◼︎Bibliographic Information
Ku Onoda, Paavo Paramas, Yutaka Matsuo, Efficient and Low-Bias Policy Gradient Estimation on Differentiable Simulators in Contact-Involving Environments, Proceedings of the 39th Annual Conference of the Japanese Society for Artificial Intelligence, 2025, JSAI2025 Vol., 39th (2025), Session ID 1S5-GS-2-04, p. 1S5GS204.
◼︎Abstract
In policy gradient methods for reinforcement learning, utilizing first-order estimators on differentiable simulators accelerates learning compared to using only zero-order estimators that do not employ derivatives. However, discontinuous behavior in the objective function introduces bias into the first-order estimator, diminishing its effectiveness. Existing methods construct confidence intervals around the zero-order estimator and use their range to detect discontinuities. Yet, the zero-order estimator suffers from high noise, low sample efficiency, and requires task-specific hyperparameter tuning. Therefore, this study proposes a novel method called Discontinuity Detection Composite Gradient (DDCG). This method detects discontinuities using a statistical test based on the assumption of smoothness and dynamically switches gradient estimation methods accordingly. We evaluated this method on a differentiable simulation control task, demonstrating good performance with fixed hyperparameters and enabling effective gradient estimation even with a small number of samples.
Translated with DeepL.com (free version)
