
Research
研究
研究業績
カテゴリー
研究領域
年
-
Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying
Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno, Toshinori Kitamura, Shin Ishii, Yutaka Matsuo
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), July 2026
-
Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?
Ku Onoda, Paavo Parmas, Manato Yaguchi, Yutaka Matsuo
International Conference on Learning Representations 2026 (ICLR2026), April 2026
-
Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learnin
Ru Wang, Wei Huang, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo
International Conference on Learning Representations 2026 (ICLR2026), April 2026
-
The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning
Jorge de Heuvel, Daniel Marta, Simon Holk, Iolanda Leite, Maren Bennewitz
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025), October 2025
-
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation
Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Wataru Kumagai, Kazumi Kasaura, Kenta Hoshino, Yohei Hosoe, Yutaka Matsuo.
Advances in Neural Information Processing Systems (NeurIPS 2025), Spotlight, December 2025
-
A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond
Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Wataru Kumagai, Paavo Parmas, Yutaka Matsuo
Proceedings of the Reinforcement Learning Conference (RLC 2025), Finding the Frame Workshop, June 2025
-
FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions
Daniel Marta, Simon Holk, Miguel Vasco, Jens Lundell, Timon Homberger, Finn Busch, Olov Andersson, Danica Kragic, Iolanda Leite
Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2025), May 2025
-
Learning Whole-Body Control for End-Effector Stabilization of a Quadruped Manipulator
Yanzhou Jin, Yaonan Zhu, Yusuke Iwasawa, Yasuhisa Hasegawa, Yutaka Matsuo
Proceedings of the IEEE International Symposium on Micro-NanoMechatronics and Human Science (IEEE MHS 2025), July 2025
-
Double Horizon Model-Based Policy Optimization
Akihiro Kubo, Paavo Parmas, Shin Ishii
Transactions on Machine Learning Research (TMLR), April 2025
-
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo
International Conference on Learning Representations (ICLR 2025)