
Research
研究
研究業績
カテゴリー
研究領域
年
-
Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying
Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno, Toshinori Kitamura, Shin Ishii, Yutaka Matsuo
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), July 2026
-
Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?
Ku Onoda, Paavo Parmas, Manato Yaguchi, Yutaka Matsuo
International Conference on Learning Representations 2026 (ICLR2026), April 2026
-
Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learnin
Ru Wang, Wei Huang, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo
International Conference on Learning Representations 2026 (ICLR2026), April 2026
-
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation
Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Wataru Kumagai, Kazumi Kasaura, Kenta Hoshino, Yohei Hosoe, Yutaka Matsuo.
Advances in Neural Information Processing Systems (NeurIPS 2025), Spotlight, December 2025
-
A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond
Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Wataru Kumagai, Paavo Parmas, Yutaka Matsuo
Proceedings of the Reinforcement Learning Conference (RLC 2025), Finding the Frame Workshop, June 2025
-
FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions
Daniel Marta, Simon Holk, Miguel Vasco, Jens Lundell, Timon Homberger, Finn Busch, Olov Andersson, Danica Kragic, Iolanda Leite
Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2025), May 2025
-
Double Horizon Model-Based Policy Optimization
Akihiro Kubo, Paavo Parmas, Shin Ishii
Transactions on Machine Learning Research (TMLR), April 2025
-
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo
International Conference on Learning Representations (ICLR 2025)
-
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Remi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvari, Wataru Kumagai, Yutaka Matsuo
International Conference on Machine Learning (ICML 2023). July 2023.
-
Generalized Decision Transformer for Offline Hindsight Infomation Matching
Hiroki Furuta, Yutaka Matsuo, and Shixiang Shane Gu
International Conference on Learning Representations 2022 (ICLR2022, Spotlight).