強化学習アーカイブ - 東京大学松尾・岩澤研究室（松尾研）- Matsuo Lab

2026

国際会議

強化学習

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno, Toshinori Kitamura, Shin Ishii, Yutaka Matsuo

Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), July 2026

2026

国際会議

LLM / NLP 強化学習

Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?

Ku Onoda, Paavo Parmas, Manato Yaguchi, Yutaka Matsuo

International Conference on Learning Representations 2026 (ICLR2026), April 2026

2026

国際会議

LLM / NLP 強化学習

Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learnin

Ru Wang, Wei Huang, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo

International Conference on Learning Representations 2026 (ICLR2026), April 2026

2025

国際会議

ロボティクス強化学習

The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning

Jorge de Heuvel, Daniel Marta, Simon Holk, Iolanda Leite, Maren Bennewitz

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025), October 2025

2025

国際会議

強化学習理論

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Wataru Kumagai, Kazumi Kasaura, Kenta Hoshino, Yohei Hosoe, Yutaka Matsuo.

Advances in Neural Information Processing Systems (NeurIPS 2025), Spotlight, December 2025

2025

国際会議

強化学習理論

A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond

Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Wataru Kumagai, Paavo Parmas, Yutaka Matsuo

Proceedings of the Reinforcement Learning Conference (RLC 2025), Finding the Frame Workshop, June 2025

2025

国際会議

ロボティクス強化学習

FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

Daniel Marta, Simon Holk, Miguel Vasco, Jens Lundell, Timon Homberger, Finn Busch, Olov Andersson, Danica Kragic, Iolanda Leite

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2025), May 2025

2025

国際会議

ロボティクス強化学習

Learning Whole-Body Control for End-Effector Stabilization of a Quadruped Manipulator

Yanzhou Jin, Yaonan Zhu, Yusuke Iwasawa, Yasuhisa Hasegawa, Yutaka Matsuo

Proceedings of the IEEE International Symposium on Micro-NanoMechatronics and Human Science (IEEE MHS 2025), July 2025

2025

論文誌

世界モデル強化学習

Double Horizon Model-Based Policy Optimization

Akihiro Kubo, Paavo Parmas, Shin Ishii

Transactions on Machine Learning Research (TMLR), April 2025

2025

国際会議

強化学習理論

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo

International Conference on Learning Representations (ICLR 2025)

Research

研究業績

カテゴリー

研究領域

年

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?

Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learnin

The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond

FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

Learning Whole-Body Control for End-Effector Stabilization of a Quadruped Manipulator

Double Horizon Model-Based Policy Optimization

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form