Research

研究

  • Home
  • 研究業績
  • 強化学習
  • 研究業績

    カテゴリー

    研究領域

    • Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

      Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno, Toshinori Kitamura, Shin Ishii, Yutaka Matsuo

      Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), July 2026

    • Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?

      Ku Onoda, Paavo Parmas, Manato Yaguchi, Yutaka Matsuo

      International Conference on Learning Representations 2026 (ICLR2026), April 2026

    • Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learnin

      Ru Wang, Wei Huang, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo

      International Conference on Learning Representations 2026 (ICLR2026), April 2026

    • Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

      Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Wataru Kumagai, Kazumi Kasaura, Kenta Hoshino, Yohei Hosoe, Yutaka Matsuo.

      Advances in Neural Information Processing Systems (NeurIPS 2025), Spotlight, December 2025

    • A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond

      Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Wataru Kumagai, Paavo Parmas, Yutaka Matsuo

      Proceedings of the Reinforcement Learning Conference (RLC 2025), Finding the Frame Workshop, June 2025

    • FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

      Daniel Marta, Simon Holk, Miguel Vasco, Jens Lundell, Timon Homberger, Finn Busch, Olov Andersson, Danica Kragic, Iolanda Leite

      Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2025), May 2025

    • Double Horizon Model-Based Policy Optimization

      Akihiro Kubo, Paavo Parmas, Shin Ishii

      Transactions on Machine Learning Research (TMLR), April 2025

    • Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

      Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo

      International Conference on Learning Representations (ICLR 2025)

    • Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

      Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Remi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvari, Wataru Kumagai, Yutaka Matsuo

      International Conference on Machine Learning (ICML 2023). July 2023.

    • Generalized Decision Transformer for Offline Hindsight Infomation Matching

      Hiroki Furuta, Yutaka Matsuo, and Shixiang Shane Gu

      International Conference on Learning Representations 2022 (ICLR2022, Spotlight).