Research

研究

  • Home
  • 研究業績
  • 強化学習
  • 研究業績

    カテゴリー

    研究領域

    • Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

      Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno, Toshinori Kitamura, Shin Ishii, Yutaka Matsuo

      Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), July 2026

    • Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?

      Ku Onoda, Paavo Parmas, Manato Yaguchi, Yutaka Matsuo

      International Conference on Learning Representations 2026 (ICLR2026), April 2026

    • Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learnin

      Ru Wang, Wei Huang, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo

      International Conference on Learning Representations 2026 (ICLR2026), April 2026

    • The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning

      Jorge de Heuvel, Daniel Marta, Simon Holk, Iolanda Leite, Maren Bennewitz

      Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025), October 2025

    • Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

      Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Wataru Kumagai, Kazumi Kasaura, Kenta Hoshino, Yohei Hosoe, Yutaka Matsuo.

      Advances in Neural Information Processing Systems (NeurIPS 2025), Spotlight, December 2025

    • A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond

      Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Wataru Kumagai, Paavo Parmas, Yutaka Matsuo

      Proceedings of the Reinforcement Learning Conference (RLC 2025), Finding the Frame Workshop, June 2025

    • FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

      Daniel Marta, Simon Holk, Miguel Vasco, Jens Lundell, Timon Homberger, Finn Busch, Olov Andersson, Danica Kragic, Iolanda Leite

      Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2025), May 2025

    • Learning Whole-Body Control for End-Effector Stabilization of a Quadruped Manipulator

      Yanzhou Jin, Yaonan Zhu, Yusuke Iwasawa, Yasuhisa Hasegawa, Yutaka Matsuo

      Proceedings of the IEEE International Symposium on Micro-NanoMechatronics and Human Science (IEEE MHS 2025), July 2025

    • Double Horizon Model-Based Policy Optimization

      Akihiro Kubo, Paavo Parmas, Shin Ishii

      Transactions on Machine Learning Research (TMLR), April 2025

    • Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

      Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo

      International Conference on Learning Representations (ICLR 2025)