Robot Showcase

GPSRタスクへの取り組みとその解決策(2024)

Demo Developers: Aoi Horo, Hikaru Wada, Koki Fukuda, Yoshihiro Noumi

We used several foundation model technologies such as a large language model (GPT-4), a speech recognition model (Whisper), an object detection model (Detic), and a multimodal foundation model (CLIP). By integrating various foundation models and implementing them in a robot, it can comprehensively recognize the real world and generate appropriate actions based on its abilities in response to commands.

Reference

基盤モデルを活用した自然言語による多様なタスク実現に向けたロボットシステムの統合

Self-Recovery Prompting: Promptable General Purpose Service Robot System with Foundation Models and Self-Recovery