内容をスキップ

研究室について
ニュース
研究
講義
起業家育成
- 松尾研発スタートアップ
- 起業クエスト
社会連携
メンバー
- 研究員・スタッフ一覧
- 学生一覧
採用・学生募集
ja
en

当研究室の論文がICLR 2024に2件採録されました。

2024.01.17

—

当研究室の論文がICLR 2024に2件採録されました。

■書誌情報
Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust. (*Equal Contribution) “A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis”. International Conference on Learning Representations (ICLR 2024, Oral)

■概要
Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.

■書誌情報
Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum, Yutaka Matsuo, Aleksandra Faust, Shixiang Shane Gu, Izzeddin Gur. “Multimodal Web Navigation with Instruction-Finetuned Foundation Models”. International Conference on Learning Representations (ICLR 2024)

■概要
The progress of autonomous web navigation has been hindered by the dependence on billions of exploratory interactions via online reinforcement learning, and domain-specific model designs that make it difficult to leverage generalization from rich out-of-domain data. In this work, we study data-driven offline training for web agents with vision-language foundation models. We propose an instruction-following multimodal agent, WebGUM, that observes both webpage screenshots and HTML pages and outputs web navigation actions, such as click and type. WebGUM is trained by jointly finetuning an instruction-finetuned language model and a vision encoder with temporal and local perception on a large corpus of demonstrations. We empirically demonstrate this recipe improves the agent’s ability of grounded multimodal perception, HTML comprehension, and multi-step reasoning, outperforming prior works by a significant margin. On the MiniWoB, we improve over the previous best offline methods by more than 45.8%, even outperforming online-finetuned SoTA, humans, and GPT-4-based agent. On the WebShop benchmark, our 3-billion-parameter model achieves superior performance to the existing SoTA, PaLM-540B. Furthermore, WebGUM exhibits strong positive transfer to the real-world planning tasks on the Mind2Web. We also collect 347K high-quality demonstrations using our trained models, 38 times larger than prior work, and make them available to promote future research in this direction.

Related Post

IEEE/ASME Transactions on Mechatronics (TMECH)に当研究室の論文が採録

IEEE/ASME Transactions on Mechatronics (TMECH)に当研究室の論文が採録

COLM 2026に当研究室の論文2本が採録

COLM 2026に当研究室の論文2本が採録

ECCV 2026に当研究室の論文が採録

ECCV 2026に当研究室の論文が採録

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026) に当研究室の論文3本が採録

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026) に当研究室の論文3本が採録

Interspeech 2026に当研究室の論文が採録

Interspeech 2026に当研究室の論文が採録

IEEE Transactions on Automation Science and Engineering(T-ASE)に当研究室の論文が採録

IEEE Transactions on Automation Science and Engineering(T-ASE)に当研究室の論文が採録

UAI 2026に当研究室の論文が採録

UAI 2026に当研究室の論文が採録

JMIR Infodemiologyに当研究室の論文が採録

JMIR Infodemiologyに当研究室の論文が採録

研究室について
ニュース
研究
講義
起業家育成
- ＞松尾研発スタートアップ
- ＞起業クエスト
社会連携
メンバー
- ＞研究員・スタッフ一覧
- ＞学生一覧
採用・学生募集

Facebook
X

Copyright ©Matsuo-Iwasawa Lab. All Rights Reserved.