■Bibliographic Information
Jiaxian Guo*, Bo Yang*, Paul Yoo, Bill Yuchen Lin, Yusuke Iwasawa, Yutaka Matsuo. “Suspicion Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4”. 2024 First Conference on Language Modeling (COLM 2024)
■Overview
Unlike perfect information games, where all elements are known to every player, imperfect information games emulate the real-world complexities of GPT-4, the recent breakthrough in large language models (LLMs) trained on massive passive This paper delves into the applicability of GPT-4’s learned knowledge for imperfect information games. To achieve this, we introduce Suspicion-Agent, an innovative agent that leverages GPT-4’s With proper prompt engineering to achieve different functions, Suspicion-Agent based on GPT-4 demonstrates remarkable adaptability across a range of information games. Importantly, GPT-4 displays a strong high-order theory of mind (ToM) capacity, meaning that it is able to make sense of the information card games. Importantly, GPT-4 displays a strong high-order theory of mind (ToM) capacity, meaning it can understand others and intentionally impact others’ behavior. Leveraging this, we design a planning strategy that enables GPT-4 to competently play against different opponents, adapting its gameplay style as needed, while requiring only the game rules In the experiments, we qualitatively showcase the capabilities of Suspicion-Agent across three different In the experiments, we qualitatively showcase the capabilities of Suspicion-Agent across three different imperfect information games and then quantitatively evaluate it in Leduc Hold’em. can potentially outperform traditional algorithms without any specialized training or examples, but still cannot beat Nash-Equilibrium algorithms }. In order to encourage and foster deeper insights within the community, we make our game-related data publicly available.
■Bibliographic Information
Yongmin Kim, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo. “Decoupling Noise and Toxic Parameters for Language Model Detoxification by Task Vector Merging”. 2024 First Conference on Language Modeling (COLM 2024)
■Outline
The goal of detoxifying language models is to reduce the chances of producing offensive or harmful output in pretrained language models (PLMs), ensuring A recently proposed detoxification method utilizes the task vector obtained by subtraction from the fine-tuned model on toxic datasets This approach has shown effectiveness for detoxification while still suffers from degradation. For mitigating the degradation, we proposed a method that detoxifies the For mitating the degradation, we proposed a method that detoxifies the PLMs by fine-tuning multiple models on split toxic datasets and by merging the subtracted task vectors. Civil Comments and Toxigen) with four PLMs (GPT2, GPT2-medium, GPT2-large, and Phi), demonstrating that our method consistently achieves a lower Especially, with GPT2-small model on Toxigen dataset, degradation was Especially, with GPT2-small model on Toxigen dataset, degradation was reduced by 38.9% compared to a existing task vector method while maintaining a similar toxicity score. In addition, we found that merging multiple detoxified models tends to increase the number of parameters that remained almostly unchanged from the pre-trained model. We assume that by merging multiple detoxified models, “decoupling noise and toxic parameters” is implicitly achieved; accidental noise in We assume that by merging multiple detoxified models, “decoupling noise and toxic parameters” is implicitly achieved; accidental noise in unrelated to detoxification disappears by averaging noise, while parameter shift associated with detoxification is maintained. hope that the findings of this study will be applied not only to detoxification but also to many other research domains that seek to suppress undesirable outputs of language models.