■書誌情報
Fumiya Uchiyama, Takeshi Kojima, Andrew Gambardella, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo. “Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?”. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)
■概要
Recent large language models (LLMs) have demonstrated remarkable generalization abilities in mathematics and logical reasoning tasks. Prior research indicates that LLMs pre-trained with programming language data exhibit high mathematical and reasoning abilities. However, this causal relationship has not been rigorously tested. Our research aims to verify which programming languages and features during pre-training affect logical inference performance. Specifically, we pre-trained decoder-based language models from scratch using datasets from ten programming languages (e.g., Python, C, Java) and three natural language datasets (Wikipedia, Fineweb, C4) under identical conditions. We then evaluated the trained models in a few-shot in-context learning setting on logical reasoning tasks: FLD and bAbi, which do not require commonsense or world knowledge. The results demonstrate that nearly all the models trained in programming languages consistently outperform those trained in natural languages, indicating that programming languages as a whole contain factors that elicit logic inference performance. In addition, we found that models trained with programming languages have a better ability to follow instructions compared to those trained with natural languages. Further analysis reveals that the depth of Abstract Syntax Trees representing parsed results of programs also affects logical reasoning performance. We hope these findings will offer insights into the essential elements of pre-training for acquiring the foundational abilities of LLMs.
—