■書誌情報
Yohei Kobashi, Fumiya Uchiyama, Takeshi Kojima, Andrew Gambardella, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo. “Crypto-LLM: Two-Stage Language Model Pre-training with Ciphered and Natural Language Data.” Proceedings of The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2025).
■概要
As the utilization of large language models (LLMs) continues to grow, the risk of sensitive data leakage from training datasets has become a critical concern. This study proposes a novel method for encrypting training data using a polyalphabetic substitution cipher. This approach prevents the model from learning sensitive information while allowing it to capture abstract linguistic patterns. We pre-trained a llama 3 model (551M parameters) using approximately 7.5 billion tokens of encrypted data and subsequently fine-tuned it with another 2.5 billion tokens of plaintext data. The effectiveness of the model was evaluated by comparing its downstream task performance with a model trained solely on plaintext data. In addition, we evaluated the risk of sensitive data leakage through name reconstruction, true-prefix and data extraction attacks. These results demonstrate the potential of our approach to balance data security with model performance.
■書誌情報
Hirohane Takagi(*), Gouki Minegishi(*), Shota Kizawa, Issey Sukeda, Hitomi Yanaka. “Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models” Proceedings of The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2025).
(*) Equal Contribution
■概要
Although behavioral studies have documented numerical reasoning errors in large language models (LLMs), the underlying representational mechanisms remain unclear. We hypothesize that numerical attributes occupy shared latent subspaces and investigate two questions: (1) How do LLMs internally integrate multiple numerical attributes of a single entity? (2) How does irrelevant numerical context perturb these representations and their downstream outputs? To address these questions, we combine linear probing with partial correlation analysis and prompt-based vulnerability tests across models of varying sizes. Our results show that LLMs encode real-world numerical correlations but tend to systematically amplify them. Moreover, irrelevant context induces consistent shifts in magnitude representations, with downstream effects that vary by model size. These findings reveal a vulnerability in LLM decision-making and lay the groundwork for fairer, representation-aware control under multi-attribute entanglement.
