AIセキュリティポータル K Program
An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress
Share
Abstract
As large language models (LLMs) are increasingly deployed in high-stakes and operational settings, evaluation strategies based solely on aggregate accuracy are often insucient to characterize system reliability. This study proposes a thermodynamic inspired modeling framework for analyzing the stability of LLM outputs under conditions of uncertainty and perturbation. The framework introduces a composite stability score that integrates task utility, entropy as a measure of external uncertainty, and two internal structural proxies: internal integration and aligned reective capacity. Rather than interpreting these quantities as physical variables, the formulation is intended as an interpretable abstraction that captures how internal structure may modulate the impact of disorder on model behavior. Using the IST-20 benchmarking protocol and associated metadata, we analyze 80 modelscenario observations across four contemporary LLMs. The proposed formulation consistently yields higher stability scores than a reduced utilityentropy baseline, with a mean improvement of 0.0299 (95% CI: 0.02470.0351). The observed gain is more pronounced under higher entropy conditions, suggesting that the framework captures a form of nonlinear attenuation of uncertainty. We do not claim a fundamental physical law or a complete theory of machine ethics. Instead, the contribution of this work is a compact and interpretable modeling perspective that connects uncertainty, performance, and internal structure within a unied evaluation lens. The framework is intended to complement existing benchmarking approaches and to support ongoing discussions in AI safety, reliability, and governance.
The thermodynamics of computation—a review
Charles H. Bennett
Published: 1982
A unified framework of five principles for AI in society
Luciano Floridi, Josh Cowls
Published: 2019
The free-energy principle: A unified brain theory?
Karl Friston
Published: 2010
A survey of confidence estimation and calibration in large language models
Jiahao Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych
Published: 2024
FactAlign: Long-form factuality alignment of large language models
Ching-Wei Huang, Yun-Nung Chen
Published: 2024
Survey of hallucination in natural language generation
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, Pascale Fung
Published: 2023
The global landscape of AI ethics guidelines
Anna Jobin, Marcello Ienca, Effy Vayena
Published: 2019
SummaC: Revisiting NLI-based models for inconsistency detection in summarization
Philippe Laban, Tobias Schnabel, Paul N. Bennett, Marti A. Hearst
Published: 2022
Irreversibility and heat generation in the computing process
Rolf Landauer
Published: 1961
Halueval: A large-scale hallucination evaluation benchmark for large language models
J. Li, X. Cheng, W. X. Zhao, J.-Y. Nie, J.-R. Wen
Published: 2023
MetaFaith: Faithful natural language uncertainty expression in LLMs
Genglin Kevin-Ming Liu, Gal Yona, Avi Caciularu, Idan Szpektor, Tim G. J. Rudner, Arman Cohan
Published: 2025
Artificial intelligence risk management framework: Generative artificial intelligence profile (NIST AI 600-1)
National Institute of Standards and Technology
Published: 2024
KILT: A benchmark for knowledge intensive language tasks
Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard
Published: 2021
A mathematical theory of communication
Claude Elwood Shannon
Published: 1948
Artificial intelligence risk management framework (AI RMF 1.0) (NIST AI 100-1)
Elham Tabassi
Published: 2023
FEVER: A large-scale dataset for fact extraction and verification
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal
Published: 2018
An information integration theory of consciousness
Giulio Tononi
Published: 2004
Factuality of large language models: A survey
Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi N. Georgiev, Rocktim Jyoti Das, Preslav Nakov
Published: 2024
HotpotQA: A dataset for diverse, explainable multi-hop question answering
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, Christopher D. Manning
Published: 2018
Share