Algorithm that gets ‘under the hood’ of AI models could effectively steer their responses
-
-
Bluesky
-
-
-
-
-
X
Access through your institution
Buy or subscribe
Is it possible to know whether the response of an artificial-intelligence model is factually correct without having a human check it? Neural networks, on which many AI systems are based, can encode concepts such as truthfulness. Concepts are often represented by neural networks as numeric patterns, but identifying these patterns and using them to steer the behaviour of AI models is a substantial challenge. Writing in Science, Beaglehole et al.1 report an approach to AI steering that outperforms alternative methods on a coding task, and show that this approach can be used to control and monitor AI models from the ‘inside’.
Access options
Access through your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Learn more
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Learn more
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Learn more
Prices may be subject to local taxes which are calculated during checkout
doi: https://doi.org/10.1038/d41586-026-01267-4
References
- Beaglehole, D., Radhakrishnan, A., Boix-Adserà, E. & Belkin, M. Science 391, 787–792 (2026).
Article
PubMed
Google Scholar
- Subramani, N., Suresh, N. & Peters, M. E. In Findings of the Association for Computational Linguistics: ACL 2022 (eds Muresan, S., Nakov, P. & Villavicencio, A.) 566–581 (ACM, 2022).
Google Scholar
- Marks, S. & Tegmark, M. In Proc. 1st Conf. Lang. Model. (COLM, 2024).
Google Scholar
- Radhakrishnan, A., Beaglehole, D., Pandit, P. & Belkin, M. Science 383, 1461–1467 (2024).
Article
PubMed
Google Scholar
- Prasad, A. V. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2602.10067 (2026).
- Wu, Z. et al. In Proc. 42nd Intl. Conf. Mach. Learn. 267, 67035–67080 (2025).
- Mueller, A. et al. Comput. Linguist. 52, 331–378 (2026).
Article
Google Scholar
- Geiger, A. et al. J. Mach. Learn. Res. 26, 83 (2025).
Google Scholar
Download references
Reprints and permissions
Competing Interests
The author declares no competing interests.
Related Articles
-
Read the paper: Toward universal steering and monitoring of AI models
-
Bad influence: LLMs can transmit malicious traits using hidden signals
-
LLMs behaving badly: mistrained AI models quickly go off the rails
-
See all News & Views
Subjects
-
Mathematics and computing
Latest on:
-
Mathematics and computing
-
To hire good scientists, look at their peer-reviewing records
Correspondence 28 APR 26
-
‘World models’ are AI’s latest sensation: what are they and what can they do?
News Explainer 28 APR 26
-
Data centres are controversial: will launching them into space help?
News Explainer 28 APR 26
Jobs
-
Assistant Professor, Stanford Dermatology
The Department of Dermatology at Stanford University is seeking an Assistant Professor...
Stanford, California (US)
Stanford Dermatology
-
Postdoc in Computational Biology
Postdoc in Computational Biology | Human Technopole, Milan Build the science that shapes the future of human health. Application closing date: 20.0...
Milan (IT)
Human Technopole
-
Postdoctoral Associate: Unsupervised Learning for DNA/RNA Molecular Dynamics
Interpretable DNA/RNA ensemble quantification with molecular dynamics, machine learning, clustering, and measurement analysis.
Gaithersburg, Maryland
Biophysical and Biomedical Measurement Group, National Institute of Standards and Technology
-
Associate or Senior Editor, Communications AI & Computing
Job Title: Associate or Senior Editor, Communications AI & Computing Locations: Shanghai, Beijing, Pune or New Delhi (hybrid) Application deadline:...
Shanghai, Beijing, Pune or New Delhi (hybrid)
Springer Nature Ltd
-
Faculty Positions at Institute of Physics (IOP), Chinese Academy of Sciences
IOP is China's premier research institution in condensed matter physics and related fields.
Beijing (CN)
Institute of Physics (IOP), Chinese Academy of Sciences (CAS)