Bluesky

Facebook

Access through your institution

Buy or subscribe

Is it possible to know whether the response of an artificial-intelligence model is factually correct without having a human check it? Neural networks, on which many AI systems are based, can encode concepts such as truthfulness. Concepts are often represented by neural networks as numeric patterns, but identifying these patterns and using them to steer the behaviour of AI models is a substantial challenge. Writing in Science, Beaglehole et al.1 report an approach to AI steering that outperforms alternative methods on a coding task, and show that this approach can be used to control and monitor AI models from the ‘inside’.

Access options

Access through your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$32.99 / 30 days

cancel any time

Learn more

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Learn more

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Learn more

Prices may be subject to local taxes which are calculated during checkout

doi: https://doi.org/10.1038/d41586-026-01267-4

References

- Beaglehole, D., Radhakrishnan, A., Boix-Adserà, E. & Belkin, M. Science 391, 787–792 (2026).

Article

PubMed

Google Scholar

- Subramani, N., Suresh, N. & Peters, M. E. In Findings of the Association for Computational Linguistics: ACL 2022 (eds Muresan, S., Nakov, P. & Villavicencio, A.) 566–581 (ACM, 2022).

Google Scholar

- Marks, S. & Tegmark, M. In Proc. 1st Conf. Lang. Model. (COLM, 2024).

Google Scholar

- Radhakrishnan, A., Beaglehole, D., Pandit, P. & Belkin, M. Science 383, 1461–1467 (2024).

Article

PubMed

Google Scholar

- Prasad, A. V. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2602.10067 (2026).

- Wu, Z. et al. In Proc. 42nd Intl. Conf. Mach. Learn. 267, 67035–67080 (2025).

- Mueller, A. et al. Comput. Linguist. 52, 331–378 (2026).

Article

Google Scholar

- Geiger, A. et al. J. Mach. Learn. Res. 26, 83 (2025).

Google Scholar

Download references

Reprints and permissions

Competing Interests

The author declares no competing interests.

Read the paper: Toward universal steering and monitoring of AI models

Bad influence: LLMs can transmit malicious traits using hidden signals

LLMs behaving badly: mistrained AI models quickly go off the rails

See all News & Views

Subjects

Mathematics and computing

Latest on:

Mathematics and computing

To hire good scientists, look at their peer-reviewing records

Correspondence 28 APR 26

‘World models’ are AI’s latest sensation: what are they and what can they do?

News Explainer 28 APR 26

Data centres are controversial: will launching them into space help?

News Explainer 28 APR 26

Jobs

Assistant Professor, Stanford Dermatology

The Department of Dermatology at Stanford University is seeking an Assistant Professor...

Stanford, California (US)

Stanford Dermatology

Postdoc in Computational Biology

Postdoc in Computational Biology | Human Technopole, Milan Build the science that shapes the future of human health. Application closing date: 20.0...

Milan (IT)

Human Technopole

Postdoctoral Associate: Unsupervised Learning for DNA/RNA Molecular Dynamics

Interpretable DNA/RNA ensemble quantification with molecular dynamics, machine learning, clustering, and measurement analysis.

Gaithersburg, Maryland

Biophysical and Biomedical Measurement Group, National Institute of Standards and Technology

Associate or Senior Editor, Communications AI & Computing

Job Title: Associate or Senior Editor, Communications AI & Computing Locations: Shanghai, Beijing, Pune or New Delhi (hybrid) Application deadline:...

Shanghai, Beijing, Pune or New Delhi (hybrid)

Springer Nature Ltd

Faculty Positions at Institute of Physics (IOP), Chinese Academy of Sciences

IOP is China's premier research institution in condensed matter physics and related fields.

Beijing (CN)

Institute of Physics (IOP), Chinese Academy of Sciences (CAS)

Algorithm that gets ‘under the hood’ of AI models could effectively steer their responses