New AI Atlas Maps Over One Billion Protein Structures
Researchers at the Chan Zuckerberg Biohub have unveiled the ESM Atlas, a massive open-source database containing predicted structures for over one billion proteins. Powered by the new ESMFold2 artificial intelligence model, this collection significantly expands the known protein universe by incorporating billions of sequences derived from metagenomic data—including samples from soil and marine environments—that were previously poorly understood.
ESMFold2 represents a significant leap in computational biology, with developers claiming it outperforms existing industry standards, such as Google DeepMind’s AlphaFold3, in predicting complex protein interactions. By utilizing a "protein language" model trained on vast biological datasets, the tool excels at modeling how antibodies bind to their targets. This capability has already proven effective in the laboratory, where researchers successfully designed and tested new proteins intended to treat cancer and various immunological conditions.
This development is a major milestone for drug discovery and fundamental biological research. By providing free, open-source access to these structural predictions, the Biohub team is enabling scientists to identify hidden evolutionary relationships, such as the structural parallels discovered between microbial defense systems and eukaryotic gene-editing proteins. As the field of AI-driven structural biology becomes increasingly competitive, the ESM Atlas stands out as a critical, high-capacity resource that could accelerate the development of novel therapeutics and deepen our understanding of life at the molecular level.