Evo 2
Evo 2 is a genomic foundation model capable of generalist prediction and design tasks across DNA, RNA, and proteins. It utilizes a frontier deep learning architecture to model biological sequences at single-nucleotide resolution, achieving near-linear scaling of compute and memory relative to context length. Trained with 40 billion parameters and a 1 megabase context length, Evo 2 processes over 9 trillion nucleotides from diverse eukaryotic and prokaryotic genomes. This extensive training enables Evo 2 to perform zero-shot function prediction across multiple biological modalities, including DNA, RNA, and proteins, and to generate novel sequences with plausible genomic architecture. The model's capabilities have been demonstrated in tasks such as designing functional CRISPR systems and predicting disease-causing mutations in human genes. Evo 2 is publicly accessible via Arc's GitHub repository and is integrated into the NVIDIA BioNeMo framework.
Learn more
Evo Designer
Evo Designer is an advanced tool developed by the Arc Institute, leveraging the capabilities of the Evo 2 genomic foundation model to facilitate DNA sequence generation and analysis. This platform enables users to input nucleotide sequences or specify organisms, prompting the model to generate corresponding DNA sequences. It provides comprehensive annotations of coding regions and, for prokaryotic sequences, offers 3D protein visualizations utilizing ESMFold. Additionally, Evo Designer evaluates sequences by scoring their perplexity and per-nucleotide entropy, assisting researchers in assessing sequence complexity and variability. The underlying Evo 2 model is trained on over 9 trillion nucleotides from a diverse array of prokaryotic and eukaryotic genomes, employing a deep learning architecture that models biological sequences at single-nucleotide resolution with a context window extending up to 1 million tokens.
Learn more
Profluent
Profluent's platform revolutionizes protein design by integrating advanced AI with in-house wet-lab capabilities, enabling the creation of proteins either inspired by nature or reimagined from scratch. This holistic approach allows for precise, adaptable, and scalable solutions to complex biological challenges, delivering results that redefine what's possible with proteins. Profluent's foundation models push the frontier of protein design beyond the limitations of random discovery, facilitating the optimization of multiple attributes simultaneously, accessing greater sequence diversity, and enabling novel functionalities. By extrapolating into new protein spaces, Profluent offers unique possibilities beyond natural or patented proteins, making it cheaper, easier, and feasible for partners to achieve commercial success. Profluent's capabilities are built on a commitment to scientific rigor, leveraging diverse datasets and advanced AI to tackle challenges.
Learn more
NVIDIA BioNeMo
BioNeMo is an AI-powered drug discovery cloud service and framework built on NVIDIA NeMo Megatron for training and deploying large biomolecular transformer AI models at a supercomputing scale. The service includes pre-trained large language models (LLMs) and native support for common file formats for proteins, DNA, RNA, and chemistry, providing data loaders for SMILES for molecular structures and FASTA for amino acid and nucleotide sequences. The BioNeMo framework will also be available for download for running on your own infrastructure. ESM-1, based on Meta AI’s state-of-the-art ESM-1b, and ProtT5 are transformer-based protein language models that can be used to generate learned embeddings for tasks like protein structure and property prediction. OpenFold, a deep learning model for 3D structure prediction of novel protein sequences, will be available in BioNeMo service.
Learn more