AminoAcid-0 (AA-0): A Protein LLM Trained with 2 Billion Proprietary Sequences

Reviewing the design and performance of the first model released for
Ginkgo’s AI developer platform

by Seth Ritter and Jake Wintermute


Sign up here to join our community and be the first to know about model releases and new features!

This is a new chapter for Ginkgo, and we’re just getting started. As we continue to develop and release more models and services, we’re excited to see how you’ll use these tools to drive innovation in biology. 


Large Language Models (LLMs), when trained with large collections of protein sequence data, have proven effective for protein engineering tasks including structure prediction1, functional annotation2, and generation of diverse enzymes3. The biological codebase at Ginkgo Bioworks includes the Unified Metagenomic Database (UMDB), a collection of metagenomic sequence data with more than 2 billion protein sequences, most of which do not appear in public repositories.

Here we introduce AA-0, a 650M parameter model following the ESM-2 architecture, trained on public data combined with proprietary sequences from the UMDB. We compare the performance of AA-0 to ESM-2 on popular benchmarks as well as a collection of internal benchmarks relevant to our commercial work in the Ginkgo Bioworks foundry.

AA-0 performs comparably to ESM-2 across a range of 235 external and 73 internal protein engineering tasks. Although the UMDB added 112 M distinct sequence clusters to the 51 M UniRef clusters available for training, the additional data did not result in uniform improvements across all tasks. These results suggest that modern protein LLMs are not limited strictly by the size of their training dataset. To reach the full potential of AI for protein engineering may require more specialized forms of task-specific training data.

Why we built AA-0

Ginkgo’s mission is to make biology easier to engineer. Over the years, we’ve worked with more than 100 commercial partners to support R&D projects ranging from therapeutics and pharmaceutical manufacturing to industrial enzymes and agriculture. 

Like many in biotech, we’re excited about AI-based tools and we have used them extensively for projects including enzyme discovery and protein engineering

By releasing AA-0 to the public, we hope to make Ginkgo’s capabilities and resources more accessible to biotechnology developers. We’re excited to see what you’ll build with them!

Accessing the AA-0 model

The AA-0 model API is available through Ginkgo’s AI developer portal. Read more about Ginkgo’s model API here.

The first release supports the common use cases of embedding calculation and generation via masked language modeling. The platform supports calls to both ginkgo-aa0-650M and esm2-650M, so that users can compare their performance as we have done here.

Users can access a free tier and competitive pricing for larger jobs.

About Ginkgo’s Unified Metagenomic Database (UMDB)

We developed AA-0 using the 2023 UMDB corpus of about 2B protein sequences. The UMDB is derived primarily from microbial DNA extracted from soil samples and sourced from diverse geographic regions. The sequence collection was initially assembled to support R&D projects for our customers including microbial strain engineering, enzyme discovery and protein engineering.

Importantly, the UMDB was not created with the primary goal of training a general-purpose protein LLM. The resource is heavily biased toward microbial genomes and includes few sequences from other taxa. One of our goals for creating AA-0 was to better understand how the composition of the training dataset impacts downstream model performance across different protein engineering tasks.

Since 2023, the UMDB has continued to grow and now includes about 3.3B unique protein sequences, spread across 416M clusters at a clustering threshold of 50% sequence identity (SeqID50). Recent additions include public resources like MGnify4 as well as new proprietary collections of extremophiles and strains relevant to agriculture. Future releases may include models trained with this larger dataset.

Structuring the combined dataset

The AA-0 training dataset was constructed following an approach similar to that described for ESM-21. We started by collecting the publically available UniRef50/90 clusters5 from the September 2021 release. These sequences are clustered at two different levels of sequence identity, 50% (seqID50) and 90% (seqID90), allowing a hierarchical sampling procedure. Sequences are selected first from the larger seqID50 clusters, then from the smaller seqID90 clusters, to ensure representative diversity for training.

We added sequences from the UMDB to the UniRef dataset by assigning them, when possible, to existing UniRef90 clusters meeting the 90% identity threshold. Representative sequences were chosen for each cluster and similarly assigned to the existing UniRef50 clusters. When clustering criteria weren’t satisfied, new clusters were spawned to contain the UMDB sequences. Clustering was performed using the easy-linclust workflow of MMseqs26 with 80% coverage.

The clustering process resulted in 172M seqID50 clusters, a substantial increase from the ~60M found in the original UniRef50. Looking inside the new clusters, we found remarkably little overlap between the public and UMDB sequences (Fig. 1). These results indicate that the combined dataset includes many novel sequences unlike anything used to train previous models. New sequences mean new information and, potentially, new opportunities for AA-0 to learn the patterns that occur in naturally evolved proteins.

Figure 1. Sequence novelty in the UMDB. 65% of protein sequence clusters used to train AA-0 included only sequences from the UMDB, 30% included only UniRef50 sequences, and 5% included sequences from both sources. The low degree of overlap indicates that the UMDB supplied many novel sequences for training.

Selecting a strategy for filtering, sampling and training

We explored a variety of approaches to filter sequences for quality and sample them from the combined dataset to use for training (Table 1). To evaluate the impact of different strategies, we used them to train a smaller model of 150M parameters. We used a smaller, 150M-parameter, version of ESM-2 to provide a similarly powered baseline comparison. Two kinds of benchmarking tests were used to evaluate performance: ProteinGym and Owl, our in-house benchmark, which we describe more below. The sampling strategies we tried included:

  • Sequence quality filter. We removed sequences with indications of low quality, for example the inclusion of non-amino-acid characters.
  • Minimum cluster size. We removed SeqID50 clusters containing fewer than the indicated number of sequences, reasoning they might not provide representative data.
  • Samples per cluster. We sampled either 1 or the indicated number sequences from each SeqID50 cluster, trading off wider cluster diversity for deeper cluster sampling.
  • Sequence length reweighting. We adjusted sampling to reduce the probability of choosing sequences shorter than the indicated length, which are more likely to represent sequences of lower utility (e.g. short non-structural proteins) or fragments.
  • Single-representative sampling. We sampled only the representative sequences for each SeqID50 cluster as determined by the clustering algorithm, simplifying sampling but losing finer in-cluster variations.
ESM2
150M
Trial 0Trial 1Trial 2Trial 3Trial 4Trial 5
Sequence quality filteringFalseTrueTrueTrueTrueTrue
SeqID50 min cluster size11121002
Samples per SeqID50 cluster1111501
Sequence length reweighting threshold11100100100100
Only return cluster representativesFalseFalseFalseFalseFalseTrue
Owl Score0.2040.1730.1610.1850.2230.2400.231
ProteinGym Score0.3180.2920.2930.2910.3180.2570.302

Table 1. Model comparisons under different filtering and sampling strategies. Performance metrics are reported as a Spearman correlation between model scores and experimental measurements. The top performing strategies for each benchmark are indicated in bold. Performance metrics are reported as a Spearman correlation between model scores and experimental measurements. The top performing strategies for each benchmark are indicated in bold.

Although no strategy was the unambiguous winner for both benchmarks, we chose the strategy in trial 3 as giving an effective balance of performance. This entailed removing all seqID50 clusters with only 1 sequence and introducing a length reweighting threshold of 100 base pairs to sample fewer short sequences. The maximum length for training sequences was set to 512, with random cropping of sequences longer than this length.

AA-0 was trained on an 8×8 configuration on Google Cloud Platform with A100 GPUs. Except as noted below, training followed the guidelines described for ESM-21. In hyperparameter search experiments, we didn’t find any that meaningfully improved outcomes. We implemented two primary changes which, in our hands, were essential for reliable training:

  • We made use of Xavier uniform initializations for KVQ weights in the attention layers with gain set to 1/sqrt(2).
  • We used the AdamW optimizer with settings lr=4e-4, weight_decay=1e-5. 

Like ESM-2, we used a linear learning rate scheduler with 2000 warmup steps reducing to 10% maximum learning rate over the training duration. Following the sampling and filtering pattern selected above, we trained for 1M steps on the combined dataset followed by 150k steps of fine-tuning on UniRef50 sequences. We found that this fine-tuning improved some downstream tasks on a select number of targets, as described below.

Model evaluation on standard and in-house protein engineering tasks

To evaluate the performance of AA-0, we made use of the public benchmark collections DGEB7 and ProteinGym8. We were also interested in testing the model specifically against the kind of protein engineering workflows that we encounter at Ginkgo. For this, we used the internally developed Owl benchmark. In the plots below, we compare the performance of 3 models.

  • ESM-2 refers to esm2_t33_650M_UR50D, the model documented here and in the original paper1.
  • AA-0-base indicates ginkgo-aa-0-650m, the model trained on the combined dataset including our UMDB sequences.
  • AA-0 is ginkgo-aa-0-650m-finetune-UR50-150k, in which AA-0-base underwent an additional 150k rounds of additional fine-tuning with sequences from UniRef50.

The Diverse Genomic Embedding Benchmark (DGEB), composed by TattaBio, is a collection of tasks that make use of the embeddings from a protein sequence encoder model. For example, using pooled representations to search a sequence collection for similar proteins.

Figure 2. Comparison of model performance using DGEB. The tasks on the left belong to six types: BiGene Mining, Evolutionary Distance Similarity (EDS), Classification, Pair Classification, Clustering and Retrieval. The reported scoring metric varies by task type, with higher scores representing better performance.

ProteinGym is a collection of benchmarks that challenge a model to predict the effect of mutations on the measured function on a protein sequence8. We focused on the collections of protein substitution variants created with Deep Mutational Scanning (DMS). The 217 total assays were collected into five assay categories: organismal fitness, enzyme activity, protein binding, protein expression and protein stability. The distribution of scores within each category gives an overview of the performance of each model.

Broadly speaking, the AA-0 and ESM-2 models performed comparably (Fig. 3). When examining the medians of the distributions, AA-0 was marginally better at tasks relating to predicting protein stability and marginally worse at predicting enzyme activity (though there is high overlap in the performance distributions). Tasks related to protein binding were challenging for both models, highlighting the difficulty of predicting interactions from sequence data.

Figure 3. Comparison of model performance using ProteinGym. The indicated models were used to score collections of protein sequences representing DMS substitutions. For each collection, performance is reported as a Spearman correlation between the model-derived score and the measured activity. 

The 217 assays are grouped into five categories by the type of property being measured. Box plots indicate the mean score for each category, as well as standard deviations and outliers.

The Owl benchmark, named for our in-house protein design software suite, was developed at Ginkgo to reflect tasks relevant for our work in commercial protein engineering. AI-guided protein discovery uses the model as an embedder to identify functionally similar proteins. Protein engineering is aided by scoring potential sequence variations that may be functionally relevant.

Owl includes 73 collections of protein sequence variants, each labeled with a functional measurement performed during the course of a real customer program. Examples of functional measurements include enzyme activity, specificity or expression titer. As above, we report model performance as a Spearman correlation between model scores and empirical measurements, grouping scores into categories to provide high-level overview (Fig. 4).

Figure 4. Comparison of model performance using Ginkgo’s Owl benchmark. The indicated models were used to score collections of engineered protein sequences. For each collection, performance is reported as a Spearman correlation between the model-derived score and the measured activity. 

The 73 assays are grouped into three categories by the type of property being measured. Box plots indicate the mean score for each category, as well as standard deviations and outliers.

Overall, we find roughly comparable results between the different models. Interestingly, we find many examples of a negative correlation between model scores and experimental outcome, particularly for the use case of predicting enzyme specificity.

Why might enzymes with improved specificity tend to have lower model-derived scores? The datasets collected for the Owl benchmark come from different kinds of enzymes for being engineered for different functional goals, making generalizations difficult. But this result might indicate important differences in the kinds of sequences that result from natural evolution and protein engineering. For example, an enzyme engineering project might seek to focus an enzyme activity on a particular target that is disfavored in a natural context. If evolution and engineering tend to move sequences in different directions, model-derived scores might negatively correlate with actual measured performance.

Fine-tuning improves performance on viral sequences

The UMDB does not represent a uniform sample of all naturally evolved protein sequences. It is primarily a collection of microbial DNA extracted from soil. As we explored AA-0, we were interested in how this bias in the training data might impact its performance.

The ProteinGym benchmark assays include proteins sourced from humans, other eukaryotes, prokaryotes and viruses. Breaking out the performance of AA-0 by taxon, we found substantially weaker performance on viral proteins (Fig. 5). We suspect this is a result of viral sequences being poorly represented in our training data. Viral sequences are particularly diverse, fast-evolving, and often unlike proteins found in cellular life forms. This result emphasizes the importance of learning from viral sequences directly to be able to model them accurately.

Performance on viral sequences improved markedly following 150k steps of additional fine tuning with the UniRef50 sequences. This improvement motivated us to include the UniRef50 fine-tuning in the model now available through the Ginkgo AI developer portal.

Figure 5. Model performance by taxon. The 217 assays of the ProteinGym ESM collection are grouped by taxon of origin: Human, non-human Eukaryote, Prokaryote or Virus. For each assay, performance is reported as a Spearman correlation between the model-derived score and the measured activity. Box plots indicate the mean score for each category, as well as standard deviations and outliers.

Conclusions

What drives the performance of an LLM? In different contexts, AI researchers have identified model size, training data, and compute as fundamental resources that govern a model’s scaling behavior9. Here we investigated the impact of training data on the performance of a protein sequence LLM. We supplemented the ~60M UniRef50 sequence clusters used to train ESM-2 with an additional 112M clusters from the Ginkgo UMDB. The resulting model, AA-0, showed comparable performance across a range of benchmarking tasks, indicating that training data alone was not a limiting resource.

Our experience with AA-0 holds lessons for the development of AI models for applied protein engineering:

The importance of data quality. In preparing AA-0 we explored a variety of strategies for filtering and sampling sequences from the very large UMDB. The selected strategy significantly impacted model performance, suggesting that further exploration in this area might lead to continued improvements. DNA sequencing technology is advancing quickly, leading to exponential growth in datasets and rapid proliferation in data collection techniques. Sequence-based AI models will benefit from standardized and optimized approaches to curate all this data.

The value of data representation. We found the AA-0-base model performed poorly on viral sequences, probably because they were sparsely represented in its training data. This weakness was partially corrected by additional fine tuning with UniRef50 sequences, and could also be improved by curating more representative datasets for future models.

The particular challenges of protein engineering. AA-0 performed well when predicting enzyme activity, a common task in the Ginkgo foundry. Interestingly, the model struggled to predict enzyme specificity, often producing scores that were negatively correlated with measured outcome. This suggests that engineered proteins may include sequence features unlike the evolved proteins used for model training. Future models may require new datasets that capture the features of successful engineered proteins, or may need other strategies to accommodate protein engineering as a use case.

The need for more task-specific data. In commercial protein engineering projects at the Ginkgo foundry, LLMs are not used to generate functional proteins de novo. Instead, libraries of generated sequences are built and tested for a particular desired function. These results from assay-labeled libraries become training data for additional rounds of AI-guided engineering, leading to performance improvements greater than those achieved with sequence-based models alone. Future models will benefit from new datasets assay-labeled for functional outcomes of interest including substrate affinity, enzyme specificity, and expression in particular microbial hosts.

AI can make biology easier to engineer. This is the first of many intended releases from the Ginkgo AI team. We are excited to begin peeling back the curtain and enabling bioengineers across the world to access our technologies. As we scale up our training efforts (we are currently training models 10x larger than these and more!), we will be eager to share our findings and plan to make the resultant models available to the community.


Ready to see what’s possible? Visit our developer portal to access everything you need to start using the API’s free tier, including detailed documentation, tutorials, and sample code. Access the portal today and be among the first to explore our new API. — And to get you started, we’re offering 2,000 sequences (i.e. ~1M tokens) of free inference in our initial language model! Just fill out the form below.


References

1. Lin Z, Akin H, Rao R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379(6637):1123-1130. doi:10.1126/science.ade2574
2. Yu T, Cui H, Li JC, Luo Y, Jiang G, Zhao H. Enzyme function prediction using contrastive learning. Science. 2023;379(6639):1358-1363. doi:10.1126/science.adf2465
3. Ruffolo JA, Nayfach S, Gallagher J, et al. Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences. bioRxiv 2024.04.22.590591. doi:10.1101/2024.04.22.590591
4. Richardson L, Allen B, Baldi G, et al. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Research. 2023;51(D1):D753-D759. doi:10.1093/nar/gkac1080
5. Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2015;31(6):926-932. doi:10.1093/bioinformatics/btu739
6. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35(11):1026-1028. doi:10.1038/nbt.3988
7. West-Roberts J, Kravitz J, Jha N, Cornman A, Hwang Y. Diverse Genomic Embedding Benchmark for functional evaluation across the tree of life. bioRxiv 2024.07.10.602933. doi:10.1101/2024.07.10.602933
8. Notin P, Kollasch AW, Ritter D, et al. ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design. bioRxiv 2023.12.07.570727. doi:10.1101/2023.12.07.570727
9. Hoffmann J, Borgeaud S, Mensch A, et al. Training Compute-Optimal Large Language Models. arXiv:2203.15556. doi:10.48550/arXiv.2203.15556

Acknowledgements

Thanks to the Ginkgo protein engineers, software developers and AI experts who helped to build AA-0: Zachary Kurtz, Matt Chamberlin, Eric Danielson, Alex Carlin, Michal Jastrzebski, Dana Merrick, Dmitriy Ryaboy, Emily Wrenbeck & Ankit Gupta.


Humans of Ginkgo: Christian Lorenz Talks Yarrowia Lipolytica & Fatty Acids

Yarrowia lipolytica has become a workhorse for metabolic engineering of molecules derived from fatty acids.

In cars, a chassis is a base frame to house the engine and working parts. Program Director Christian Lorenz discusses how Ginkgo developed standardized strains, workflows and DNA parts for the yeast Yarrowia lipolytica to help launch innovative biological engineering campaigns.

Humans of Ginkgo is an interview series featuring Sudeep Agarwala interviewing some of the brilliant folks at Ginkgo to learn more about the technology that makes our work possible.


Sudeep Agarwala: I’m speaking from my personal bias, but when I think of yeast I usually think of Saccharomyces cerevisiae. Why is Yarrowia lipolytica such a powerful tool?

Christian Lorenz: Sure–I think there’s a good reason you don’t think of Yarrowia when someone mentions yeast. Most people would probably agree that Yarrowia is still a non-traditional yeast, but given how important it is in industrial biotech, there’s a lot more interest in the yeast for academic circles.

Basically, as the name suggests, Yarrowia lipolytica has been of interest because of its ability to accumulate lipids, or fat. We’ve seen natural strains accumulate 20 percent or more dry cell weight in lipids–I believe there are some reports of 40%, even. Some engineered strains can even have a higher content of fats.  So Yarrowia is an expert in making fat and has a high flux through the fatty acid biosynthesis pathway.

This pathway is upregulated when the cells experience nitrogen limitation–there’s a starvation response that triggers lipid production. They utilize the carbon in the media, usually in the form of glucose, but in the industrial setting it can be something cheaper like ethanol, glycerol, acetate, or even different types of oil, and start accumulating fat, which are stored in lipid bodies in the cell.

SA: But there are a lot of yeasts that produce fatty acids–what’s so special about Yarrowia that it’s able to do it at such high levels?

CL: This is something that’s in common with an entire family of oleaginous yeasts, they’re called. In these yeasts, the precursors for the fatty acid synthesis pathway–acetyl CoA, which converts to malonyl CoA. In oleaginous yeasts, citrate is produced in the mitochondria; it is shuttled into the cytosol, where an enzyme, ATP citrate lyase, converts it into acetyl CoA, which leads to fatty acid biosynthesis. The flux through this pathway is much higher in these yeasts.

SA: This lifecycle seems pretty great for metabolic engineering — I know Yarrowia’s a workhorse for metabolic engineering off the fatty acid production. And in protein production, it’s great for lipase production.

CL: Lipases, and a lot of other hydrolytic enzymes too. But there’s an interesting use in small molecule production as well: because of the metabolism, Yarrowia is a good acid producer and that usually means that it can tolerate low pH in fermentation. That’s important because for some yeasts, they die at low pH. So when you engineer these yeasts for some small molecules that drop the pH, you have to add a lot of base to your fermentation. That’s not the case for Yarrowia, which we’ve seen do well as low as pH 3, even.

SA: That’s the good — I want to talk about the ugly, though. Yarrowia is known as a dimorphic yeast, meaning it has ovoid cells, but can also form these filaments that can cause real issues when it comes to fermentation. How do you deal with that?

CL: This is a really good point. First of all, we have a collection of more than 30 different naturally-occurring strains of Yarrowia at Ginkgo that we’re allowed to engineer with. Each of these strains are Yarrowia lipolytica, but they differ a lot in how they behave in how we engineer them, and how they behave in different fermentation processes. So for any particular engineering project, we have a wide choice of strains that we can test to see which ones will behave the best in our process.

SA: I feel like we’ve buried the lede — Ginkgo has more than 30 different wild Yarrowia strains?

CL: Sorry — yes. More than thirty, and we’ve gotten them being very careful about restrictions on freedom to operate and IP restrictions. And we’ve done a lot of work characterizing them: there’s the obvious questions about how they behave on solid medium on a petri dish vs how they grow in a liquid medium. But we’ve tested them further: how much flux do they have through the fatty acid biosynthesis pathway? In different fermentation conditions how tolerant are they to different process conditions? how well do they tolerate different pH? etc. We’ve gained quite a bit of information for them, so we can make intelligent decisions about how we deploy them in different engineering projects.

SA: What does it take to be able to genetically engineer more than 30 different Yarrowia strains?

CL: Well, we think about strain engineering in a Design-Build-Test-Learn cycle.

In terms of Design, we have standardized DNA tools that we know will work in most, if not all, of these strains. That is, we have standard promoters, terminators, drug selections on these engineering tools. And they’re targeted to parts of the genome that we know are easy to target.

That leads to Build, where we’ve developed methods of culturing and effectively introducing this DNA into these strains pretty effectively–transforming a wide range of these hosts in high-throughput is a pretty standard operation at Ginkgo.

For Test, like I mentioned before, we have a good understanding of how these strains behave in a wide variety of conditions–liquid media, solid media, and fermentation vessels. We know how to cultivate these strains in high-throughput so we can measure how much product–small molecule, protein, etc.–they’re producing.

SA: I hope you don’t mind the direct question, but at Ginkgo we refer to “chassis strains” when we’re developing an engineering plan. What exactly is the Yarrowia chassis at Ginkgo?

CL: I think we’ve done a good job summarizing that here, actually. A chassis is a frame or a housing for a piece of technology. Part of our engineering efforts at Ginkgo have been to make that framework in Yarrowia: we have strains, genetic tools for targeting and expressing DNA, protocols for building libraries of yeast strains in high-throughput, and protocols for testing these different strains in high-throughput. There will be some edge cases, always, but with these basic elements form a very powerful framework for biological engineering.

So when a new project comes in, we don’t have to think about developing things from scratch or bringing in new yeasts and spend a lot of time testing them or developing processes for them. Instead, we have our engineering chassis: we’ve built the engineering infrastructure for Yarrowia so that our customers can bring exciting products to the market fast.


Christian Lorenz came to Ginkgo Bioworks after completing his PhD work on bacterial protein secretion systems with Ulla Bonas at Martin-Luther-Universität Halle-Wittenberg, and post-doctoral work on Pseudomonas aeruginosa in the lab of Stephen Lory at Harvard Medical School in Boston. At Ginkgo, he is an organism engineer specializing in metabolic engineering for small molecule production in bacteria and yeast systems.

Humans of Ginkgo: Sneha Srikrishnan Talks Protein Services

Ginkgo’s Protein Services leverage a diverse technological platform to provide support our partners’ R&D.

Sneha Srikrishnan, Senior Director of Business Development at Ginkgo, discusses how Ginkgo’s business model customizes partnerships to align with the developmental stage of partners’ products, offering tailored R&D services from strain engineering to fermentation optimization. 

Humans of Ginkgo is an interview series featuring Sudeep Agarwala interviewing some of the brilliant folks at Ginkgo to learn more about the technology that makes our work possible.


Sudeep Agarwala: Sneha, you’ve been at Ginkgo more than seven years and have done everything from strain development workflows in the foundry all the way to talking to customers about how Ginkgo approaches protein expression issues. 

Sneha Srikrishnan: Well actually, I’ve been thinking about protein production for my entire professional career–I researched this as a graduate student and postdoc and you’re right–when I first came to Ginkgo, I was working on developing a lot of the early workflows in the foundry for engineering different types of yeast. It’s been incredibly gratifying seeing those workflows grow into what Ginkgo’s offering now and it’s also why I’m so eager to talk about Ginkgo’s place in protein production in my current role in business development at Ginkgo.

SA: I know you’ve talked here about how Ginkgo works with partners in the protein space. I’d like to get a sense of how the partnering work–an earlier-stage product may have a very different scope of work associated with it than a more mature product that is already out on the market. So how does a company’s stage in R&D play a role in how we partner with them?

SS: We’ve thought about how to work with a wide range of partners in the protein space and the different stages of products that they’re working on. You’re right that early-stage products may require more extensive R&D in terms of strain engineering or other aspects of product development. Add to this, many early-stage products, either at start-ups or at larger, more established companies, may not have a lot of cash to outsource R&D–either because people are working on these early-stage products as a preliminary proof-of-concept to see if it has legs or because people are still working on funding for their company.

Late stage products can have a different scope associated with them. And by this, I mean, projects that are already on the market, so maybe there’s less strain development and more emphasis on fermentation optimization, processing, with less of an emphasis on strain engineering.

In my previous conversation, I’ve talked about how Ginkgo’s Protein Services’ Offerings are aimed to help projects at a wide variety of stages. But there’s another important point to discuss here in terms of how we try to maintain flexibility in our business model with these service offerings, and structure it so that we are tying our success to the success of our partner.

In general, we divide the cost of a project into two buckets. There’s an R&D fee, which we don’t generate margin from. The second bucket is downstream value–and this is where we tie our success to the success of our partners.

Realizing downstream value can be in the form of royalties on sales, a flat percentage of revenues, lump-sum payments on commercial milestones, or any combination of these. Or, for whatever reason, if it doesn’t make sense to have this payment tied to sales, we’ve been able to find ways to solve these situations as well–single payments, or a single payment broken up into different chunks; we try to be flexible on how this downstream value is realized and we try to work with our partners on how this can work.

What I’m trying to emphasize here, though, is that Ginkgo wants our partners to be successful, and that we’re willing to work with our partners to index on that success.

SA: I think maybe you’ve answered this question indirectly, but I want to make sure: when does it make sense to come work with Ginkgo versus, say, a CRO?

SS: I like this question because I get it in different ways from many potential partners. I think of it like this: companies can go to CROs with a very defined service that they are looking for–in many instances, this is the kind of work that companies have the capability but not the bandwidth to do or can’t justify the expense to do in-house for a project that might be in its very early stages. 

Ginkgo’s cell engineering platform is different. The goal of developing this platform is to bring a wide range of technologies to bear on a single project; and in some cases we’ve combined part of the platform into our Service Offerings, which makes sure that we’re providing state-of-the art technology on our partners’ R&D.

SA: And, I suppose, there’s an effect of streamlining the work through Ginkgo in these Service Offerings, versus having to coordinate with a lot of different CROs or academic groups.

SS: Exactly–there are efficiencies when you’re bringing a product to market with Ginkgo, and this is what makes it a great place for companies interested in end-to-end research and development. In progressing your project from early stages–discovery or early strain development–all the way to fermentation optimization and scale-up, there’s an efficient knowledge-transfer that comes between the different teams working on your strain that can really shave a lot of time off your commercialization process. Imagine if instead you have to tech transfer from and to a different CRO every time you needed to move to the next stage of your project.

And there’s one other point that I think is worth mentioning here–that efficiency brings with it flexibility. Let’s say for example you’re developing a strain using rational engineering methods and are hitting a wall with reaching the target titers. Ginkgo can quickly move to unbiased engineering methods without having to identify a partner who can do this work.  We know how to do this quickly because they are already part of our broad platform: the teams for rational engineering and unbiased engineering collaborate to provide new direction to the project (sometimes they’re even the same people). And this is how we can quickly shift gears to this new direction in the project.

We think about this efficiency in our platform in terms of “more shots on goal” at making a project successful. I think this is something to really emphasize: you can potentially mitigate a lot of technical risk by working with Ginkgo. Having many workstreams easily accessible in one place allows you to try different things quickly as you move forward to commercialization. And at a place like Ginkgo, we’re there to support the development that needs to happen so you can focus on the product-facing work needed to commercialize your protein.


Sneha Srikrishnan is Senior Director, Business Development at Ginkgo Bioworks, leading Protein sales & Product management. She previously served on the technical team of Ginkgo as a Sr. Director of platform technology for enzymes and protein production. Prior to Ginkgo, Sneha worked at Gevo, Inc. as a scientist developing yeasts for commercial production of isobutanol. She has over a decade of industrial experience in successfully delivering synthetic biology-based solutions within the nutrition & wellness space, in sustainable fuels, waste valorization and environmental remediation, and holds patents in these areas. Sneha graduated with a Bachelors in Chemical Engineering from the Indian Institute of Technology, Bombay and earned her Ph.D. in Chemical and Biochemical engineering from the University of California, Irvine. Sneha is passionate about food security and circularity.

Humans of Ginkgo: Applying Selections and Strain Improvement

Ginkgo’s Selections and Strain Improvement (SSI) group empowers screening for metabolic engineering – Part 2

Ariel Langevin, Ginkgo’s current Head of Strain Engineering, and Adam Meyer, former Head of SSI and now part of the Foundry Leadership, talk about how SSI has been applied in the past and how the group is working to meet the demands of mammalian cell engineering and broader bio-based industries.

Humans of Ginkgo is an interview series featuring Sudeep Agarwala interviewing some of the brilliant folks at Ginkgo to learn more about the technology that makes our work possible.


— This is the second part of a two-part interview. Read Part 1 here

Sudeep Agarwala: In the first part of this discussion, we talked in some general terms about how SSI helps with metabolic engineering at Ginkgo.

I wonder if we could get into specific details? Adam, since you led SSI for a while, I wanted to turn this question to you first. Tell me about some of your favorite projects.

Adam Meyer: So the one that’s very near and dear to my heart is when we made the synthetic expression system for Pichia pastoris. Developing this system was just one part of a larger project we were working on, but that expression system has subsequently become the backbone of a lot of the protein expression projects that are currently at Ginkgo.

This project was a collaboration between SSI, NGS, and Fermentation. But in doing this work, it felt like it was a sort of “coming out party” where it became obvious the type of things that we could do.

For this project, we took on the design of hundreds of thousands of combinatorial synthetic expression systems and we were able to construct these in the Pichia expression host. But we didn’t test them individually–I mean, how could we? You’d need as many fermenters as you have designs and we didn’t have 100,000 fermentation reactors.

So we did a pooled approach to find the best expression system. We put all of our library together in one fermentation vessel, and ran a very ordinary fermentation process for Pichia–no fancy bells and whistles. We wanted to find the expression system that gave us blockbuster expression under very standard conditions without having to reinvent fermentation, either in the lab or, maybe more importantly, at scale.

And at the end of the day, we found ones that performed amazingly, frankly – far better than the best-in-class expression system. And I don’t know that we could have found those in any other, I should say, reasonable way.

SA: I remember seeing that development process and it yielded some pretty spectacular results for our customer. Ariel, tell me about some projects that stand out for you?

Ariel Langevin: Over the years, there’ve been several good ones. One set of projects that comes to mind was developing and deploying a biosensor to detect a small molecule of interest. This was a bacterial project–in E. coli. We developed a biosensor that would cause any cell producing the small molecule to emit fluorescence.

After developing this biosensor, we deployed the first version to screen a million member strain library in a pooled fashion–the SSI way. And this biosensor made the project successful! We were able to find strains that produced higher concentrations of the compound. Last year, for the same project, we deployed a second version of the biosensor to screen another library and identify strains with increased titer.

I like this story because we were able to see the entire arc of the project from start to finish, and there were variants of the million member libraries that would have been impossible to find without this tool.

SA: Before we move on, I notice you’ve both rattled off a bunch of Latin–and that got me curious: SSI doesn’t only work with yeast and bacteria, right?

AL: That’s right! That’s one of the things I love most about working in SSI and at Ginkgo: we get to work with so many different organisms and across so many different platforms to really accelerate the work of strain engineering. 

SSI has a lot of expertise with model yeast and bacteria, but we’ve also worked with filamentous fungi, and anaerobic soil bacteria that, while industrially important, have little in the way of available genetic tooling. Non-model organisms are more than welcome in SSI!  A lot of our workflows are organism agnostic–meaning that the same or similar SSI protocol can often be used for whichever organism the project may be working with. At Ginkgo, we’ve seen that even today being able to generate and screen a large library of random mutants for any organism in a fraction of the time of the usual methods is still a really powerful capability that has applications in a huge number of industries.

And we’re actively expanding our capabilities so that the scope of SSI’s work includes more mammalian cell lines, plant cell lines, and microalgae among other forms of life.

SA: I want to thank both of you for your time and begin wrapping up. First, Adam, you’ve been leading SSI for the past few years. What was that like? Where are you going next?

AM: Well, I’m not going anywhere–I’m still staying at Ginkgo! My official title is Senior Foundry Lead. I’m going to be taking on a broader role overseeing SSI, ALE, and EncapS making sure that those groups integrate well into the rest of the Foundry, and can be more flexible with the demand they’re seeing from the projects that are coming through Ginkgo. I’m here to make sure that we get more and more efficient at hitting our partners’ goals as well as scaling our Foundry platform further.

SA: Ariel, now that you’re taking over Adam’s previous role, what’s next for SSI?

AL: As Adam alluded to, there’s going to be a lot of work to make sure we’re supporting our partners’ projects as they move through design-build-test-learn cycles in the Foundry and that we’re making that process as smooth as it can be. There’s already been a lot of innovative tools that have been developed to make this possible, but there’s still more that we plan to do.

And part of this is expanding and solidifying our capabilities. We’ve put in a lot of energy into  developing workflows for microbes–bacteria, yeast, and fungi. There’s going to be a lot of exciting opportunities in growing the team to support mammalian workflows. As Ginkgo does more work with engineering mammalian systems, there’s going to be an increasing need to develop processes to screen them efficiently.

In the first half of this conversation, Adam talked about how rational engineering and our group really make a complete package of complementary approaches to cell engineering. Making sure that these techniques are as robust in mammalian cell engineering as they are on the microbial side is really going to provide a powerful cell engineering platform that can impact all parts of the bioeconomy.

— This is the second part of a two-part interview. Read Part 1 here

Work With Us


Adam Meyer did his PhD work at UT Austin with Andy Ellington developing novel directed evolution methods, which he applied to the engineering of T7 RNA Polymerase.  He continued developing these methods with Christopher Voigt at MIT, where he improved the performance of small molecule biosensors.

He came to the Selections and Strain Improvement (SSI) Team at Ginkgo Bioworks in 2018, where he led the efforts for the team’s core technologies: 1-pot library generation, pooled screening, directed evolution, and genome editing.  Adam led the SSI Team from 2020 through 2023, and is now part of the Foundry Leadership Team, with a focus on deploying the SSI, EncapS, and ALE technologies.

Ariel Langevin, PhD, completed her doctoral work in Mary Dunlop’s group at Boston University, where she studied the dynamics and evolution of antibiotic resistance. She joined the SSI team at Ginkgo in 2020. At Ginkgo, she has focused on developing protocols for generating 1-pot libraries, workflows for multiplexed assays, and performing fluorescence-based and growth-coupled selections. Currently, she is the head of SSI at Ginkgo.

Humans of Ginkgo: Intro to Selections and Strain Improvement

Ginkgo’s Selections and Strain Improvement (SSI) group empowers screening for genetic engineering – Part 1

Where rational engineering hits roadblocks, unbiased strain development can come in to help. Ginkgo Bioworks’ SSI team finds ways to accelerate unbiased techniques to take metabolic engineering to the next level, fast. The current and former head of SSI, Ariel Langevin and Adam Meyer, discuss how SSI complements rational strain engineering at Ginkgo.

Humans of Ginkgo is an interview series featuring Sudeep Agarwala interviewing some of the brilliant folks at Ginkgo to learn more about the technology that makes our work possible.


— This is the first part of a two-part interview.—

Sudeep Agarwala: Ariel, you’re taking over Adam’s position as head of SSI, and Adam, you’re moving on to a more senior role in Ginkgo’s Foundry, so I’m really pleased to have this opportunity to speak with both of you about SSI–its past and where it’s going–during this transition.

But some background first — Adam, you’ve been at Ginkgo for six years now, and this entire time, you’ve been thinking about making mutants and screening them. How does that play into developing strains for industry?

Adam Meyer: For decades, you could even argue for centuries, industrial microbiology has been based on screening random mutants. And I mean that this is what scientists did before we had developed the technology to synthesize and transform cells with DNA–I mean, this is even before people knew that DNA carried information.

So in these cases, say you have a bacterial strain that you want to produce more protease, for example. Traditionally (and this is from before genetic engineering or even the ability to introduce DNA to organisms) you would take that strain and add a DNA damaging chemical that would create different mutants of the original strain. Most of the resulting mutants won’t perform any better, many won’t perform as well, but in some of them, you’d hit the DNA just right and come up with an organism that actually makes more protease, in this example. 

Traditionally, this has taken a lot of human work: teams would test thousands, tens of thousands, of mutant strains one by one to find which one performs better compared to the parent. If you’re clever (or if you’re lucky) you don’t have to screen brute force like this, – you can just grow the whole population of mutants on a drug or some other specific condition that will only allow the best performers to live, meaning you can massively enrich for maybe only a handful of gifted mutants from among hundreds of millions of mutants. It’s a numbers game–the more mutants you can test at a time, either by screening each one individually or (if you can) by selectively enriching for just the strains you’re looking for, the better. You can think of this as Moore’s Law!

SA: I’d love to get into more detail: how does SSI play this numbers game?

AM: Well, our group, SSI, which stands for Selections and Strain Improvement, is in charge of identifying mutants with the performance we want. Our approach is generally to try to find ways to monitor these best performers in bulk, as opposed to screening each and every mutant one by one. 

SSI asks, essentially: why not just put everything all into one big pool? Our group designs strategies so that instead of screening through each mutant, you can find a way to easily select the ones that have the best performance.

For example, we can have a fluorescent reporter in the strain so that cell that glows the brightest is the one that we want. So when we make the mutants, we can throw all of them into one flask, culture them, then, using cell-sorting technologies, select the cells that glow the brightest.

Maybe another example is with binding. Say you want to find a strain that binds a compound particularly well. Instead of testing each variant individually, you can flow them over a column and just select for the things that stick to that column, as opposed to doing a whole bunch of individual binding characterization studies.

Ariel Langevin: One thing I’d like to add is that approaches like this win when it comes to scale. In the traditional methods, screening 10,000 colonies is a big lift. When you’re able to convert an arrayed screening campaign into one where you’re just selecting for the best ones in a single pot, it’s much more straightforward to identify the best performers out of hundreds of millions of variants.

So it’s worth spending that time to think up ways to find ways to select the best players from a pooled approach–you can test many orders of magnitude more candidates and have a higher chance of success.

SA: This is a really great point Ariel. I’m curious how that works at Ginkgo: when do people decide to work with the SSI team? Does every project have some involvement from SSI?

AL: There’s a couple of factors that come into play when folks are deciding how to leverage the SSI team. Scale is the most important one. If a project needs to test millions of strains, it’s going to be very hard to fit that into 96 well plates, or even 384 or 1536-well plates for that matter. This really comes into play when our partner would prefer to stay away from genetic engineering of their strain  because for their particular instance, genetic tool development would require quite some time, or because their market is looking for non-GM techniques for commercialization. Usually in these cases we turn to generating a large, diverse library of random mutants before down-selecting the best-performing strains.

There are a couple of tools we can deploy to do this in a pooled way, as Adam was talking about before. Historically, SSI has typically had a heavy focus on developing biosensors. This means, we engineer cells to express proteins that can bind to compounds of interest inside a cell and give an output signal–usually fluorescence. We’ve also seen some great advantages with anti-metabolites–compounds and proteins that interfere with a cell’s natural metabolism and result in growth changes. Both of these preliminary methods enable us to both screen huge numbers of strains, and also to identify the ones that are performing the best much more quickly compared to an arrayed screening approach that is still the gold-standard in many industries.

In 2022, Ginkgo made two acquisitions that complement these techniques. The first is EncapS–which encapsulates cells in nanoliter reactors. These nanoliter reactors can enable fluorescent readouts of the cell’s metabolism or productivity, and it’s a really elegant way to rapidly sort through hundreds of thousands of cells in a single run and monitor how they’re functioning.

The second technology is Adaptive Laboratory Evolution, or ALE, where you start with a population of cells and grow them continuously under different selective pressures. Over time, the cells that grow best in the conditions take over the population. It’s more nuanced than that, but that’s an overview.

So with these two technologies, we’ve been able to have more of an impact on projects that come through Ginkgo.

AM:  That was great Ariel–may I add two things?

AL: Please do!

AM: You asked whether SSI is involved in every project. To be clear: not every project that comes into Ginkgo involves SSI.  Not every phenotype is amenable to a pooled approach.  When there is a fit, pooled methods are extremely powerful, so we end up contributing to a substantial fraction of programs.  

And a good chunk of this work is scoped from the very beginning, and really comes into play, like Ariel said, when our partner really wants to stay away from synthetic DNA, and when we’re looking at random mutagenesis. We’ve seen a lot of people in food and agriculture with these requirements. In these instances, the team designing the project will say: “Hey, this project clearly has an organism that needs to have some sort of output that’s amenable to a pooled screen or selection.” This is where we sit down and plan where we step in and what we’re going to deliver.

But another good chunk of the work we see is: “Hey, we thought that we could hit the titer for the customer by engineering this particular enzyme or pathway, etc. But it turns out we’ve hit a wall and we can’t improve this strain anymore.” And that’s where SSI comes in: we’re called in to “unstuck” a project that has hit some really hard walls.

In my personal opinion, I think this is where SSI becomes really valuable. When “traditional strain engineering” has come up on limitations, we take an alternative approach. Rational engineering tries to tell the cell what to do. SSI’s approach is to give the cell millions, hundreds of millions of different options. And our team has the ability to screen through millions–hundreds of millions of different options and in doing this, we’re really asking the cell which option gets us where we want to go.

The two approaches really complement each other and work off of each other to deliver effective solutions for metabolic engineering.

—Stay tuned for Part 2, next week—

Work With Us


Adam Meyer did his PhD work at UT Austin with Andy Ellington developing novel directed evolution methods, which he applied to the engineering of T7 RNA Polymerase.  He continued developing these methods with Christopher Voigt at MIT, where he improved the performance of small molecule biosensors.

He came to the Selections and Strain Improvement (SSI) Team at Ginkgo Bioworks in 2018, where he led the efforts for the team’s core technologies: 1-pot library generation, pooled screening, directed evolution, and genome editing.  Adam led the SSI Team from 2020 through 2023, and is now part of the Foundry Leadership Team, with a focus on deploying the SSI, EncapS, and ALE technolgoies.

Ariel Langevin, PhD, completed her doctoral work in Mary Dunlop’s group at Boston University, where she studied the dynamics and evolution of antibiotic resistance. She joined the SSI team at Ginkgo in 2020. At Ginkgo, she has focused on developing protocols for generating 1-pot libraries, workflows for multiplexed assays, and performing fluorescence-based and growth-coupled selections. Currently, she is the head of SSI at Ginkgo.

Humans of Ginkgo: Applications and Opportunities of Adaptive Laboratory Evolution

Automated ALE harnesses the power of evolution — Part 3

Automated ALE has already proven a powerful player in the toolkit available for strain improvement at Ginkgo. Here, Simon Trancart, head of ALE at Ginkgo, discusses how partners have worked with Ginkgo in the past, as well as ongoing work that is aimed at making Automated ALE at Ginkgo accessible to new industries.

Humans of Ginkgo is an interview series featuring Sudeep Agarwala interviewing some of the brilliant folks at Ginkgo to learn more about the technology that makes our work possible.


— This is the final part of a three-part interview.—

Read Part 1, Why ALE?, here

Read Part 2, Inside ALE, here

Simon Trancart, Ginkgo's head of ALE

Sudeep Agarwala: In thinking about how different groups could interface with ALE at Ginkgo, it sounds like there are a few different scenarios: a first case in which ALE is part of a larger engineering program at Ginkgo. There’s another case in which a customer’s done a lot of work beforehand on their strain or maybe has been using that strain for years at commercial scale and just wants the output of ALE without a lot of characterization. Then maybe there’s this other hybrid case where the customer wants the strain and there’s characterization about what mutations have come into the strain, how it’s performing in high detail, etc.

Simon Trancart: When a customer comes with the sole goal of improving the commercial strength of their strain, most of the time, they don’t want to pay an additional bolus of money and time that would be necessary to understand what the mutations are. So of course we can offer a limited scope of work in these situations. And that’s fair: in many instances there’s no need to do extra work to understand the mutations; performance and time to market is what matters here.

But I would say that for earlier stage programs where ALE is part of the R&D process or programs of course we’ll look for mutations. If we think it’s relevant, then we can learn from it too. And if we demonstrate by retro-engineering the parenteral strain with what we believe are the causative mutations that they impact the phenotype, that’s a very powerful way to validate.

So I think of ALE as an evolutionary engine to generate mutations that can be added to our understanding of biology. That’s where I think there is a very important value as well.

SA: Ginkgo has a huge number of resources in its Foundry. If all I wanted was the ALE service, is that something that Ginkgo would offer?

ST: Absolutely! And I believe we are the best partner for it. ALE has become trendy and we recently have seen startups and spinoffs from academic labs that propose competing services. They’re probably cheaper as they need to penetrate the market. But from our perspective, the automated ALE that we’re working with has been validated for a wide range of organisms and applications. It includes certain selection modes that we believe are unique and have a lot less inherent risk than the other competing systems out there; we’ve worked hard to ensure that we have a superior technology. And, in thinking about how we partner with customers, we’re trying to be creative with our pricing, so that what we’ve built can be accessible to startups that need to achieve milestones quickly or industrial players that need to get a return on investment faster.

Yes, we do projects that are mostly based on the use of automated ALE for customers that are looking to get started with strain improvement, others looking for cost reduction through adaptation to new conditions such as new feedstocks or higher temperature, or other applications accessible by ALE. Our experience means you have a better chance of success. Having said that, Ginkgo has great power as a one-stop shop where you can have a full external program. That way you don’t have to coordinate development between separate teams. I think Ginkgo creates even more value to customers in this type of projects: the way that we reduce costs is actually to improve the efficiency of an R&D workstream.

SA: What types of things are being developed for automated ALE at Ginkgo?

ST: We had a successful, I would say  “proof of concept” experiment with filamentous fungi that produced very large filaments. We were positively surprised by the results because we could perform all the basic fluidic operations from transferring from one chamber to another, taking samples, diluting, et cetera, without too many issues.

The only issue is that the optical density measurement was very noisy. But I would say that we are pretty confident that it should work with low-viscosity Aspergillus strains that are at Ginkgo because they behave almost like yeast. Right now, we are working on a proof of concept with two of those low-viscosity strains, to evaluate the suitability of our automated ALE system with that type of organism.

We have also worked with acute myeloid leukemia cells. Even though it was a very short run, it was promising. I think that there is potential for other non-adherent mammalian cell lines as well. But we will need to investigate this further. We are evaluating how we can engineer our system for a wide range of cell lines.

SA: You’ve talked about how ALE can be used in conjunction with genetic engineering techniques or alone in unbiased strain construction. What are some of the more creative uses of ALE you’ve seen?

ST: I do also believe that our capability to continuously cultivate organisms for a very long time can be interesting for other applications than improvements through evolution. We have one customer who has been using our technology for many years. And in the last few months, they have been using it to benchmark different strains against the genetic stability criterion to choose the very one strain that they were going to inoculate in their first commercial fermentor. And they were concerned that there would be genetic drift because it’s a continuous process and their scheduled maintenance is every three months.

They wanted to have a very stable strain and they thought that they had no other technique that could reproducibly expose each of the different candidates to stresses similar to those they’ll see during the long fermentation. We developed a system that can expose strains to reproducible conditions for long durations and that could get close to those stresses of their particular process. But of course, it’s important to note that since automated ALE is at lab scale, we could not really mimic industrial conditions.

We’ve also been talking about the strain as the output of automated ALE, but the evolution can also tell us about certain products’ efficacy as well. For example this system can also be used, for instance, as a pre screening tool for antibiotic molecules or  prebiotic/probiotic strains and compounds, where we would inoculate our system with a microbiome model or organisms, and monitor how these molecules or strains modulate the population in the continuous cultivation over time–how the residence time of the product or what is the impact on the population, etc.

So the capability to be able to cultivate cells for a very long period is powerful. And being able to maintain sterility and prevent biofilm formation while monitoring the genotypic and phenotypic in that population presents a versatile tool that has applications in a wide range of fields.


Simon Trancart joined Ginkgo through the acquisition of Altar, a French biotech company he co-founded and led as CEO. Altar specialized in automated adaptive laboratory evolution (ALE), a niche that Simon navigated with his background in engineering and civil engineering.

At Ginkgo, Simon leads the Adaptive Laboratory Evolution, based in Évry-Courcouronnes, France. Simon’s work focuses on the automated ALE process, which the performance of ALE campaigns. He has been instrumental in integrating the ALE team’s work with Ginkgo’s foundry services, enabling better execution and insight into ALE. Simon’s expertise extends to the application of ALE in various organisms and its coupling with rational design.

Humans of Ginkgo: Inside Adaptive Laboratory Evolution

Automated ALE harnesses the power of evolution — Part 2

Ginkgo’s head of ALE, Simon Trancart, discusses how ALE at Ginkgo with the Genemat technology overcomes the challenges of contamination and biofilm production. Ginkgo leverages ALE  to deliver genetic variants that have been carefully shaped by natural selection in the laboratory.

Humans of Ginkgo is an interview series featuring Sudeep Agarwala interviewing some of the brilliant folks at Ginkgo to learn more about the technology that makes our work possible.

 


— This is the second part of a three-part interview.—

Read Part 1, Why ALE?, here

Read Part 3, Applications and Opportunities, here

 

Simon Trancart, Ginkgo's head of ALE

Sudeep Agarwala: Adaptive Laboratory Evolution — ALE — is a powerful tool that’s been around for a long time — I believe you said the 1940’s. Maintaining a culture under a constant growth rate for weeks, months, even years, harnesses the power of evolution for strain improvement campaigns. But making an automated system has remained challenging. Why is that?

Simon Trancart: The idea behind laboratory evolution is that you keep a suspension growing permanently.

One way to do ALE is to do serial passaging, which involves the sequential transfer of microorganisms from one growth medium to another. When the culture is transferred to a fresh medium (a passage), only a small portion of the culture is carried over. If a mutation confers a fitness advantage, the organisms with that mutation will grow and reproduce more quickly than those without it. Over time, they will make up a larger and larger proportion of the culture.

Another–I would argue, more effective–way is to cultivate continuously at constant volume. You add sterile medium to sustain growth; you also have to withdraw the same volume that you added. And that creates a competition, because there is growth on the one hand, and you have dilution of the population on the other hand, and so only the microbes that grow at a given base rate will actually have a probability to survive and transmit their genetic heritage over time. Here also, a beneficial mutation will progressively dominate in the population.

When you attempt to automate one or the other way to do ALE, you may face contamination issues. For ALE, you want something to be very, very reliable.  And beyond contamination, there are other issues. In continuous culture in a single vessel, you will have biofilms, whereas serial passaging is really hard to automate. We know that robots need maintenance and if you want to explore thousands of generations, you might really be exposed to failure and interruption of your experiment. 

SA: But you’ve found a way to reliably automate this?

ST: Yes, and this required tackling critical issues: biofilms and contamination. In any system where the culture is being maintained in a single vessel, eventually, you will get a  biofilm at some point. 

That is, nature finds an easy way to cheat the system. Over time, in order to stay in the vessel and escape selective pressure, cells will stick to the vessel wall. Evolutionarily, it’s very effective–finding a physical way to remain in the population. Everywhere you have this kind of long-term cultivation under selective pressure, you will find biofilms, like you find in dental plaque, or wastewater stream infrastructure, et cetera.

In our Genemat system at Ginkgo, we have two chambers and the culture resides in one of them. I can transfer this culture to another refuge chamber so that I can sterilize and then rinse the principal vessel. 

Meanwhile, the culture is safe in the other. And after we have restored the original conditions, I can transfer back. And then I can sterilize and rinse the second, refuge chamber. And I can complete a cycle by the end of which the probability of survival by sticking somewhere is absolutely zero. So the idea is that we just get rid of these biofilms and in a closed set up, which means I do not need to replace containers to open tubes or whatever, or to manually interact with the system. This paved the way for full automation of ALE in a fluidic setup that can dramatically limit the chances for contamination and system failure, and that can work 24/7 for as long as necessary.

What we achieved with the Genemat after several years of development is a standalone, autonomous fluidic apparatus that automates ALE, with tubes connecting growth chambers between themselves and to tanks containing growth media, sterilizing agents, water. This results in a closed circuit that is sterile, and there is no manual intervention required in it, and we can automate everything mainly through optical density measurement.

The achievement that we’ve done over the last 20 years was actually to have a system that works, that is really automated.

SA: How long can you run an ALE experiment for? 

ST: Typically, the duration of an ALE campaign on the Genemat is around 3 months. That can be shorter or longer–it really depends on the project. The length of the experiment is no constraint to us, the Genemat can support ALE experiments for as long as necessary.

At CEA Genoscope, a French research center that co-owns the technology, an experiment has been running for about 10 years. I think that we have accumulated maybe 50,000 or 60,000 generations, you know, maybe 10 years which is maybe three or four times faster than doing the experiment by serial passaging.

The capacity of maintaining those cells growing in exponential phase always, or in different states, depending on what we want to do, it’s a pretty unique capacity. The Genemat will adjust the selective pressure to the actual adaptation of the microbes. And so we have a system that works 24/7 and we can reach our target faster. 

Of course, it’s hard to imagine that a customer would come to us with a project that runs for 10 years! But our experiment shows the power of our ability to create a sterile environment and maintain cells for extended periods, while reaching the target faster.

SA: How does this tie into Ginkgo’s Foundry?

ST: We take samples from the Genemat during evolution that can be characterized at any point during the experiment. Historically, before we were acquired by Ginkgo, we had only been doing this work of inoculating the machine, evolving, taking samples, and shipping the samples to the customer, which also prevented us from having too much insight on the actual process and impact of the evolution going on in the Genemat.

But now, with Ginkgo’s Foundry, we can access the data generated with the samples. The ALE team continues having the same scope as we did previously, and we will ship those cryotubes through the Foundry, but we have access to the information of how the evolved strains performed now, and what were the paths taken by evolution. That’s very exciting for us. Depending on what’s needed for the project, we will isolate clones, perform basic characterization and dispatch to other Foundry services for phenotyping or genotyping.

SA: What’s the output of automated ALE after everything goes through the Foundry? Are you working with the entire population? A single clone?

ST: We collect samples from any experiment on a routine basis every week. That’s our standard. We do a basic QC and we store that in our freezers as a backup of evolution. And this happens for the duration of an experiment, typically a few months.

What we collect from our system is a polyclonal population. We collect them in a 1 ml cryotube so we can characterize them.

For this, usually we just first sequence a given number of clones to understand how many different genomes we have, so that we can further assess the phenotyping. And when we understand that, then we can test the different genotypic variants for how well they perform for the desired KPIs.

I like the way we do this: first sequence, understand how many genomes we have at this point in the ALE, and then we can then calibrate the characterization.

So usually the customer will get the best clones, generally, regardless of the nature of the program.

SA: What types of things would you consider in scoping Foundry services for a project?

ST: I would say a base scope of work includes a basic screening of the best clones against basic phenotypic indicators (KPIs). But if you want to have more insight on the performance, we might characterize other phenotypes using omics, fermentation–everything that Ginkgo can offer. Now, we can also sequence the strains to understand the beneficial mutations that we could use in a rational engineering campaign that might be running in parallel. At Ginkgo, this is something that could be done by the Systems Biology group.

So integrated into the Ginkgo’s Foundry, ALE is so much more powerful. Not only do you have the strains as an output, but now you can understand the pathway they took, how they perform, and have a roadmap for improving the phenotype in the background of your choice.

And the combination of these services, in one place without having to coordinate different efforts, that’s what makes it exciting to be a scientist at Ginkgo–you can understand a problem and find solutions. And that’s also an incredible service for our customers to have access to.

 

Read Part 1, Why ALE?, here

Read Part 3, Applications and Opportunities, here


Simon Trancart joined Ginkgo through the acquisition of Altar, a French biotech company he co-founded and led as CEO. Altar specialized in automated adaptive laboratory evolution (ALE), a niche that Simon navigated with his background in engineering and civil engineering.

At Ginkgo, Simon leads the Adaptive Laboratory Evolution, based in Évry-Courcouronnes, France. Simon’s work focuses on the automated ALE process, which the performance of ALE campaigns. He has been instrumental in integrating the ALE team’s work with Ginkgo’s foundry services, enabling better execution and insight into ALE. Simon’s expertise extends to the application of ALE in various organisms and its coupling with rational design.

Humans of Ginkgo: Why Adaptive Laboratory Evolution?

Automated ALE harnesses the power of evolution — Part 1

Adaptive Laboratory Evolution (ALE) was developed in the mid-20th century, but it’s only recently that scientists have been able to leverage this process for industrial partners. Ginkgo’s Head of ALE, Simon Trancart, discusses how Ginkgo uses ALE as a fast, unbiased strain development tool that is powerful on its own or paired with a metabolic engineering campaign.

Humans of Ginkgo is an interview series featuring Sudeep Agarwala interviewing some of the brilliant folks at Ginkgo to learn more about the technology that makes our work possible.


— This is the first part of a three-part interview.—

Read Part 2, Inside ALE, here

Read Part 3, Applications and Opportunities, here

Simon Trancart, Ginkgo's head of ALE

Sudeep Agarwala: You’re in charge of Adaptive Laboratory Evolution (ALE) at Ginkgo — a way for guiding evolution in the lab. Why is this something that’s important for a company that engineers strains? Why is guiding evolution in the lab an important tool for metabolic engineering?

Simon Trancart: So the beauty of ALE or other “artificial selection techniques” that try to mimic natural selection is that you don’t need a priori knowledge on what is the bottleneck or what mutations will be required to optimize your pathway. So the best fit for ALE is when you have a strain that you want to improve, but you don’t know how.

Or you might know where you would play with genome engineering, but you cannot because there are no tools for an exotic organism that we can’t engineer easily. Or if you need a non-GMO application.

So I would say, that’s what the most obvious applications are: things that are hard to engineer or can’t be engineered. The very important limitation is that it must be related to fitness, growth or survival.

SA: I noticed you mention “other artificial selection techniques” — so ALE is not the only way to exert natural selection in the lab?

ST: There are multiple ways to mimic natural selection at the lab for the purpose of directed evolution of cells or entire genomes. One approach consists in two sequential steps: diversity generation and then screening, that’s what the EncapS team at Ginkgo does–create a library, which is then screened in ultra high-throughput. In ALE, diversity generation and screening both take place during continuous cultivation. ALE takes advantage of genetic drift in a population and allows the variants that arise to be subjected to natural selection through continuous culturing.

Not all ALE methods are the same. Let’s take Richard Lenski’s work as a famous ALE example. Since 1988, Lenski has been conducting what’s known as a serial-passaging experiment, repeatedly transferring E. coli from one container into another container containing fresh media to observe evolution over thousands of generations. His manual approach over three decades has yielded remarkable insights into microbial evolution.

SA: So ALE is a way to capture this? To mimic natural selection in the laboratory?

ST: Yes. But there are other ways to implement laboratory evolution–or adaptive laboratory evolution, ALE. The picture here shows an implementation using continuous cultivation in a single vessel. This type of system was first implemented in the late 1940’s. There was one team led by Aaron Novick and Leo Szilard at the University of Chicago, and another team in France by Jacques Monod at the Institut Pasteur that really understood that we can evolve microbes quite fast if we cultivate them continuously under controlled conditions.

SA: So these fermentation methods can actually evolve a population of cells to do what you’d like?

ST: Well it’s interesting what you’re saying, because you’re talking about fermentation. We do not see our system as a fermentation tool. We could say that fermentation aims at optimizing the output of one genome and you play with the conditions to optimize the output from that genome. Whereas evolution–ALE–aims at producing an optimized genome from a starting strain or library and you will play with  the conditions that will direct adaptation to those conditions.

Having said that, though, it’s important to note that you cannot select for whatever trait you want, but for better fitness under specific selection conditions.

The other thing to point out is that people often use fermenters or liquid handling robots in an attempt to automate ALE.  And that’s interesting–that people have taken equipment designed for a given purpose and used it to try to make ALE into a system that is automated. But for many reasons like contamination, biofilms or maintenance requirements, this type of method can have drawbacks and issues associated with it. It simply does not work if you want to really automate ALE and that’s the reason why we designed a system specifically for this purpose.

SA: What are some examples for how ALE would work?

ST: For example, if we want to make a strain grow better in a set of given physical and chemical conditions, ALE is a right fit. So: increase the growth rate on the given medium, change media, adapt to new media, new carbon sources, new nitrogen sources, adapt to toxic chemicals, increase tolerance to toxic chemicals, to extreme pH conditions, adapt to higher oxygen tolerance.

These are very basic applications. And I do see a lot of synergies with actually rational engineering. Because any time that you would modify the genome of a strain it will, most of the time, be at the expense of some fitness, especially if you’re modifying a lot of genes. But you can recover fitness after genetic modifications using ALE to stabilize the genome for further engineering.

SA: So ALE is a tool that can be used right alongside more conventional strain engineering?

ST: One of the most beautiful examples is when you can actually engineer a new synthetic activity that will be coupled to growth. So ALE is a great tool when your engineering is substrate-related. So, for example, “I would like to clone heterologous enzymes to utilize C5 sugars, for example, and improve that using ALE.”

There might be other opportunities where rational engineering methods are used to couple the targeted activity to growth. And then: use ALE as a lever to fasten the implementation of synthetic activities into the organisms.

SA: We’ve spoken about working with different organisms in ALE. What organisms have you worked with in this system?

ST: We have run many projects with different bacteria, different yeasts, and a few microalgae.

We had one project with plant cells where it worked well. It was maybe not a good fit because the doubling time was like four days or so. So you can imagine that evolution takes more time, right? But at least we could demonstrate that we can continuously cultivate this kind of cell during–I think it was three months. It was gratifying because it proves that our system really does not contaminate. After so many hours, any contaminant will dominate here. We did that on the rich medium. And we didn’t see anything.

SA: Any highlights?

ST: We had a collaborative project on Pseudomonas putida as part of a collaborative project funded by the European Commission. We were invited to join this by another group that had developed a bacterial chassis aiming at producing bio fluoro polymers.

And they had designed a way to both produce these polymers in P. putida, as well as to couple the fluorination to growth. So, to be clear, they had a scheme where the bacteria cannot grow if it doesn’t incorporate fluorine in its metabolism. And then the fluorine would be directed towards production of fluoro polymers. That was a beautiful synbio project where we could demonstrate the power of combining rational design with ALE to implement new-to-nature activity in life. We tackled other problems with ALE, notably because P. putida doesn’t naturally grow on high levels of fluorine.

And so we did several ALE campaigns in that program, some for improving tolerance to fluorine/fluorinated compounds, which are highly toxic, and others, which aimed actually at improving the growth of strains that were dependent upon the uptake of a fluorinated compound.

It worked pretty well. And we believe this is the type of approach that could be developed for other applications.

Read Part 2, Inside ALE, here

Read Part 3, Applications and Opportunities, here


Simon Trancart joined Ginkgo through the acquisition of Altar, a French biotech company he co-founded and led as CEO. Altar specialized in automated adaptive laboratory evolution (ALE), a niche that Simon navigated with his background in engineering and civil engineering.

At Ginkgo, Simon leads the Adaptive Laboratory Evolution, based in Évry-Courcouronnes, France. Simon’s work focuses on the automated ALE process, which the performance of ALE campaigns. He has been instrumental in integrating the ALE team’s work with Ginkgo’s foundry services, enabling better execution and insight into ALE. Simon’s expertise extends to the application of ALE in various organisms and its coupling with rational design.