Scaling up whole genome sequencing of COVID-19

Scaling up the infrastructure for testing and tracking COVID-19 is one of the most important things we can do at this stage of the pandemic. Towards this broader effort, at Ginkgo we are focusing our next generation sequencing (NGS) pipeline with a target of being able to sequence 10,000 full viral genomes per day from de-identified patient samples in order to effectively track the evolution of the virus as it spreads. We’re planning to work on this with clinical labs including the CLIA-approved testing facility at Battelle and we are actively seeking other collaborations with labs generating patient RNAs for sequencing—please reach out to us at

Current testing methods

The United States has limited capabilities to detect new pathogens such as SARS-CoV-2 coronavirus in infected individuals or environmental samples. Clinical laboratories rely on qPCR testing that reports the presence or absence of virus but provides no viral genetic sequence information.

Sequencing allows tracking of variants as the virus spreads throughout the population, enabling tracing transmission chains in time and location. Transmission chains critically inform public health control and containment measures, while knowledge of virus variability can guide the development of vaccines and cheap diagnostics.

Too few samples from infected individuals are currently being sequenced, and essentially no environmental samples are being tested. The ability to perform sequencing at enormous scale, to enable population-level testing and long-term surveillance, is needed.

With such a capability, tens of thousands of samples could be sequenced daily, including:

  • All patient-derived samples submitted to public health departments, etc.;
  • Random samples drawn from apparently well individuals; and
  • Diverse environmental samples, drawn from points of entry or transit, airplanes, trains, buses, and subways, other public places, and sewers.

Deep and broad sampling would provide data needed to determine incubation times, spread mechanisms, and prevalence, to gauge the extent of herd immunity, and to create predictive models that may guide containment efforts in this and future pandemics. As therapeutic options come available, we want to spot any emerging drug resistance or vaccine escape mutations.

Repurposing Ginkgo’s NGS pipeline

In a normal day at Ginkgo, we sequence DNA from thousands of different bacterial, fungal, and plant samples thanks to our automated sample preparation methods and terabyte-scale data de-multiplexing algorithms. These tools make it possible for us to quickly shift to sequencing up to ten thousand small viral genomes every day.

We can start from already extracted RNA from clinical labs, or use our automated sample preparation system and standard methods to extract RNA from patient samples or environmental swabs. Viral RNA would be amplified by reverse transcription using virus-specific primers (with human-specific primers serving as an internal positive control) to generate material suitable for our high-throughput NGS process. Our existing bioinformatics pipeline would be extended to quantify virus in each sample, report any failures to detect the positive control (indicating a sample which must be re-queued), and assemble the detected virus sequences and format them for submission to public and/or government databases.

We hope that this sequence data can serve public health efforts as well as the growing community of biologists and bioengineers working to track the virus and develop therapies and vaccines. Our approach would not be possible without the great open science exemplified by ARTIC Network, CDC, Nextstrain, and others. We look forward to a day very soon when we start adding to the database of publicly available viral sequences. We will be continuously looking to build the network of upstream collaborators that can generate RNA samples for analysis and ensure that our pipeline is doing the most good by sampling broadly.

Please spread the word so that we can ensure that samples which could be sequenced don’t languish in freezers, and if you are a clinical site generating patient RNAs we hope you’ll get in touch with us at to start talking about how we can work with you.

Posted By: Birgitte Simen