Scaling COVID-19 Testing to Millions Per Day

Detection is one of the key tools in our country’s fight against COVID-19. We previously discussed how qPCR and software algorithms are being used to diagnose the disease. At Ginkgo, we are developing a SARS-CoV-2 test based on Next Generation Sequencing (NGS). In this post, we’ll explore how (NGS) and software can augment the country’s testing capacity to support reopening our economy.

At the time of this writing, the US conducts about 500K COVID-19 tests per day. Researchers have proposed that we, as a nation, must scale this number to tens of millions or even hundreds of millions of tests per day so that we can test people going to work a few times a week, allowing workplaces to help contain the spread of the virus. How can the U.S. rapidly scale up testing capacity 10 or 100 fold? In our recent whitepaper, Ginkgo proposed continuing to leverage and scale up qPCR capabilities since qPCR is both well understood and is a tried and true technology. To further augment U.S. testing capacity, Ginkgo is pursuing an additional approach – leveraging genomic sequencing technologies to detect the novel coronavirus. NGS was developed and refined for sequencing the human genome during the Human Genome Project, but NGS is versatile and has found a number of uses beyond its original application for sequencing the human genome—including using sequencing to detect the presence of viral RNA in a sample.

NGS technology can read massive amounts of DNA. We could potentially use it to look for genome fragments of the novel coronavirus. In a single run, modern NGS instruments are able to read millions to billions of DNA fragments of up to 600 base pairs (individual A, C, T and G molecules that make up the genetic code of life). We could use this huge “read” capacity to simultaneously look for base pair fragments unique to the virus in individual samples taken from thousands of people, thereby enabling us to perform an enormous number of tests all at the same time, on the same instrument.

We do have to customize existing NGS pipelines to detect the novel coronavirus. First, we would collect samples from people —these may be collected via saliva, nasopharyngeal swabs, or some other mechanism.  We would convert the virus’s RNA to DNA and then amplify specific DNA fragments that are unique to the virus (we don’t sequence any identifying human DNA sequences). We could then attach unique DNA “barcodes” (sequences that aren’t biologically relevant but that allow us to track sequences in bioinformatics software) for each individual’s sample so that the DNA fragments can be traced back to a particular sample. The samples are then pooled together into a single NGS run, which the NGS instrument processes. Custom software then separates out the detected DNA fragments from each individual — if coronavirus fragments are detected for a sample, then there is a high likelihood that that person is infected. This process sounds complicated (and I have omitted many important details), but the individual steps are all well understood and more importantly, much of the infrastructure for performing these steps already exists – including the software.

Software plays a major role in every step of NGS pipelines, from the initial laboratory work all the way to the end analysis. Laboratory information management software (LIMS) drives the workflows that keep track of samples and automate laboratory work such as adding the DNA barcodes discussed above. While this domain may be unfamiliar to you, modern LIMS systems are built with technologies that you are likely familiar with including React, Docker, AWS and Python. Analyzing large NGS datasets involves processing a huge amount of data and sometimes use algorithms that can consume a vast amount of computer memory and storage. Fortunately, a lot of this analysis involves pleasingly parallel tasks which lend themselves well to solutions that use tools such as AWS Batch and Apache Airflow both of which we use at Ginkgo. This means that the pipelines are easy to orchestrate and we only pay for the compute power we need to use.

In summary, NGS is an important technology whose existing capacity can be repurposed to detect the novel coronavirus in people. While biotechnology may not be a familiar domain to most software engineers, it does leverage modern software tools and technologies we are familiar with. We will discuss some of the details of our NGS pipeline in a future post.

(Feature photo by National Cancer Institute on Unsplash)

qPCR Quantification

Software is playing many roles in addressing the COVID-19 pandemic. This Wired article covers a couple software projects which include applications for sharing data and an application for tracking pathogen evolution. But software is also an essential part of many of the biological research tools that are playing a major role in fighting the spread of the disease. I want to talk a little about the software basics for qPCR, which is one of the methods the CDC is using to detect and measure COVID-19 in research settings. My colleague, Keith Robison, has an excellent discussion about qPCR and COVID-19 here.

qPCR is a quantitative form of PCR which stands for polymerase chain reaction. PCR enables the detection of very small quantities of genetic material (such as a small amount of the COVID-19 virus) by amplifying it to millions or billions of copies, making detection easier. The basic idea is that the amount of DNA can be doubled through a process called thermocycling which, assuming 100% efficiency, doubles the amount of DNA during each of the 25 to 30 heating and cooling cycles of upto 6 minutes each. This in turn causes a dye to fluoresce proportionally to the number of copies made. By measuring the fluorescence at each cycle we can estimate the relative amount of material by looking at the curve. In practice the resulting reaction curve looks sigmoidal as shown below.

So how can we use these results to measure relative quantities of DNA? Well keep in mind that the amount of DNA doubles after each cycle. Thus, if curve A comes up 3 cycles earlier than curve B, then we would expect curve A to represent a sample with 23 or 8 times higher the concentration of genetic material than the sample represented by curve B. One simple mechanism to make this determination is to simply draw a horizontal line called the threshold and determine at which cycle curves A and B intersect with the threshold. The cycle of intersection is called the Cq or cycling quotient where smaller Cqs represent larger quantities of DNA.

So how does this help detect COVID-19? Let’s say no curve appears for a given patient sample. From this data we may conclude that COVID-19 is not present in that sample and the patient is not infected. What if a curve appears with a smaller Cq than another curve? In that case we might infer that there is 2ΔCq more genetic material in that sample and that the infection may be at a more advanced stage for that patient.

So using these kinds of algorithms, qPCR can be used to detect relative quantities of genetic material such as COVID-19. This in turn can help assess things like the state of infection in a patient or the likelihood of different transmission mechanisms. We would like to talk about what Ginkgo is offering in the world’s battle against COVID-19, so we will delve into how software contributes to those in a future post.

(Feature photo by National Cancer Institute on Unsplash)

Welcome to GinkgoBits!

Ginkgo Bioworks is well known in the biotech world for our mission: to make biology easier to engineer. It powers our work and collaborations for sustainable agriculture, meat alternatives, living medicines, and most recently, playing an active role in the world’s fight against the COVID-19 pandemic. What is not well known is that digital technologies (software, data, IT, InfoSec and DevOps), as well as our product management and UX teams, are central to Ginkgo’s mission and success. These functions form our Digital Tech team.

Like other digital technology teams, we use modern first-class tools and technologies such as React, GraphQL, AWS, Snowflake, Jupyter, and Docker. We use these technologies to process terabytes of data every day in an environment where each year we process 3 times more data than the year before. We adhere to the tenets of Agile development, meeting regularly with users, delivering value early and often, and reflecting on how we can improve while having more fun.

So what makes Ginkgo Digital Tech unique? Well, for one, our team is building Ginkgo’s software platform that will benefit humanity by tackling its most vexing problems, including the current COVID-19 pandemic. Second, we built the platform to scale to levels previously unimagined in biology research and is processing and analyzing tens of thousands of biological samples daily and even more tomorrow. In the process of doing so, Digital Tech members work directly with scientists to solve the most advanced problems of the day. This also means that we have opportunities to play and interface with advanced scientific instruments including robots, mass spectrometers, next-generation sequencers and automated fermentors. Most importantly, we care deeply about our fun and nurturing environment and we actively work towards this through initiatives like our Digital Tech GrowING.

We are proud of our team and our work so we will be talking more about it in the coming weeks via GinkgoBits, which encompasses Twitter and our blog. There are a number of things we want to talk about ranging from algorithms, AWS, product management to, yes, COVID-19. What would you like to hear about? Let us know. And welcome to GinkgoBits!

Software Internships at Ginkgo

Software Interns at Ginkgo

Our summer software interns make outsized contributions to building the software that makes it possible for Ginkgo to program cells. This software ultimately helps build organisms that do amazing things such as creating nutritious foods without animals, creating new flavors and fragrances, synthesizing affordable medicines and helping to make agriculture more sustainable by growing crops without fertilizers.

Our 2019 summer interns came from diverse backgrounds and universities across the US, with an equal number of men and women represented. Each intern was responsible for participating in the development of an impactful project that helps accelerate the pace of engineering biology, deploying it to production and presenting their work in front of all stakeholders.

Luisa presenting her work to stakeholders

The results were amazing. Let’s take a closer look at some of the projects they participated in:

  • A React UI was developed that makes it easier for scientists to prototype and refine biological lab services available at Ginkgo’s Foundry, which is kind of like an App Store for biology. This UI is awesome as it empowers scientists to experiment and learn quickly while keeping their focus on the science.
  • A project made it possible to chain together multiple automated robotic workflows using React and GraphQL. Consequently it becomes easier to design, build and test complex workflows that run our laboratory experiments.
  • A Python API to our DNA design software was worked on to further automate the creation and ordering of a huge number of DNA designs.
  • Our DNA Sequencing analysis pipeline was worked on helping it to scale more easily. It leverages many modern technologies including Airflow and AWS Batch to process terabytes of data daily.

Our interns exceeded our high expectations. We really had two main goals—to help our interns learn tons on their quest to become stellar software engineers and have them make impactful contributions that they help deploy to production. We achieved these goals in large part though our agile processes that structure development in meaningful two week sprints and by providing mentors that help guide the growth of the interns. Another major component is our commitment to using first class tools such as React, GraphQL, Docker, Python, Django, Flask and AWS cloud technologies that significantly streamline development.

Luisa and Leah

So Ginkgo’s 2019 Summer Intern program was fantastic and fun and a lot came together to make it a big win. The most important part, however, were the interns and mentors themselves—they were awesome! We are looking to build on this success in our 2020 Summer Intern program—the only missing part is you. Are you up for the challenge? If you are looking for a summer internship, apply here. Are you a software engineer interested in mentoring and using first class tools to build the software for programming life? Sign up here!

Playing with dogs in the dog room!
Software engineers and intern coding at the Museum of Fine Arts
Coding inspired by the art at the Boston Museum of Fine Arts
Ginkgo has an awesome view with prestigious visitors like Old Ironsides