Sequencing the genome of SARS-CoV2, featuring Grant Hall

MONDAY, 4 MAY 2020

Below is a transcript of the latest episode of the BlueSci Podcast, which we have provided to help our podcast reach a wider audience. The script has been transcribed using an audio-to-text converter (otter.ai) and edited by our hosts. If you spot any errors please do let us know! Please do subscribe, follow, like, and review our podcast to show us your support and give us feedback, whether you listen or read the episode.

You can find the Podcast on:

AnchorFM : www.anchor.fm/bluesci-podcast

Spotify: https://open.spotify.com/show/64iT2OKepq3muMeM3Puh06

Apple Podcasts: https://podcasts.apple.com/podcast/id1358079746?ct=podlink&mt=2

Google Podcasts: https://podcasts.google.com/?feed=aHR0cHM6Ly9hbmNob3IuZm0vcy8zZWRmMTdjL3BvZGNhc3QvcnNz&ved=0CAUQrrcFahcKEwjAtMKA8PboAhUAAAAAHQAAAAAQBQ

BlueSci Podcast

Released 4th May, 2020

BlueSci 0:05

Welcome to the BlueSci podcast - brought to you by Cambridge University Science Magazine. I'm Ruby and I'm Simone. Every two weeks, we speak to local researchers, university staff and students and anyone who works in science, to learn about their research and activities, hear about the work that they do, and uncover what goes on behind the scenes. If you want to get in touch with a question, suggestion or just want to be featured on the podcast, just drop us a tweet. Our handle is @BlueSciPod and you can also email us at bluescipodcast@gmail.com

Ruby 0:43

Hi, everyone. Welcome back to our series of episodes related to Coronavirus. Today we're speaking to Grant Hall, who is working as part of the COG UK initiative to sequence the genome of the SARS-Cov-2 novel Coronavirus. This is the virus that causes the COVID-19 disease. He'll tell us all about why sequencing is important and what we can learn from sequencing the virus's genome.

So welcome Grant, thank you so much for joining us today to talk about your work with COVID sequencing. Can you just tell us a little bit about yourself and your background and how you ended up on this project specifically?

Grant 1:26

Yeah, absolutely. So I am an MPhil student right now in the department of Pathology. And I work in the Division of Biology as a member of the Goodfellow lab, where we traditionally do research on noroviruses. This degree has been my introduction into the world of biology. For the most part, I'm a chemist by training. I completed my undergrad at the United States Military Academy, working on synthetic drug development for less money and I'd realized I was more intrigued on how the drugs interacted with a pathogen rather than the method for creating them. And so I went out pursuing a way to explore those interactions and found myself in the world of virality. I definitely didn't realize that I would be presented with the opportunity to work on genomic sequencing for SARS-Cov-2 when I joined Professor Ian Goodfellow's lab. But now I find myself with the opportunity to be working and learning alongside a plethora of amazing scientists in the Cambridge node of the COVID-19 genomics UK Consortium -Professor Ingersoll as well as some of the postdocs in the lab, including Dr. Luke Meredith, have been involved in the past with an organization. Universities known as the Arctic network, that have experienced doing real time, genomic sequencing and analysis during outbreak response, the Arctic network ended up being the foundational body that would help us start the COG-UK Consortium. The prior expertise definitely shows, as our node here in Cambridge was able to from a standing start, sequence over 300 genomes in the first two weeks of setting the lab up. After six weeks of work, we just submitted over 1000 genomes now to the COG UK Consortium.

Simone 3:48

Can you tell us a bit about the sequencing itself? Like what kind of information do you look for what kind of information can we like learn from sequencing something like a virus?

Grant 3:56

When we look at sequencing it's not any sort of absolute data set. When we sequence viruses in mass, it's not to give us all the answers. We can't track it, where a virus has been through a bunch of patients just with the genetic data. But a genetic sequence can provide us with additional information that can help us paint a picture into an outbreak response. So for the case of a virus, all viruses, there's a lot of genetic diversity between them. Every time the virus replicates itself in a different host, there is the chance that you're going to incur a mutation. And so over time, viruses will slowly progress down a specific pathway. And you'll see a preponderance of a certain genome being different than when you say see it at the beginning of an outbreak. And so for a virus like SARS-Cov-2, we see right now an average mutation rate of one to two changes per month. So this slow genetic shift, by mapping it in real time with the outbreak allows us to kind of map the progression and see what are the certain kinds of viral sequences that we're seeing in certain regions globally. And likewise, even more regionally within the UK, is there a specific diversity that's starting to be found up north more than down south. And then when you have a great organization, like the Cov genomics UK consortium that we are one single node of , and we start working closely with Public Health England, you can take this data, like I said, and pair it with clinical data or epidemiological data, and you start to be able to extrapolate and find more information about an outbreak in general. So you can start to track transmission routes, and be better informed with the number of transmissions potentially cases introduced into the UK and then how those propagated or you can start to notice if there are certain kind of phenotype of the disease a certain way that patients present is it associated with a certain genetic sequence. And so at this point, there's nothing flashy or exciting. But by providing this information in real time, rather than in retrospect, you can better inform public health policy and a real time manner and hopefully be able to make fast decisions that will save lives.

Ruby 6:26

It's so amazing, thinking about the fact that they've already calculated a mutation rate already for the virus so quickly. It's amazing. Yeah, yeah, no, it truly is. The speed of the research is incredible.

Simone 6:40

And for like non biological background people, what kind of how big is the genetic code of ours? So like, let's say, like a human genetic code?

Grant 6:52

So yeah, in comparison, the human human genome is tiny, but even when we're looking at RNA viruses, Coronavirus is in general or on the larger end of the spectrum. And this is just by nature, RNA is a lot more unstable in comparison to DNA. So SARS Cov 2 has a genome that is just under 30,000 base pairs, so big in terms of RNA viruses, but small comparison to the human genomes.

Ruby 7:25

Yeah. And so when you're talking about the accumulation of mutations or changes, what are the implications of this in terms of thinking about vaccine development or, you know, its ability to transmit?

Grant 7:42

So I think it comes back to what I was touching on a little bit in the sense that there are no absolute implications, these mutations could lead to a significant change and potentially the presentation a specific protein, or it could lead to a variation in the disease's ability to spread or potentially the phenotype of disease that's presented by the viral infection. So you could see more or less asymptomatic cases. But especially with a virus that we don't understand, we don't necessarily know the implications of these mutations, and they are random. There are probably plenty of skilled mathematicians that could start to predict and create models. But that requires data. And so what's great about again, this mass approach towards compiling genomic data that the COG UK consortium right now is attempting is that you're really providing a chance to develop a robust data set and sample of what the UK caseload is looking like.

Ruby 8:54

And how does the interaction with other COG UK nodes work?

Grant 9:00

So COG UK actually is a government funded consortium and initiative. So it was initially funded and set up with the help of the HSE. UKRI, and the Wellcome Trust consists of four different Public Health England institutions, the Sanger Institute, as well as 12 universities across the UK. And they are all working alongside local hospitals to sequence these samples and then process and upload the genomes, while pairing it with epidemiological data collected by the hospitals, and then clinical data as well. And so this then gets compiled into a massive data set and is handled by Public Health England and the government. But I think it's important to note that this wouldn't have been able to start as quickly as it did if it wasn't for the work of the Arctic network, which is this subset of universities that had been involved with some of the genomic sequencing during the Ebola outbreak, and had developed these protocols, optimized protocols for outbreak response, genomic sequencing. And so by nature of having these well established protocols, in the context of viruses like Zika, Ebola and measles, they were able to then adapt them to the current outbreak, and thus provide results in a very fast manner and have a really well established and uniform protocol that these labs and organizations could then work from.

Ruby 10:41

It sounds like extremely well organized considering how quickly this has all been taking off and sort of you speaking about these nodes and all that sort of network approach. And do you think it's important to sort of pair genomic sequencing with local hospitals?

Grant 11:00

Yeah it's important that that that interaction is happening so that we know what's happening locally. And I absolutely, absolutely think that's completely right. So if we're looking at issues that you see anytime you're trying to put on a massive collaboration or put together a massive initiative, and you take academia, the government, different nonprofits and other organizations, anytime you have these bureaucratic organizations working together, there's going to be friction because everyone has a different operating procedure. And so anytime you can, I think localize these interactions, there's an easier way to overcome those points of friction, you're able to work together more closely face to face and find solutions to the problems that you are encountering, but at the same time, provide reliable data. So this thing or Institute obviously has this massive capacity for genomic sequencing. And so they definitely are going to pull the main front when it comes to other sequence databases, but obviously, they're going to work in a lot slower manner because they're receiving samples from across the UK. These small nodes are like ourselves here in Cambridge are able to provide data directly back to the hospital. And again, it comes into it can help inform the epidemiological data and clinical data that they're getting to compare what is the genetic diversity that they're seeing in case loads?This explores the question, is there a way that genomic data can provide real time information about nosocomial transmission of the virus? And so while there currently is, we're still working out those kinds of protocols and people are trying to figure out whether or not this information is something that's needed to just aid the epidemiological data or if it can be more absolute. It does provide hospitals with the information and they can choose then use it as they find fit to inform their own decision making processes.

Simone 12:59

That sounds really helpful. Yeah. And it's definitely a theme that like we've because we've just started this series of episodes about the COVID-19 situation. Last time, we were speaking to Professor Steven Baker who was working on kind of like helping the average hospital with diagnostics. So obviously, there they were, they were running, you know, they're not a diagnostic lab, but they were kind of taking the load off of, of the local hospital for that for the screening. And it just worked. So it's definitely, it definitely seems like finding those local solutions is really constructive and worthwhile. And one other thing that we talked to him about was also this idea of people stay informed and people finding reliable sources of information, so that they weren't like misinformed during this situation, and also the fact that we have all this fake news going around and people don't really know who like a reliable scientist is and so on. There's also a lot of like conspiracy theories, I guess about like, where the viruses come from if it was made in the lab if it was leaked from somewhere. I guess, what can sequencing tell us about that? Can we use the genetic information of the virus to kind of trace it back the way the same way instead of tracing it like forward to see where it's going? Can we look back and see where it's come from?

Grant 14:11

So absolutely, when we actually look at genomic data analysis, we often actually look at it in a retrospective manner. So we can see the flow of the virus over time, how did it mutate and eventually try to work our way back to potentially that patient zero, that initial jump where virus moved from a reservoir of some sort in to a human population and began to start transmitting? There are definitely limitations from that. So we can gather as much information as we can, and make definitely really well informed hypothesis on where what kind of virus was the initial source, what kind of host did the virus and make eventually make that jump from, but without that exact sample? There's always the possibility that there's something else that we aren't aware of, there's some sort of jump that occurred that we're not accounting for. And so it definitely, I think, can put people's mind at ease that it likely is in some bioengineered warfare weapon, because there are indicators within the genetic sequence that we'll be able to denote that this really doesn't seem right. Whereas it's not an absolute, we can just look at the sequence and say is this absolutely came from this sample, or this host, and it's been through 12 people since then. It'd be great if we have that capacity.But it does provide us with the information to I think, debunk some of these conspiracy theories, and I think, put faith in our health institutions.

Ruby 15:47

And so we've sort of spoken about the importance of the sequencing effort and and why it's being done. Could you more like tell us now about what it's like doing the actual work itself and presumably within a lab. And is that within Cambridge? And, and, you know, for non biological people out there could you sort of maybe briefly explain how sequencing actually happens? Because it's kind of like this big black box, you just put it in and like sequence but, you know, it's quite quite an interesting process?

Grant 16:20

Yeah absolutely, because it's definitely something. I think every scientist can think about the first time they're introduced to a new technique, and it definitely does feel like a black box. So our lab is well positioned to kind of aid the hospital. We're actually in Addenbrooke's. So we're one floor below the diagnostics laboratory and so what's really great about the technique that we use, we're able to work with their extracted samples upstairs. So we're never handling infectious material, which is great. So that means that all of the work that we're doing in the lab is done at a, a class one/class two level. So we're not having to work in a BSL three facility, under a respirator, we're not dealing with patients directly. So as long as we're enforcing a really good social distancing practice within our lab, we're not at risk of our samples, I think we're more of a risk to our samples of anything! Because there's a lot of handling that has to be undergone. So after the diagnostic lab identifies a positive sample, we are notified and then given some of the extract, and it's with these extracts that we can start our process. So we use what's considered a multiplex PCR amplicon sequencing process, which is a lot of words but to break it down simply, RNA viruses can degrade really easily or we potentially have a really low sample. So we'll get positive samples from upstairs from the diagnostic lab. And the very first thing we'll do is we will take whatever sample we have and convert it into cDNA. So this immediately takes our sample and makes it more stable. And then we'll use a multiplex PCR. And what this is, is we take these short primer regions, and we amplify small fragments of the RNA. And so it's like making a massive jigsaw puzzle. And so we'll take our small amount of sample and amplify so that there's more of it to work with and will produce these small regions. What's great about this approach is we can have a partially degraded sample and be able to still recover a significant part of the genome. But as a result, too, there is this high risk of cross contamination and so we have to handle for that within our lab protocol. This is probably one of the bigger difficulties with this genomic sequencing technique, because we have all these tiny puzzle pieces that we have to avoid having them swap boxes, because obviously it would then ruin our complete image. And in order to help deal with that, we have implemented some techniques in the lab to help avoid this and one of them is a relic. This project really starting in a field environment is we work out of these black tents. They were designed to be able to set up a field sequencing unit or lab in any part of the world that doesn't have access to a traditional library sources. So we have certain steps that are done with inside these black tents that are originally used for hydroponic plant growing, but they are perfect because the material allows them to be sterile and help them keep two different libraries separate. And to help them again, prevent that cross contamination issue that can occur from working in this type of multiplex PCR. But then in two, it makes it really accessible, you don't need to, in theory have this fancy lab to work in, you can see this approach and these publicly available protocols on the Arctic network and associated with COG UK and a lab in the US or in South America or in Africa or in Southeast Asia, could pick up these protocols use the basic resources that they have access to, and be able to begin sequencing and aiding to what is right now an international pause, which is really great. It's awesome that you can take anywhere as well, when they're used in the field. What makes them really great is that they're lightweight. But obviously, that's not a concern for us here in Cambridge. So we have our lab designed in a way that samples work in a progressively dirtier manner, or cleaner manner, so that our fresh samples are handled in one room and then aggressively moved through the process to prevent the risk of cross contamination. And so, after amplifying this data, we then prepare it so they can interface with our Oxford nanopore grid ion sequencing system. So it's a series of steps in which we just prepare the individual DNA sequences so that they can interface with the instrument and be able to be sequenced. And so that involves adding a barcode onto them, so that we can pull the samples together and run more than one sample at once. So we can run up to 24 samples on a single flow cell, which allows for us to clean up and remove any excess material and wash buffers and other DNA that's going to be in the sample that we get that's been extracted. Because it's not as simple as just poking the virus in particularly, we're going to get other genetic material that is associated with you as a person as a patient. And so then the workflow roughly could be handled within a 12 to 24 hour period. But what we traditionally do is follow about a 36 to 48 hour turnaround. So we'll get a sample from upstairs around noon, run it through the PCR step. And that will take us through overnight and the next day, prepare it to be sequence and then run an overnight sequencing and at that point, we'll have a genome by midday the next day. And so it's pretty amazing to know that this kind of technique is rather recent - 10 years back, this kind of technology wasn't there yet. So our ability to respond to this outbreak is enabled by the advances by both the Arctic group to get this protocol to a point where it can be well established and efficient and optimized, but also by the massive greats that have made these protocols to start with, and yeah, it's impressive.

Ruby 23:31

Yeah, no, that's that's a really good point, actually. Because knowing how much the sequencing world has just taken off over the past 10 years, I mean, a little Oxford nanopore things are great. They're, they're constantly getting improved as well. And, yeah, it's kind of scary to think yeah, if this has happened 10 years ago, we wouldn't have nearly as as much knowledge as we do now already. So I think that's a really good positive that I hadn't really considered.

Simone 24:00

I guess it also means that there's hope for either this kind of thing does happen again, not only will we have better technology probably but also like it'll have helped us identify what are the things that we need to improve and like are lacking the most are. So hopefully that will kind of motivate those gaps to be filled and push us in the right direction for the future. Exactly. And I was gonna ask as well like, because obviously you know, you're You said you're doing your MPhil, I'm not sure of your Mphil was meant to be on the COVID virus, probably not. So what made you volunteer for this kind of scheme and what is it like to really be working on something where you can tangibly see the impact that it's having on a very fast turn around?

Grant 24:52

Yeah. It's very exciting to be involved with that. I remember a couple weeks ago, friend asked me the question because this the time for biologists to be alive, like, Is everyone just really up and excited? And I'd argue with no person is excited about an outbreak. And especially any kind of scientist definitely wants to be doing the research of interest. I do research on noroviruses traditionally. But anytime you feel like you have a specific skill set, potentially that could be helpful in any way, you want to provide help when you see people in need. I think there were some, like 1000 people that signed up within Cambridge to help out with a lot of the initiatives online. And they were just overwhelmed with volunteers with really, like, practical and like high level skill sets. And so having this unique opportunity to work within genomics, which definitely wasn't my background, but having the basics and PCR that could allow experts to come in and teach is really, really neat. And it's been able to in some ways, I think diversify MPhil and give me a more broad exposure than I definitely would have got. So in that it's, I think, really satisfying to be able to see the impact because, like you said, scientists don't always get to see the direct impact, they get to know what it potentially could use for. So there there's, I don't wanna say selfishly, but selfishly, there's something slightly satisfying about it. But obviously, no one wants to be in circumstances that we are.

Ruby 26:26

No, definitely not. But yeah, it's sort of quite an amazing opportunity to be involved with all of it, because of the fact that you're involved in it. And I guess your family and friends must know. I mean, I don't even really work in viruses. I'm more of a bacteria kind of gal. And I've been flooded with questions as if I know all the answers. And I was just wondering, you know, especially now considering you're working with with it, you're always getting questions from your family and relatives and weird articles and videos and is there a deluge of things like that? Or?

Grant 27:03

Oh, absolutely, and I, but I think most people involved with science tend to get those questions in general, when anything happens. It's just one of those things, you answer the questions that you you can and then you do the ethically right thing to do and not answer the questions you don't know. Because that just does exactly that prevents the spread of misinformation. And so I think that's what's really empowering for even the scientists that maybe don't have the opportunity to right now being allowed to do something is be scientific advocates. Advocate for proper reading of a journal article because I don't know about you guys, but so many of my friends and family have now taken to scientific journals trying to read through an article and make sense of it. But not recognize, oh, this is a preprint. It hasn't been reviewed yet or you take it as a data point, not as the absolute new standing information sheet. All your practices and policy this this one single journal article. And so any person that's involved with any form of science has the ability to help inform these basic techniques to their family and friends. And so I think that's really important.

Simone 28:13

Yeah, no, it's definitely a really good opportunity to do that.

Ruby 28:15

I think it's the confidence as well to critically appraise what you're being told.

Simone 28:20

Yeah. And definitely, like you said, to be able to say, look, I'm not the right person to ask for this, and point people in the direction of the experts that can actually answer those questions.

Grant 28:30

It's definitely the patience too, like, when I just think about even being in this lab. Obviously, in no way am I well versed in any of this. I've been taught so much of the protocol and the approach, but I've been able to do my part now in this lab group, thanks to like the great minds and scientists that have had the patience to work with me. So it's the minds of like, Professor Ian Goodfellow who is helping run the group, Dr. Setoric, who is over on in the department of clinical medicine, who's working really closely with us and helping us pair our genomic data with the epidemiology. It's the long list of scientists like Dr. Luke Meredith and Dr. Sericati, you, Dr. Charlotte, Houldcroft who are coming from either within the lab or different parts of the UK, with their own individual backgrounds with either genetic sequencing or outbreak response, or just general biology background. And their experience is helping inform how our lab group works, helping inform how other groups work, and their patients to then teach us students who can then one day hopefully fill their roles. And so if we can take that same level of patients, even in our local communities, we can help make that same level of impact even if we're not directly involved.

Ruby 29:53

It's a real testament to how science can work so efficiently and so well. Absolutely. Well, thank you so much for chatting to us. I appreciate you're massively busy at the moment. And we really appreciate it.

Simone 30:06

Yeah, I mean, our listeners can't see but you're like in the lab. Right now.

Grant 30:13

I'm actually sitting right next to the flow cells that we have.

Ruby 30:16

Amazing. one step removed. Okay, well, thank you so much again.

Grant 30:28

Thank you for having me.

Simone 30:32

Thanks for tuning in. We hope you enjoyed the episode and learned something about sequencing the genetic code of a virus and the RNA. We have another interesting Coronavirus episode lined up for you in two weeks time. So do hit subscribe, follow whatever button it is on the whichever podcasting platform you're listening to and if you want to get in touch with us with any questions or suggestions for what you want to hear from us, especially at this time where we are kind of responding to what's going on in the world real time we would love to hear about what you're curious about what you want to know about the COVID-19 situation so please get in touch with us using either Twitter you can contact us @bluescipod that's like username or you can send us an email or email is radio@bluesci.co.uk and yeah Just don't forget to follow us and leave a review if you've enjoyed it leave us a rating.