Francisco Azuaje, Karim Beguir, Harry Farmer and Dr Rich Scott: How can cross-sector collaborations drive responsible use of AI for genomic innovation?
Description
In this episode of Behind the Genes, we explore how Artificial Intelligence (AI) is being applied in genomics through cross-sector collaborations. Genomics England and InstaDeep are working together on AI and machine learning-related projects to accelerate cancer research and drive more personalised healthcare.
Alongside these scientific advances, our guests also discuss the ethical, societal and policy challenges associated with the use of AI in genomics, including data privacy and genomic discrimination. Our guests ask what responsible deployment of AI in healthcare should look like and how the UK can lead by example.
Our host, Francisco Azuaje, Director of Bioinformatics Genomics England is joined by
Dr Rich Scott, Chief Executive Officer at Genomics England
Karim Beguir - Chief Executive Officer at InstaDeep
- Harry Farmer – Senior Researcher at Ada Lovelace Institute
If you enjoyed today’s conversation, please like and share wherever you listen to your podcasts. And for more on AI in genomics, tune in to our earlier episode: Can Artificial Intelligence Accelerate the Impact of Genomics?
"In terms of what AI’s actually doing and what it’s bringing, it’s really just making possible things that we’ve been trying to do in genomics for some time, making these things easier and cheaper and in some cases viable. So really it’s best to see it as an accelerant for genomic science; it doesn’t present any brand-new ethical problems, instead what it’s doing is taking some fairly old ethical challenges and making these things far more urgent."
You can download the transcript, or read it below.
Francisco: Welcome to Behind the Genes.
[Music plays]
Rich: The key is to deliver what we see at the heart of our mission which is bringing the potential of genomic healthcare to everyone. We can only do that by working in partnership. We bring our expertise and those unique capabilities. It’s about finding it in different ways, in different collaborations, that multiplier effect, and it’s really exciting. And I think the phase we’re in at the moment in terms of the use of AI in genomics is we’re still really early in that learning curve.
[Music plays]
Francisco: My name is Francisco Azuaje, and I am Director of Bioinformatics at Genomics England. On today’s episode I am joined by Karim Beguir, CEO of InstaDeep, a pioneering AI company, Harry Farmer, Senior Researcher at the Ada Lovelace Institute, and Rich Scott, CEO of Genomics England. Today we will explore how Genomics England is collaborating with InstaDeep to harness the power of AI in genomic research. We will also dive into the critical role of ethical considerations in the development and application of AI technologies for healthcare. If you’ve enjoyed today’s episode, please like, share on wherever you listen to your podcasts.
[Music plays]
Let’s meet our guests.
Karim: Hi Francisco, it’s a pleasure to be here. I am the Co-Founder and CEO of InstaDeep and the AI arm of BioNTech Group, and I’m also an AI Researcher.
Harry: I’m Harry Farmer, I’m a Senior Researcher at the Ada Lovelace Institute, which is a think-tank that works on the ethical and the societal implications of AI, data and other emerging digital technologies, and it’s a pleasure to be here.
Rich: Hi, it’s great to be here with such a great panel. I’m Rich Scott, I’m the CEO of Genomics England.
Francisco: Thank you all for joining us. I am excited to explore this intersection of AI and genomics with all of you. To our listeners, if you wish to hear more about AI in genomics, listen to our previous podcast episode, ‘Can Artificial Intelligence Accelerate the Impact of Genomics’, which is linked in this podcast description. Let’s set the stage with what is happening right now, Rich, there have been lots of exciting advances in AI and biomedical research but in genomics it’s far more than just hype, can you walk us through some examples of how AI is actually impacting genomic healthcare research?
Rich: Yeah, so, as you say, Francisco, it is a lot more than hype and it’s really exciting. I’d also say that we’re just at the beginning of a real wave of change that’s coming. So while AI is already happening today and driving our thinking, really we’re at the beginning of a process. So when you think about how genomics could impact healthcare and people’s health in general, what we’re thinking about is genomics potentially playing a routine part in up to half of all healthcare encounters, we think, based on the sorts of differences it could make in different parts of our lives and our health journey. There are so many different areas where AI, we expect, will help us on that journey. So thinking about, for example, how we speed up the interpretation of genetic information through to its use and the simple presentation of how to use that in life, in routine healthcare, through to discovery of new biomarkers or classification that might help us identify the best treatment for people.
Where it’s making a difference already today is actually all of those different points. So, for example, there’s some really exciting work we’re doing jointly with Karim and team looking at how we might use classification of the DNA sequence of tumours to help identify what type of tumour - a tumour that we don’t know where it’s come from, so what we call a ‘cancer of unknown primary’ - to help in that classification process. We’re also working with various different people who are interested in classification for treatment and trials, but there’s also lots in between recognising patterns of genomic data together with other complex data. So we’ve been doing a lot of work bringing image data together with genomic data and other health data so that you can begin to recognise patterns that we couldn’t even dream of. Doing that hand in hand with thinking about what patients and participants want and expect, how their data is used and how their information is held, bringing it all together and understanding how this works, the evidence that we need before we can decide that a particular approach is one that policymakers, people in healthcare want to use, is all part of the conversation.
Francisco: Thank you, Rich, for speaking of cutting-edge AI applications and InstaDeep. Karim, could you give us a glimpse into your work and particularly how your technologies are tackling some of the biggest challenges in genomic research?
Karim: Absolutely, and I think what’s exciting is we’ve heard from Rich and, you know, this is like the genomics expertise angle of things and I come from the AI world and so do most of the InstaDeep team. And really what’s fascinating is this intersection that is being extremely productive at the moment where technologies that have been developed for like multiple AI applications turn out to be extremely useful in understanding genomic sequences.
This is a little bit, our journey, Francisco. Back in 2021/2022 we started working on the very intriguing question at the time of could we actually understand better genomic sequences with the emerging technologies of NLP, natural language processing. And you have to put this in context, this was before even the word ‘generative AI’ was coined, this was before ChatGPT, but we had sort of like an intuition that there was a lot of value in deploying this technology. And so my team, sort of like a team of passionate experts in research and engineering of AI, we tackled this problem and started working on it and the result of this work was our nucleotide transformer model which we have open sourced today; it’s one of the most downloaded, most popular models in genomics. And what’s interesting is we observed that simply using the technologies of what we call ‘self-supervised learning’ or ‘unsupervised learning’ could actually help us unlock a lot of patterns.
As we know, most of genomics information is poorly understood and this is a way actually, with using the AI tool, to get some sense of the structure that’s there.
So how do we do this? We basically mask a few aspects of the sequence and we ask the system to figure them out. And so this is exactly how you teach a system to learn English, you know, you are teaching it to understand the language of genomics, and, incredibly, this approach when done at scale - and we train a lot on the NVIDIA Cambridge-1 supercomputer – allows you to have results and performances that are matching multiple specialised models. So until then genomics and use of machine learning for genomics was for a particular task, I would have developed a specific model using mostly supervised learning, which is, I am showing you a few examples, and then channelled these examples and tried to match that, and so essentially you had one model per task. What’s really revolutionary in this new paradigm of AI is that you have a single model trained at very largescale, the AI starts to understand the patterns, and this means that very concretely we can work with our partners to uncover fascinating relationships that were previously poorly understood. And so there is a wealth of potential that we are exploring together and it’s a very exciting time.
Francisco: What you’re describing really highlights both the potential and the opportunities but also the responsibility we have with these powerful tools, its power, and this





