Dr Natalie Banner, Dr Raghib Ali, Professor Naomi Allen, Dr Andrea Ramírez: How can we unlock the potential of large-scale health datasets?

Update: 2025-01-27

Description

In this episode, our guests discuss the potential of large-scale health datasets to transform research and improve patient outcomes and healthcare systems. Our guests also delve into the ethical, logistical, and technical challenges that come with these programmes.

We hear how organisations such as UK Biobank, Our Future Health, and All of Us are collecting rich, diverse datasets, collaborating and actively working to ensure that these resources are accessible to researchers worldwide.

Hosting this episode is Dr Natalie Banner, Director of Ethics at Genomics England. She is joined by Dr Raghib Ali, Chief Medical Officer and Chief Investigator at Our Future Health, Professor Naomi Allen, Professor of Epidemiology at the Nuffield Department of Population Health, University of Oxford, and Chief Scientist for UK Biobank, and Dr Andrea Ramírez, Chief Data Officer at the All of Us Research Program in the United States.

"There are areas where academia and the NHS are very strong, and areas where industry is very strong, and by working together as we saw very good examples during the pandemic with the vaccine and diagnostic tests etc, that collaboration between the NHS and academia industry leads to much more rapid and wider benefits for our patients and hopefully in the future for the population as a whole in terms of early detection and prevention of disease."

You can download the transcript or read it below.

Natalie: Welcome to Behind the Genes

Naomi: So, we talked to each other quite regularly. We have tried to learn from each other about the efficiencies of what to do and what not to do in how to run these large-scale studies efficiently. When you are trying to recruit and engage hundreds of thousands of participants, you need to do things very cost effectively. How to send out web-based questionnaires to individuals, how to collect biological samples, how the make the data easily accessible to researchers so they know exactly what data they are using.

All of that we are learning from each other. You know, it is a work in progress all the time. In particular you know, how can we standardise our data so that researchers who are using all of us can then try and replicate their findings in a different population in the UK by using UK Biobank or Our Future Health.

Natalie: My name is Natalie Banner, and I am Director of Ethics at Genomics England. On today’s episode we will be discussing how we can unlock the potential of large health datasets. By that I mean bringing together data on a massive scale, including for example genomic, clinical, biometric, imaging, and other health information from hundreds and thousands of participants, and making it available in a secure way for a wide range of research purposes over a long time period.

Through collaboration and industry partnerships, these programmes have the potential to transform research and deliver real world benefits for patients and health systems. But they also come with challenges ranging from issues in equity and ethics through to logistics, funding, and considerable technical complexities. If you enjoy today’s episode, we would love your support. Please like, share, and rate us on wherever you listen to your podcasts.

I’m delighted to be joined today by 3 fantastic experts to explore this topic. Dr Raghib Ali, Chief Medical Officer and Chief Investigator at Our Future Health. Professor Naomi Allen, Professor of Epidemiology at the Nuffield Department of Population Health, University of Oxford, and Chief Scientist for UK Biobank, and Dr Andrea Ramírez, Chief Data Officer at the All of Us Research Program in the United States.

Andrea, if I could start with you. It would be really great to hear about All of Us, an incredibly ambitious programme in the US, and maybe some of the successes it has achieved so far.

Andrea: Absolutely. Wonderful to be here with you and thank for you for the invitation. The All of Us Research Program started in 2016 from the Precision Medicine Initiative and was funded with the goal of recruiting 1 million or more participants into a health database. That includes information not only from things like biospecimens including their whole genome sequence, but also surveys that participants provide, and importantly linking electronic health record information and other public data that is available, to create a large database that researchers that access and use to study precision health.

We have recruited over 830,000 participants to date and are currently sharing available data on over 600,000. So, we’re excited to be with your audience, and I hope we can learn more and contribute to educating people listening about precision medicine.

Natalie: Thank you, Andrea. And not that this is competitive at all, but Raghib, as we are recording this, I understand the Our Future Health programme is marking quite a phenomenal milestone of 1 million participants. Would you mind telling us a little bit about the programme and something that you see as the benefits of working at scale for health research.

Raghib: Thank you very much. So, Our Future Health is a relatively new project. It was launched in 2020 with the aim of understanding better ways to detect disease as early as possible, predict disease, and intervene early to prevent common chronic diseases. Similar to All of Us, we are creating a very large database of participants who contribute their questionnaire data, physical data, genetic data, and linkage to healthcare records, with the aim as I said, to really improve our understanding of how best to prevent common chronic diseases.

So, we launched recruitment in October 2022. Our aim is to recruit 5 million participants altogether, and in the last 2 years about 1.85 million people have now consented to join the project. But you are right, as of last week we have what we call 1 million full participants, so people that have donated a blood sample, completed the questionnaire, and consented to link to their healthcare records. In our trusted research environment, we now have data on over 1million people available for researchers to use.

Of course, we have learnt a lot from the approach of UK Biobank, which we are going to hear about shortly, but the resource is open to researchers across the world, from academia, from the NHS, from industry, so that will hopefully maximise the benefits of that data to researchers, but as I say with a particular focus on early detection, early intervention, and prevention research.

Natalie: Thank you Raghib. Great to have you with us. Naomi, Raghib mentioned that UK Biobank has been running for a long time, since 2006. It is a real success story in terms of driving a huge range of valuable research efforts. Could you talk to us a little bit about the study and its history and what you have learned so far about the sort of benefits and some of the challenges of being able to bring lots of different datatypes together for research purposes?

Naomi: Yeah, sure. So, UK Biobank started recruiting 0.5 million participants in 2006 to 2010 from all across the UK with a view to generating a very deep dataset. So, we have collected information on their lifestyle, a whole range of physical measures. We collected biological samples, so we have data on their genomics and other biomarkers. Crucially because they recruited 15+ years ago, we have been able to follow up their health over time to find out what happens to their health by linkage to electronic healthcare records. So, we already have 8,000 women with breast cancer in the resource, cardiovascular disease, diabetes, and so on.

But perhaps most importantly, not only does it have great data depth, and data breadth, and the longitudinal aspect, is the data is easily accessible to researchers both from academia and industry, and we already have 18,000 researchers actively using the data as we speak, and over 12,000 publications already generating scientific discoveries from the resource.

Natalie: So, we have got 3 quite different approaches. Recruiting in different ways, different scale, different depth of data collection and analysis, but all very much around this ethos of bringing lots of different datatypes together for research purposes. I wonder if you could talk a little bit about how you might be sort of working together, even though you have got slightly different approaches. Are there things that you are learning from one another, from these different data infrastructures, or how might you be looking in the future to work together to address some of the challenges that might come up from working at scale?

Naomi: So, we talk to each other quite regularly. We have tried to learn from each other about the efficiencies of what to do and what not to do in how to run these large-scale studies efficiently. When you are trying to recruit and engage hundreds of thousands of participants, you need to do things very cost effectively. How to send out web-based questionnaires to individuals, how to collect biological samples, how to make the data easily accessible to researchers so they know exactly what data they are using.

All of that we are learning from each other, and you know it is a work in progress all the time. In particular, how can we standardise our data so that researchers who say are using All of Us can then try and replicate their findings in a different population in the UK by using UK Biobank or Our Future Health. So, can we come up with common standards so that

Comments

In Channel

Amanda Pichini: What is a genetic counsellor?

2025-11-1208:02

Dr Emily Perry: What is the Genomics England Research Environment?

2025-10-1505:13

Jenna Cusworth-Bolger, Tracie Miles and Rachel Peck: How are families and hospitals bringing the Generation Study to life?

2025-09-2440:32

Dr Nour Elkhateeb: What is a clinical geneticist?

2025-09-1009:31

Francisco Azuaje, Karim Beguir, Harry Farmer and Dr Rich Scott: How can cross-sector collaborations drive responsible use of AI for genomic innovation?

2025-08-2738:08

Dr Harriet Etheredge, Gordon Bedford, Suzalee Blair-Gordon and Suzannah Kinsella: How do people feel about using genomic data to guide health across a lifetime?

2025-05-1330:35

Dr Natalie Banner, Paul Arvidson, Dr Rich Gorman and Professor Bobbie Farsides: How can we enable ethical and inclusive research to thrive?

2025-04-2342:02

Vivienne Parry, Alice Tuff-Lacey, Dalia Kasperaviciute and Kerry Leeson Bevers: What can we learn from the Generation Study?

2025-03-1934:03

Dr Ana Lisa Tavares, Anne Lennox, Dr Meriel McEntagart, Dr Carlo Rinaldi: Can patient collaboration shape the future of therapies for rare conditions?

2025-02-2646:08

Dr Gavin Arno, Kate Arkell, Bhavini Makwana and Naimah Callachand: Can genomic research close the diagnostic gap in inherited sight loss?