DiscoverResearch in ActionTransforming public health with unstructured data and NLP in FDA's Sentinel Initiative
Transforming public health with unstructured data and NLP in FDA's Sentinel Initiative

Transforming public health with unstructured data and NLP in FDA's Sentinel Initiative

Update: 2024-07-23
Share

Description

What is the MOSAIC-NLP project around structured and unstructured EHR data? Why is structured data not really enough for drug safety studies? And to what degree is NLP speeding up access to data and research results? We will learn all that and more in this episode of Research in Action with Dr. Darren Toh, Professor at Harvard Medical School and Principal Investigator at Sentinel Operations Center.

www.oracle.com/health

www.oracle.com/life

 www.sentinelinitiative.org

--------------------------------------------------------

Episode Transcript:

00;00;00;00 - 00;00;26;14
What is the MOSAIC and LP project around structured and unstructured data? Why is structured data not really enough for drug safety studies? And to what degree is NLP speeding up access to data and research results? We'll find all that out and more on this episode of Research in Action. Hello and welcome to Research in Action, brought to you by Oracle Life Sciences.
 
00;00;26;14 - 00;00;50;14
I'm Mike Stiles. And today our guest is Dr. Darren Toh, professor at Harvard Medical School and principal investigator at Sentinel Operations Center. He's got a lot of expertise in Pharmacoepidemiology as well as comparative effectiveness research and real-world data. So, Darren, really glad to have you with us today. Thank you. My pleasure to be here. Well, tell us how you wound up where you are today.
 
00;00;50;14 - 00;01;26;22
What what attracted you in the beginning to public health? Good question. So I trained in pharmacy originally, and I got my Masters degree in Pharmaceutical Outcomes Research at a University of Chicago, Illinois, Chicago. And it's where I first learned about a field called Pharmacoepidemiology, which sort of very interesting to me because I like to solve problems with methods and data and pharmacoepidemiology.
 
00;01;26;22 - 00;02;00;29
It seems to be able to teach me how to do that. So I got into the program at the Harvard School of Public Health, and when I was finishing up, I was deciding between staying in academia and going somewhere and getting a real job. And that's when I found out about an opportunity within my current organization and I've heard great things about this organization.
 
00;02;00;29 - 00;02;29;26
So I thought I would give it a try. And the timing turned out to be perfect because when I joined, our group was responding to a request for proposal for what is called a mini sentinel pilot, which ultimately became the sentinel system that we have today. So I've been involved in the Sentinel system since the very beginning or before we began.
 
00;02;29;28 - 00;03;02;25
And for the past 15 years I've been with the system and the program and because I really like its public health mission and I'm also very drawn to the dedication of FDA, our partners and my colleagues to make this a successful program. Well, so now here you are, a principal investigator. What exactly is the Sentinel Operations Center? What's what's the mission there and what part do you specifically play in it?
 
00;03;02;27 - 00;03;52;26
Sentinel is a pretty unique system because it is a congressionally mandated system. So the Congress passed what is called the FDA Amendments Act in 2007. And within that FDA, the Congress asked FDA to create a new program to complement FDA existing systems to monitor medical product safety and more specifically, the Congress, US FDA, to create a post-market risk identification and analysis system that will be using data from multiple sources that will cover at least 1 million lives to to look at the safety of medical products after they are approved and marketed.
 
00;03;52;28 - 00;04;33;07
So in response to this congressional mandate, FDA launched what is called a Sentinel initiative in 2008 and in 2009 as I mentioned, FDA issued its request for proposal to launch the Mini Sentinel Pilot program, and the program grew into the sentinel system that we have today. So it's for my involvement. It sort of grew over time. So when I joined, as I mentioned, we were responding to this request for a proposal and we were very lucky to be awarded the contract.
 
00;04;33;09 - 00;05;04;05
So when it was starting, I serve as a one of the many epidemiologists on the team and I led several studies and I gradually took on more leadership responsibility and became the principal investigator of the Sentinel Operations Center in 2022. So I've been very fortunate to have a team of very professional and very dedicated colleagues within the operations center.
 
00;05;04;05 - 00;05;27;26
So on a day to day basis, we work with FDA to make sure that we can help them answer the questions they would like to get addressed. And we also work with our partners to make sure that they have the resources that they need to answer the questions for FDA. And most of the time I'm just the cheerleader in chief just to share my colleagues and our collaborators.
 
00;05;27;28 - 00;06;11;23
Now that's great. And and then specifically, there's the Mosaic NLP project that you're involved with. What is that trying to achieve and what are the collaborations being leveraged to get that done? So Sentinel Systems has always had access to medical claims data and electronic health record data or year data. One of the main goals for the current sentinel system is to incorporate even more data, both structured and unstructured, into the sentinel system and to combine it with advanced analytic methods so that FDA can answer even more regulatory questions.
 
00;06;11;25 - 00;06;40;09
So the Mosaic and NLP project was one of the projects that FDA funded to accomplish this goal. So the main goal of this project is to demonstrate how billing claims and data from multiple sources when combined with advanced machine learning and natural language processing methods, could be used to extract useful information from unstructured clinical data to perform a more robust drug safety assessment.
 
00;06;40;11 - 00;07;21;18
When we tried to launch this project, we decided that we would issue our own request for proposal. So there was an open and competitive process, and Oracle, together with their collaborators, were selected to lead this project. So I want to talk in broad or general terms right now about data sharing, the standards and practices around that. It kind of feels silly for anyone to say it's not needed, that we can get a comprehensive view and analysis of diseases and how they're impacting the population without it.
 
00;07;21;20 - 00;07;46;15
NIH is on board. It updated the DMS policy to promote data sharing. You know, the FDA obviously is leaning into this. So is data sharing now happening and advancing research as expected, or are there still hang ups? So I think we are making good progress. So I think the good news is data are just being accrued at an unprecedented rate.
 
00;07;46;17 - 00;08;28;21
So there are just so much data now for us to potentially access and analyze. There's always this concern about proper safeguard of individual privacy. And through our work, we also became very appreciative of other considerations, for example, the fishery responsibilities of the delivery systems and payers to protect patient data and make sure that they are used properly. So you mentioned the recent changes, including in data management, ensuring policy, which I think are moving us in the right direction.
 
00;08;28;26 - 00;08;56;23
But if you look closer at the NIH policy, it makes special considerations for proprietary data. So I would say that we have made some progress, but access to proprietary data remains very challenging. And the FDA, the NIH policy doesn't actually fully resolve that yet. When you think about the people who do make that argument for limited data sharing, they do mostly talk about what you just said about patient privacy.
 
00;08;56;23 - 00;09;25;20
IT proprietary data. Pharma is especially sensitive to that, I would imagine. So how do we incentivize the reluctant how can we ease their risks and concerns or can we? Yeah, it's a tough question. I think that this require a multi-pronged approach and I can only comment on some aspects of this. So I would say that at least based on our experience, the willingness or ability to share data often depends on the purpose.
 
00;09;25;23 - 00;09;55;29
That is, why do we need the data? Many data partners participate in Sentinel because of its public health mission, and our consideration is how would the data be used again, Is there proper safeguard of patient privacy and institutional interest? There are other ways to share data. For example, instead of asking the data to come to us, we can send analysis to where the data is.
 
00;09;56;06 - 00;10;34;22
And that is actually the principle follow by federated system like Sentinel. So we don't pull the data centrally. We send an analysis to the data partners and only get back what we need it. And it's usually in the summary level format. So that actually encourages more data sharing instead of less sharing. I would say that recent advances in some domains, such as tokenization and encryption, might also reduce some concern about a data sharing, a patient privacy concerns in academic settings.
 
00;10;34;29 - 00;11;24;26
We've been talking a lot about days, for example, for individual who collect the data and the people I propose to offer them authorship or proper acknowledgment if they are willing to share their data. But that is not suffici

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Transforming public health with unstructured data and NLP in FDA's Sentinel Initiative

Transforming public health with unstructured data and NLP in FDA's Sentinel Initiative

Oracle Corporation