Using artificial intelligence techniques for early diagnosis of lung cancer in general practice

Update: 2025-05-20

Description

Today, we’re speaking to Professor Martijn Schut, Professor of Translational AI in Laboratory Medicine and Professor Henk CPM van Weert, GP and Emeritus Professor of General Practice, both based at Amsterdam University Medical Center.

Title of paper: Artificial intelligence for early detection of lung cancer in GPs’ clinical notes: a retrospective observational cohort study

Available at: https://doi.org/10.3399/BJGP.2023.0489

In most cancers, the prognosis depends substantially on the stage at the start of therapy. Therefore, many methods have been developed to enhance earlier diagnosis, for example, logistic regression models, biomarkers, and electronic-nose technology (exhaled volatile organic compounds). However, as most patients are referred by their GP, who keeps life-long histories of enlisted patients, general practice files might contain hidden information that could be used for earlier case finding. An algorithm was developed to identify patients with lung cancer 4 months earlier, just by analysing their files. Contrary to other methods, all medical information available in general practice was used.

Transcript

This transcript was generated using AI and has not been reviewed for accuracy. Please be aware it may contain errors or omissions.

Speaker A

00:00:01 .600 - 00:00:55 .370

Hello and welcome to BJGP Interviews. I'm Nada Khan and I'm one of the associate editors of the journal. Thanks for taking the time today to listen to this podcast.

Today we're speaking to Professor Martin Schutt, who is a professor in translational AI and Laboratory medicine, and Professor Hank Vanwort, GP and Emeritus professor in General Practice, who are both based at Amsterdam University Medical Center. We're here to discuss their paper, which is titled Artificial Intelligence for Early Detection of lung cancer in GP's clinical notes.

So, yeah, it's great to see you both here today. And Martin, I'll come to you first.

I suppose we know that it's important to try and diagnose cancer early, but could you talk us through what's the potential for artificial intelligence here in terms of identifying cancer earlier based on patient records?

Speaker B

00:00:55 .810 - 00:01:52 .220

Yeah, that's a very interesting question because the potential kind of like goes hand in hand with the huge amount of interest in AI. And I think there are great opportunities. There are also great challenges.

But talking about the opportunities, especially in the context of the article that we wrote, is on the data side. So on the data side, the digitalization of electronic health records gives great opportunities.

A lot more is digitalized, and that means that we also, in our case, have access to free text, and that we, with the advent of the large language models, with also new developments in AI, we also have better ways of making use of those data. So those two combined creates a really interesting formula for big opportunities for AI in the general practice and healthcare in general.

Speaker A

00:01:52 .300 - 00:02:05 .960

And you mentioned access to free text records. So what GPs are typing into the record records?

But before we get into the study, can you just briefly describe what is natural language processing and how that can be used in free text records?

Speaker B

00:02:06 .760 - 00:03:10 .100

So we know that a lot of clinical risk scores, they work with features of patients, so their age and their gender or sex. And. But of course, a lot of information is also written up in unstructured way. And in our case that is text.

But we can also think of images and audio, and in that sense we have access to that data by different ways, which natural language processing is one of them. And it means that we give AI access to this text through, for example, advanced models like we now have, like ChatGPT users.

But that's only one extreme of the spectrum that we can talk about, because you could also imagine that we just simply look with keywords through the text, and then if certain keywords were mentioned, that you include that in the information that is available to your Docu to your, to your model.

Speaker A

00:03:10 .260 - 00:03:18 .820

And Hank, I don't know if you want to comment on just what we know already about clinical scoring systems for early diagnosis of cancer.

Speaker C

00:03:19 .140 - 00:04:21 .310

The problem with what we already know is that we know things because they have been coded in the past. If, if you look at the ways to access data, the only way to access data was by using codes.

And the big jump forward is made by using not only codes, but also text, because codes will always be replicating themselves.

By which I mean that a GP who likes to, to have to make notes of what he has been speaking about with patients, he cannot code all the things that he will write down.

So codes will always form a very exquisite extraction of the content of a consultation and will never present us with new information because codes only exist when the information was already there. Otherwise there will be no codes. Just so implicitly there is be a replication of what we know when we have to code our things.

Speaker A

00:04:21 .899 - 00:04:49 .139

Yeah, absolutely.

And I work with a colleague called Sarah Price who's done some research around coding and she's shown in her research that clinical coding can be biased depending on the outcome. So people who have bladder cancer, they're more likely to have codes for hematuria or blood in the urine.

So, yeah, there could be a discrepancy in how clinicians code things rather than write it in the free text.

Speaker C

00:04:49 .139 - 00:05:09 .160

Yeah, because in the past there has been done some marvelous research by Willie Hamilton, Hamilton and for example, and Judy Hippisley Cox is well known, but they had to use codes. So there was never a jump forward. And I think that now with the aid of natural language, we can make a jump forward.

Speaker A

00:05:09 .559 - 00:05:36 .620

And the methods that you use here are quite complex, but I'll try to summarize it briefly.

So essentially you analyze the electronic health records of over half a million Dutch patients and used these natural language processing techniques and machine learning to look back in the records of people diagnosed with cancer. And then you look to see what data in those records could be used to predict lung cancer.

But is there anything you want to add to that, just for a lay audience? Martin?

Speaker B

00:05:36 .700 - 00:06:20 .170

Yeah, one nuance, a small correction on that is that we don't only look at the patient with cancer, but we look at the cases and controls. So we both look at that because the AI needs to be able to distinguish the case from the controls.

I think that's one important distinction because in healthcare, fortunately, we always have to do with low prevalences. We don't have too many patients compared to the healthy patients. That is Something of what the complexity of these kinds of models is.

I think that is also important to realize when you develop these kinds of models.

Speaker C

00:06:20 .810 - 00:06:22 .250

May I add something because.

Speaker A

00:06:22 .250 - 00:06:22 .730

Yes, please.

Speaker C

00:06:22 .730 - 00:07:09 .230

Because if you look at the, the scientific side of it, then if you develop a prediction model for, for a cancer, for example, then you have to do that with a logistic regression method. And logistic regressions can, can contain many variables, but not as many as you can use when you, when you can use new large language models.

So you can also analyze many more variables. But you can. That's one point. And the second point is that you can analyze those variables in connection to each other.

Great advantage compared to the past. So if you look at the model that we are, we used for this research, I think we use two layers of 100 variables in different relations to each other.

So that gives you 100 times, hundred possibilities.

Speaker A

00:07:09 .470 - 00:07:14 .630

Talk us through what you did develop here. So what? Talk us through that. Maybe Martin, you can try to explain.

Speaker B

00:07:14 .630 - 00:08:21 .700

Yeah, Can I start with. So we picked up a signal.

So we develop prediction models taking into all of these, what you said, over half a million patients, all the clinical notes, the consultations that they had, put it in a prediction model. We pick up a signal, we can make a prediction model that can. That performs well. So that's one.

But the second step is that ideally we would also like to get some information from that model. It's like, what do you use to predict what does contribute to a prediction for lung cancer?

And then we come to the nature of the complex methods that we use is that they are black box. We are not able to open them up and see what is in them.

And that is actually, I

Comments

In Channel

Faecal calprotectin in the over-50s: Rule-out test or red flag?

2025-11-1114:46

Antidepressants in pregnancy: A closer look at miscarriage risk

2025-11-0409:32

Not one size fits all: Accessing menopause care in the NHS

2025-10-2815:43

Counting GPs: When definitions change the workforce picture

2025-10-2115:47

Talking GLP-1s: how GPs see their role in obesity management

2025-10-1417:02

Receptionists reimagined: How online services are transforming the GP front desk

2025-10-0715:38

Menopausal symptoms from hormone receptor positive breast cancer treatment

2025-09-3024:17

Inside the BJGP and editorial insights: Euan Lawson on the future of publishing and how to get published

2025-09-2320:46

Bridging the gap: GPs, patients, and mental health in perimenopause

2025-09-1619:20

Balancing safety and access: The GP’s role in isotretinoin management

2025-09-0918:12

What do patients really want? Rethinking general practice access

2025-06-2415:59

ADHD medication – practical tips for GPs on how to recognise common side effects and what to do

2025-06-1717:47

Risk of postural hypotension associated with antidepressants in older adults – what to think about when prescribing

2025-06-1014:08

The ‘new kid on the block’ – same day versus routine care appointment systems in general practice

2025-06-0317:24

More chest x-rays lead to earlier lung cancer diagnoses and better cancer survival – what we can be doing differently in practice

2025-05-2718:49

Using artificial intelligence techniques for early diagnosis of lung cancer in general practice

2025-05-2020:10

‘See the symptom, not the pregnancy’- a look at cancer diagnosis during pregnancy

2025-05-1314:58

Prescribing testosterone in hypoactive sexual desire disorder – how to initiate it, and how to monitor it in general practice

2025-05-0618:27

Looking back at the BJGP Research Conference 2025

2025-04-0113:38

The challenges to diagnosing vulval lichen sclerosus and how to get it right

2025-03-2518:19

00:00

Using artificial intelligence techniques for early diagnosis of lung cancer in general practice

#box-pro-ellipsis-176422475823760{-webkit-line-clamp:2;}Using artificial intelligence techniques for early diagnosis of lung cancer in general practice

Using artificial intelligence techniques for early diagnosis of lung cancer in general practice

Using artificial intelligence techniques for early diagnosis of lung cancer in general practice