Dev and Doc: AI For Healthcare Podcast

30 Episodes

Reverse

Everything you need to know about LLM benchmarks- Turing Test, OpenAI's Healthbench, ARC prize, LM arena

2025-08-2255:19

Whenever there was AI, there were benchmarks- from the turing test, to society-changing benchmarks like MNIST and ImageNet to modern problems like the ARC prize, benchmarked served a vital purpose to measure the performance of AI models. But something has shifted in modern times, in the LLM era have benchmarks lost their utility, becoming mere advertisement for big tech? Even seemingly more sophisticated benchmarks like LM Arena can be gamed by tech giants. We also deep dive into healthcare benchmarks like OpenAI's Healthbench (deeply problematic) and Microsoft's AI-DXO orchestrator agent for diagnosis. Where is this all going? How do we make the perfect benchmark? Or is the real work to be done afterwards in the real world?👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)---Timestamps00:00 Intro - The OG benchmarks - Turing test, MNIST, ImageNET06:40 Are large language models benchmarks similar to humans taking tests?10:05 Are we testing model capability vs production ready?12:00 LLM era - data contamination15:30 LM Arena - The leaderboard illusion paper - how big tech games benchmarks28:35 Goodhart's law - When a measure becomes a target, it ceases to be a good measure32:05 Some good benchmarks - games - Pokemon, ARC prize, Minecraft34:35 Medical benchmarks - OpenAI's healthbench has some big problems46:50 Microsoft AI-DXO orchestrator for case reports---Connect with UsYour Hosts:👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn🤖 Dev - Zeljko Kraljevic - TwitterFollow & Subscribe:YT: https://youtube.com/@DevAndDocSpotify: Follow us on SpotifyApple Podcasts: Listen on Apple PodcastsSubstack: https://aiforhealthcare.substack.com/For enquiries:📧 Devanddoc@gmail.com---Production Credits🎞️ Editor: Dragan Kraljević - Instagram🎨 Brand & Art: Ana Grigorovici - Behance

#28 AI agents explained - Manus AI, computer control, Agentic workflows (healthcare)

2025-05-0901:00:48

AI agents are here, but how did we get here in the first place? How do we build and leverage AI agents for high stakes domains like healthcare? In this episode of Dev and Doc, we go deep into the forest that is AI agents and computer control - starting from the "caveman" era of LLMs discovering tools, to cultivating intelligent models and agentic workflows. We dissect everyday agents like MANUS AI, and deep dive into how, where and when AI agents should be used. Are these agents hype or hope, is this actually the second deepseek moment?👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)Episode Timestamps:00:00 Highlight3:13 start / intro5:20 LLM's caveman era - tool usage6:46 Agents have autonomy and interact with environment11:15 workflows and agentic flows15:30 when should you be using an agent?24:27 vibe coding is like driving a car29:07 Demo - MANUS gathering financial trends, computer control35:55 Demo MANUS AI- website creation for Autism Assessment49:05 computer control factions- Freedom vs Process automation55:00 Autism website testing59:13 summary + endHosts:👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokrFind us on:YT - https://youtube.com/@DevAndDocSpotify - https://podcasters.spotify.com/pod/show/devanddocApple- https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120Substack- https://aiforhealthcare.substack.com/For enquiries:📧Devanddoc@gmail.comCredits:🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d

#27 Exploring Claude Sonnet 3.7 for healthcare

2025-02-2658:03

Can Claude perform a range of complex clinical tasks? Dev and Doc are here to investigate.Claude sonnet 3.7 was released less than 48 hours ago, the model is highly intelligent and is one of the best we have seen in recent memory. Definitely passes the vibe check.We give some amazing examples of coding with claude with few shot prompts, and cover technical and clinical evaluations and share our first thoughts. We even tested claude to take a patient history!NB - PLEASE don't do this at home, obviously this is a demo and we do not in any way condone or recommend using an LLM as your doctor or healthcare provider, we are just demonstrating what the future could be. If you are sick, please seek a medical professional.👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)TIMESTAMPS00:00 start + highlights01:54 Introduction08:54 Benchmarks, state of the art14:44 guardrails, refusals, AI safety and catastrophic risks22:36 show and tell- great for coding and make video games!26:54 example hospital runner30:17 Medical use cases- clinical coding, biomedical entity extraction37:04 only medical example in Claude model card- still hallucinating citations38:37 making an anatomy app40:10 forecasting clinical diagnoses43:36 taking a medical history from a patient53:33 wrap up👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - linkedin.com/in/dr-joshua-auyeung🤖Dev - Zeljko Kraljevic twitter.com/zeljkokrYT:youtube.com/@DevAndDocSpotify:podcasters.spotify.com/pod/show/devanddocApple:podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120Substack:aiforhealthcare.substack.comFor enquiries - 📧 Devanddoc@gmail.com🎞️ Editor - Dragan Kraljević instagram.com/dragan_kraljevic🎨 Brand design - Ana Grigorovici behance.net/anagrigorovici027d

#26 Is it still worth doing a PhD in 2025? (Computer Science / Machine Learning)

2025-02-2156:41

Is it still worth doing a PhD in 2025? Is the academic system broken in this publish-or-perish landscape? When is a PhD not worth pursuing? About this Episode In this Dev and Doc episode, Zeljko (now associate professor!) and Josh (doctor, PhD drop out) talk about the good and the bad of PhD life. They provide insight into the academic world with a focus on computer science and machine learning. 👋 Connect With Us! Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎙️ Hosts 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn 🤖 Dev - Zeljko Kraljevic - Twitter ⏳ Timestamps 00:00 - Start and highlight 01:42 - Intro 03:11 - What made you pursue PhD in the first place 05:05 - Industry or PhD first 10:00 - Positives - Moonshots 17:03 - Positives - Access to world experts and collaboration 20:55 - Positives - Open source and open science 24:49 - Positives - A good environment enables a smooth PhD 27:04 - Negatives - You are a one-man show 31:33 - Negatives - Publish or Perish 45:44 - Bring your research closer to the audience through blogs and other media, journals are legacy media 51:20 - Verdict - Is a PhD still worth it in 2025? 📢 Follow Us LinkedIn Newsletter YouTube Spotify Apple Podcasts Substack 📧 Contact Us For enquiries - devanddoc@gmail.com 🎞️ Video Production 🎬 Editor - Dragan Kraljević - Instagram 🎨 Brand Design & Art Direction - Ana Grigorovici - Behance

#25 Testing Deepseek R1 on Complex Medical Tasks. Here's what we found. (GRPO explainer)

2025-02-0701:20:45

Dev and Doc put Deepseek R1 to the test in a technical and clinical deep dive. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-au-yeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr TIMESTAMPS 00:00 Highlights 04:36 Intro 08:29 response from OpenAI, Anthropic- model training costs, tightening restrictions on China, pricing wars 13:13 what an open-source deepseek means for the world. 15:38 Sam altman and Dario amodei feeling the pressure 23:10 TECHNICAL deep dive - RLHF, ppo, dpo 37:08 GRPO, R1s secret sauce 45:02 the aha moment, learning like a human? 50:25 deepseek R1 training and controversy 59:08 deepseek healthcare evaluation - Ethnic Bias 1:06:17 The diagnostic acid test (fail) 1:12:46 Coding clinical data / Medical billing (shout out SNOMED) LinkedIn Newsletter https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7216474068085026817 YT - https://youtube.com/@DevAndDoc Spotify - https://podcasters.spotify.com/pod/show/devanddoc Apple- https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 Substack- https://aiforhealthcare.substack.com/ For enquiries - 📧Devanddoc@gmail.com 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d

#24 Significantly advancing LLMs with RAG (Google's Gemini 2.0, Deep Research, notebookLM)

2025-01-1057:46

Dev and Doc - Latest News Dev and Doc - Latest News It's 2025, Dev and Doc cover the latest news including Google's deep research and notebook LM, DeepMind's Promptbreeder, and Anthropic's new RAG approach. We also go through what retrieval augmented generation (RAG) is, and how this technique is advancing LLM performance. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) Meet the Team 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn 🤖 Dev - Zeljko Kraljevic - Twitter Where to Follow Us LinkedIn Newsletter YouTube Spotify Apple Podcasts Substack Contact Us 📧 For enquiries - Devanddoc@gmail.com Credits 🎞️ Editor - Dragan Kraljević - Instagram 🎨 Brand Design and Art Direction - Ana Grigorovici - Behance Episode Timeline 00:00 Highlights 00:53 News - Notebook LM, OpenAI 12 days of Christmas 07:44 Change in the meta - post-training 11:34 Optimizing prompts with DeepMind Promptbreeder 13:20 Is OpenAI losing their lead against Google 16:45 Deep research vs Perplexity 24:18 AIME and oncology 26:00 Deep research results 30:20 RAG intro 33:14 Second pass RAG 36:20 RAG didn't take off 38:40 Wikichat 39:16 How do we improve on RAG? 41:11 Semantic/topic chunking, cross-encoders, agentic RAG 51:15 Google’s Problem Decomposition 53:32 Anthropic’s Contextual Retrieval Processing 56:07 Summary and wrap up References Cross Encoders Wikichat Google's Problem Decomposition Anthropic's Contextual Retrieval Google AIME in Oncology DeepMind's Promptbreeder

#23 Can OpenAI's GPT o1 solve complex medical problems?

2024-09-2039:44

First Thoughts and Preliminary Insights into OpenAI's GPT o1 Strawberry in the Medical Domain With some expected and unexpected findings, we have a "bake off" between o1 and Doc to demonstrate how o1 fares with tricky medical scenarios. Disclaimer Obviously, don't use AI to diagnose or treat your medical problems. If you are unwell, please seek a medical professional (AI isn't good enough just yet :)). 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) Contributors • 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ • 🤖 Dev - Zeljko Kraljevic - https://twitter.com/zeljkokr Follow Us • https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7216474068085026817 • https://youtube.com/@DevAndDoc • https://podcasters.spotify.com/pod/show/devanddoc • https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 • https://aiforhealthcare.substack.com/ For enquiries - 📧 mailto:Devanddoc@gmail.com Team • 🎞️ Editor - Dragan Kraljević - https://www.instagram.com/dragan_kraljevic/ • 🎨 Brand Design and Art Direction - Ana Grigorovici - https://www.behance.net/anagrigorovici027d Timestamps • 00:00 - Start + Highlights • 01:28 - Intro, What is GPT o1? • 05:18 - What is "Reasoning" in o1? • 12:38 - Benchmarks: o1's Successes and Failures • 24:07 - o1 and Doctor Bake Off! • 24:21 - The Pregnancy Acid Test for LLMs • 26:23 - Clinical Coding • 30:06 - Tricky Patient Scenarios • 32:25 - Opioid Dose Conversions

#22 Explaining Explainable AI (for healthcare) with Dr Annabelle Painter (RSM digital health section Podcast)

2024-08-1558:40

Dev and Doc is joined by guest Annabelle Painter, doctor, CMO, and podcaster for the Royal Society of Medicine Digital Health Podcast. We deep dive into explainability and interpretability with concrete healthcare examples. Check out Dr. Painter's Podcast here, she has some amazing guests and great insights into AI in healthcare! - https://spotify.link/pzSgxmpD5yb 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖 Dev - Zeljko Kraljevic - https://twitter.com/zeljkokr LinkedIn Newsletter YouTube Channel Spotify Apple Podcasts Substack For enquiries - 📧 Devanddoc@gmail.com 🎞️ Editor - Dragan Kraljević - https://www.instagram.com/dragan_kraljevic/ 🎨 Brand design and art direction - Ana Grigorovici - https://www.behance.net/anagrigorovici027d Timestamps: 00:00 - Start + highlights 03:47 - Intro 08:16 - Does all AI in healthcare need to be explainable? 15:56 - History and explanation of Explainable/Interpretable AI 20:43 - Gradient-based saliency and heat maps 24:14 - LIME - Local Interpretable Model-agnostic Explanations 30:09 - Nonsensical correlations - When explainability goes wrong 33:57 - Modern explainability - Anthropic 37:15 - Comparing LLMs with the human brain 40:02 - Clinician-AI interaction 47:11 - Where is this all going? Aligning models to ground truth and teaching them to say "I don't know" References: Fun Examples of when models go wrong - Nonsensical correlations Mechanistic interpretability Anthropic - Mapping the mind of language models Limitations of current AI explainability approaches Explainability does not improve automation bias in radiologists

#21 Foundational Models in Digital Pathology: Enhancing Cancer detection and outcomes

2024-08-0201:01:43

An explainer on Foundation models for pathology, from Microsoft's Gigapath to Owkin's H-optimus-0, every company, big or small, are building pathology AI models. In this episode, Doc talks to Sean M. Hacking, assistant professor in Pathology at NYU Grossman School of Medicine and Özgür Şahin, particle physicist at CERN. Together they are building the infrastructure for digital pathology that then allows training of pathology foundational models. Find out more at https://www.pathonn.com/. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7216474068085026817 https://youtube.com/@DevAndDoc https://podcasters.spotify.com/pod/show/devanddoc https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 https://aiforhealthcare.substack.com/ 👨🏻‍⚕️Doc - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖Dev - https://twitter.com/zeljkokr 🎞️ Editor - https://www.instagram.com/dragan_kraljevic/ 🎨 Brand design and art direction - https://www.behance.net/anagrigorovici027d 00:00 Introduction 03:28 Why pathology 06:42 Transporting slides is a logistical nightmare 13:20 When particle physics and AI pathology collide 17:55 AI digital pathology - Patch-based architecture and sparse topologies 27:09 Is there enough pathology data? 29:11 Microsoft and Gigapath, transformer models for pathology 33:55 Pathology models clinical applications 43:18 Staining applications of AI 49:22 Building a digital pathology startup - Patho-NN 57:36 Using AI to see tumor grading features that humans can’t see References: https://www.nature.com/articles/s41586-024-07441-w https://www.microsoft.com/en-us/research/blog/gigapath-whole-slide-foundation-model-for-digital-pathology/ https://www.nature.com/articles/s41379-021-00919-2

#20 How to build a successful healthTech/ BioTech start-up (2024 roadmap) - Derrick Khor

2024-07-1801:08:33

Doc talks to Dr Derrick Khor - Cancer Doctor, HealthTech Consultant and Linkedin Guru. We share Derrick's insights from consulting over 120 companies and a step-by-step guide on how to build a successful Healthcare company. You can find more of Derrick and his helpful guides - https://adoptadoc.com/resources/ profile- https://www.linkedin.com/in/derrick-khor/ 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) Dev&Doc is a podcast where doctors and developers deep dive into the potential of AI in healthcare. 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung 🤖Dev - Zeljko KraljevicLinkedIn NewsletterYouTubeSpotifyAppleSubstack For enquiries - 📧 Devanddoc@gmail.com 🎞️ Editor - <a href="https://www.instagram.com/dragan_kraljevic/">Dragan Kraljević</a> 🎨 Brand design and art direction - <a href="https://www.behance.net/anagrigorovici027d">Ana Grigorovici</a> Timestamps 00:00 Highlights and intro 3:01 Start 5:10 getting into health tech 8:03 lack of clinicians in start ups 15:07 Derrick's own healthtech journey to consulting 23:37 Start ups and failure 27:35 the start up road map 32:16 are you a medical device (samd)? Intended use 40:55 clinical evidence generation 48:16 go to market, NHS DTAC 57:57 power of networking, social media, linkedin 1:02:43 top UK health tech companies to look out for

#19 Tracking health with technology and AI - demystifying digital biomarkers

2024-07-0401:03:36

Dev and Doc deconstruct digital biomarkers! This is a fascinating and nascent field in the world of medicine, how have biomarkers transformed the way we practice medicine, and how will AI and wearables, sensors and digital fingerprints transform the way we practice in the future? Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) find us on youtube- @Dev and Doc 📙Substack: https://aiforhealthcare.substack.com/👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d Timestamp 00:00 highlights 01:50 intro 02:40 how biomarkers evolved in the last century 6:02 what is the definition of a biomarker 10:00 biomarkers can be very biased depending on who you are testing 12:31 when does a test become a biomarker 17:30 the digital age and measurements - AI vision in retina scans, digital stethoscopes 23:50 what is an “analog” biomarker vs digital biomarker? 30:10 where do biomarkers fail in evidence based medicine? 34:55 Biomarkers are pretty poor for mental health 47:57 can AI predict depression better than humans? 51:21 Digital biomarkers to detect movement disorders 01:00:04 this can change clinical trials forever Refs - variable definitions of biomarkers https://informatics.bmj.com/content/31/1/e100914 -digital biomarkers convergence nature paper https://www.nature.com/articles/s41746-022-00583-z -digital stethoscope for heart failure https://www.thelancet.com/pdfs/journals/landig/PIIS2589-7500(21)00256-9.pdf -touch screen typing depression paper https://www.nature.com/articles/s41746-022-00583-z - Duchennes body suit biomarker https://www.nature.com/articles/s41591-022-02045-1#Sec9 - Friedreichs ataxia body suit https://www.nature.com/articles/s41591-022-02159-6?fromPaywallRec=false#Sec9

#18 Keith Grimes - Startups and doctors, HealthTech consulting, Babylon's demise, Leadership theory

2024-05-3001:09:33

Dr Keith Grimes is a HealthTech consultant and General Practitioner working with companies to transform clinical ideas into something impactful. He worked as the digital health director in Babylon Health prior to its demise, and currently runs his own consulting firm, Curistica. This is one not to miss! References HealthTech consulting at Curistica www.curistica.com Prof Amanda Goodall on leadership theory https://amandagoodall.com/ For those interested in Leadership opportunities: -Faculty of medical leadership and management https://www.fmlm.ac.uk/ -Bite labs https://www.bitelabs.io/ Dev&Doc is a podcast where doctors and developers deep dive into the potential of AI in healthcare. 👨🏻‍⚕️Doc - <a href="https://www.linkedin.com/in/dr-joshua-auyeung/">Dr. Joshua Au Yeung</a> 🤖Dev - <a href="https://twitter.com/zeljkokr">Zeljko Kraljevic</a> <a href="https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7216474068085026817">LinkedIn Newsletter</a> <a href="https://youtube.com/@DevAndDoc">YouTube</a> <a href="https://podcasters.spotify.com/pod/show/devanddoc">Spotify</a> <a href="https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120">Apple</a> <a href="https://aiforhealthcare.substack.com/">Substack</a> For enquiries - 📧 <a href="mailto:Devanddoc@gmail.com">Devanddoc@gmail.com</a> 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d Timestamps 00:00 start 1:10 Career career career - GP, babylon health, digital consultancy 6:40 working as a rural GP in Scotland 9:21 time is the biggest factor of clinical impact 12:11 finding impact through data 21:29 leading by example 23:52 Should doctors be leading healthtech businesses? 30:10 why do healthtech start-ups not have clinicians earlier? 36:30 Babylon failure - importance of having clinical influence at the top 43:55 experience being grilled on BBC newsnight 49:45 lessons learnt from the downfall of Babylon 52:25 6 values of consulting firm Curistica 55:51 common problems in start ups 59:36 how AI will change the healthcare landscape

#17 How to build a clinically safe Large Language Model - Hippocratic AI, Llama3, Biollama

2024-05-0943:24

How do we reach the holy grail of a clinically safe LLM for healthcare? Dev and Doc are back to discuss news with Meta's LlaMA model and potential of healthcare LLMs finetuned on top like BioLlaMa. We discuss the key steps in building a clinically safe LLM for healthcare for healthcare and how this was pursued by Hippocratic AI's latest model - Polaris. 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr The podcast 🎙️ 🔊Spotify: https://podcasters.spotify.com/pod/show/devanddoc 📙Substack: https://aiforhealthcare.substack.com/ Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d References Hippocratic AI LLM- https://arxiv.org/pdf/2403.13313 BioLLM tweet - https://twitter.com/aadityaura/status/1783662626901528803 Foresight lancet paper -https://www.thelancet.com/journals/landig/article/PIIS2589-7500(24)00025-6/fulltext Linear processing units- https://wow.groq.com/lpu-inference-engine/ Timestamps 00:00 Start 01:10 Intro- llama3 , a chatGPT level model in our hands 06:53 Linear processing units to run LLMs 09:42 BioLLM for medical question and answering 11:13 quality and size of dataset, using youtube transcripts 12:41 Question and answering pairs do not reflect the real world - holy grail of healthcare llm 18:43 Dev has Beef with hippocratic AI 20:25 Step1 Training a clinical foundational model from scratch 22:43 Step 2 Instruction tuning with multi-turn simulated conversation 24:15 Step 3 training the model to guide model in tangential conversations 27:42 Focusing on the hospital back office and specialist nurse phone calls 33:02 Evaluating Polaris - clinical safety LLM , bedside manner, medical safety advice

#16 Dev&Doc x Rewired - LLMs, Clinical foundation models and automating administrative tasks (live)

2024-03-2146:59

In this special episode we share a live recording of our live podcast episode at the Rewired UK conference, where NHS, industry and policy markers unite. We discuss current LLMs from a technical and practical perspective. Dive into how to build Foundational models for the National health service and our experiences. We were also privileged to be joined by head of digital at Cambridge University Hospital NHS trust, Dr. Wai Keong Wong on how to evaluate AI products and discussions on automating administrative tasks for clinicians with ambient clinical documentation. 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr The podcast 🎙️ 🔊Spotify: https://podcasters.spotify.com/pod/show/devanddoc 📙Substack: https://aiforhealthcare.substack.com/ Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d 00:00 intro 02:05 AI vs doctors - are language models ready to replace doctors? 05:22 the tranformer models and attention 08:51 human labour for reinforcement learning 11:00 building the NHS LLM, key concepts 13:55 foresight GPT - predicting the next clinical event in a patient timeline. 16:29 is text enough? 17:19 £3.8B investment into NHS digitisation and admin automation - ambient clinical documentation 20:14 how do you evaluate AI products for the NHS? 26:24 how do you vet the tech companies and future proof your purchase? 27:23 do clinicians need more digital health education? 28:41 transparency of AI models and benchmarks 31:30 question - EHR data created by AI leads to homogenisation and errors 34:03 question - training on structured vs unstructured EHR data 38:06 question - LLMs as a brain. How do we give it a body? 41:05 framework for ai deployment

#15 The death of Prompt Engineering

2024-02-2934:52

What do Prompt engineers have in common with telephone operators in the 1870s? Spoiler - they're both dying professions 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr The podcast 🎙️ 🔊Spotify: https://podcasters.spotify.com/pod/show/devanddoc 📙Substack: https://aiforhealthcare.substack.com/ Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d 00:00 Highlights 01:10 Intro - where did prompt engineering go wrong? 4:10 what is prompt engineering fundamentally? 10:54 LLMs training data reflects prompt engineering 12:32 prompts are model dependent 14:02 prompts that make you think 18:26 combining expert and generalist medical models for doctors 19:49 Diagnostic reasoning prompts, is it interpretable? 26:55 can we find prompts more elegantly/ systematically ? 28:42 Will prompts become obsolete? Models that self discover prompts 31:09 Telephone operators and Prompt engineers - death of a profession Refs Prompt "hacks" (oh man) - https://learnprompting.org/docs/intermediate/chain_of_thought Diagnostic prompt interpretability paper - https://www.nature.com/articles/s41746-024-01010-1 self - discover https://arxiv.org/abs/2402.03620 telephone operators - https://www.history.com/news/rise-fall-telephone-switchboard-operators

#14 Aligning AI models for healthcare | Understanding Reinforcement Learning from Human Feedback (RLHF)

2024-02-1442:01

How do we align AI models for healthcare? 👨‍⚕️ And importantly, the moral codes and ethics that we practice everyday, how does the LLM deal with ethical scenarios like the trolley problem for example? This is a fascinating topic and one we spend a lot of time thinking about. In this episode Dev and Doc, Zeljko Kraljevic and I cover all the up to date topics around reinforcement learning, the benefits and where it can go wrong. We also discuss different RL methods including the algorithms used to train ChatGPT (RLHF). Dev and Doc is a Podcast where developers and doctors join forces to deep dive into AI in healthcare. Together, we can build models that matter. 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua... 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr The podcast 🎙️ 🔊Spotify: https://open.spotify.com/show/3QO5Lr3... 📙Substack: https://aiforhealthcare.substack.com/ Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kral... 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovic...00:00 Highlights 01:27 start 4:38 aligning ethics of ai models 7:04 doctors ethical choices daily 8:00 RLHF and AI training methods 16:29 reinforcement learning 19:35 Preference model -rewarding models correctly can make or break the success 27:05 exploiting reward function, model degradation (and how to fix it) Ref AI intro paper - https://pn.bmj.com/content/23/6/476 Open AI RLHF paper - https://arxiv.org/abs/1909.08593 War and peace of LLMs! - https://arxiv.org/abs/2311.17227

#13 Research begins when hype ends - Doc's adventure, LlaMa3 , Code LlaMa, Gemini Ultra

2024-02-0118:04

In this episode Doc goes on an adventure to chair an LLM/ generative AI conference session and reflects on his experience. Dev and Doc also discuss big news on meta's Llama3 and Code LlaMa. Dev and Doc is a Podcast where developers and doctors join forces to deep dive into AI in healthcare. Together, we can build models that matter. 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr The podcast 🎙️ 🔊Spotify: https://open.spotify.com/show/3QO5Lr3w4Rd6lqwlfKDaB7?si=e7915d844994403e 📙Substack: https://aiforhealthcare.substack.com/ Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d 00:00 Highlight 00:36 Start 1:57 Are researchers just using Generative AI to get presentations /publications? 6:18 Hype cycles , lack of real world clinical studies using LLMs 8:08 LlaMa3 , Code LlaMa announcement and insights 13:30 Google bard / Gemini ultra second on leaderboard 17:30 wrap up and end

#12 2024 AI Predictions : Ambient clinical intelligence, language models as commodities, GPT-5 and AGI

2024-01-1846:15

Dev And Doc are back ! Here we break down the biggest highlights of 2023, and AI predictions for 2024. Dev and Doc is a Podcast where developers and doctors join forces to deep dive into AI in healthcare. Together, we can build models that matter. 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr 00:00 start 01:01 Intro, Advancing LLMs in healthcare 07:10 Ambient note documentation in Medicine 10:52 Meta LLaMa are the good guys ? 14:40 GPT store 19:40 Overhyped Google Gemini model 26:17 AGI again 29:05 6 big predictions Open source vs Closed source models 38:55 AI in healthcare- LLM clinical trials , AI drug discovery 42:05 end References GPT store- https://openai.com/blog/introducing-the-gpt-store Hugging face predictions- https://twitter.com/ClementDelangue/status/1729158744762626310 AI drug discovery (blog post to paper) - https://news.mit.edu/2023/using-ai-mit-researchers-identify-antibiotic-candidates-1220 Google AMIE blog - https://blog.research.google/2024/01/amie-research-ai-system-for-diagnostic_12.html The podcast 🎙️ 🔊Spotify: https://open.spotify.com/show/3QO5Lr3w4Rd6lqwlfKDaB7?si=e7915d844994403e 📙Substack: https://aiforhealthcare.substack.com/ Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d

#11 The AI race to automate clinical coding

2023-12-1428:01

We have conversations between doctors and developers exploring the potential of AI in healthcare Josh is a training Neurologist in the NHS, and AI researcher in St Thomas' hospital and King's College Hospital. He is also a PhD student at King's College London. Zeljko is an AI researcher and PhD student at King's College London, as well as a CTO for a natural language processing company.

#10 The building blocks of AGI - Google's Gemini, OpenAI's Q*

2023-12-0728:37

In this episode, Dev and Doc sit down to discuss artificial general intelligence from the perspective of a neurologist and computer scientist. We dive into the current developments around AGI , the 2 controversial schools of thought, LLMs and neuroscience, and give hot takes about whether we will ever reach AGI. Dev and Doc is a Podcast where developers and doctors join forces to deep dive into AI in healthcare. Together, we can build models that matter. 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 00:00 start 01:05 intro 03:46 two camps of AGI - Yann Lecun vs Geoffrey Hinton, Architecture vs Data 07:47 Do emergent capabilities of LLMs pose a threat to humanity? 08:45 Intelligence and AGI - neuroscience and computer science approach 16:59 LLMs vs the human brain 24:16 Do AIs need a human touch? - Intrinsic personalities, temperaments, motivations, joy and reward The podcast 🎙️ 🔊Spotify: https://open.spotify.com/show/3QO5Lr3w4Rd6lqwlfKDaB7?si=e7915d844994403e 📙Substack: https://aiforhealthcare.substack.com/ 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d

#box-pro-ellipsis-175754946753134{-webkit-line-clamp:2;}Dev and Doc: AI For Healthcare Podcast