Digital Pathology Podcast

181:Can AI Read Clinical Text, Tissue, and Costs Better Than We Can?

Aleksandra Zuraw, DVM, PhD Episode 181

Send us a text

What happens when artificial intelligence moves beyond images and begins interpreting clinical notes, kidney biopsies, multimodal cancer data, and even healthcare costs?

In this episode, I open the year by exploring four recent studies that show how AI is expanding across the full spectrum of medical data. From Large Language Models (LLM) reading unstructured clinical text to computational pathology supporting rare kidney disease diagnosis, multimodal cancer prediction, and cost-effectiveness modeling in oncology, this session connects innovation with real-world clinical impact.

Across all discussions, one theme is clear: progress depends not just on performance, but on integration, validation, interpretability, and trust.

HIGHLIGHTS:

00:00–05:30 | Welcome & 2026 Outlook
New year reflections, global community check-in, and upcoming Digital Pathology Place initiatives.

05:30–16:00 | LLMs for Clinical Phenotyping
How GPT-4 and NLP automate phenotyping from free-text EHR notes in Crohn’s disease, reducing manual chart review while matching expert performance.

16:00–23:30 | AI Screening for Fabry Nephropathy
A computational pathology pipeline identifies foamy podocytes on renal biopsies and introduces a quantitative Zebra score to support nephropathologists.

23:30–29:30 | Is AI Cost-Effective in Oncology?
A Markov model evaluates AI-based response prediction in locally advanced rectal cancer, highlighting when AI delivers value—and when it does not.

29:30–38:30 | LLM-Guided Arbitration in Multimodal AI
A multi-expert deep learning framework uses large language models to resolve disagreement between AI models, improving transparency and robustness.

38:30–44:30 | Real-World AI & Cautionary Notes
Ambient clinical scribing in practice, AI hallucinated citations, and why guardrails remain essential.

KEY TAKEAWAYS

• LLMs can extract meaningful clinical phenotypes from narrative notes at scale
 • AI can support rare disease diagnosis without replacing expert judgment
 • Economic value matters as much as technical performance
 • Explainability and arbitration are becoming critical in multimodal AI systems
 • Human oversight remains central to responsible adoption

Resources & References

Support the show

Get the "Digital Pathology 101" FREE E-book and join us!

TRANSCRIPT:


00:00:03 - 00:00:39

Welcome   trailblazers. Happy 2026.   Welcome for the first Digi Path Digest   in 2026. This is number 34. Uh and I'm   proud of us. Let's do this. I'm going to   say hi in the chat because uh we have   some computer trouble.   Let's see if I can say hi.   And if you can see me and hear me, and I   see you joining already, let me know in   the chat.



00:00:34 - 00:01:06

 Just say hi in the chat and   say where you're tuning in from. What   time is it for you? And it's 6:00 a.m.   for me in Pennsylvania. Couple of   updates while I wait for the rest of   you. So, obviously, happy 2026.   Um the last thing that we did last year   and there was a recap on the podcast and   probably a video on YouTube channel as   well uh was the recap of 2025.



00:00:59 - 00:02:00

 That was   December in London. You may see me   walking through London and you will see   the double-decker buses in the   background as I recount what happened in   2025 in digital pathology. And I'm super   excited about 2026 and the new   developments. and and of course our   uh new papers right and ah thank you so   much greetings from London and I see   that um comments from LinkedIn are going   through so if you are there on YouTube   Facebook let me know as well so that we   can have a discussion and we can have a   conversation   what else happened since the beginning   of the year so I am very I know you've   heard that several times. H but I am   working on the new version of the book   and everybody who has the book and by   the book I mean digital pathology 101



00:01:58 - 00:02:28

  all you need to know to start and   continue your digital pathology journey   if you're new to the field. Uh there is   a free PDF on my website and I'm going   to give you a QR code right now.   Let me click the QR code.   You should see it on the bottom of the   screen now. Uh the QR code to get the   free PDF on the book.



00:02:24 - 00:02:55

 If you're   connected on LinkedIn, you may be   getting messages from me asking, "Hey,   did you get the book already?" Uh cuz   there is a new version coming. I'm   already on chapter three out of five.   So, it's actually happening. One and two   are updated. Um, and obviously it's   taking me longer than I wanted it to   take me, even though I'm leveraging as I   AI as hard as I can.



00:02:49 - 00:03:16

 Uh, because if I   wasn't, it would take me even more time.   So, um, when you get this book, it's   going to be the previous version, but   don't worry because everybody who has   signed up, who had, uh, subscribed for   the free PDF of whichever version is   going to automatically get new versions.   So, um, and I see you guys are scanning   the code. So, thank you so much.



00:03:11 - 00:03:42

 Um,   that's an update. Then, uh, I was   mentioning a couple of times something I   was planning to do, and actually I'm   committing to doing this, uh, on   Valentine's like as as a Valentine   um, special thing is um,   how to create uh, expert presentations   with AI. I have um gathered like   different tools.



00:03:38 - 00:04:07

 I have described my   process. So at this uh conference in   London organized by uh global engage   digital pathology and AI um I was   speaking as well and I used this process   for prepare to prepare my presentation   to prepare the topic slides and also to   practice it. I did practice. I like   dread practicing once my slides are   ready.



00:04:03 - 00:04:39

 I'm like, "Oh, let let's just be   done with it. Let's wing it." And I know   that winging it is okay.   And I'm just going to say hi to people.   Hi to India. It's 4:30 in India right   now.   And who else is joining? Verona. That's   so nice. We have some publications um   out of Italy today.   Anyone else joining right now?   Uh, oh, get them borg as well.



00:04:32 - 00:04:59

 Nothing   from Sweden. Maybe next time. Um, okay.   So, what I was saying, the AI   presentations, right? I have a process   and I actually practice and it went   better than without practicing. And in   every presentation book, you're going to   see like, oh, practice because then it's   going to be so much better.



00:04:52 - 00:05:21

 But I know   that we all are busy people um doctors,   researchers, digital pathology   trailblazers who when they are done with   the slides often like uh 3 hours if   we're lucky before we actually are   presenting then we're good to go. Um and   we have enough experience. But there is   a process that lets you practice without   actually investing too much time in it.



00:05:18 - 00:05:46

  So that's going to be out there around   Valentine's. If you're interested, uh,   let me know in the chat. AI   presentations. AI presentations. Um,   obviously if you have any other   questions, you let me know in the chat.   But we need to start. We need to start   discussing what happened in the digital   pathology space. Let me share my screen.



00:05:42 - 00:06:21

  Oh, actually, I'm already sharing, but I   blew myself up. Come on. Okay. Um, is   the code still visible? I don't see it   anymore.   I'll put it up later. Okay,   now I see it. Now I don't see it. Okay,   good. Because   we need to show the paper.   Apologies for like having trouble with   the computer.



00:06:14 - 00:07:12

 I'm getting a new one next   week, so we're going to be good again.   But I want to see myself as well   because I make faces. I want you to see   me making my faces. It's part of the   process. Stay with me. Chat in the chat   and I'll figure it out.   Okay, let's do it one more time.   Stop sharing.   Start sharing   share   and   please.   No. Okay.



00:07:09 - 00:07:44

  You're not going to see my faces today.   I need to make the paper big so you're   going to be seeing my voice and then I'm   going to And hello Scott Atlanta   welcoming   in the   morning right Atlanta 6 a.m. as well.   Okay,   let's start. Let's start uh automat   autom automating clinical phenotyping   using natural language processing.



00:07:37 - 00:08:21

 Let's   see if my pen works.   It works. Okay, good.   So, this one is super cool. I think it's   this one. You're going to see. But   basically um so clinical phenotyping   what is that even?   This is a group from Germany and New   York so international collaboration.   Um so often there are studies real world   studies based on electronic health   records h and they require manual chart   review.



00:08:17 - 00:09:19

 And when we see the word manual,   we know it's going to be slow and labor   intensive and it's not going to be   scalable, limited scalability, right? So   they developed and uh compared   computable   phenotyping   uh based on rules using the spa   framework. this spacey or spa sigh   framework uh is a python framework um to   like coordinate large language models   and um they had so they had this   framework and a large language model   GPT4 for subphenotyping of patients with   Crohn's disease uh considering age   diagnosis and disease behavior. So, we   have GPT4. They worked with GPT4. Uh,   when you go to chat GPT, there's already   chat GPT 5.2. Um, and you guys are so   sweet. I was having trouble and people



00:09:16 - 00:10:18

  are saying, "No rush. Take your time.   Thank you so much."   And um, and yes, this is the cool paper   because you can already see this little   heart here. And but okay. So, so they   had this framework um let's call it   spacey framework. I don't know if I'm   pronouncing this acronym correct. uh for   the LLM based approach they use chat GP   uh they use GPT4 model and they had data   that included almost 50,000 clinical   notes and 2,24   radiology reports from um 584   Cronhn's disease patients and then they   had this test set of 280 clinical texts   uh that was the and uh these texts were   labeled at sentence level in addition to   patient level ground truth data. Um and   then they evaluated the algorithms for



00:10:15 - 00:10:52

  recall precision specificity values F1.   So all the parameters that we have for   evaluating these models and the results   were similar or better performance using   GPT4 compared to the rules. uh on a note   level the F1 score is at least 0.9   for disease behavior and 0.82 for age at   diagnosis patient level this is a little   lower 0.6 six for disease behavior, 0.



00:10:44 - 00:11:22

71   for age diagnosis and the conclusion   that they have here, this is the first   study to explore computable phenotyping   algorithm based on clinical narrative   text. uh where prior interan annotator   agreements range from 0.54   to uh 098 and we know that um all these   interrator interhuman   um comparison they always range from   like 0.



00:11:18 - 00:11:53

54   not too high sometimes they agree more   but in the pathology world I always say   hey if it's 0.7 then we're so in   agreement and then there's still 30%   that were not in agreement. So um this   underlines the potential of LLMs for   computable phenotyping and may support   largecale cohort analysis from   electronic health records and   streamlined chart review process in the   future.



00:11:46 - 00:12:23

 And now the heart is for plain   language summary and trailblazers.   Let me show myself now. So plain   language summary. How cool is that? So,   they included it here and I'm going to   read it to you. I'm going to I'm going   to show it to you in a second. But   basically,   what I I read this one and I went there   are two other abstracts that are a   little bit more complex and I'm like,   hm, how about I ask my LLM and I use   chat, I use GPD, I use Claude.



00:12:17 - 00:12:54

 Uh these   are domain sometimes. I ask perplexity,   which is like a LLM powered search   engine. But basically I did a screenshot   of this and I'm like can you explain it   in plain language and it did. So uh if   you see an abstract that you would like   to get explained in plain language or   you need to explain to somebody else in   plain language screenshot it and put it   in chat GPD   and we have people from Hamburg at   around 12. Yeah 123 probably now. Right.



00:12:52 - 00:13:24

  Perfect. So let's do the plain language   summary.   It's beautiful.   Doctors and researchers often need to   group patients by specific medical   features. And this these are the   phenotype phenotypes. And this is the   phenotyping basically grouping patients   by specific medical features. And much   of this information is free text   clinical notes rather than tabular or   structured data.



00:13:19 - 00:13:49

 And I'm going to give   you a comment on um free free text plus   structured data efficiency. So uh here   they had the patients with Crohn's   disease and uh the free text notes can   include important details describing the   disease course over time such as bowel   narrowing structures abnormal openings   fistulas problems with the area around   the anus and agent diagnosis.



00:13:43 - 00:14:29

 So how   cool is that? They explain what the   structure is. They explain what a fistla   is. Um, and this plain language summary,   it's like um   kind of smartly written because it   explains   without uh like dumbing it down. Um, and   it aligns with a principle that I once   learned called   never I want to show myself again and I   can't   now I can.



00:14:22 - 00:15:24

 So never underestimate your   audience intelligence but never   overestimate their prior knowledge.   Right? So if you're listening to this,   you're a highly intelligent uh   healthcare professional or somebody   involved in healthcare in one way or   another. Um so by default you're a super   intelligent person, but you may not have   the prior knowledge, right? So that's my   goal uh to explain things in a way that   everybody understands them and same   counts for me, right? I do need to go   back and check the computer vision   terms. I'm a pathologist. My expertise   and main domain is pathology but I do   talk about uh things that uh are from   computer vision from AI from you know   whatever is happening in this space. So   I do need to um update my knowledge base   as way as well on a regular basis. So   anyway hard for plain language



00:15:20 - 00:15:51

  explanation. I love that. Um   so yes because when we only use   structured data we often miss these   details like uh whatever happened there   whatever was written in the text uh and   reading notes by hand is very slow and   costly. Um so we use NLPs uh natural   lang NLP natural language processing and   LLM's large language models.



00:15:46 - 00:16:09

 Um, and   they built two NLP approaches and   created a new sentence level data sets   to test them. Uh, and they say this   method could save time in research and   help clinicians flag people who may need   extra care. I love this plain language   summary. I don't know if that was a   requirement of the journal.



00:16:05 - 00:16:41

 What is the   journal? Let's check it.   Um,   this is   communication and medicine London. Um, I   don't know. I love it. I like it.   Sometimes they also ask for visual   abstracts. That's cool. That That's cool   as well. But the plain language summary   is the highlight of today. Okay.   Let me know what you think   for using LLMs.



00:16:36 - 00:17:09

 Oh, I need to tell you a   story. So, uh, using LLMs for clinical   notes. That brings me to my latest   visit. Um,   let me maximize myself for a second.   Sorry for like commenting on my clicking   actions. Uh, so   related to that paper, uh, using LLMs or   using AI in, uh, healthcare. So, I live   in Fairfield, Pennsylvania.



00:17:02 - 00:18:02

 Fairfield is   not a big town. It's I think it has 500   uh inhabitants. So you can imagine what   kind of infrastructure we have in a town   for 500 inhabitants. And then I go to   the doctor for just a regular checkup. I   got a spot like I think on the 31st of   December, so last day of the year. Uh   and I go and and just a regular checkup,   right? because I needed to um I needed a   refill for my prescription and they   wouldn't give me the refill because I   didn't have the checkup for too long,   right? So, I go and I learn that our   little um practice well, it's it's a   part of a chain, but that they use   ambient scribe, which is this AI   transcription thing where you can   actually you as a healthcare provider   can have a natural conversation with the   patient. You don't have to like type and



00:18:00 - 00:18:27

  look at your screen and then pretend   that you're actually paying attention to   like the face expressions of your   patient which you are not because you're   typing what they're saying. So now they   use this uh transcription service   ambient scribe and I had a conversation   with uh my provider and it filled the   clinical notes everything without her   having to type it.



00:18:22 - 00:19:00

 So that was super   cool. I felt proud that uh Fairfield is   so advanced in terms of uh AI. Uh one   thing that I would improve is to   actually like inform people that this is   happening. Uh   did I ask   about it? I think I asked about it but   but you know I'm obviously interested in   the topic. So we had a lengthy   conversation that was probably longer   than my checkup itself uh about AI and   that was cool.



00:18:51 - 00:19:38

 So AI in healthcare is   happening. LLM for making the bringing   back the patient provider relationship   and contact are happening as well.   And now let's go back to the papers.   Zebra bodies recognition by artificial   intelligence. combin a computational   tool for fabri nephropathy.   So   this is a so   greetings to Italy.



00:19:32 - 00:20:17

 This is a group from   Italy, different universities in Italy.   And fabri disease is a rare lizosomal   storage disorder caused by mutations in   the uh gla gene gla gene. H and there is   an accumulation of lizooal substance.   It's called globo triioil ceramid.   Accumulation of globot triioil ceramid.   Yeah, it has a very characteristic   appearance on electron microscopy   which I'm not going to show you right   now cuz my computer is going to freak   out and then we're not going to be able   to um see anything.



00:20:12 - 00:21:01

 But you know what?   I'm just going to draw it electro   microscopy, right? But that's like an   advanced imaging technique. So So they   like these lizosomes look like zebra   like they they have stripes. That's why   they're called zebra bodies. But here we   are talking,   let me remove my thing.   Um so the diagnosis uh especially in   females is more difficult and um the we   need a renal biopsy to uh to   to assess it, right? that this remains   essential and obviously interpretation   requires expert pathologists.



00:20:55 - 00:21:23

 So here we   did not use electron microscopy which is   like pretty straightforward because I   thought oh zebra bodies with AI on EM   images that's going to be a no-brainer.   No, no, no. We used whole slide images   from renal biopsies   uh of fabric nephropathy patients to   develop and validate foamy podite   screening AI tool.



00:21:21 - 00:22:04

  Fomytoytes   are a lot less characteristic and you do   need expert pathologists uh to recognize   them than these zebra bodies in electron   microscopy.   So   they are basically like foamy cells. Um   and   but they developed this AI tool that   first it classifies glomeili and then   segments uh podocytes and and they   evaluated the performance using standard   metrics.



00:21:58 - 00:22:33

 Um and they designed a new   zebra score uh to quantify disease   burden burden and correlation with   hisystological scores and clinical   parameters. Um and they had a   classification accuracy of 79% in   identifying foamy picosytes. So   this is a new AI screening tool that is   supposed to support the pathologist. So   um let's highlight that um here.



00:22:27 - 00:23:28

 The AI   sorry   wrong highlighter the AI assisted zebra   pipeline highlights high-risk fibbrin   neopropathy features to support   nefropathologist as a screening tool.   Right? So this is important because it's   pathologist support. Um and you know if   you've been   here or at any digital pathology   conference um more than once   uh you you will still hear the questions   hey is it going to take away uh the jobs   this is like part of a conversation   every time there is a new uh AI tool is   it okay? are we going to be uh less   proficient in in what we know like it's   a technology discussion right so here   important thing it's a it's a screening



00:23:25 - 00:24:10

  tool that helps the pathologist and it   recognizes these foamy podocytes   okay   any questions any comments let me know   in the comments I see our comments we   have Two more to go.   Let's see. Oh, yeah. This one is   interested as well. Interesting as well.   Let me make it bigger for you because   it's about Oh, can we save money with   AI?   And um there is no discussion about   healthcare without discussion about cost   effectiveness or money in healthcare.



00:24:07 - 00:24:32

  Um so here we have cost effectiveness   analysis of uh artificial intelligence   for response prediction of neoaduvent   radio chemotherapy in locally advanced   rectal cancer lark locally advanced   rectal cancer in the Netherlands. This   one is from the Netherlands and also we   have authors from Poland.



00:24:26 - 00:25:05

 So um   greetings to Belgium, the Netherlands,   Poland and Italy again. Italy is very   represented. So if there is anybody from   Italy, let's say hi in the comments.   Okay. So um this study aims to provide   insights into the potential cost   effectiveness of AI tool in the response   prediction to neoaduvent chemotherapy of   stage 23 uh the rectal cancer and this   is the comparison to usual care.



00:24:59 - 00:26:00

 So this   is a hypothetical study, right? Uh it's   not like it's a model. It's modeling,   right? So this study included a state   transition mark of model from a Dutch   societal perspective. Um quality   adjusted life years and costs were   simulated over a 10year horizon. And you   know the important uh word here is that   they were simulated, right? We're we're   hypothesizing about it, right? Um   and then uh sensitivity analysis and a   threshold analysis were performed and   we're going to see in a second what that   means. Um and the results like in the   best case scenario when everything works   well AI is uh performing well the there   was an incremental cost saving of uh 2.5



00:25:56 - 00:26:36

  million euro per uh quality adjusted   life years gained per thousand patients   right so this is thousand patients and   2.5 million euro cost savings Um main   drivers of co cost effectiveness were   the clinical complete response incidents   and specificity of the tool uh and cost   effectiveness was maintained if the cost   of AI was 1,100 euro and €2,100 euro.



00:26:27 - 00:27:17

 So   they simulated, okay,   excuse me. What if this like deploying   this tool uh would cost over a€,000   euro? It was still cost effective to use   it over 2,000. It was still cost   effective h and uh performance at uh   0.85 and uh 0.90 which is very like good   performance. Right. So then the question   arises okay if the deployment uh costs   more then at some point you lose the   cost effectiveness if the performance   goes down then uh obviously this is not   a viable case anymore.



00:27:11 - 00:27:59

 Uh so they say   findings of this study present the   economic impact of a hypothetical   hypothetical AI based approach to   treatment and uh treatment uh treatment   response prediction uh in stage 23   uh locally advanced rectal cancer   patients who received neoaduvent   um chemotherapy   uh chemotherapy sorry neoaduventant   chemo radiotherapy and are eligible for   consecutive surgery and the results of   the study highlight the complexity of   healthcare decision- making.



00:27:53 - 00:28:25

 I think,   you know, this is kind of no-brainer,   but I think in general, it's so much   easier to think in black and white and   so much easier to, you know, make   decisions when you only have like good   and bad option, but there's always way   there's often   good and better or bad and worse and   things like that.



00:28:21 - 00:28:52

 So uh there are a lot   of nuances specifically to healthcare   and you know these models help and if   we're lucky and we have a great case   scenario where everything is performing   over 0.85   whichever metrics they chose then we   have a fantastic tool to save money. Um   and but if not then we don't but great   exercise I think. Okay let me know.



00:28:45 - 00:29:46

 Oh,   and leads UK is saying hi as well.   Hello.   And we have some comments that AI will   increase productivity and efficiency in   automation. Um, and this has already   impacted clinical labs. I think so. I   think there are a lot of um   how do I call them? I don't call them I   do call them lowhanging fruit fruits uh   kind of AI tool to tools and I'm   referring to workflow uh improvement   tools workflow redesign tools um maybe   we're going to get some papers next week   or or soon when um aentic AI approaches   are going to uh get published. So   basically where you're trying not to   replace or help the diagnostician in the



00:29:43 - 00:30:14

  diagnosis but help them in their   workflow where they like don't have to   click millions of windows and uh lose   time and patients and things like that.   So definitely AI entering these areas of   clinical labs is super valuable without   threatening the expertise of the   physician uh doctor healthcare provider.



00:30:10 - 00:30:55

  So let me make this one big.   This is our last one but stay till the   end.   I'm going to give the QR code for the   book again so that you can scan it if   you have not yet.   Okay. And that's an interesting one as   well. A multi-expert   multi-expert deep learning framework   with LLM guided arbitration   for multimodel histopathology   prediction.



00:30:45 - 00:31:32

 Um, so   and this is Cincinnati, Ohio, USA.   So we all know that   we have advanced a lot. The deep   learning like changed the landscape of   computer vision of u   computer   aided healthcare. Let's call it like   super broad computer aided healthcare.   deep learning. Um, one of my podcast   guests, Andrew Janoik, called it uh as   if we were given fire like the invention   of fire, deep learning in the broadest   uh broadest sense.



00:31:26 - 00:31:56

 And now transformers   and uh transformer-based architectures   like made this fire even bigger. So um   this helped with improving the accuracy   of computational pathology. But um   there are a lot of models being created   and conventional model and sambling   strategies often lack adaptability and   interpretability.



00:31:52 - 00:32:25

 What does that mean?   These assembling strategies. So let's   say   you are uh developing multiple models   and one model says one thing and the   other model says a different thing. Uh   and then you have several more and then   you um let them vote or or like you   ensemble you like uh you collect all the   results and maybe you do um and they   mention what this is like majority   voting or aggregating this in a   classical way.



00:32:26 - 00:32:54

  So um   because these these models multiple AI   models can provide complement uh while   they can provide complimentary   perspectives um sorry the aggregating of   their outputs is often insufficient for   handling intermodel disagreement. So not   only pathologists disagree the models   disagree as well.



00:32:50 - 00:33:20

 Uh so what do we do   with that? To address these challenges,   they propose the authors proposed a   multi-expert framework that uh   integrates diverse vision-based   predictions and a clinical featurebased   model with a large language model acting   as then intelligent arbitrator. So we   have like a third party. We have these   models that go and assess something.



00:33:16 - 00:34:01

  They have two data sets. And then we   have the large language model that kind   of looks at the reasoning of these   models and picks   the decision in a more intelligent   transparent way than just like some   mathematical operations or um whatever   was done so far. So um they leverage the   con contextual reasoning and um   explanation capabilities of LLMs uh and   their architecture dynamically   synthesizes insights from both imaging   and clinical data resolving model   conflicts. This is cool.



00:33:54 - 00:34:56

 H and then it   provides transparent rational decisions.   Uh and what they used were two cancer   hystopathology data sets. Uh there was   one   HMU GCH E30K and this is for gastric   cancer. Uh that only had pathology   images. So this is important because um   that got me confused because they have   these two data sets and one of them only   has pathology images whereas the second   data set uh BCNB which is breast cancer   biopsy data set is truly multimodel   contains pathology imaging and clinical   information. So their proposed   multi-expert LLM uh arbitrated framework   and they call it Melma Melma   multi-expert LLM arbitration right Melma   H outperforms convolutional neural



00:34:54 - 00:35:33

  networks and transformers which are   currently the facto um and state-of-art   classification ensemble models and um   their method has better overall results   and they tested different LLMs as   arbitrators. So they took Lama, GPT   variants and Mistral   and they pro uh their proposed framework   outperforms strong single agent CNN um   vision transformer baselines on the data   sets.



00:35:27 - 00:35:58

 uh and   they show that um and ablations show   that learner per agent trust materially   improves the arbitrator's decisions   without altering prompt or data. Um   and LLM guided arbitration consistently   provides more robust and explainable   performance that individual models   conventional and sambling with majority   vote.



00:35:54 - 00:36:30

 So here are these different   conventional ensembling methods the   majority vote uniform average and   metalarners. So their LLM uh arbit   arbittor arbitration outperforms these   and um so so we have an an option they a   promise an option that LLMdriven   arbitration for building transparent and   extensible AI systems in digital   pathology.



00:36:24 - 00:37:25

 Uh so for this first database   uh of gastric cancer I was like okay so   what what was it arbitrating if there   was just uh image if there were just   images? Uh so what it was looking at uh   for these different models that um that   it was arbitrating between. So there   were different computer vision models uh   that that were giving some answers and   what what it was looking at was okay how   often was um each of these models right   uh how consistent it was what was the um   like confidence in which this particular   model predicted and not just uh the   majority voting or whatever the   approaches were. So it can even like go   into unimodel models and figure out okay   which reasoning of which model was uh



00:37:21 - 00:38:22

  better more times and uh propose a kind   of arbitration strategy which I thought   was pretty cool because   um we are limited in doing that and by   we I mean human observers. So you need   to do it like   how did I learn that by taking part in a   project where and these models have a   lot of parameters right and we were only   trying to um do combinations of um   couple of parameters and couple of   performance metrics and very quickly I   was like is there a software to do it   because I am not able to visually decide   like they're kind of good enough but I   am not looking at every cell I'm not   looking at like every field of view. I'm   then looking at the metrics. How do I   know uh which one is better? Uh and now   we have an LLM powered framework that   can help us arbiter and provide some



00:38:20 - 00:39:21

  kind of justification and then um a   human expert can go and figure out okay   is this justification   does it make sense or not? Uh and with   that I want to add something to okay   take an abstract and uh ask an LLM to   explain it to you in plain language. Um   ask like clarifying questions because   that was one of my clarifying questions   like oh so how come how is this model   arbitrating multimodally on a unimodel   uh data set? So so that's what it   explained to me but not in the first   place. So if you have any um questions,   anything is confused and sometimes the   LLM is going to tell you, oh no, most of   the time it tells me, oh great catch.   Oh, and there was another thing I needed   to tell you about. So obviously I have   like some pop-up notifications on my   phone. Let's see if I can find this one   cuz when I saw this, I'm like, I need to   tell it to the trailblazers because we



00:39:19 - 00:39:54

  were talking about it. uh we were   talking about it where when I had this   problem I think I told you the story   that I was submitting um like part of a   publication and uh like I read through   the through my part it was really solid   I knew it was based on my presentation   and then I like quickly used AI for   references   so   h I don't feel that bad anymore I felt   very bad when so nobody really caught it   but they gave me.



00:39:49 - 00:40:17

 Maybe they caught it,   but uh maybe they like wanted to give me   the benefit of the doubt because they   told me, "Oh, can you format the   references differently and I went in and   formatted them differently?" And in the   process of formatting them differently,   I realized that they were fake. So, it   wasn't only me.



00:40:08 - 00:40:50

 Let me give you uh   January 2026. So this um this month   um an analysis by the AI detection   startup GPT0 revealed that more than 100   AI hallucinated citations were included   in research p papers accepted for the   neural IPS 2025.   Um so without you know going into the   details I just screenshot screenshot   this yesterday because I thought I have   to tell it to the train blazers.



00:40:42 - 00:41:28

 So yeah   over 100 AI hallucinated citations. So   for us this information says yes you can   leverage AI for citations but you need   to scrutinize it because it hallucinates   uh and I think it's getting so at least   from since I wrote the book   it was first edition 2023 second edition   is going to be 2026 when I wrote it it   would hallucinate in this like normal te   text output a lot more than uh right now   right now is I feel it's uh it leveled   up.



00:41:21 - 00:41:48

 Uh but in these like very specific   tasks um like like the reference   finding, reference generation or or   stuff like that, it still hallucinates.   If you find a tool that is actually   optimized for that, let me know because   I was trying to find one couple of   months ago last last year. Obviously   last year is a couple of months ago   because we're in January.



00:41:44 - 00:42:19

 Um but I   didn't find anything specific. There are   different ones for um writing different   things. If you find anything, let me   know.   And   thank you so much. Oh, and I have one   more comment. Let's do this comment.   Um   by nowadays there's a lot of fog in how   AI can be implemented in order to help   or boost effectively on pathology   workflows.



00:42:14 - 00:42:50

 I think improving the   efficiency of workflow is key. Yes. And   something I um a cool presentation I   listened to at the DPAI conference in   London uh by Orly Ardon was about QC   quality control. Uh because this quality   control step I mean it is an integral   part of producing slides but uh then   there was this additional QC step of uh   quality control of scans.



00:42:43 - 00:43:12

 So not only in   digital pathology unlike radiology maybe   it's going to change with the direct to   digital imaging technologies but not   only we still do the analog glass slide   we then scan it and then on top of that   we quality control the scan even though   the glass was already um Q seed but   because the different artifacts and   different things are important in   scanning we need to do that.



00:43:08 - 00:43:48

 So she was   talking about uh implementing a software   for that. uh and I thought hey yes this   is something that is high leverage low   risk and and using AI in these   situations um helps obviously with   workflow with uh probably cost savings   or even um maybe generating of revenue   but what it helps with most and that was   part of uh my conversation with my   healthcare provider when I went to the   doctor and uh we were talking AI was   like, hey, people are afraid.



00:43:46 - 00:44:22

  And I think we uh talked about a paper   where people uh where there was a   mention of people when there was a   mention of AI being used in a clinical   trial, they would withdraw from the   clinical trial. I'm like, wow, that is   not good. So anyway, there was this   discussion about uh making it   understandable uh introducing it uh to   the patients, introducing this also to   healthcare providers to be comfortable   with the tools and um implementing guard   rails. This is an ongoing discussion.



00:44:19 - 00:44:45

  You're going to be hearing about it. Uh   h but for today, let's grab a coffee,   start our working day. Thank you so much   for joining me. Uh, next week I'm going   to have a new computer. Uh, and   no, I don't even know how to like switch   it off. Oh, no. Well, reading takes you   a long way.



00:44:40 - 00:44:54

 It has a button that says endstream. Thank you so much. Uh, thank   you so much for trailblazing this trail   with me. Leave me comments if you're uh   looking at if you're listening to the recording.