Digital Pathology Podcast

192: AI Detects Hidden Lymph Node Metastases

Subscriber Episode Aleksandra Zuraw, DVM, PhD Episode 192

This episode is only available to subscribers.

Digital Pathology Podcast +

AI-powered summaries of the newest digital pathology and AI in healthcare papers

Send us Fan Mail

Paper Discussed in this Episode:

Region-Based Segmentation of Lymph Node Metastases in Whole-Slide Images of Colorectal Cancer: A Pilot Clinical Study. Fayzullin A, Savelov N, Balkivskiy A, et al. Cancer Medicine 2026.

Episode Summary: In this deep dive, we strip away the marketing gloss of AI as a mere time-saving tool and look at its true value in the lab: saving lives through relentless vigilance. We examine a 2026 study on colorectal cancer that deploys a two-stage AI pipeline to hunt down microscopic lymph node metastases. By highlighting "Specimen 8"—a speck of cancer hidden within a busy, benign background—we explore why the real return on investment for AI in digital pathology isn't about speeding up the human, but acting as an automated safety net that catches what the human eye naturally misses.

In This Episode, We Cover:

The 12-Node Burden: The grueling clinical reality of staging colorectal cancer, where pathologists must manually scan at least 12 regional lymph nodes for microscopic tumor cells—a perfect storm for change blindness and visual fatigue.

The Mimics of Pathology: Why finding metastases isn't just looking for a "needle in a haystack," but fighting visual mimics like sinus histiocytosis that effortlessly camouflage tiny, poorly differentiated cancer cells.

The Two-Stage AI Pipeline ("The Scout" and "The Artist"):The Scout (GoogLeNet): A lightweight classification model that acts as a binary filter, achieving a staggering 100% recall by scanning image tiles and successfully filtering out confusing artifacts like tissue folds. ◦ The Artist (DeepLabV3+): A heavy-duty semantic segmentation model that draws precise boundaries around viable tumor cells while intelligently ignoring necrosis and lakes of mucin.

The Hardware Validation Test: How the researchers proved their AI's robustness by testing it across different hardware (Hamamatsu and Leica scanners) to avoid the "silent killer" of AI projects: domain shift from scanner variability.

The "Specimen 8" Revelation: A breakdown of the crucial moment the AI caught a 0.14 mm by 0.06 mm metastasis hiding in a benign pattern. The AI didn't save the pathologists time here—it actually slowed them down to verify—but it prevented a catastrophic misdiagnosis.

The Return on Investment (ROI) Myth: Why hospital administrators need to stop looking at AI strictly for turnaround time speed. The study proved overall time savings were essentially negligible (1-3 seconds per case), but the quality assurance and patient safety derived from catching missed cancers were priceless.

Key Takeaway: The true value of AI in pathology isn't in racing the clock; it's in absolute vigilance. By successfully highlighting microscopic metastatic mimics that cause human false-negatives, AI proves its worth not as a turbo-button for the lab, but as a tireless quality assurance partner that ensures accurate cancer staging and optimal patient outcomes.

Get the "Digital Pathology 101" FREE E-book and join us!

Um hello and welcome back trailblazers. You're tuning in to another session of the digital pathology podcast.


Yeah, great to be here.


Today we're uh we're kind of stripping away the marketing gloss and getting our hands dirty with the real science.


We're doing a deep dive into some really fascinating source material.


I am honestly relieved we're taking this journal club approach today,


right? Because we aren't just here to skim abstracts. We want to dissect the methodology, challenge the findings, and figure out what it actually means for your lab workflow.


Exactly. And the paper we're reviewing uh it was published in cancer medicine by John Wy and Sons just this year 2026.


Yeah. It's dense, it's technical and it kind of contradicts a lot of the common wisdom we hear at conferences.


It really does.


So to give you the exact title, it's region-based segmentation of lymph node metastases in whole slide images of colorctal cancer. A pilot clinical study.


Quite a mouthful.


It is. The lead authors are Alexi Fasulan and Nikita Savalov repres. ing a collaboration between the Institute for Regenerative Medicine at Second University, the Moscow City Oncology Hospital number 62 and Medical Neuronets.


And you know, right off the bat, we need to address the elephant in the room with this topic.


The AI expectations.


Yes. If you walk into any pathology department right now or any hospital boardroom and you mention AI integration, what's the immediate assumption? What's the KPI everyone is obsessing over?


Oh, it's soup, speed, turnaround time. It's always Uh, how can we get these slides off the pathologist's desk faster?


That's the entire narrative. But this study presents a really fascinating counterargument to that. It suggests that if we optimize purely for speed, we might actually be optimizing for failure.


Because this research isn't about racing the clock. It's about finding the invisible killer.


Right? We're talking about metastases so small, literally.14 millime, that they are effectively just dust on a slide.


Which brings us to the clinical stakes of this deep dive. We're focusing on colurectal cancer


which is incredibly common.


Third most common cancer globally. Yeah. Almost 2 million new cases in 2020 alone. So this isn't some rare edge case. This is the breadandbut of many labs.


And for the healthcare professionals listening, you know the burden of the 12 node rule,


the standard set by the Royal College of Pathologists.


Exactly. You cannot strictly stage a patient without excising and examining at least 12 regional lymph nodes.


Right. But Let's be honest about what that workflow actually looks like physically and mentally.


It's exhausting.


Yeah. You have a tray of slides. Most of them are likely negative.


Mhm.


You were scanning high resolution tissue searching for something that might not even be there.


It's the definition of needle in a haystack work.


Well, I'd say it's actually a bit more complex than a needle in a haystack. How so?


Because in a haystack, the needle looks totally different from the hay. But in hystopathology, you're dealing with mimics.


Ah, right.


You have sinus hysteioytosis, follicular hypercclasia. These are benign conditions that under the microscope can look alarmingly similar to metastatic deposits,


especially if the tumor differentiation is poor.


Correct. If you have micrometastasis, solitary tumor cells or small clusters that lack clear glandular formation, they just blend right into the background of lymphosytes and hysteocytes.


And if you miss one, I mean, if you call a node negative when it has a tiny 0 2 mm deposit,


you've understaged the patient. It's that simple,


right? And then they might miss out on edge of event chemotherapy


and their prognosis just plummets.


So the margin for error is effectively zero. Now the authors approached this with a very specific technical architecture. They didn't just throw a standard ResNet at the problem and hope for the best.


No, they built a two-stage classification and segmentation pipeline


which is crucial.


And I want to unpack this architecture because I think it speaks to the specific challenges of whole slide imaging. WSI is massive. You can't just feed a gigapixel image into a neural network. No, the computational cost would be astronomical. It would melt your GPU.


So, they adopted a strategy that kind of mirrors how a human approaches a slide, but with a twist. They separated the tasks. Stage one is what we'll call the scout, and stage two is the artist.


Let's look at the scout first. They use the Google Net architecture here.


Why that specific choice?


Well, Google Net is relatively lightweight and fast. Its job isn't to draw perfect boundaries. Its job is just classification,


right?


They chop the whole slide images into tiles. 1024x24 pixels. The scout scans these tiles and acts as a binary filter. Is there something suspicious here or is there nothing to see?


But this is where digital pathology gets messy. It's not just pure tissue on the slide. You have artifacts,


lots of them.


You have tissue folds from the micro. You have staining inconsistencies, maybe a bubble under the cover slip. How did the scout handle all that noise?


That's a critical point in the paper. In their training data, they explicitly had to teach the model to recognize artifacts as negative.


Which makes sense


because a tissue fold often creates this dark high contrast line that a naive AI might interpret as a dense cellular cluster


or a nucleus.


Exactly. By training the scout to ignore these folds and staining issues, they drastically reduce the noise before it ever reached the second stage.


Okay. So, the scout filters the tiles. If a tile is flagged as suspicious, it gets passed to the artist stage two.


And this is where they deployed deep v3 plus. Now deep IB3+ is a much heavier model. Correct.


Much heavier but far more capable for semantic segmentation.


Why that?


It uses acris convolution which allows it to capture multiscale context. Basically it can see the fine details of a cell while also understanding the broader tissue architecture around it


which is essential for drawing the mass


right the precise outline of the tumor especially those tiny isolated tumor fauxhigh.


And this is where the specificity of the design really shines for me. The scout finds the general area so the heavy duty deep lab model doesn't have to waste resources analyzing empty fat or normal lymphoid tissue.


Exactly. It only focuses on the high probability targets. It's an efficiency play, but it's also a performance play. By narrowing the search space, you reduce the chance of false positives generated by the segmentation model.


Let's talk about the data they used to train this system. 108 slides from the Moscow City Oncology Hospital number 62. That feels like a decent number, but in the world of deep learning, Isn't that a little on the small side?


It is on the smaller side for a foundational model, but the validation strategy is what gives this paper weight.


The hardware validation.


Yes, they didn't just test it on the same scanner they trained on. They utilized both a Hamasu NanoR S360 and a Leica Appirio AT2.


I am so glad you brought that up because scanner variability is the silent killer of AI projects.


It really is.


A model trained on a hamatu might look at a Leica image which has a slightly different color temperature and contrast profile and just completely fail.


Exactly. It's called domain shift. By validating across different hardware, they demonstrated that their pipeline is robust to those variations.


That proves the model is learning the actual morphology of the cancer,


not just the specific color profile of a single scanner.


Yes. So, we have the pipeline, we have the validation set. Let's look at the actual metrics. And there is one number in the results section that jumps right off the page. The recall for the classification model is scout. was 1.0.


1.0. That's 100%.


Okay. As a skeptic, when I see 100% in a medical study, my alarm bells start ringing. It usually implies overfitting or maybe a simplified validation set.


I had the exact same reaction. A perfect recall is statistically suspicious. It means the model did not miss a single metastasis in the validation cohort. Not one.


Wow.


However, when you look at the breakdown, you see where the trade-off happened.


Uh the specificity,


right? The specific icity was 935.


So that means roughly 6 12% of normal healthy lymph nodes were flagged as suspicious by the scout.


Precisely. And in a high throughput lab, some people might argue that's a nuisance. You know, why am I looking at this? It's just a reactive node.


But step back and look at the alternative. Would you rather review six false alarms or miss the one slide that actually has a metastasis?


In oncology, we always bias towards sensitivity. We can tolerate false positives, but we absolutely cannot tolerate false negatives.


That is the golden rule. The AI is designed to be a high sensitivity safety net. It is essentially saying, "I'm not entirely sure about this human, so you better take a look."


And when we look at the segmentation performance, the artist, the dice coefficient was 818,


which is very solid.


Just a quick note for our trailblazers, a dice score basically measures the overlap between the AI's drawing and the expert pathologist's drawing. Anything over 8 is generally considered good for this type of clinical task.


It's a strong result. And visual If you look at the figures in the paper, the segmentation is really intelligent. For example, it excluded lakes of mucin,


which is notoriously difficult for AI. Mucin is so unstructured, it can just look like debris or an artifact.


Exactly. And it also ignored necrotic detritis dead tissue. Simple thresholding algorithms often get confused by the texture of necrosis.


But this model successfully ignored the garbage.


Yes. And only outlined the viable tumor cells. That really speaks to the quality of the deep Lab V3 plus architecture


but and this is a huge but technical validation is not clinical validation. We've seen plenty of algorithms that get amazing dice scores but fail miserably when you put them in an actual workflow.


That is the classic implementation gap.


So Fesulan's team tried to bridge that gap. They integrated this pipeline into the Axon medical information system and ran a pilot clinical study with two expert pathologists.


Right?


They timed them reviewing cases with and with without the AI assistance, those semi-transparent masks generated by the artist.


And this is where the narrative really twists. If you read the marketing brochures for digital pathology AI, they all promise one thing, return on investment through time savings,


right? Review cases 30% faster.


Exactly. So they timed the experts and the result there was no statistically significant difference in time.


Let's actually look at the numbers because they are almost comical in their insignificance. Expert one saved about 1 second per case going from 15 seconds to 14.


1 second.


And expert 2 saved roughly 3 seconds. 25 seconds down to 22.


And when they looked at the hard cases, the ones with very small metastases, the savings were around 2 and 1/2 seconds. You cannot run a business case on 2 and 1/2 seconds.


If I am a hospital administrator looking at this data, I'm thinking, why am I paying for this software? It doesn't speed up the queue. It adds complexity, but doesn't improve my throughput.


If you look at it purely through the lens of a efficiency, the study was a failure.


Right?


But that interpretation misses the single most important finding of the entire paper. The finding that actually justifies the entire system.


You were talking about specimen 8.


Specimen 8. We really need to spend some time here because this is the absolute core of their argument.


Set the scene for us. What exactly was specimen 8?


It was a lymph node slide. To the human eye, even an expert eye, it appeared to be a straightforward case of sinus hysteocytosis


which we touched on earlier,


right? It's a benign condition where the sinuses of the lymph node are expanded by hysteocytes. It looks busy. It looks highly cellular, but it is fundamentally benign.


So, the pathologist scans it at 10x or 20x, sees that familiar pattern of hysteocytosis, and their brain just says, "Okay, this is negative." Moving on to the next slide.


Exactly. That's standard pattern recognition. But the AI, the scout, and the artist working together flagged a specific region, a tiny tiny region.


How tiny are we talking? 14 mm by 006 mm.


That is effectively a speck of dust.


It is microscopic and crucially the morphology was incredibly deceptive. These cells had weak glandular features.


Meaning


meaning they didn't form those nice round distinct glands that scream adnocarcinoma to a pathologist. They were poorly differentiated clustering in a way that just mimicked the surrounding hysteocytes.


So you have a visual mimic hiding inside a naturally busy benign pattern that is smaller than a fraction of a millimeter.


It is the perfect storm for a false negative. But the AI flagged it. The mask drew a semi-transparent circle right around it. Wow. And the pathologist zoomed in to high power, probably 40x or higher, and realized, my god, that actually is a metastasis.


And they admitted in the discussion of the paper that without the AI's mask guiding their eye to that exact coordinate, they likely would have missed it completely.


They explicitly stated these are easy to miss. And that moment right there is exactly why the time savings graph was flat.


Oh, because verifying that unexpected finding took time.


Exactly. When the AI flagged that tiny spot, the pathologist didn't just glance at it and move on. They had to stop. They had to zoom all the way in. They had to switch objectives.


They had to really think about it.


Yes. They had to cognitively process. Is this hysteocytosis or is this cancer? That evaluation takes 30 seconds, maybe a full minute.


So the AI actually slowed them down on that slide.


Yes, it slowed them down to prevent a catastrophe. The efficiency was technically negative, but the quality of the read was infinite. You just cannot put a price on preventing a stage three cancer from being mclassified as stage 1 or two.


This completely reframes the whole ROI conversation in our field. We shouldn't be selling AI to administrators as a turbo button for the lab.


No,


we should be selling it as an automated quality assurance officer.


That is the real trailblazer perspective here. The value isn't in speed, it's in the safety net. And the pathologists in the study noted that the u the user experience of those semi-transparent masks was very unobtrusive.


It didn't get in the way of their traditional read.


Exactly. It just acted as a guide. It smoothly said, "Hey, look here."


It effectively acts as a second pair of eyes that never gets tired, never gets distracted by a phone call, and doesn't suffer from change blindness after looking at 50 pink and purple slides in a row.


That is the key difference. Humans are incredible at high level pattern recognition, but we are terrible at sustained vigilance over highly repetitive tasks.


Right?


AI is the exact opposite. It has zero true intuition, but its vigilance is absolute.


Now, looking toward the future, the authors do touch on a pretty controversial idea in their discussion section. We established earlier that the scout had 100% recall. It caught every single cancer in the validation set.


Yes, it did.


So, logically, if the AI is genuinely perfect at finding cancer, could we eventually trust it? to filter out the negatives entirely.


This is the holy grail of automated screening. Just imagine a workflow where the AI pre-screens the 12 lymph nodes from a patient.


Okay,


it determines that 10 of them are unequivocally negative. It archives them. The human pathologist never even sees those 10 slides.


That would reduce the workload by what 80%.


Easily, maybe more. In colarctal cancer, the vast majority of excise nodes are actually negative. If the pathologist only has has to review the two suspicious nodes and the 10 negative ones are autoverified. You have just increased that lab's capacity by an order of magnitude.


But the legal and ethical implications of that are just staggering to me.


Oh, absolutely.


If the AI does miss one, say that recall drops to 99.9% in the real world and a patient is harmed because of underststaging, who is liable?


That's the billion-dollar question.


Is it the pathologist who never even looked at the slide? The software vendor? The hospital administration?


That is exactly why the authors are very cautious in their language. They suggest that a model with 100% sensitivity could theoretically assume this role, but we are not there yet.


We need more data.


We need massive multicentric clinical trials to prove that this 100% recall actually holds up across thousands of diverse patients, not just the 108 slides from one hospital.


Still, it represents a massive shift. It moves the role of the pathologist from being a searcher to being a verifier,


which is a much better use of their specialized training. I mean, medical school teaches you to diagnose complex pathology, not to play a highstakes game of where's Waldo on a gigapixel image, right?


Let the machine do the exhausting searching and let the human do the clinical reasoning.


There's also an interesting note on the user experience regarding the heat maps versus the masks.


Yes. So, the classification stage, the scout produces a heat map, which is essentially a probability cloud. But the segmentation stage, the artist produces a hard mask, a defined line.


And the pathologist found the combination of both to be essential.


They did because the mask tells you exactly what the object is, but the heat map tells you how confident the machine actually is about that object.


It provides nuance. A faint heat map hovering over a mask suggests, "Hey, I'm guessing here."


While a bright red one screams, "Look at this immediately."


And that transparency is so crucial for building trust. If the AI is just a black box that spits out the word cancer, a doctor will be naturally skeptical.


Naturally.


But if it shows its work, if it says I see these specific patterns here and here it becomes a collaborator rather than a dictator.


So let's bring this all back to our listeners our trailblazers. We've reviewed the tech pipeline, the metrics and the really critical story of specimen 8.


Mhm.


What is the main takeaway for the lab director or the practicing pathologist tuning in today?


I think the main takeaway is that you need to completely change the questions you ask during vendor evaluations.


So instead of asking how much faster will this make my team,


ask them how does your model handle mimics like sinus cysteocytosis,


right?


Ask, "What is your true recall on micro metastases smaller than 02 millimeters?" And most importantly, ask, "Has this exact software been validated on a specific brand of scanners we have in our lab?"


Because if it was trained purely on a hamatu and your lab runs on Philips hardware, you might as well be flipping a coin.


Exactly. The Fez Zillan paper proves that hardware robustness is possible, but it has to be engineered intentionally from day one.


And I think there is a broader lesson here about how we to find value. We are so conditioned by modern management to measure value in minutes and seconds saved.


We are


but in medicine real value is measured in patient outcomes.


If that pilot study had been a live clinical scenario, that AI effectively saved a life. Not by being fast, but by being relentlessly thorough. That is the real ROI.


That is a very powerful place to land. And it brings me to a final thought I want to leave our listeners with today. Something to mull over on your drive home.


Let's hear it.


If an AI doesn't doesn't save you time, but guarantees you never miss a 0.14 mm metastasis.


Uh,


it makes you wonder about the past.


What do you mean?


Well, historically, humans have been doing this staging manually for decades. If AI is catching these microscopic stage 3es that humans naturally miss, does that mean our historical survival data for stage 1 or stage 2 colorectile cancer is actually polluted by missed stage three cases?


Wow.


Will the adoption of AI force us to rewrite our historical staging guide? ines entirely just because our baseline accuracy is about to shift so dramatically. Is peace of mind a metric we need to start quantifying not just for the pathologist but for the integrity of our medical data itself?


I would argue it's the only metric that truly matters going forward. That's a huge paradigm shift to think about.


Definitely. Well, a huge thank you to Alexi Fea Zolan, Nikita Savalof, and their team for this incredibly rigorous contribution to cancer medicine. Yeah, we highly recommend you look up the full paper, the images of the segmentation masks specifically. on those artifact heavy slides are honestly worth a read alone.


Absolutely. Seeing the artist model successfully navigate around the necrotic detritus is quite satisfying from a technical standpoint.


Thank you for joining us on this deep dive into our source material today. Keep questioning the metrics, keep demanding better validation, and keep blazing those trails. We will see you next time on the Digital Pathology Podcast.


Goodbye everyone.