Digital Pathology Podcast

Aleksandra Zuraw from Digital Pathology Place discusses digital pathology from the basic concepts to the newest developments, including image analysis and artificial intelligence. She reviews scientific literature and together with her guests discusses the current industry and research digital pathology trends.

All Episodes

Digital Pathology Podcast

167: Why Accuracy Matters in Digital Pathology Podcast with Keith Wharton, Jr.

October 21, 2025 • Aleksandra Zuraw, DVM, PhD • Episode 167

Send us a text

Why do some pathologists still hesitate to trust digital slides—even after the FDA says “yes”? Because accuracy in digital pathology isn’t just about pixels—it’s about precision, validation, and confidence.

In this episode, I talk with Dr. Keith Wharton, MD, PhD, Global Medical Director at Roche Diagnostics, about how the Roche Digital Pathology DX system earned its FDA clearance for primary diagnosis—and what that means for the field.

We explore the science and strategy behind whole slide imaging (WSI) validation, the challenges of feature recognition, the meaning of non-inferiority, and the future of interoperability and AI in diagnostic systems.

If you’ve ever wondered what it takes to make a digital system clinically equivalent to the microscope—this episode is your roadmap.

🔹 Highlights with Timestamps

[00:00–02:30] What “primary diagnosis” really means—and how the human brain processes histopathology features.
[02:30–06:00] How Roche achieved FDA clearance for its DP200 and DP600 scanners—and why it’s “clearance,” not “approval.”
[06:00–11:00] Breaking down the Roche Digital Pathology DX System components: scanner, viewer (Navify DP), and monitor.
[11:00–16:00] Understanding the pixel pathway—the heart of system validation.
[16:00–19:00] How the FDA defines precision and accuracy in validation studies.
[19:00–30:00] Inside the massive multi-year validation studies: design, washout periods, and thousands of slide reads.
[30:00–33:00] The non-inferiority margin (−4%)—why it matters and how Roche exceeded the benchmark.
[39:00–45:00] The surprising “nuclear groove” discovery and what it reveals about how pathologists adapt to digital.
[1:10:00–1:13:00] Future-ready systems and FDA flexibility through predetermined change control plans (PCCP).
[1:25:00–1:35:00] Keith’s reflection: bridging the gap between discovery and clinical impact, and why the future of digital pathology is brighter than ever.

Resources from This Episode

Roche Digital Pathology DX (DP200 & DP600) – FDA-cleared systems for primary diagnosis
FDA Guidance (2016) – Technical performance standards for WSI
American Journal of Clinical Pathology – Paper in press on validation study design
Book: Chasing the Invisible by Dr. Thomas Grogan
Frontiers Journal (2021) – Tissue Multiplex Analyte Detection in Anatomic Pathology (co-authored by Aleks and Keith)
AJCP Paper
Aleks and Keith’s Paper - Frontiers | Tissue Multiplex Analyte Detection in Anatomic Pathology – Pathways to Clinical Implementation

Key Takeaways

✅ FDA clearance requires rigorous demonstration of precision, accuracy, and statistical confidence.
✅ Non-inferiority margins (typically −4%) define the threshold for clinical equivalence to microscopy.
✅ Feature recognition in digital environments (like nuclear grooves) challenges perception and training.
✅ Interoperability and predetermined change control plans (PCCP) may accelerate system evolution.
✅ Digital pathology’s foundation is the pixel pathway—where scanner, viewer, and monitor all align.
✅ The field’s future depends on bridging discovery and practice, guided by robust validation.

Support the show

Get the "Digital Pathology 101" FREE E-book and join us!

Why Accuracy Matters in Digital Pathology | Podcast with Keith Wharton Jr.

Keith: [00:00:00] Primary diagnosis is when a pathologist looks at a slide and through really kind of a miraculous process the visual information goes in, something happens in the brain and then you know either instantaneously or a few minutes later or maybe after some tests some words come out the words get into the report and, and dictates like everything to follow with the patient.

So what's in the black box of the brain that's going on? So one is something called features are what you learn in you know histology class. You know, here's a nucleus, here's a cytoplasm, here's a fat cell, here's a, you know, here's an infectious agent.

The second one is a context. So, you can probably think of some examples where if I show you some features in this one history, but then I show you the same features in a different history.

Your primary diagnosis might be different, right? So, it turns out the context can be really, really important. And then the third one is language. So think about you know how important the that we all have shared meanings of a set of words that come out [00:01:00] of the end right invasive versus incite youpleomorphic versus anoplastic like these have these automatically whether you like it or not I say these words and the part of your brain that makes pictures is already flashing pictures in your head we don't really understand what's going on all we have is really the diagnosis to compare.

Aleks: Welcome my digital pathology trailblazers. Today, our podcast guest is Dr. Keith

Wharton from Roche Diagnostic Solutions and our topic is Digital Pathology Validation Studies to Get a Whole Slide Imaging System cleared by the FDA. And Roche received clearances for their whole slide imaging system that includes two scanners, the DP200, and that was in the summer 2024, and DP600, and that was at the end of 2024. And we will be talking about what actually [00:02:00] needs to be done and why to achieve this milestone and how this compares throughout the industry not just what Roche did but there is a paper in the making that we're going to be talking about. So good to have you on the podcast Keith quick background of how we know each other we actually worked together previously and we co-authored a paper. The title of our paper was Tissue Multiplex Analyte Detection in Anatomic Pathology Pathways to Clinical Implementation. So we brainstormed and geeked out on how multiplexing can be used in the clinic.

But we always start with the guest first. So, Keith let us know a little bit about yourself, your background, and what your role is now at Roche Diagnostics.

Keith: Thanks Alex. It's so great to be here and I just want to say thank you to you for all the advocacy of digital pathology that you have accomplished in the last few years, since

actually we used to work [00:03:00] together. So, so that paper was published in Frontiers

journal in 2021 and it explores…

Aleks: That was when I incorporated the podcast and blog Keith.

Keith: Yeah, I figured you just you were just yeah thinking about it. I think the my

motivation for that paper and if you remember it was just this widening gap between what's technically possible all of the discoveries we're making with digital pathology spatial multiomic biology we'll talk about that a little later and what fraction of that is you know is going to impact patients in the near future I think it's a widening gap and that's one of the reasons I you know I'm very passionate about what we do at Roche, I think today's topic is very relevant and for that reason I we're going to talk touch touch on some topics like the pixel pathway that paper's not obsolete yet so even though it's four years old.

So my background I started out as a chemical engineer became a medical doctor and then trained to be a pathologist. But I did the whole research Spectrum from STEM to Stern [00:04:00] as well I did a PhD in molecular biology a postdoc. I was a faculty and then I was actually associate dean for medical education for a little while before entering industry in 2009 so that was 16 years ago. And I've been really lucky to spend about half the time in pharma and half the time in kind of the diagnostics life sciences tool space.

And so I've been then with Roche Diagnostics since 2022.

So it was after we worked together. I'm in a global medical affairs role, and I mean you could guess what I'm covering right?

I'm covering digital pathology and multiplexing. So you know if the shoe fits, wear it, and few a handful of other things. Roche is a very rich company from a scientific and product development standpoint. So I'm really thrilled to be here at this point.and I will say I've been really lucky from the standpoint of background as a pathologist in industry to actually do pathology the whole time. You know, a lot of MDs go into what I'd call clinical development, medical affairs or drug safety. [00:05:00] These are all great things to do and they're really important, but none of them are actually pathology when you get right down to it and you know this as a veterinary pathologist.

And then actually in contrast to DVMs who go into industry who mostly do toxicologic pathology.

Aleks: Yeah, we actually one or another pathology in the industry. Yes

Keith:. I was going to say you guys have a career path. I found like MDs really don't have this career path. So I had to…

Aleks: That is true. I never looked at it like that.

Keith: Kind of piece it together. Yeah. I had to piece it together, jump around in thein the pharma roles I was in. I was the only MD in departments of all DVMs. So from the standpoint as an as an experimentalist, I was, I loved it.

So I was like I call myself the token MD in that group. Because as you know a lot of those folks do combine safety and efficacy studies. So it's actually really important to know what you're looking at all about the target modality and try to interpret the lesions in ter in context of of you know potential safety signals or or efficacy. [00:06:00]

And so I guess after having pieced that together now for the past 16 years, I'm kind of happy I've been able to survive in industry as a MD pathologist.and I'm also proud, you know, I've been able to publish like 12 papers and and a book um on, you know, that's very relevant to the topics that that we're going to talk about. So within Roche, Roche Diagnostic Solutions, most of what I support would be what we'd call the legacy Ventana.

Ventana started as Ventana Medical Systems.actually founded in 1985 I believe by Tom Grogan who's a pathologist and I don't know if you're familiar with him. He was actually one of my professors in medical school. So I had the privilege of meeting him as a medical student. I think it was 1987. And you know I was a you know young person looking for trying to figure out what I wanted to study and do. And actually what he was actually doing didn't excite me but that that's okay. He was actually very very encouraging and that kind of has set the tone the type of individual [00:07:00] he's you know he's he is actually to this day within Roche Ventana. So he retired full-time from leadership I think in 2017 but he's still around and he's still really inspiring to people. And I guess the last thing I should mention about Tom is that actually right before the pandemic he wrote an autobiography that is not only actually a really interesting autobiography. It's called Chasing the Invisible.

Aleks: I need to review it on the podcast.

Keith: Yeah, he would love to talk to you, actually.

Aleks: We're going to link to this book in the description.

Keith: So, 2019, I think, Chasing the Invisible. So part of it is his kind of odyssey through founding Ventana and all the ups and downs he went through. He's he's really revealing like

how much frankly how stressful it is to take a company from, you know, twinkle in your eye to acquisition by Roche in this case in 2008. So and but the other piece of that that you don't realize is what a major role you know Ventana has played [00:08:00] in the advancement of precision diagnostics really through automated staining companion diagnostics and now we're entering this era of digital pathology entering the next stage.

So, um, yeah, I recommend you…

Aleks: You need to introduce me. Maybe you can do an introduction and then…

Keith: We can do that. We can do that. Chasing The Invisible. And of course, it's in it. He called it that because when you look at an H&E like you can't actually see the stained biomarkers, but he as a hematopathologist was doing a lot of you know in initially manual immuno histochemistry and that was like the inspiration and motivation to actually roboticize the whole…

Aleks: To automate. Oh my goodness.

Keith: Yeah. And he wasn't the only one, but he was certainly one of the leaders. And as you know, it's one of the leading platforms today that's in pathology labs. Yeah.

Aleks: I need to read the book because what's happening in somebody's mind to design something like this so early when this method was still like a method that not [00:09:00] everybody trusted.

Keith: Right?

Aleks: Like now like not everybody trusts they want others to do even though it has been validated so many times already right.

Keith: I mean and if you've done IHC by hand maybe in your early days like it works you know some of the time…

Aleks: Want to do it again.

Keith: Some of the times it doesn't work and maybe it's you know twice as dark on Tuesday as on Thursday and as we know like that just doesn't work for especially for semi-quantitative companion diagnostic type biomark so that's that's pretty much me and I'm as I said I'm thrilled to be here today with you to chat.

Aleks: Let's talk about the Roche Digital Pathology DX system. You guys recently received the clearance for primary diagnosis. Can you give us a quick rundown of what the system is all about and how it fits into the digital pathology landscape?

Keith: Sure. Sure. I'd be happy to. So, you mentioned the actual name of the system and this will become important in a moment. It's called Roche Digital Pathology [00:10:00] DX or you know we abbreviated RDPD sometimes. So if you hear me say RDPD during our chat today, you'll know what I mean. What…

Aleks: That's what it is.

Keith: …talking about. So the the system, we'll go into this in a bit of detail for the listeners who aren't familiar with this. The system according to the FDA has three basic components. So there's a scanner component, a viewer component, a software viewer component, and then there's a display monitor component. And actually, in I believe it's 2016 FDA put out a guidance document that kind of told the world, at least in the United States, how it wanted to look at these systems. So, so that and as you mentioned, we had a great year last year, two 510ks. In the summer, we had our first clearance, which included a couple of performance studies, which I'll go into. And then we had a second one was called a special 510K, and that's, that is for occurrences where you're basically adding components but you're not changing the intended [00:11:00] use, and you're not really making major changes to it. So, we…

Aleks: Okay.

Keith: …presented an argument to the FDA that the two scanners, the two DP200 and the DP600 were identical by design even though they were you know they were actually manufactured a few years apart and but they make the same file type and they have the same so-called pixel pathway. We'll we'll go into that in a little more detail. So and at the time we got our clearance last summerit was the fifth overall whole slight imaging system clearance with with a novel scanner. So I'll explain that what that means in a moment. And then again this is a 510k. So the proper term is clearance and not approval just for the the…

Aleks: Yeah, I think the most confused term when you talk about something that FDA said yes to and people confused, you know.

Keith: [00:12:00] So, it's clear it's a cleared system and and generally when you clear something that means there's a substantially equivalent to a so-called a predicate device. So, we'll go into that a little bit with our studies as well. So, that was the fifth system.as of today, there actually seven now whole set imaging systems with novel scanners.there are separate clearances for viewers and monitors. We won't talk about those today.and we talk a lot about the slow adoption of digital pathology. I know this was popular in the you know the mid teens of this century because there you know there e there were either none or very few cleared systems and some labs you know didn't really want to adopt unless the FDA had had blessed it accordingly. So the first clearance was 2017 there was second one was in 2019 I was actually part of that and so now we're up to seven and now it's eight years after the first clearance. So I what I'm getting at is that we can't really use a lack of choice or lack of options as an excuse anymore. [00:13:00] There may be other excuses but we can't use that one as an excus anymore. So and then maybe just an understanding and why system the idea here is

that the comparator device for pathologists to make a diagnosis is is the microscope, right?

What do you need to make that work? Well, before there was electricity, you needed a good mirror lined up against the sun.but now that we have electricity and light bulbs, light sources, you know, you basically need an electric plug and then you pretty much trust the pathologist, you know, to do what they do.a scanner will not serve that purpose, right? If you make have a slide scanner on your desk, you put you put slide in, it scans, and then you can't you know, you need something to read it, right? So that's the rationale for the system is that a scanner, at least the way we make them today,is not useful as a comparator. So you need to add viewing software, and so our viewing software is called ViewPath. It's now called Navifi DP, [00:14:00] Navifi Digital Pathology. And we had to choose actually a specific monitor, and it was an ASUS brand monitor. So what does the FDA clearance allow us to do? Well, it makes us to make specific claims that are really no different than all of the other whole set imaging system claims. In fact, if you line up all the intended use statements for all of the systems, they're like identical septuplets. They're pretty much the same. Yeah. The only components are the only difference is rather the different components and the system names.

Aleks: And I want to highlight this highlight that you are highlighting that it's a system because the like the spoken shortcut is a cleared scanner. But there is no such thing as a cleared scanner, and we keep referencing the pixel pathway. So I'm going to just let you talk about it when whenever you want to touch on this but that basically defines the [00:15:00] system.

Keith: So you could talk about the pixel pathway within the scanner. You could I think what FDA is really referring to is the pixel pathway of the system that includes the scanner. So what I will say is that the and we'll talk about this a little later on the algorithms that have been cleared by the FDA for use there's a very small number of them but I I remember hearing this and but I tried to look it up in a document I couldn't find it what the the position has been that in order

to get a an algorithm cleared through the FDA or approved if it turns out to be it it has to be done on a you know previously cleared or approved system. So the idea is that you're building you're it's like you're building a house, right? The foundation might be the scanner, and then on top of that the walls are the viewing software, and then maybe the second floor is a viewer, and then on top of that maybe there's some software functionality that's your you know your crow's nest or your you know your two-car garage or however you want to look at it. So the idea is that the system is really [00:16:00] the foundation of of kind of everything you need to do going forward including algorithm development or any technical advances.

Aleks: We recorded a demo video at USCAP. I was at the Roche’s booth and the you are going to be able to see it on YouTube the Ventana DP600 scanner. But today I want to dig into your validation study. So the scanners are going to be separate and I'm going to send everybody who is on my list an email about that as well. But I'm interested in the validation.first at the very high level it is logical FDA wants to know if digital pathology is non-inferior to light microscopy. Like you said pathologists have been doing their job on a light microscope for a long time. And the microscope is a system in itself. It has like all the components but because we went digital we kind of dismantled the component from this one device. But we still want to know that this is non-inferior [00:17:00] because this has been used for pathology since its invention.

Keith: A few centuries…

Aleks: But how do we pro… Yeah. For centuries now. How many centuries? Two centuries. Three centuries. I don't know how many.

Keith: Depends on how far you go. You're back. Almost two. Almost two. We'll put it that way.

Aleks: Exactly. But how do you prove that something like that has been in use for so long is like the new thing is the same as the old thing even though they look totally different like at the high level what does the FDA want to know?

Keith: Sure. Sure. And now we have seven examples that have been published in decision summaries. So I would say the best way to get a sense of how the FDA looks at these is to first look at that performance document in 2016 and then look at the summaries and then we have a paper in press as I mentioned that'll go into a bit more detail.so they're looking at a couple of things. And I I will say you mentioned validate like we're we're a medical device [00:18:00] manufacturer. We're not a lab per se. So the V-word means different things typically slightly different things to medical device manufacturers than to a clinical laboratory. So I'm talking now about the former. So they want, they want an assessment of precision and accuracy and and these seem very intuitive, and foundational for really any device, right? But because there are so many cleared systems, they're all equivalent, substantially equivalent, the study designs were fairly fixed. So, you have a little bit of leeway there, but not much. And actually, ours, kind of rolled out at the beginning of the pandemic, and that'll be a really interesting tie-in I'll make near the end.so, the goal is that you have these two studies, they're fairly sizable. One measures precision, one measures accuracy, and you have to meet so-called acceptance criteria. So, you have to decide ahead of time like what level of precision or accuracy you need to achieve. And then the goal of course is use using [00:19:00] statistics to disprove the null hypothesis. So the null hypothesis in this case would be that digital pathology is nowhere near as good as as microscopy.so to disprove that you need toactually the way we do it is take a difference inaccuracies and then that difference has to be a certain amount.

Aleks: So what is the like high level what's the difference between precision and accuracy and why does the FDA want to see both? Because like you say device manufacturers have different

requirements then if you would take a device and then validate it in a lab right, there you don't have to do so much but here you do. Why do you need those two components and what are they showing?

Keith: So, I will say just before I get into tha, this is an example where a single person with a single idea and let's just say they were an [00:20:00] octopus they had lots of hands they still couldn't do it by themselves. So as a pathologist I have some idea of what would convince me but I'm not what I would call a card carrying statistician. So what I will say is that in the design of these studies and of course in analysis of data for anything for any health authority, statisticians need to be involved so that you have proper study design and then you have it's powered adequately enough to rule out any confounding sort of non-irrelevant or biased explanation. So that is one reason these studies are are pretty juicy. I'll I'll get into that in a moment. So I would say the philosophical question of why is well let's just saywhy don't I go into what we're actually…

Aleks: I love that you call it a philosophical question.

Keith: Well I some of it's history and because you're dealing with the substantial equivalence you can't rewrite the book right you you've got the predicate [00:21:00] you have to show your equivalent and that that's pretty I want to say it's pretty locked in but let's let's step back and ask like what what is primary diagnosis right diagnosis Primary diagnosis is when a pathologist looks at a slide and through really kind of a miraculous process the visual information goes in, something happens in the brain and then you know either instantaneously or a few minutes later or maybe after some tests. Some words come out, the words get into the report and and and that dictates everything to follow with the patient. So what's in the black box of the brain that's going on? The way I think of it is that there are kind of three components of primary diagnosis. So one is something called features. Features are what you learn in histopath, you know, histology class. You know, here's a nucleus, here's a cytoplasm, here's a fat cell, here's a you know, here's an infectious agent. The other one, the second one, is context. So you can probably think of some examples where if I show you [00:22:00] some features in this one history, but then I show you the same features in a different history, your primary diagnosis might be different, right? So it turns out the context can be reall, really important. We'll go into that in a little bit. And then the third one is language. So think about you know how important the fact that we all have shared meanings of a set of words that come out of the end, right?

Invasive versus incisive, pleomorphic versus anaplastic, like these have these automatically whether you like it or not. I say these word,s and the part of your brain that makes pictures is already flashing pictures in your head. We don't really understand what's going on. All we have to do is all we have is really you know the diagnosis to compare. All right. So, and then of course my favorite study around making diagnoses is tha,t like it's not specific to humans, right? Remember, Richard Levenson's famous paper where he trained pigeons [00:23:00] to read breast cancer, benign versus malignant.

Aleks: I know…

Keith: With a little bit of, you know, caloric incentive. Nevertheless, like, you know, the pigeon version of Keith could probably do a pretty good job, you know, doing what Keith is trying to do. So, that's an amazing insight. It's very evolutionarily deep process. Probably relates to visual cues that help you find food and survive and mate, and stuff like that. All right. So, the precision study then looks at 23 pre-selected features that again they were already baked into the prior predicate devices study. So, we didn't really have a choice in those.

You would recognize all of them, right? There are things like you know, adipocytes, skeletal muscle nerves.

There are some abnormal things as well. There's no comparator group though. But you do have multi-reader and multisight, multi-scanner and stuff system. So in the end for that study we had almost 5,000 instances where a pathologist looks [00:24:00] at a a region of interest and they're supposed to identify a feature. So 5,000. So that's like you know four digits. That usually makes the statisticians quite happy to have an end that high.

Aleks: Yes.

Keith: Yeah. Right.

Aleks: I remember from my PhD where we had like three mice per group, one died, and the statisticians were like, "No way."

Keith: Shhh…Don't tell anybody.

Aleks: The higher the number.

Keith: But really, there's no reproducibility crisis in science. Really? No, just kidding. Alright. So then the other study, the accuracy study, that's like the grand daddy study that was in their case four sites, four pathologists per site,over 2,000 cases. And the basic design is that the pathologist looks at a case with microscopy and then they look at the same case with like a wash out with digital or vice versa. They're randomized and they're put in smaller groups and they're mixed up and and so on. So if you add [00:25:00] all of that up, it's almost I think it's 15,000ish diagnoses were made. So 15,000 instances where a reader looks at you know a case and says yes, it's basal cell carcinoma or it's a hernia or whatever.

And then and then that wasn't enough actually to the end point was the disagreement rate. So it's the disagreements the clinical diagnostic disagreements that make a difference in patient care. And so we train them we so we have in our case we had 20 adjudicators. So we train them to judge like is this the same as that or is it is it a minor disagreement which case it's counted as a degree or is it a major disagreement because it hasa change in clinical diagnosis. So when you add that all up there was likeover 40,000 adjudications. So 40,000 instances where an adjudicator, pair of adjudicators actually look at two diagnoses blinded and then they say [00:26:00] these two are the same. There's a minor disagreement which means there's no change in clinical management or there's a major disagreement which means there's a change in clinical management. So,these are so these are both types of studies multi-year, you know, they they take thousands of hours of time on our side, hundreds thousands of hours of reader time.and so these are these are massive these are massive studies. And now again, as I said, there's seven cleared systems. So they've all been through this.

Aleks: So how much time did it take for you to run the study? Because you say you had four pathologists per institution. That's like six coordinating 16 pathologists. Coordinating one is a challenge.

Keith: Well, right. And there were site leads. There were there were medical directors at each site that we worked with. There are co-authors on the paper. There's Histotechnologists and scanner technologists at each site each site that are trained on the system. The systems because they're investigational use only are not [00:27:00] allowed to be you know tracked on to other side projects. They have to be dedicated to this purpose because they're investigational. So, but and so the site has to be on board. The pathologists have to be on board and they have to be able to dedicate a certain amount of time over weeks to months to get through the workload. You know, in addition to of course we all know they have lots of free time. So it's really not a problem…

Aleks: Like everyone, right? It's a side project, not…

Keith: Yeah. Exactly. Exactly. So yeah it so you know when you add it all up it's thousands of hours and then it's it's years from the time you think of it to the time you get your protocol written down your IRBs done gets get the the the machines on site and then I think for us it was a couple of years but theyou know till a after you know we got the results to get the clearance.

Aleks: So let's talk about the paper your team's paper which is in press in the [00:28:00] American Journal of Clinical Pathology. It mentions specific acceptance criteria for precision and accuracy. Are the numbers that you mentioned is that mandated by the FDA? Can you walk us through what numbers you need to hid, hit to get FDA clearance and how Roche system performed against those bench benchmarks? I'm especially interested in the non-inferiority margin because you mentioned they are basically quantifying how people disagree. So how much can they disagree? There is an inferiority non-inferiority margin of minus 4% that seems to be consistent across different hall imaging system throughout the industry like where does it come from are we so consistent like what are the numbers that you need to hit in addition to like how many [00:29:00] cases you need to do for the statisticians.

Keith: Right so I'll say for thank you for both studies, those cut offs the study sizes to some extent the power calculations they're pretty baked in you don't have much flexibility and the goal there is to I'll explain this when I explain the acceptance is to to is there to be no doubt there's no statistical doubt that your data is is telling you what you you think it is, in other words you're not achieving acceptance kind of by accident. Right, so the studies are are sized that way so that's like impossible from if you talk to a statistician. Okay, so for the precision study, remember I said there's like almost 5,000 instances where a pathologist, you know, looks at a screen and says and looks at a list of features and says check check or not.we landed around 90% for all of the metrics and that included like intra sight, [00:30:00] interday,interredear, and and so that you know, that so the acceptance was 85% and and again the it's the lower bounds of the point estimates. So the point estimates were all around 90%. When you added in the confidence intervals the lower bounds subtracted about one and a half points. So we were still above the acceptance which was 85%. And 85% is you know to me it's it's a common number you encounter with regulators where I think they would have, if let's just say the acceptance was 70%. You know then they would have to go to the public and say well you know three times in 10 this test doesn't work but you know seven times in 10 it does, that doesn't look very good for them or you know really for anybody. So 85% is a compromise I think between what looks good and what's kind of achievable within the study domain. So we passed that. So I was we were thrilled [00:31:00] to hear about that. All right.

So then the accuracy the metric was the difference in accuracy of each modality. So as I said they read all the cases or some of the cases manually wash out digital or they read them digitally wash out manual. So every pathologist reads the case twice. And so if they do have recall bias, it's sort of spread across all cases.and there's the wash out kind of, you know, prevents them from…. Yeah…

Aleks: I did a a similar exercise for a different validation of a project and I was afraid that I'm going to like very much remember everything I saw. I was surprised how much I have forgotten because when we were discussing the study, I'm like, what do you mean it's going to be skewed by the recall bias? And they gave us enoughtime what's the time called wash out period. The the wash out period was long enough that I actually didn't remember or like no like in some instances I had no idea. Right.

Keith: That that's the goal, right? [00:32:00] There are some pathologists who, you know, self-proclaimed that they have memories like elephants and they never forget. And that's probably true, right? There are some Yeah.

Yeah.so but when when I if I threw Yeah. So anyways that's that is what it is. If there's and again you're not your results aren't driven by one reader because there's 16. So if you do have one elephant in the room who's remembering everything they're only 116th of the data. All right. So let's just say the manual accuracy was 99%. And the digital accuracy was 97%. So the metric was DR minus MR which is minus 2 in that case. Okay. So if the acceptance is minus 4 the you had to be greater than minus 4. So minus 2 is greater than minus 4. And actually I just gave you the point estimates if you include the confidence intervals for studies of this size.it drops [00:33:00] it down about a a percent. So where did we land? We landed at minus.6%.

With confidence intervals that overlap zero and the lowest bound of that confidence interval was about one point lower so minus 1.6 six again which is still I'll call it miles away from minus 4. So we were quite happy there that we passed.and so an interesting thing I think now that all of these studies is again there's been seven clearances. So there have been seven studies of kind of the scale, six out of seven of those the difference in accuracy overlaps with zero and you know all six all seven of seven actually were greater than neg negative 4%.

So that to me in addition to each system being you know passing it tells me that DP is passing right I think we'll go into that.

Aleks: Yeah that is my impression it's basically that's what it is. [00:34:00]

Keith: Yeah. Yeah. So so that's good news for DP. It's good news for the systems obviously and it's a vote of confidence I I think for the the non-inferiority of the two.

Aleks: Do you think that because of that there's going to be no there's not going to be like relaxation of anything because every device is a new device and that's just the norm and people will expect that. I was just hoping like oh maybe then people don't have to do 5,000 slides or whatever 2,000 cases because it actually is good enough.

Keith: There... You know, there have been a few clearances based on viewers that had much smaller cohort size. The risk with a smaller cohort, of course, is that you'll you know, with the competence intervals, you won't you won't pass, right?

Aleks: That is true.

Keith: So, and we'll we'll go into the factors that contribute as I said our study all the studies have a match digital and manual group [00:35:00] and so the end point is the difference between the two. So, what that means is that whatever environment is contributing to your modality accuracies, you know, should be contributing equally to both of them.

And and what you're really looking for is a signal that shows that one is is actually inferior. And so the fact that they overlap with zero says you can't find any, which is good. Again, it's a vote for each of the systems. It's a vote for digital pathologyin general. Yeah, absolutely.

Aleks: So, so were there any particular tissue types or any diagnosis that were really good on digital and not so good on the microscope or the other way around that they were good like very concordant between pathologist on the microscope but not so great on digital were was there anything specific there?

Keith: So there's signals I'll say there's really nothing specific because [00:36:00] none of those subtype, subgroups organs are powered appropriately to confidently state that there'sfor example you know you can't you can't read liver biopsies with DP there none of them had enough sample size to even show that even if there was a big difference sowe we controlled for that two ways and this is similar to where the other groups looked at it first of all you're not allowed to frontload the cohorts with a bunch of easy cases a bunch of BCC's and hernia sacks, right? You actually have a predetermined list of organs and diagnosesand the proportions thereof that each site needs to pull out of their archive and enroll.

And they they do it pretty much consecutively, but they also are have targets for specific organs and we had to meet, you know, most of those. So there were 20 organ groups and when we do subgroup analysis of the 2000 cases 20 groups and when we do subgroup analyses the difference in accuracy [00:37:00] for every one of those groups was within plus or minus 4%. So again not powered appropriately but in other words there were no organs present that were really driving a good or bad modal accuracy with one of the other modalities.

So the differences were you know within a range over all 20 organs. All right. So then we subdivided those 20 into 74 organ plus you know diagnoses or procedure types and and these are all predetermined as well based on the predicate device. And so there there was a little bit more of a spread. They kind of ranged from like minus 8 to plus 8%. So they weren't all squeezed into four, but it was really interesting looking at the the ones that kind of landed at the margin. So it turns out two of the ones that were the most accurate digitally, believe it or not, not manually.

And again, underpowered. We don't know, you know, whether it's real. Bladder [00:38:00] neoplasia.

So why is that? And I got start to thinking again, this is a hypothesis. Well, if you you looked at bladder biopsies, like they look like 64 continents on a on a projection map, right? There's tissue everywhere. They're really challenging to look at by microscopy. And maybe digital makes them easier, right? So maybe you can actually be more accurate. Again, underpowered, you know, all the disclaimers, but I thought that was really interesting. Two out of the 74 were bladder cancer categories, and that was more accurateuh digitally. Now at the other end more accurate with microscopy were two of the breast incitu carcinoma categories both resections and biopsies.

And when I say more accurate with microscopyagain not much like 8%. So it's again the entire study were 2,000 cases like that I suppose it would have failed but we don't know [00:39:00] because the study was underpowered with that sample type. And then but what that sample type tells you is that, that's a type that's intrinsically challenging with microscopy right, you know atypical ductal hyperplasia versus carcinoma incitu versus microinvasion so we saw a lot when we actually went back and looked at all the diagnoses and I'll get through that in a moment we saw a significant number where you know where they were they were both similar but not exactly Right. And the adjudicator called it one way and sometimes they called it the other.but but again breast in general this this dysplasia carcinoma spectrum intrinsically challenging with microscopy. So it's not a surprise that it's challenging with digital as well.let's see. So then when we looked at all these subgroups and we looked at chopped up the data several different ways [00:40:00] we didn't really find any you know, system specific or DP specific you know pitfalls. So in other words, the meth the statistics told us they were non-inferior and then we when we looked at subgroups, we didn't see a real consistent trendwith with any of the non-statistically significant subgroupsyou know beyond those that I mentioned and most the rest were within this plus or minus 4% range when you looked at difference in accuracy.

Aleks: So these are okay these are organs or types of diagnosis and in the paper you mention specific microscopic features like nuclear grooves that were sometimes challenging to identify digitally and we're going to explain what these are, especially in the context of Langerhans cell histiocytosis, but then you had perfect identification of the same feature in other contexts. So there's one [00:41:00] like and you mentioned that there is different context to the same feature at the beginning, right? So there was one context of a feature where people didn't recognize it digitally but then there was the same feature but in a different context like thyroid cancer, where they did recognize it. So, so what does this tell us about how pathologists recognize features digitally versus glass and like does this inform in any way the training of pathologists when they transition to digital?

Keith: Yeah, those are really good questions. So, the remember precision study did not compare to microscopy. So that would have required us having a microscope set up and an arrow saying what's feature on a microscope and then doing the same thing digitally. So that was not done. So we can't really say anything about comparison to microscopy with the precision study design. [00:42:00] But I think your questions raise a kind of a larger question of well why isn't it 100%. Like you know what was can you only see nine out of 10 features with your system like what's going on? And that's it turns out that's, you know that might be the null hypothesis answer but what happens is actually if you look at what they chose instead of the right answer it tells you a lot about why they chose that answer and why they missed it. So I'll explain that quickly.

So you mentioned nuclear grooves.

Aleks: Yeah.

Keith: Nuclear grooves were the troublemakers in our in our you know the study's designed by having three different cases with a with a feature. And so our curator chose two cases that are what I would call textbook. Thyroid cancer, ovarian cancer. So these are things like if you're a medical student you're learning about these it's like oh nuclear grooves if you see those you think thyroid you think ovarian and there are you know there are a handful of other things you think of as well but you know medical

students can't memorize everything. Right? So there was [00:43:00] a case called yeah you mentioned Langerhans cell histiocytosis which is a childhood disease and these are characterized by sort of marginally neoplastic langan cells they accumulate in the body they're different forms of it. But they have nuclear groups and they're they're immune cells.

So for some reason only two out of 72 instances where the pathologist looked at that those LCH cases and called nuclear grooves they they got it right. So only 3%. It was abysmal and that accounted for nuclear grooves being the worst feature identified in the entire group set out of 23. So that really, you know, dragged down our results. But what we when we looked at the what they chose instead was two thirds of the ones of the 70, you know, the of the 72, 70 let's say two thirds of those or more chose so-called Reed Sternberg cell. [00:44:00]

So Reed Sternberg cell as you know is the pathogonic cell malignant, cell of hodskin disease. classic owl's eyes, right? You know, it's the two big nuclei, two bignuclei, two big nucleoli, and they're kind of next to each other and it's like it's like the owls hooting at you. So, they called most of them called it that. And then if you look at kind of the background of that lesion, it has a benign infiltrate.

These are kind of funky looking cells. A lot of Reed Sternberg cells, as you know, are what's called non-cononical. They don't you can find owl's eyes, but not everyone, you know, has those owl's eyes.

Aleks: So, they basically…

Keith: We we can infer that. Yeah, they we inferred that they kind of looked at the whole background. They saw these big cells that were you know kind of bordering on multi-ucleated and they said ah you know Hodkins you know it must be hot it's it's it's Hodkins must be Reed Sternberg checknick's case. So that was one reason I I think that's plausible based on our data. [00:45:00] Yeah. So then the second reason they missed it was, I would just say you know a typical example right we all take histology we learn of the these beautiful prototypical examples of a a feature the one that we highlight in the paper are called asteroid bodies. These are features of granulomitis and other types of inflammation. And they basically look like big red starfish or octopuses.they have long stringy arms and they have a nice red center. They don't really look like asteroids. But…

Aleks: Whoever is going to be watching on this on YouTube, it's going to be on the screen. Don't worry.

Keith: Excellent, excellent!

Aleks: So if you're just listening and you want to see them, we're gonna put them on the screen. No worries.

Keith: I' I'd love to hear that. So what we found so these also you know the these are features but they don't occur alone right they typically occur inside of a giant cell with kind of a foamy background and there's some great EM studies that have been done [00:46:00] like in the 60s that these are made out of some sort of filaments that nobody still understands.so what we found is that the ones that were really classic canonical looking ones 100% of the time they got those. There were others that looked like they were kind of on fast cuts or maybe deeper cuts and had some of the background but the features of the asteroid body were really minimal.but they still got those like 80% of the time. So in that case it tells me that like even though it was a really bad example like they got the context and they say oh it must be an asteroid body because that kind of looks like a a poor asteroid body we'll say. So that was the other one. And then the third reason they didn't get 100%. Was that once again there's let's see there we had six readers 5,000 almost 5,000 instances where readers looking at an ROI you know looking at a checklist check or not and what we found is that there were many many instances where when [00:47:00] they missed the missed it it wasn't that the feature wasn't there is that that wasn't the feature chosen by the curator because the the study design only allows the curator to select one feature. So the and the most prominent example was osteoccytes versus osteoclasts. Okay, so think of those, right? An osteoccy is is embedded in bone and it's got arms but you can't really see it. It's kind of squished by the bone. And the osteoclass are, you know, bone resorbing cells. They're multi-nucleated. They don't look anything like each other.

Aleks: No, but they often like in different places, but you're going to find them in the same slide.

Keith: They're in different places, but we don't, we don't you know, we don't have an arrow in these ROI saying here's your feature. What is it? It's not name the feature. It's find the feature and then name it. Yes.

Aleks: Okay.

Keith: So, what we found a lot of the times was that it was a it was a bone section. They picked [00:48:00] osteocytes, but it was actually was one where there was an osteoclast in it and they said, "Oh, it's an o" and they didn't they were allowed to select both.

But I guess my argument here is like, "All right, I'm looking at a thousand thousand ROIs and I'm I don't want to select I so I I attribute that kind of fat to fatigue. Maybe it's like this is a repetitive feature hunting task and like oh, I have to find three of them or two of them to get it right." So,I think I I use that as kind of the third bucket of explanation why people messed it. So, in no cases was the feature not evident. Uh or was this there was a few cases where they said, "Well, this isn't in focus. I I can't see it." But the that study design did not allow us to rescan the slides and in which is in contrast to the accuracy study which did allow pathologists to rescan slides if they felt it was a bad scan.

So anyways, like we have good explanations I think for why you know it wasn't 100% [00:49:00] and had nothing to do with DP.

Aleks: Yeah, that, that's kind of a common thread that it's how people work and it's because it's people and it's not a pathology is an interpretative science. So like you say it had nothing to do with the method. It has had everything to do with the art or science of pathology and the people who are practicing it.there was another thing that that you described in the study that the longer more detailed sign-out diagnosis were actually associated with higher disagreement rates for both digital and microscope read. And I don't know, I didn't think this was so intuitive because I would think, okay, if it's [00:50:00] so detailed, like described in such detail, there should be better agreement. But, what do you think that reveals about how we would maybe approach diagnostic documentation and pathology reports or like what does it say about what you guys did?

Keith: Well, there's certainly a need to make diagnosis more patient friendly, right? There's a lot of efforts undergoing…

Aleks: That’s a topic for a different discussion I guess…

Keith: …about that. It's a topic for a different discussion. In this case, I I will say that remember we our endpoint was DR minus MR. So the difference in accuracy well in the in several of the other digital pathology whole sight imaging system studies a separate primary endpoint was the digital modality accuracy by itself and I think for reasons that are historical we did not declare that as a an endpoint in our study. [00:51:00] Our study kind of took off during the pandemic and the it was very difficult to get feedback during that time. We had to take some risk going forward. So that endpoint was not declared as an endpoint prospectively but of course it was measured and then then when we actually you know put our application in the FDA said well they the FDA they said well you know what about this endpoint all right so it turns out that our modality accuracies were each around 8%. So in the historical acceptance criterion again with theyou know inferior the competence intervals lower bounds etc. The cutoff is 7%. So I I will say if we had declared that or were held to that we you know we would have failed and I don't know what would have happened but we did have to do a pretty in-depth multimonth root cause analysis to understand like why were we landing at 8% with both modalities. As I said when we subtracted them they were less than 1%. [00:52:00] But viewed separately you know they at least the digital one would have failed. So we did a root cause analysis and it turns out, it was we can point we can blame the pandemic. What happened was when we were in the contracting phase with each of these labs like they were all scrambling to to get their work done right that the digital wasn't really we remember that time right digital was only used kind of spottily there was some emergency EUA’s and enforcement discretions and things so all the path all the sites we were working with we worked with three lab, three excuse me three clinical labs and one academic lab they were all game. They all wanted to participate but they said you know we have to enroll the cases more quickly. So what they ended up doing was not shortening or focusing or you know I'll just say curating [00:53:00] the diagnosis to kind of a you know what the primary diagnosis was. So we all know pathologists are notorious for hedge language for you know verbosity, for you know use of different types of terms.so so we know that right? So, what happened was for lots of the cases is that they like the whole report was just pasted into our database and then then the adjudicators had to sort it out.so soall right so then what let's

give you an example okay so let's just say the reader sees a case and they say oh this is diagnosis A, and then but the reference is A and B or maybe it's A but I can't rule out C. So do you call that an agreement because they both got A or do you call it a disagreement because they missed B or C?

And you [00:54:00] the adjudicators were per per the designs of these studies they were given instructions on how to adjudicate but they were not given they weren't overridden right their opinion was their opinion they looked at this they looked at that and they said is this an agree or disagree and we trusted them to do that. Now…

Aleks: Okay

Keith: Another unique feature of our study is that we had we had 20 adjudicators and they worked in teams and so when we did our analysis we ended up seeing kind of a similar phenomenon across all of the adjudicators and I'll explain that in a moment. So what this teaches us I think is that I mean any kind of absolute modality accuracyin a study as an endpoint is is is not a good endpoint right it's very risky because there are all kinds of other factors that can enter into the study design execution data analysis that could get you you know [00:55:00] let's just say let's just say our final number was 20%. Everybody say, "Oh, well, oh my goodness." But then what if manual was also 20 20.1%. So the difference is, you know, 0.1%. So I I think it's really it's dangerous actually to have modality accuracy by itself, without some context as an endpoint in these in these types of studies. For sure.

Aleks: Yeah. I see it as like you are assessing something that's already an interpretation by somebody who didn't interpret it. So, so you like you say you add more variables to the system that you're evaluating like because you already evaluated the features and the like assumption or belief is then okay whoever saw the features can take the features and make whatever the diagnosis should be out of those features and this is where interpretations come [00:56:00] comes into play that results in a report that may be convoluted because we're pathologists. So,I I can totally relate to why this adds another layer of complexity.

Keith: A little bit to add to that. Remember the feature list was pre-specified. Now, some of those features might be in the actual cases.some might not be, some may never be, right? the the cases in the accuracy study are really designed to be kind of consecutive a broad representation of what a pathologist in general practice would see with respect to primary diagnosis. And then the last point I I left out is that t to prove statistically that theit was the length of the diagnosis and not difficulty or any of the other you know 50 variables you could think of. We actually did a pretty strict statistic post host hawk statistical analysis. We divided all the input diagnoses into long and short. And it turns out the ones [00:57:00] with the longest diagnosis, the half, you know, the half, their disagreement rate was twice what the short diagnosis disagreement rate was. And that was controlling for designation of case difficulty, which is something that was a type of case that had to be designated input.

So we we felt like we had really strong data toto justify blaming the pandemic. I'll say it that way.and and and thatthat you know that modality disagreement rate really has again nothing to do with the system.

Aleks: So your data showed that 20x scanning was good enough for almost all diagnosis. So did pathologists actually need 40x and if they did how did you use this in the study?

Keith: Now that's a really good question. I will say in the precision study we had a set of features that were kind of high power features. So those were 40x [00:58:00] and then you know shown at 40x and then we had another set that were you know lower power features and those were shown at 20. So it was tested but we didn't test the same feature at 20 and 40 right? That would be a different study design. So for example could you see you know salt and pepper chromatin better at 40 than 20? The answer would probably be yes, but we didn't test that.there's kind of high power features and low powered features.and then for primary diagnosis, you know, 40x scans on, you know, almost all the scanners are beautiful, right? The question is whether they're necessary. And if you you think back to your pathology training, right, a lot of primary diagnosis is 1x or 2x, right? It has a certain appearance and then you really just need to go down to Yeah. 20 or 40 or even sometimes on oil to verify something that is a lower power impression. So, and then to consider that the 40x like a lot of the scanners in this class can do 20 and 40x but you know by [00:59:00] definition without compression the 40x images are four times bigger and they probably take four times longer to collect roughly. So the question is do you need that for primary diagnosis? And and then the final piece of it is that our predicate device tested 20x. So we were kind of on the hook to, you know, to test 20x. If we had tested 40x, it would have been a different study. So our scanner, like the predicate device, will do 20x and 40x scans. So anyways, lots of background to that, but the bottom line is we scanned everything at 20 and then gave the pathologist the option to scan at 40 if they felt like, well, this isn't good enough or I need to see a higher power image. And so it turns out, let's see, was there's only like 3,200 slides in the study, only six slides out of 3,200 where the pathologist said, I can't see this at 20x. I need to see it higher higher magnification. So that's far less than 1%. And I think [01:00:00] again that's kind of what the predicate device study showed.and it also says that you know 20x is fine for the vast vast majority of diagnosis that that happen in general surgical pathology.

Aleks: So was there anything specific? Was there the pattern? I know it was only a couple of slides but like what did they ask for? Why did they ask for 40x?

Keith: Yeah. And you know, we didn'twell, I'll just say that we have that, but the the number the number was so small that you couldn't really like if I told you two of them said that I couldn't see the chromatin like that could say that, but that's like two that's like far less than 0.1%. So, you can't really make any conclusions about that from a statistical standpoint. But on the other hand like people have looked at this in the digital pathology literature.and then also some of it is intuitive right we know that there are some features [01:01:00] that you need to kind of focus up and down to get really confident about. So and and that made sense with the cases that we saw that were challengingin both modalities.so like fine chromatin features dysplasia versus atypical nuclei things like that. Looking for H. pylori.but interestingly once again 20x all the rule out H pylori cases the difference between the error rate or or the yeah the major discrepancy rate again DR minus MR was almost zero for those cases. So even at 20x there was no signal there that even you know looking for H pylori gastric biopsies was you know more of a a problem with 20x versus 40x. So yeah so it's now I will say and and I don't know if we're going to talk about this but you know a lot of the algorithms now are trained at 40x [01:02:00] or even you know different magnifications. So 20x is adequate for primary diagnosis but the question is will it be adequate for all the algorithms that get made, if they're trained on 40x you know that's an open question.

Aleks: Yeah. Yeah. Then you probably would have to train at a certain magnification. I'm also thinking okay then how can you leverage the fact for efficiency for like reducing the size of images the fact that you just mentioned that most of the diagnosis are a low power diagnosis or are low power diagnosis with high power confirmation. Like you look at low power and you see the focus and you just go to the focus of whatever you're interested in like you don't need the whole image at 40x. How can we, and you know that's beyond the scope of our discussion but basically to me it's like okay it's natural that pathologists [01:03:00] don't look at every single pixel of the image.

Keith: Yeah, they can’t.

Aleks: They only need a confirmation of a suspicion that takes a lot less than scanning this whole thing at 40x. So,that's you know food for thought probably for the technical experts in the digital pathology team. How can you leverage this diagnostic or pattern recognition behavior in designing algorithms in designing infrastructure however right, I don't have the answer to this but to me it's like waste of resources to do 40x.

Keith: Yeah some context maybe is that like when you collect the data in a form called a data pyramid right it's, it's analogous to being able to zoom in and out and you're right like it would be very challenging to look at every pixel of a 40x image of you know a large resection like that's not how pathology is done. On the other hand the actual [01:04:00] areas where you need high versus lower magnification is very case specific and it's dependent on you know pathologist preference. So I I think unless you get into something like virtual microscopy where you're dynamically judging how much information you have to take in different areas, the the way these scanners work right now is that they make a you know one digital image and it can be it can be zoomed up, up and down and then of course depending on the lab use case right you need a certain bandwidth to manage that data. There's a so-called hot warm cold storage that might be useful for initial diagnosis versus archiving versus secondary consultations. So I think all this tells us is that…

Aleks: And you know, we don’t need to have the answer but…. are

Keith: …for most… yeah I'm just saying that's those the variables that people, people are thinking about now.

Aleks: So looking at the FDA all of the FDA cleared whole slide imaging systems they're showing very similar performance in the validation studies [01:05:00] and I kind of alluded to this but in your paper you mentioned that these studies, these designs were originally negotiated between FDA and the Digital Pathology Association (DPA) and various manufacturers. So given that six and now seven different whole slide imaging system with distinct types of scanners have have now been cleared with comparable results. Do you think it might be time to revise those agreements? And you know, we already started talking about it, but maybe focus more on image quality and usability testing rather than repeating those like highly resource intensive clinical studies and that give us similar outcomes. Thoughts on that?

Keith: Yeah. You're you're you're singing my you're singing my song for sure. I I think, you know I was with one of the manufacturers when we did the performance study for the second [01:06:00] system that was cleared and we were really excited to see that and this was like 2018, 2019 we were really excited to see that the numbers came out almost identical to the first system. and they were like right on top of them. And so and I've you know I've given several talks on this and now you know contributed to the literature in various ways and and in some of the informal talks I kind of joke I like to joke about things sometimes that you know now with five of these six of these and the seventh one it's almost like we're rediscovering a constant of nature you know we're rediscovering Pi we're rediscovering the gravitational constant like we're seeing the same thing over and over again with very different systems from a design standpoint. So I think that does raise the question of whether you need you know studies of this magnitude after you have so much experience now almost a decade with these devices.so this is a point in our paper and you know for this purpose we review actually we compare our results with everybody else's. [01:07:00] And so these these studies are a lot of work. They're quite burdensome. you know, millions of dollars, years, I don't even want to know how many hours they are of people's time and and we saw the effect of basically taking shortcuts during the pandemic when people's time was was was critical. So, it does really I think force the field to raise the question regulators as well. Is there a less burdensome pro approach and I think the things you mentioned the the system usability the critical task the risk analysis around that does that performance is that adequate for primary diagnosis we can test that today and then image quality right they're already actually already make metrics withinthe FDA aboutabout what constitutes image equivalence for pixel identical images and that could certainly be expanded to include pixel nonidentical images or images of a particular type. There are metrics to do that. So I think we're entering an era [01:08:00] that's that maybe relieves some of the burden around the creation and definition and updating of these systems in the future.

Aleks: I hope so. I also like I kind of secretly hope that after the pandemic where we had the discretion to use the systems that this data is going to support the use of whichever technicallygood system for generating these images. And I still secretly hope that all this data from all the systems that were allowed to be used then is going to be gathered together with what the manufacturers did for clearances and that you have you can have like a giant data pool saying, “hey it's good enough like ask the pathologist if it's good enough for them and let them work with it.” Obviously, you know within a framework within parameters [01:09:00] and things like that but I think like like you say like do we need to keep rediscovering nature it's a microscope in a box in a different form right.

Keith: Well, yeah how many significant digits right how many significant digits do we need Pi to I guess at this point you know this is why again almost 10 years has passed since the first clearance soIt's always a good time to reassess contemporary.

Aleks: We’ve been touching and mentioning the pixel pathway. So, um, basically when digital pathology first emerged, the pixel pathway, the path that pixels take from scanner to viewer to display was typically maintained within systems from a single manufacturer. Right? And now we're seeing systems cleared with components from different vendors. And you guys have your own system, but I know that on the back end [01:10:00] of those systems, you can attach algorithms. Can you tell me how rush is approaching interoperability and third-party algorithms? What is needed in such case to maintain confidence in validated solutions? Because you know if you have something from one vendor, they took care of it. But what if you want different components from different vendors, can you still trust it? How do you guys think about it? What do you personally also think about it?

Keith: Yeah, the short answer is it depends on right the combination and…

Aleks: It depends. on the use.

Keith: …And why do I say that? So the initial clearances were one scanner, one software, one monitor. Boom. And then what now we have with seven clearances at least with novel scanners is really exciting, right? We've got some with multiple file types or at least additional file types, different like browsers and viewers. [01:11:00] And we've like different monitors. There's one of the manufacturers that has something called a PCCP predetermined change control plan. So this is sort of a regulatory tool that's Oh wow. emerging that can be useful to update your system you know in a in a predetermined negotiated fashion with the FDA in this case and if you're not if you're basically making a major potentially a major change that doesn't really change the intended use and you do the appropriate risk analysis and you do some testing some bench testing that is agreed upon and you pass that then you don't need to go through you know a whole new 510k or even a supplement. You just do the testing, you update the system and it's a note to file. So, this is a potentially less burdensome way to kind of keep up to date some of these systems that are, you know, some of the components are going to be obsolete in a year because the other manufacturers just don't [01:12:00] make it anymore.

And you don't want to have to completely do, you know, another 2,000 case clinical study just to keep a 510k in place. I mean, you don't want to. If you have to, you have to, but that's really not a good use of time assuming you've done…

Aleks: But if you don't have to, then why?

Keith: You know, design work and analytical testing. Yeah. But you did mention image analysis. So, I will say that you know that a lot of what pathologists do, you know, for primary diagnosis is, you know, grade and estimate quantities. And there's a huge amount of excitement and activity making algorithms to do some of this. And I just want to emphasize that and none of these seven studies were primary diagnosis or algorithms used really for for any purpose. They are all just about, primary diagnosis and diagnostic comparisons and feature identification. So and then if you think about like pathologists like you know, you you're a pathologist. [01:13:00]

I'm a pathologist. Some pathologists are blindingly fast with microscopes and so if you give them, you know, a clunky user interface and a bunch of screens and a bunch of pull down menus, they're not happy. So there and this is the way I think that you know all the systems are getting better but there there's a big ergonomic challenge among all of us vendors to really make our interfaces as blendingly fast as the microscope can be. So there's lots of ways to do that. Some of that software, some of that's you know roller balls and extra things like that.that that's going to happen. So the point I'm trying to get at now, you know pathologists can use these systems for primary diagnosis you know confidently but there really isn't a lot of a great number of choices for algorithms.most of the algorithms vast majority that are used in the United States are you know research use only [01:14:00] which means quality is variable and the lab's pretty much on their own to validate. So that in engenders kind of a huge risk to the labs as to which algorithm to pick and how do they validate it. We you know we as a manufacturer if we have a research use only algorithm we can't tell them how to do that. It's kind of on them. So it does put a burden on us however to make a good product. You know no we don't we hope that no manufacturers want to put out you know a quick product that's notfundamentally good. But so then the question is how is Roche is looking at this? And you may have heard I work with the commercial team, but I'm not not the commercial guy. I am excited about this though. We do have a suite of algorithms right now that again all research use only not part of our primary diagnosis clearancein the United States that pretty much read our IHC or ISH tests. So Her2, EDL1, ERPR67. So these are tools that are designed [01:15:00] for things that you know happen in incase level sign out. And so, you know, we're interested in taking those through the FDA clearance process as a lot of manufacturers. Unfortunately you know right as I said right now there's only two like predicate device algorithms that are that are cleared in the United States on FFP sections for use in these systems and that's of course page prostate and then IBEX Galen second read it's called so this is exciting but it's a tiny tiny number of predicate devices so the way Roche is looking at this is that is that we've created something called the open environment which is a what I would call a full integration ation of a third party algorithms capability into the Navifi DP environment. So from a case level standpoint and again these are all for research use only, one doesn't have to flip between screens or browsers or interfaces the integration is [01:16:00] complete. So when you're dealing with a set of slidesall of the functionality is available to you within you know that environment. So we have now I think 10 plus vendors signed up. I think over 20 algorithms. The first two this is an experiment. It's you know it's a work in progress. I've had good feedback so far. The first two that were fully integrated were IBEX and Path AI. So those are two reputable companies and we we we hope to be you know the the marketplace of choice in the future for this reason. But again, just want to make it super clear, none of these algorithms are part of primary diagnosis. They're research use only. They don't fall under our clearance. And so we just need to be very very clear about our messaging to customers about you know what is FDA cleared and what's not.

The second thing I guess point you mentionedI I [01:17:00] consider DP at kind of this kind in his time in history DP and algorithms as a bolt-on right we've got 40 years plus of using we're getting better at it right, automated IHC staining we haveseveral predictive companion diagnostics there where pathologists you know read the slides and the treatment decisions made got hundreds and hundreds of sort of called class one stains in the United States that are used as a basis of of diagnosis and prognosis and and everything happens with that being read on a microscope, you know, just fine in the lab now. And now we have all this excitement around big data digitization.and so we're scanning these slides and we're, you know, creating a ton of data.

So it's it's a bolt-on, right? I wouldn't say it's an after thought, but it is kind of an afterthought. We're trying to add it. So but then of course the question is for any particular AI or really anytime you make the system more complex [01:18:00] by adding something you know are you magnifying error or are you compensating for it and I would say we don't know because it's a bolt-on there's only one way to find out. You got to do the experiment. Are you are you getting a number that you know compensates for example for a stained type panalytical variation or are you getting a number that's actually making it worse right are you worse than the problem so this is a whole area of investigation that needs to happen I would say the ro solution there is linking the design of the assay with the you know with the scanning and the algorithm by design. Because what that does then is in my mind it like d-risks the lab's decision on which assay to choose, which scanning system, which algorithm, right? You may pick, you know, this scanning system or rather this stain, you know, run on this stainer, you add on this scanner, [01:19:00] you file it in this format, you run this algorithm, it might work like might work fine, might not, you don't know. And and I think there there going to be some labs that maybe they like that level of uncertainty because they want to tinker and see what's going to happen, but there's going to be a lot who just don't want to repeat others mistakes. I think so. I think one of the Roche’s strengths is integration by design.

Aleks: Yeah. And I thin knot only because you know there is even more variability because you or not only variability but variables that you add to the system that you have to test. You're testing something fundamentally different than what we were just talking about. You're not testing the pathologist ability to recognize something and diagnose disease. you're testing and and to me this is like an in like a flaw [01:20:00] of comparing quantitative scoring to a visual semi-quantitative assessment because how are you how are you checking if those algorithms are good enough? If you're checking against your ground truth and unfortunately your ground truth generator is a human that is worse at the task of quantification than the computer that you're comparing them to. So that's like you know to me that's another whole different way of evaluating this. And me personally I think it's not being done correctly. It's not being done according to our physiological and visual strengths but we don't have a better way to do it and the way we do it right now visually is the standard of care. So then on the other hand you're comparing to standard of care. So that's like a whole other mix up [01:21:00] but I totally get the the message that you're trying to convey doing it by design.

And the next question is okay what is the design to then test it and check it and make sure that it's not a lot of uncertainty coming out on the other end.

Keith: Well I I do think you bring up a a really important point about the pathologist being the ground truth today. And I think if you set up a study for registration or scientific credibility, if you if you don't consult a pathologist or use their opinion, you know, it's viewed as like not credible and you're just, you know, you're doing random computer vision. I don't think that's completely true. But I think on the other hand there will be a point maybe in the future when the algorithms get so good compared to the pathologists that it'll no longer be acceptable [01:22:00] to consider the pathologist the ground truth. The image computer vision on you know that's properly tested and you know will be the ground truth. I think that's coming. I think we're still aways, away from that.but of course, you know, maybe it's like you know, like like Wigan days, do you know this NPR show where like everybody's above average kind of thing. Every pathologist thinks they're above average at. Yeah.

Aleks: No,

Keith: It's a it's an American radio show. I don't think they do it anymore, but the joke was. Yeah. All all the all the children are above average.

Aleks: Yeah. I guess it's, you know, everybody thinks that they're better than the average. I mean, I like to think about myself that I'm better than the average.

Keith: Well, of course, of course you are. I mean, so am I. I mean, right.

Aleks: Of course. Oh, thank you. Thank you. But, you know, it depends in in what, but [01:23:00] I agree with what you say. Or there's going to be another like orthogonal modality to check against. And what do I mean by that? So in the first chameleon challenge…

Keith: Yeah.

Aleks: That also counts for annotations for everything that goes into this algorithm development.for the chameleon challenge where there were way they were looking with image analysis for epithelial cells in lymph nodes, basically cancer metastasis in lymph nodes. they just did IHC with cytokerotin and that was their ground truth and not a pathologist annotating that could like miss a cell in the corner of the lymph node or whatever right. So I thought that was a good way of solving it.obviously resource money intensive but it is an objective method you can compare to and you know maybe we need these type of objective methods [01:24:00] and some of them are not developed yet right?

Keith: Well there's this whole area now of virtual staining that maybe with AI that may be useful for defining the ground truth that's emerging.

Aleks: I hope stuff is going to emerge. So based on everything you've learned from this validation process. What advice would you give to pathology labs that are thinking about going digital? And as someone with your finger on the pulse and you know with your experience also, you've been doing this for for over a decade.what do you see coming down the pike in the next few years? We can include the virtual staining in this one.

Keith: Yeah. No, I'm watching this carefully. I would say my pulse is quickening. It's a really exciting time in history to be, you know, part of this field. I'll start with the second question. I would say there's still a growing gap between the [01:25:00] conceptual discoveries that are occurring with a variety of technologies including digital pathology, computer vision and and what I see being quickly adapted to actually improving human health. I'm really focused on that personally for I think about the next decade of my life. Like what does that transition look like? How do you take something that comes out of a nature paper or a you know something that and actually benefit patients? I think we still it's that's hard that's hard and it's not just science it's business it's it's aligned incentives of multistakeholders etc.so as you know we have a large amount of information that's a large amount of you know knowledge in in deep learning and AI and different ways to do AI just on H&E stain sections that's going to have tremendous impact down the road maybe not immediately as replacing the pathologist for example but maybe being a screen so understanding what screening performance metrics [01:26:00] need to be for clinical use there's a huge amount of activity you've seen just in the last year or two on large language models ChatGPT, Agentic AI and things like that and that this is why I think actually the the final output of primary diagnosis which is all those words those are all you know training fodder for these models so that's really exciting a kind of a parallel area and as you know we've followed this area together in the past is this emerging field called single cell and spatial multiomic biology so this isn't really at the at the initial stages it's not AI. But it's just generating massive amounts of data on a single cell level, either you know in soluble situations or in tissueof of it it's basically a multimarker analysis so instead of two or three markers you know on a cell can get a thousand or a whole transcript to I mean it's incredible these are these techniques have really just emerged [01:27:00] in the last couple of years at this scale so the amount of data the amount of discovery that will I hope feed lead into diagnostics is going to be massive and probably the you know the leading indication would be some sort of use in immuno-oncology understanding why some patients respond to checkpoint inhibitors and some don't. It really requires that you get a handle on all the different cell types in the biopsy, not just the tumor cells. Which ones are helping the immune system.you know, eliminate the tumor, which ones are preventing it, what is their mechanism?

Like spatial biology tells you that. It's absolutely amazing. But question is priority. How do you convert that into actually actionable clinical tests? Go ahead. You're going to say something.

Aleks: Yes. that immunoncology is where you cannot really go too much further without image analysis. So that is going to be the discipline where all these imaging single cell markers multimarkers, [01:28:00] methods are going to be developing in parallel with the software or like the the image analysis analytics for that because that's beyond a visual capacity of a human. You can confirm something but there is no way you can analyze it in a meaningful way.

Keith: Absolutely. Yeah.it's not it can't be done with the human eye. You can't do it with a place map looking with with relative abundances. You need u computational computer vision, right? Computer vision in essence. And so that yeah and yeah no that that's a really exciting area. All right. So then you the first part let's see oh advice let's see I think I have four yes kind of general pieces of advice for labs considering digital pathology. I think as as I said there the there's been a slow level of adoption. I think it's still slow but it's it's you know it's increasing in the past few years [01:29:00] which is exciting.first of all and again there's lots of organizations that help out with this. First I I would say you know try to succeed on a small scale before you you know may be inadvertently fail on a grand scale right you have to build a lot of stakeholders to consensus to get the investment on the business cases around digital pathology adoption and and I think most of the labs I've seen that do this they they focus on a few things that are important that help them learn lessons and collaborate with others in the field other vendors and you know others just other people.the second one I think is you need a champion, a local champion and ideally that person is a pathologist. Somebody you know I'll just say coming out of training at the early stage in their career. I haven't met many trainees who are finishing up their fellowships and you know [01:30:00] are chomp chomping at the bit to you know only use a microscope the rest of their life. In fact none of them are right. They all want to use digital in the future. So I would say you get a a fellowship training. you know new partner in your firm or a new pathologist and and you make part of their job being the champion to work with the different teams and and help enable you know some sort of installation that actually will work. Let's see the third piece is networking. I think as I said I've been doing digital pathology one way or another since I entered pharma you know, 15, 16 years and I've just been incredibly impressed by how generous people in this field are with their time how much they want to help people learn. I saw that when I started you know with, I was using you know early on I was using the Vizia, you know early versions of that and that you know they were incredibly helpful to answer the pharma questions we had. Now there's a lot more vendors. [01:31:00] There's a lot more busy firm still here, but there's a lot more other vendors. There's there's a huge. there's, you know, there are evangelists, I'll call them, in the digital pathology space, the Dr.Parwani’s of the world. We we need them because they give us hope that almost anything is possible, but there's a huge, you know, network of folks that have been trained behind them and that are just as you know, equally knowledgeable in their area. And I found for example the API to be very useful for that as an organization.

DPA is another one that founded really to nurture this field from the side of the vendor from the side of the research user from the side of pharma, now from the side of the AI companies.so anyways don't be a wallflower right reach out. You don't get what you don't ask for. Ask the question and ask a lot of people. And I guess the last thing I see is the major barrier is is some sort of lobbying we need to do, right? [01:32:00] It was the American reinvestment and recovery act of 2009 that at least in the US mandated the use of electronic medical records to get reimbursed and so that you know and then a few years later everybody was using for for better for worse electronic medical records.We now have what 43 Category-3 CPT codes in the United States. You get reimbursed a total of zero dollars for those today. So they're experimental. I know it's not quite an incentive yet, but it's a first step and we still don't have like what the professional component of using an algorithm is like what's that worth? I think we're assuming it's worth nothing because it's just the pathologist's job anyways. But that may not be the right way to go. So those are the four pieces of advice. Succeed small. Need a local champion, ideally a pathologist network like crazy and I think we need to lobby as a group and I think there are groups that are doing it including CAP and others in the [01:33:00] United States. Maybe just in the last just a reflection if that's all right.

Aleks: Yeah, go ahead.

Keith: Yeah. Yeah. So I mean, I think it's it's almost been 50 years since I took my first biology course in high school. So I think about like learning about DNA and then you know eventually becoming a biologist, a physician, a pathologist. I I'm still just in awe at how much has changed between you know then and now last century to this century. And I think for example of my grandmother she was born, no planes didn't exist and by the time you know she was an older person you know we're flying everywhere. So that sort of radical transformation in pathology and in how science becomes new pathology, like I've seen that and I think you've seen a fraction of that as well and now of course we have sequencing we have big data we have AI we have all kinds of ways to analyze the data, I'm really excited to see what happens in the next [01:34:00] you know 10 to 15 years. But I do believe the importance of digital pathology is that I I think all of this information if is interpreted in the context of tissue. It all converges on the digital pathology work environment. So that's one reason I'm really excited to see what happens in the space because once again you can publish a paper in a a very reputable journal and show a you know tney plot or pseudotype analyses and all of these really beautiful data visualization effects that are are you know tools that are that are not pathology, right?

Everything happens that is happening in the lesion on the slide that you know we visualize. So how do we convert that knowledge into thinking about what's actually happening in the disease and that's digital pathology to meand that's super exciting.

Aleks: Thank you so much. So your advice for new labs I can take it as life advice as well. [01:35:00] And like you say, I mean, even me, even though I have not been practicing for 50 years yet, I have seen so many things happen that I'm no longer like thinking, “Oh, it's not possible. I'm like, let's wait half a year and see what smart people in this field come up with.” And now I'm obviously fascinated by the large language models and how can this help us and multimodality not only multimodality the one that we talked about with multiple markers but multiple data points and and when I was starting it was like oh it's not going to happen in my lifetime and now I'm like let me wait half a year and see. So thank you so much for the insights.

Keith: No my pleasure. Thank you. No, it's a pleasure to talk to you. I'm so glad you're you're doing this for the field and yeah, it's really great to see you. Thank you.

Aleks: I hope we see each other [01:36:00] at the next conference. H and I wish you a fantastic rest of your day.