175: Deploying Digital Pathology Tools - Challenges and Insights with Dr. Andrew Janowczyk Artwork

Digital Pathology Podcast

Aleksandra Zuraw from Digital Pathology Place discusses digital pathology from the basic concepts to the newest developments, including image analysis and artificial intelligence. She reviews scientific literature and together with her guests discusses the current industry and research digital pathology trends.

All Episodes

Digital Pathology Podcast

175: Deploying Digital Pathology Tools - Challenges and Insights with Dr. Andrew Janowczyk

December 02, 2025 • Aleksandra Zuraw, DVM, PhD • Episode 175

0:00 | 1:12:43

Send a text

Why does it take three years to deploy a digital pathology tool that only took three weeks to build? That’s the reality no one talks about—but every lab feels every time they deploy a new tool...

In this episode, I sit down with Andrew Janowczyk, Assistant Professor at Emory University and one of the leading voices in computational pathology, to unpack the practical, messy, real-world truth behind deploying, validating, and accrediting digital pathology tools in the clinic.

We walk through Andrew’s experience building and implementing an H. pylori detection algorithm at Geneva University Hospital—a project that exposed every hidden challenge in the transition from research to a clinical-grade tool.

From algorithmic hardening, multidisciplinary roles, usability studies, and ISO 15189 accreditation, to the constant tug-of-war between research ambition and clinical reality… this conversation is a roadmap for anyone building digital tools that actually need to work in practice.

Episode Highlights

[00:00–04:20] Why multidisciplinary collaboration is the non-negotiable cornerstone of clinical digital pathology deployment
[04:20–08:30] Real-world insight: The H. pylori detection tool and how it surfaces “top 20” likely regions for pathologist review
[08:30–12:50] The painful truth: Algorithms take weeks to build—but years to deploy, validate, and accredit
[12:50–17:40] Why curated research datasets fail in the real world (and how to fix it with unbiased data collection)
[17:40–23:00] Algorithmic hardening: turning fragile research code into production-ready clinical software
[23:00–28:10] Why every hospital is a snowflake: no standard workflows, no copy-paste deployments
[28:10–33:00] The 12 validation and accreditation roles every lab needs to define (EP, DE, QE, IT, etc.)
[33:00–38:15] Validation vs. accreditation—what they are, how they differ, and when each matters
[38:15–43:40] Version locking, drift prevention, and why monitoring is as important as deployment
[43:40–48:55] Deskilling concerns: how AI changes perception and what pathologists need before adoption
[48:55–55:00] Usability testing: why naive users reveal the truth about your UI
[55:00–61:00] Scaling to dozens of algorithms: bottlenecks, documentation, and the future of clinical digital pathology and AI workflows

Resources From This Episode

Key Takeaways

Algorithm creation is the easy part—deployment is the mountain.
Clinical algorithms require multidisciplinary ownership across 12 institutional roles.
Real-world data is messy—and that’s exactly why algorithms must be trained on it.
No two hospitals are alike; every deployment requires local adaptation.
Usability matters as much as accuracy—naive users expose real workflow constraints.
Patholog

Support the show

Get the "Digital Pathology 101" FREE E-book and join us!

Introduction

Andrew: [00:000:00] I would argue even to this day there's not a single person on the team that understands 100% of all of the components at the depth needed to reproduce it. So it really has to be a synergistic collaboration where everyone contributes their different parts and make sure the other person doesn't fall into any common potholes or or these sort of things. So I think going into the future more and more science is going to be this way because we have more knowledge than ever before. You can imagine like being a doctor back in the days of Socrates, you could probably fit all the medical knowledge into a 50-page book. And that's not the case anymore. Things are becoming more complex. We're gaining more knowledge, which means there's more working parts. There's more interaction, but it's also more difficult for a single person to be able to integrate all of those things successfully. So multi-disiplinary is the way of the future.

Who is Andrew Janowzcyk?

Aleks: Welcome digital pathology trailblazers. Today I am with a guest who was already

a guest on the podcast. Andrew, how many times?

Andrew: Like at least two times.

Aleks: Two Times, and we had a webinar series. [00:01:00] Andrew Janowzcyk. Did I pronounce your last name well?

Andrew: Correct. Well, after all these years, you've gotten it down.

Aleks: I know cuz I was pronouncing it in Polish. It's a Polish last name, so it hurts me to pronounce it in English. But Andrew, welcome back to the show. I'm excited to present well to talk about a paper that you and your group put together a guide for the deployment validation and accreditation of clinical digital pathology tools which I studied extensively and felt it would be a good thing for my trailblazers to get like an insight into because obviously they can read the paper but before we dive in let's talk about you a quick brief reintroduction to the trailblazers of Andrew Janowzcyk

Andrew: Yeah. So, I'm Andrew Janowzcyk. I'm an assistant professor at Emory University and I also work with the Geneva University Hospitals or the HUG for short. My specialization is in computational digital pathology and building algorithms for diagnosis, prognosis, and therapy response. [00:02:00]

Aleks: So, what drew you to digital pathology?

Andrew: You know, I I so I studied actually my master's degree was in computer vision and you know, we were doing like tracking systems and these sort of things and digital pathology seemed like a very kind of natural transition. This was, you know, pretty old now. It was a while ago where the field was still especially new and, you know, it's grown, I think, in popularity. It's grown in potential impact. So with the advent of deep learning and all the other great technologies that are coming up now, I think the next 20 or 30 years, we're going to see huge huge improvements in this one.

Aleks: It's interesting because you said when it was young, you joined, and I see you on this paper and it's more like on the regulatory side of things and I'm like why is Andrew on this paper? So, What gap were you trying to fill with it? And you have two first authors and you're one of them, right?

Andrew: Absolutely. Yeah. So my other co-author, Johan Ferrari he's an expert in essentially like the validation of these algorithms. He knows all about the ISO regulations and this sort of things [00:03:00], and he was instrumental in actually helping us deploy and validate our algorithm. So I would say we kind of provided the technical expertise and he provided the regulatory expertise and together we were able to go and figure out a way to try and generalize this and share this information with the public.

Aleks: It's funny because I had a similar experience with my employer, where I was the project manager, and we were just talking like what role you had in putting it all together. I was the project manager and I had like all these really regulatory people and validation people who kind of live in a different universe, but then we all come together and do this work together and everybody has to kind of speak a little bit of the other person's language even though not really deep knowledge of each other’s subject. Was that similar in your case?

Andrew: Absolutely. I think one of the common themes that we'll we'll discuss today is just how multi-disiplinary the team is. I mean something that you and I discussed before this interview [00:04:00] ultimately, I would argue even to this day there's not a single person on the team that understands 100% of all of the components at the depth needed to reproduce it. So it really has to be a synergistic collaboration where everyone contributes their different parts and makes sure the other person doesn't fall into any common potholes or these sort of things. So it is really I think going into the future more and more science is going to be this way because we have more knowledge than ever before. You can imagine like being a doctor, doctor back in the days of Socrates you could probably fit all the medical knowledge into you know a 50, 50-page book and that's not the case anymore. Things are becoming more complex. We're gaining more knowledge, which means there's more working parts. There's more interaction but it's also more difficult for a single person to be able to integrate all of those things successfully. So multi-disiplinary is the way of the way of the future.

Aleks: I think it's beautiful what you said. Not a single person could be able to reproduce the work that went into it and I think it also is going to you know borders on the [00:05:00] philosophical discussion of how science is going to be but it's going to make us less lazy because if you're like the only person that can do it you just do it your way. You don't have to do so many back and forths. You get it done it goes. If somebody picks up on stuff that should be redone great. If not then fantastic as well and here it's not the case because you need to align in the group. So, let's start with the digital tool that you guys are creating and deploying and the deployment of this tool. So the tool is called H pylori debt do I call it correctly and the deployment so like the deployment took like three years right? So in those three years you created it and deployed it in the clinical environment. Let's explain the tool and What does the tool do, and why did deployment take so long? Why did it take so long?

Andrew: Yeah, that's a well that's the whole crux I think of the paper. So you know the the main takeaway I think from the paper if we want to just start [00:06:00] from a high level is it was painful for us. I'll just be honest it was painful. It took a lot of a lot of learning to do that and that's kind of what you would expect a university hospital to do, right? If you want to be a thought leader and an innovator, you're expected to really be at the precipice and try and figure out a lot of these challenges and and how to overcome them. And then I think our responsibility is

then to try and encapsulate them into lessons learned and then backfill it. So people that don't have the same level of resources or the same expertise or the same level of skill in their teams will kind of be able to achieve the same things in an easier more efficient way. Right? This is basically the idea of of engineering is to enable others to do something that they might not have been able to do. So I think the the idea here is that the actual building of that algorithm was fairly straightforward. Right? I think if you read the paper, I'm not here to convince anyone that it's you know a mind-altering tool. Essentially what we do is Geneva goes and uses immuno histochemical stain in order to identify each polari [00:07:00] which is a stomach bacteria and they look basically through a large section of tissue and see if there's that bacteria and if it's there that patient's positive it's not there that patient's negative. So we basically built a tool that screens the entire image identifies the top 20 most likely regions that contain H.Pylori. Even if it doesn't it always returns the top 20 and then produces those inside of a contact sheet. The user can look at that and say yes there is H.Pylori here. No, there's not H.Pylori here. but it says well integrated into the image management system. So Geneva uses Sectra. It's integrated there. So the pathologist sees actual boxes around each of the detected H. pylori components. They can click on any of those boxes on the annotation tab. It brings them immediately to that region of the image and it allows them to essentially try and complete this diagnosis in a more time efficient manner. Now building the actual algorithm, it's a fairly straightforward algorithm. I think you know the folks that have the expertise in computational pathology [00:08:00] won't be that surprised. It's essentially just a convolutional neural network. It's actually a fairly small one of let's say a few hundred thousand parameters. It's not these foundation models with billions of parameters, that sort of stuff. It's fairly small and dense, which makes sense because the the the algorithm itself, it's not that hard of a a of a of a use case and it goes and processes that before the pathologist sees the sees the slide. So, it's essentially as soon as the slide is scanned, the tool gets aware of a slide that it should process on. It processes that, it pushes the results into Spectra so that when the pathologist does sit down, voila, there it is. It appears to them as if it was, you know, there all along.

Aleks: There all the time.

Andrew: They're all along. So, what's interesting, I think that that technology was was pretty easy to put together. It's a supervised learning. You you had some annotations of pathologists. I circled it said this is H. Pylori, this is not H. pylori. We train it. You know, I think you could probably find a tutorial online to do something similar. [00:09:00] So, I would argue that you could do something like that in maybe two or three weeks. you know maybe a week if you have all of the data and it's perfectly curated that sort of stuff and then you have 20 months of deployment activities. So why is there this gap right and this is the whole purpose of that manuscript is to delve deeper into what we spent the time on the mistakes that we made the lessons that we learned and how we're going and approaching the same exact things into the future trying to reduce that time period down into the the shortest possible time.

Aleks: You know what? It makes me feel less bad because I'm supposed to submit a paper on our validation work and I'm like, "Oh my goodness, we did it like two years ago or two and a half. I don't remember exactly, but I was feeling so guilty that it takes me so long…

Andrew: Don’t feel guilty.

Aleks: But now I don't feel guilty anymore. I think it's going to be a good but our is a little bit more narrow. It focuses more on the pathology side of validation scripts. Here, you guys did not restrict yourself to any like specific area of this because you defined you defined [00:10:0] like 12 roles for this validation for these validation activities. So it was funny because I read the full paper. It's like 23 pages and but I had an AI assistant natural voice reader and you had abbreviations for these roles like EP would be the experimental pathologist but there are DE, what's DE?

Andrew: Development engineer.

Aleks: Yeah. So, DE, my AI would read it as Delaware and whatever would be like a state abbreviation, it would read it as a state abbreviation. So I was like, "Oh, Delaware did this." And I want to highlight this because this was something I didn't pick up when I was preparing the questions. Then I ran so I read it with my AI assistants. Then I ran it through notebook LM and my trailblazers already are familiar that notebook can make podcasts out of papers and I sometimes publish them, you know, with a little intro. I did that for this one. And when I was preparing questions, I was I kind of assumed, [00:11:00] oh, it's for an AI tool because you're an AI engineer because this was image analysis. It's AI, right? And you corrected me, no, it's for any digital pathology tool that you're going to be deploying. And then after we had that conversation, I listened to AI and my AI never confused this. So, so yeah, Notebook LM is pretty accurate. But going back to these 12 defined rules for validation, your hospital is big, right?

Andrew: Yeah. Yeah, it is.

Aleks: How do smaller institutions handle digital deployment? How do how do smaller institutions handle that?

Andrew: You know, I I was thinking about that same exact question. Keep in mind that in order to have, let's say, a digital pathology tool of any sort, you already have to have a digital pathology pipeline, right? So Geneva is 100% digital, right? Right? They scan about 2,000 slides a day. The pathologists sign out digitally. Now, that is kind of the baseline requirement where if you want to apply an algorithm to some [00:12:00] maybe it doesn't have to be of course the entire digital workflow, the entire case load that you have, but if you want to apply an algorithm to let's say breast cancer slides, you have to have those breast cancer slides scanned and visible to the to the pathologist, right? So that that's maybe pretty obvious, but I I think the step after that is that in order to be able to even get to that stage, you have to have appropriate people and appropriate infrastructure in place. So someone had to, for example, purchase the scanner, someone had to go and integrate the scanner into the workflow, someone had to organize an image management system, right? So a lot of those things there are kind of baseline established things that I would assume already exist before you want to go and try and build your own algorithm and deploy it. And it's I would argue that the real difference

between that base infrastructure and building of the building and deploying of that algorithm is really just someone like me, right? Because we didn't have to hire let's say someone to deal with the digital pathology pipeline. That person [00:13:00] was already existing and they already were hired by the hospital and they've already been doing their job. We didn't have to hire a quality engineer because Johan was already there running the IHC laboratory. So, it's really more about shifting let's say, a small percentage of existing people's time that with the expertise in the different domains like the IT people. We didn't hire any additional IT people for an algorithm. They already existed for the digital pathology pipeline. So it was really just the engineers that have the expertise in let's say the deep learning component and building the algorithm that was the kind of next incremental investment but everything else…

Aleks: But you just said like that was like two weeks or three weeks out of of this whole thing and kind of like it makes me want to rephrase this question because well or I don't know shine a different light on the question because that's also something like people are people from these smaller institutions that would want to do digital So kind of feel at a disadvantage but there's not really any disadvantage because the tool that [00:14:00] you guys created was an something that was increasing efficiency of a high throughput institution. Right? It's not like rocket science that somebody is now going to have worse care because they didn't deploy this tool. But you can help more patients faster even more than you were that you were already helping and even faster that you were already doing with all the other like benefits of digital pathology of digital sign out you know scaling the number of pathologist or whatever you guys do but basically it's not that you're depriving somebody else of like any improvement of care rather scaling up something that's already at a certain level and then you have to you have to validate and you have to have all these people who do that right?

One step back because when you were developing this was there a performance gap between the research or did you even have a research data [00:15:00] set for this and then you would deploy, because that's like a hurdle that people have but I remember in the paper you guys said “oh at some point you needed more data and basically you task the pathologist to click whatever ever they would want to have included in the algorithm as they were diagnosing without it.

It's not really a question, but I know you're…

Andrew: I'm happy to talk. I think that was a major a major takeaway. One of the the challenges I would say from you know the last two and a half years that I've spent trying to focus more on on clinical deployment works is we start to appreciate that there's a gap as you mentioned between our research cohorts and the clinical cohorts. And what essentially that means in the real world is that you know we have very nice pathologist collaborators in research and what they do for us is they pick beautiful cases right they pick situations they're like ah this is a good one I'm going to give you that and they'll go through and selectively pick those right so now there's a bias in that but it's a bias towards high quality right there's a bias towards clean images that are in focus where the staining is very good… [00:16:00]

Aleks: I'm laughing because TCGA does not have a bias toward high quality at all right…

Andrew: Exactly. So if you go and start like curating these cohorts to try and do some type of study, you're there is some implicit bias in there.

And then the challenge is that when you take those and you build an algorithm and you try and deploy it, you very quickly realize what that bias is, because now maybe you have a little bit more folded tissue.

What's your algorithm going to do?

Maybe the staining is a little bit more variable. What are you going to do?

And all of those cases that the pathologist says, I'm not going to give this to you. This is too complex. Oh, this is this is this is not a very obvious case. Well, unfortunately in a clinical workflow, those slides are going to be processed, right? You don't get a a priority decide what you want to include. It gets scanned. We have criteria that define which slide should go and get fed to which algorithm and that algorithm runs, right? Even if it's not the perfect slide, even if it's not something that you'd want, it's going to run. And you have to now think about what that means in terms of giving feedback to the pathologists [00:17:00] because if they go and say,

“You know, these results are complete garbage.”

and you say,

“Well, it's very different than the training set that we had.” They're going to say, “well, I don't really care. You can't show me 20% of garbage every day. I'm just going to ignore your tool because it's just confusing. It's not really providing added value.”

So, the main lesson learned, I would say, from all of this to anyone that wants to build, for example, image based, biomarkers, these sort of things, is to collect that real world data. And what I think that actually means is in an unbiased way say I want every single case for the next two months that this algorithm would be run on without any pre-re beforehand without any removal without any I want to see the totality of that spectrum just as soon as it comes out of the scanner I want I don't want anyone else to be able to go and filter those things out and then once you see the totality of that I would argue it actually changes the way that we build our algorithms because now we can see the total space and we'll make better assumptions ultimately.

Aleks: And you can decide whether to run some other algorithm ahead of time like [00:18:00] HistoQC that we have like a full series on and I'm going to link to it but here another like kind of philosophical touch point okay you don't want garbage data but you don't want to curate data so these are like two spectrums so TCGA - The Cancer Genome Atlas is like bad quality data that was never intended for digital pathology you guys are working in in a totally digitally enabled hospital. So the lab already knows that the goal of producing these slides is for digital pathology. They're not going to be perfect 100% of time, but they are good because they need to be scanned, right? So then having the real live data go into the algorithm and we want to discuss also the terminology here because I call your tool algorithm but it's a lot more than the algorithm. the algorithm development was two weeks and the tool deployment was three years. So let's let's talk about that.

Andrew: So I think there's there's actually a phase [00:19:00] that we introduced inside of the paper that we call algorithmic hardening and essentially…

Aleks: Yeah, I want to talk about that.

Andrew: What I what I imagine is is basically an algorithm comes in and a tool comes out. That's that's what that algorithmic hardening stage is. So we have to define what these things are. I think an algorithm is really if you look at the true definition of an algorithm in a dictionary, it's basically a step-by-step procedure to accomplish something. So if I say add these two numbers…

Aleks: Yeah, diagnostic algorithm or whatever, right?

Andrew: It's a flowchart, right? Pathologists have algorithms as well in the medical books where it says follow this flowchart and it's a decision tree and that's that's also called an algorithm on how maybe you want to diagnose a patient. So algorithms don't have to be computational. An algorithm in pure definition is just a step by step which you do something. Now, we have to appreciate that research code. We'll say research scripts are research grade, right? And by research grade, I mean they show they're kind of a proof of concept. They're a prototype. They demonstrate that something is feasible. [00:20:00] They may not be done to the best computer science standard. So this may be things like modularity code reuse unit testing documentation you know proper proper we'll say structuring of the project into different subdirectories for different things you using proper utilities having the software development kit all of those things may or may not exist in research grade code it depends on who actually wrote it but the main point of if I think of myself when I was a PhD student you just want something that works and then when it works you can for example publish a paper, you can go and get some results and say, is this the right result? Yes or no? What do I have to fix? Right? So, your goal is to really get as quickly as possible to an end point where you can then go back and improve and go back and improve. And this is kind of the life of of a PhD student and an innovator in that space. It's you're iterating on ideas. You're not iterating on products. You're not iterating on like a concrete deliverable. You're iterating on trying to get the best version of that idea possible. And that's great for what [00:21:00] it is.

You can't take a research code and deploy it into a clinical environment because it's not production grade. So what does production grade mean now? So this is that concept of hardening. Hardening an algorithm into a tool involves all of those things. It involves pure code review where someone else reviews your code, make sure it's understandable by the person who will eventually maintain this long term. It involves things like versioning control. It involves things like putting them inside of, for example, a Docker container. So it's containerized. You have fixed versions of every single one of those dependencies and you know what those versions are and you know when they change and how they change and you have more kind of strict monitoring of it. You have very clearly defined…

Aleks: You have a software for tha?

Andrew: Uh well we use docker which goes and keeps track you can go and create files which will keep track that sort of stuff…

Aleks: And version three I like lose track.

Andrew: There you go.

Aleks: That's obviously research.

Andrew: Exactly. So this is the main I think the main difference is that that tool is basically [00:22:00] taking something that is potentially very sophisticated and only understandable or only usable by a very small number of people. So if I take for example my PhD thesis code and give it to you, you probably wouldn't understand it. If I give it to you know someone else who studied computer science and has a PhD in biomedical engineering, it's not very or it would probably take them two weeks to understand it, right? because I wrote it to generate a result, not for a product. And it's the process of taking that and putting it into a format where you personally can use it. So a tool in this sense is something like Microsoft Word. You don't have to look at the code. You don't know how it works. You just know when I do this, this happens. I can click this button and it prints. I'm not really that worried about the insides. That's kind of production grade. If you have to start going and modifying code to get something to work, that's research grade. So that hardening process is taking this kind of ad hoc stuff very strictly formalizing it, documenting it, and making sure that you know all of the components how they interact and assuring that they're going to continue to work that way [00:23:00] in the future.

Aleks: So I think it I don't know when I said it last, but like life is a funnel. you have a bunch of these research grade things and fantastic ideas and then reality just squeezes it in and one product or whatever one deployment not even you can have a product and not deploy it and comes out and that goes back to another definition between or like the difference between validation and accreditation. And we have a lot of definitions in this particular papers paper that are kind of important to understand. So this validation accreditation and why can't you just like okay why couldn't I plug your tool into my environment.

Andrew: You know I think the biggest problem is there's I guess there's two two components right there's a technical component and there's a regulatory component. From a technical component, the challenge that, you know, I've I've realized now [00:24:00] is that every hospital is really fundamentally different. And this isn't too surprising because hospitals generally tend to be pretty old overall, right? And hospitals existed before technology, right? We'll say before the internet as an example. And as a result, hospitals were really designed only to treat the people in their near vicinity. And that also means that you know you weren't really if you had to transfer a case from one place to the other you would just physically mail it right you'd put the things in paper and and mail that that sort of stuff. So what ended up happening is over you know the last whatever let's say 200 years hospitals grew in isolation in silos and the problem is is that when something grows in a silo just like you know we'll say bacteria or any other evolutionary process it's constantly adapting for its local environment but it's not adapting to someone else's environment. So every decision that that hospital makes rightfully so this isn't a critique every decision that they make is the right decision for them. Now what does for them mean? Well, it means [00:25:00] based off of how the departments are organized, but how you for example organize a department. Maybe pathology is inside of the hospital. A lot of hospitals in Switzerland have a different institute for the pathology. But all of those are old decisions. Now those old decisions feed forward year after year after year where a hundred years later a decision about something as simple as structure that was made then now impacts how computer systems were purchased and impacts how those systems are organized and how they communicate and who has the ability to go and decide that this code goes to this sort of thing and the this this kind of like heterogeneity. So these hospitals work very well in an isolated format. But now when you go and build something in one format and you try and take it to another format, it's very different. An example of this is we're trying to take our algorithm now and perform a similar deployment and validation process at Emory which also uses. So you' think I should be able to…

Aleks: That was like my follow-up question.

Andrew: You would think you should be you would be able to copy and paste this sort of thing. But what ends up happening is [00:26:00] the 12 roles that you mentioned before, those 12 roles exist in spirit in every hospital, but they don't exist in practice in the same person in every hospital. So Geneva may have one person that does two of those roles while Emory may have split those roles and have two different people that do it, but another role is actually emerged of three of the other kind of responsibilities into a single person. So even just even the overlaying of the person and the responsibilities that are involved in this process are not one to one. So you end up spending a lot of time just trying to figure out how to align these things just from a person perspective without even thinking about all the technology is different. One has epic the other one does it. One goes and codes them using this format the other one goes and codes them using this format. One goes and does the biopsy at this time of day. One does right. So there's like all of these small little difference that have a a larger impact than just the algorithm itself. So ultimately when we succeed because I believe we will the algorithm itself [00:27:00] somehow will still not have changed. It will still involve the same steps of going and finding a piece of H.Pylori, building a contact sheet highlighting the all of that will remain probably 99% exactly the same.

But it's the entire environment around it that will have changed. How we deploy it, what kind of computers we deploy it on. Will we deploy it on a cloud-based system? All of these sort of things have other ramifications that really prevent this kind of copy and paste between hospital.

Aleks: I love your paper because I I'm like as you're saying this, I'm like I just had the same experience. I went to STP - Society of Toxicologic Pathology Annual Meeting which was I think the first time we met but the European version and as you're saying this about these two hospitals Geneva and Emory I'm like, “Oh yeah there was one company that was deploying the software that we use for reading and another one and they had like a totally different workflow,” and it's so funny because you will have like different papers and people are like but why are you redoing [00:28:00] this work we already published it. Well, you have to redo the same work for your institution. And one thing that was something I remembered was like one person would say, "Oh, no. Our QA didn't let us do a.” - Like “What do you mean they didn’t let us you do it?” - “Well, they didn’t let us do it, so we have to figure out a workaround to get it done in a different way."

So, it's like you cannot really hard argue. you have to figure out your way to get to the same you know outcome whatever the outcome however we define outcome the deployment of a tool may be an outcome or whatever the tool is providing you h may be the outcome, but it's so funny it's like I'm liking your paper more and more even though it took me some time and effort to read it for the first time.

Andrew: It took a lot of time and effort to write I can assure you of that but you're absolutely right I think the trick is it's not in is in that word same. You know, we're not doing the same work. We're doing very very very similar work. [00:29:00] And the difference between same and similar is hundreds of hours of effort in this context.

Aleks: So question, do you see value in like and I think you guys have a consortium backing it up. and I'm jumping between my questions. But, do you see value of like consolidated efforts how to do it if everybody has to do it in a different way anyway? Because I kind of like there were initiatives in the talkbust space to do it together and we were kind of ahead of the curve and we already did it and I took myself out of this because I'm like doesn't serve me. What is your opinion on that?

Andrew: I think there's a large amount of overlap and on that overlap in between hospitals is where we should be sharing expertise. So, I know that every hospital doesn't have for example Sectra.

So going and mandating that everyone uses the same code to produce Sectra type outputs is probably too much. [00:30:00] But if we go and agree on certain let's say standards for example geojson is a common way of storing annotation information and what type of inputs that we get. So if we can at least kind of define a standardized way of receiving input and a standardized way of putting output that goes and allows more flexibility for some nuance and some changes on the inside as well from a we'll say regulatory perspective. It's not that they're fundamentally different, they're just slightly different and that ultimately means most of them have the same requirements for information. So an example of this is at Emory they have an internal risk assessment which is done very differently in Geneva but what I found doing both of these processes is that it's actually somehow the same information so it's not a different way it's a different kind of form it's a different format they ask but it's the same kind of information so if we go and build and this is what I think we've we've started to do as part of that that effort if we go and start to build templated [00:31:00] documents which are also in the appendix of our guide here exactly aligned with the ones that we use internally in Geneva.

If you fill out those documents or if everyone agrees that this is important information, which I think they are, I don't think there's a lot of fat there to trim. If everyone agrees that this is the important information and we all agree to fill it out in the same way and we all agree that these are important things, then even though each hospital the way I report that information is different the actual information content is the same. So if we at least go and say this is my algorithm this is you know these five forms filled out with all of that information I can then go and take that to different institutions and say “oh copy and paste from here oh question seven's actually question nine from this oh copy and paste copy it's copy paste oh this paragraph goes here”, right so it's more like breaking it apart into pieces and then filling it in where it goes because the information itself is not fundamentally there.

Aleks: Okay so there is enough value in like doing it together not to reinvent the wheel.

Andrew: I mean I think they also do this from like consortiums, right? When you know, for example… [00:32:00]

Aleks: Yeah, yeah…

Andrew: Create like an oncology cohort, the oncologists from the different hospitals will get together and say, "What's the minimum metadata that we need for each of these patients?" And then they'll have meetings about it and they'll say, "Oh, we need at least the age. We need to know the smoking status. Oh, I want to know if this has a bracket mutation. Oh, I don't care if it has a bracket mutation. I sometimes get okay well then and then there's like some consensus ideally that's eventually found and then that forms that minimum viable cohort of of metadata and I think there's value in that because now when people go and share those cohorts you say oh I filled out the 30 required fields you already know that you're going to start from a a very very strong place maybe there's a few other things to do but you've already kind of set the stage getting that 30 is already very difficult and likely requires pulling from different data sources so if you already have that established adding one or two more maybe isn't as as complex as as starting from scratch. So if we could do the same thing with algorithms and the requirements and kind of the expected validation and how we measure throughput and these sort of things then that will already get us maybe 85-90% of the way [00:33:00] there.

Aleks: So then let's go back to the validation and accreditation difference and why Why are both validation and accreditation required? do you need both? Do you need both? Probably especially in a clinical setting.

Andrew: I've learned a lot about this. This is probably just in the definition of these two words is probably where I've had the most learning in the last in the last two years. It took me a long I'm going to try my best here to explain it to the level that I've managed to achieve. A laboratory itself is accredited and an accreditation is something that happens by an external entity. So the Geneva Hospital Laboratory is accredited. It's accredited by the Swiss accreditation service.

That means and it's a 15189 laboratory. That's an ISO regulation. If you've not read that regulation, I recommend it. I found it very very interesting because it can apply to everything. I can use ISO 15189 to like cook eggs if you wanted to, right? All all that that accreditation…

Aleks: I'll cook eggs the research way Andrew.

Andrew: Quickly quickly and to the taste. [00:34;00] The purpose of this accreditation is really a documentation process, right? So it says what do we think is important? How do we detect when there's errors? How do we report those errors? How do we correct those errors? How do we then update our documentation to make sure that we know how our laboratory is running? Right? So, it's essentially this idea of knowing what's going on inside of a large group of people and if you have 20 or 30 people working together, you need if you want to have an accreditation or or some visibility, you need some type of workflow. You need some type of diagrams. You need some type of way a structured way of communicating. And I think that's what that ISO essentially is is providing for.

It's saying, I think this test is going to go and produce this result. this is how I'm going to check to make sure that that result is correct. If it's not correct, these are the things that I'm going to do to correct it. When I correct it, this is how I'm going to document that correction. This is how I'm going to transmit to that information to the other people that use that workflow so they're aware of it and then we're going to start the process over again. So, it's really…

Aleks: Oh my god, I’m gonna downloaded it…

Andrew: …from this abstract concept, you can really apply [00:35:00] it to a broad set. Of course, in the medical device when the ISO 15189, there's medical device specific stuff, right?

They give medical examples. It's really kind of very well tuned for that specific domain. But I would encourage people if they do read it to imagine it at a higher level and really appreciate how generalized of a concept it can be. It's about documenting and understanding what you're doing, how you'll detect errors, and how you're going to correct it. And what an accreditation agency does is they come and essentially look at all of the documentation that you've put together and say, is this a functioning system? Will this actually propagate the information correctly and percolate errors up to the top so that people that are qualified to address them can address them and then document and inform everyone that it's been addressed. So it's really the external creditor is just validating the fact that that system or crediting that system that you've put in place is actually a functioning system. And the interesting thing here really the the trick here is that multiple laboratories are accredited [00:36:00] but everyone can come up with their own version of that process right and this is again what we described before because maybe this person doesn't exist at that hospital so you can't go and use for example the Geneva accreditation of their digital pathology workflow at Emory because there isn't a one-to-one match from the technologies from the people from the image management systems from the documentation system there isn't you can use it for inspiration and you can say like, oh, we need some some way to track document changes. We're going to use this piece of software. Oh, we need some way to alert people. We're going to use email or whatever it is. So, you can inspired by it, but you'll never be able to directly apply it. The accreditation agency comes and looks at your version when you're when you think you're done with it and verifies that yeah, this is working correctly and then goes and reviews the last year or so and say, yeah, everything has remained within those bounds in the sense that you've described the system. Now they're going to go and perform yearly checks trying to make sure that you've actually abided by that system. So that's what that auditing is [00:37;00] saying you've told us that you were going to do this. Now let's make sure that you actually did do this to that level of expectations and if you wander outside of there then you know they give you recommendations say oh you didn't do this that or whatever it is. But so that process is really just having like a third party verify what what you've done. So that's my understanding of what accreditation is.

Aleks: So you say, "Oh, you can use it for cooking eggs." And I'm like, "No way am I going to ever touch it." And then like two minutes into your conversation, I'm like, "Yes, I think I'm going to download it for my next presentation that I'm giving at the at the conference, which is about like how to make sure that the AI tools that you're using are working."

Andrew: The thing is is like there's no it's it's more vague than I expected. It's a very interesting document because it's not saying you need to make sure this scanner produces this result. It says you should go and organize a committee of people that are experts in that area and using that you should determine what the minimum useful values what the metrics should be and what the minimum values are. That's that that's what the ISO says,[00:38:00] right? That's all it says. It doesn't say what they should be. It doesn't say how you should get those experts. It just says this is this is how you do this thing. And then if you take that, you can apply that actually to a very broad number of topics. It could really be anything because oh experts well I want to cook an egg. Let me get a bunch of chefs that have cooked eggs and we'll talk about what they think the best egg is and how we get those metrics and now we just document that. What does that document look like? Right?

So, it's all kind of this this self-feeding thing that you get to come up and describe it yourself and then just make sure that you actually execute.

Aleks: Okay. Validation.

Andrew: Validation is the process well in Geneva for example, we validated our algorithm as a lab developed test. This is maybe not too surprising to the hisystologology folks because you could do that for you know IHC's or PDL1 I think is another common one where they do as lab develop tests you basically can like buy a test from externally and then you make sure that it works as expected. So validation here we we basically ran the tool and we asked ourselves [00:39:00] is the tool working the way we expect it to what are the requirements for the tool. So this is what we defined a priori said oh you know we want like 100% sensitivity we want this we want this we want this come up with like different metrics that you want with your colleagues and with the experts right this is very well established within the kind of workflow there and then you go and see if you actually met those requirements right so it's really just a I want this sort of thing how can I verify that I've actually reached those goals and an example of that is you know you can run it for three months in routine practice you process some, you know, 2,000 cases and then you go and say, did we hit the actual metrics? Yes, we hit all of these. We Okay, well, that's a lot of evidence that suggests that our tool is validated to work the way that we think it's going to work.

What I think is most interesting that I'd like to stress here is that there's really is no difference between us validating a tool as a lab developed test and us purchasing an FDA or a CE mark tool because we still have to do that validation, right? In order to introduce something external [00:40:00] into an accredited workflow, part of our accredited workflow says when you get external tools in, you have to validate them to make sure that they work, which kind of makes sense.

So even if you buy a tool, you still have to make sure that it works in your workflow. That makes sense. What's very interesting is that if you understand kind of at a deep level how all of this stuff works, you can actually get a good documentation for anything. So if I go and tell you I have an algorithm, it's 51% accurate. If I go and correctly document it and I go and say this is 51% accurate, I could validate that it works with 51% accuracy and now I have a I can have a tool I could have a tool that just as long as it's well documented and right one of the concerns here is you'll have some risks. What is the risk? It's wrong 49% of the time as part of that ISO requirement.

Aleks: How are you have to mitigate?

Andrew: Right now if I come up with a good mitigation strategy I can go and validate a 51% accurate tool [00:41:00] if I have a good mitigation and I can accredit that I and an accredititor would look at that and say it's I don't think it's a useful tool maybe but you've decided with your experts and through your workflow you've gone through each of these steps so I think what's interesting is people imagine that oh if I buy an FDA approved algorithm or if I buy a a high C mark algorithm that it's 100% accurate or that it's going to perform perfectly. That's really not the case. You're really that FDA mark is also a similar concept that they said my tool is going to perform at least this well and then they show that it performs at least that well. The question that a user or someone that goes to purchases should ask is what did you set as your requirement? Now, if they set it as 99.999% that's great, but they easily could have set it at 70% and then show that their algorithm is at least 70% accurate. The question that you should ask yourself isn't does it have the FDA mark. The question is is what are the actual metrics that they suggested they would hit that they then managed to successfully hit. [00:42:00]

Aleks: Okay. Even it's for scanners for everybody. You have to still test it in your lab. The like you said whenever you you bring in a tool you have to incorporate it into your workflow the way you're supposed to do it mandated by your regulatory agencies.

Something you said is that the accredititors come in and say okay everything is with certain within certain parameters. and a year after that it still was all good. And we had this discussion as we were going back and forth with the questions and that's kind of a misconception that people who are newer to these digital tools are especially like you mentioned that the there were there was deep learning involved in the algorithm. They think the algorithm drifts that it gets like trained every time it's getting used and the question is like how do you mitigate it? It like has its life of its own. And the truth is [00:43:00] once you're ready to validate it and you version it, it does it cannot have a life of its own anymore. But what has a life of its own is how the population drift that we are experiencing in terms of or like the disease drift.

So what did you do with that? How do you address that when your tool starts underperforming? Not because anything happened to the tool, but because of the presentation of anything in the biology changed in the population. It's probably not going to be one year, but like five years. Why not? Like in the ToxPath space, you're not looking at control data beyond five years because there is genetic drift in How do you handle population drift affecting tool performance? the population of animals.

Andrew: So I think what's what's interesting here you know I'll just completely agree with what you said that algorithm doesn't drift, right? That algorithm is locked down to a specific version and that version is written in your at least in Geneva I'll speak from that experience it's written in that accreditation [00:44:00] or that validation document that says this is the version that we did it was made on this date and that's it. It's then frozen in time so it's technically impossible for algorithms to drift because that algorithm is just a bunch of computer code that's in zeros and ones and those zeros and ones are locked down they're not changing because if they change then we don't have control of our system right now our system is is fluctuating as you said what can change is like the the input to that or the maybe some pre-analytics.

So it's interesting because there's multiple cases that this can happen. One is as you said genetic drift, population drift that you know if you have let's say a large immigration population that maybe shifts the patient demographics or you know the these sort of things happen but you also have more subtle things like we bought a new scanner, we bought a new stainer, we bought a new microtone that just cuts a little bit thicker than the other ones that then somehow will impact algorithm performance. So all of these sort of things are possible kind of latent variables that that feed into that system.[00:45:00]

What we can do is and this is also part of that that ISO requirement is monitoring right you there's a heavy component of monitoring here just to make sure that you know what's going on right do I know what data is coming in do I know what data is going out I think what's interesting about this ISO concept is that if you change a scanner changing that scanner will trigger by definition other things to in that workflow right because when you go and look at kind of the hierarchy of how the documents are interconnected in in this accredited workflow. If I go and change the thing at the top, it goes and says, "What things are connected to this?" Right? So, when we go and fill out our validation document, it says, "What scanners did you use in order to validate this tool?" And you list those scanners. Now, if I go and have a new scanner come in, it'll go and I'll look at that document, say, "Oh, actually, this tool is no longer valid for this scanner because it was never validated. Should we validate it for the scanner?" Right? So now there's kind of an interconnection that as pieces or at least large pieces of the puzzle are changing [00:46:00] the other part of the puzzle becomes aware of that and says is everything still in control or is it out of control for the population drift and these sort of things I think there's excellent opportunities for things like Histo QC to go and perform routine monitoring and say you know is my hematoxin still within my acceptable hematoxin range?

I know what it looked like for the last five years I can measure that set that as a baseline and then start measuring deviation from that baseline. We also know that the algorithms aren't performing because ultimately the pathologist will complain as they should, right? If something is wrong, just to be clear for any pathologists here, if something is wrong, they should tell someone and say, you know, the last five times I saw the output from your algorithm, it was completely wrong or it was more wrong than usual. Can you look into it? Something strange is going on. And they should report that and they should feel comfortable reporting that and then you go and run additional tests. The solution is ultimately if you can't control the laboratory because you know maybe the company that sold you that hematoxin and asin went out of business and you have to get it from another [00:47:00] third party, or the scanner broke and you had to buy whatever it is. You can still fine-tune that model. You can still go, and you know change things if you need to change things. But you need to be aware that that's an expensive process because now, as soon as I go and change things.

I've lost my LDT. My LDT was for this specific thing, not a version of it, but for this one in particular. So if I go in and now start changing my model weights, that's a different model. Now the question becomes, is it sufficiently similar? Then you should have a conversation about it. So there is kind of this opportunity to improve and correct, right? That's not really a big fear. It's maybe the idea of monitoring to be able to detect when that needs to take place.

Aleks: Is there anything that needs to change in the regulation to scale the deployment of the successful tool like let's say your H.Pylori that is successful, anything because I don't know and you can you can tell me that I'm like maybe missing something but I don't [00:48:00] I don't know what would need to change in this particular scenario to make it like more scalable because you have to go back to the lab and do the same work again.

Because I recently had a conversation with Keith Wharton from Raj when they get their clearance for their scanner. And it was the sixth scanner and he like analyzed in his paper all these scanners and he says well they all are like, “Why are doing the same thing performing the same like why do we need to do all these”, well he doesn't say that but basically the question is like do we still need to do this double work because to me it looks like double work. Because you buy it into your lab and then you have to do the work again.

And you know, you kind of already know that it's working. It should work if we're doing primary diagnosis on it. But do we need to have another like 10th scanner go through like three years of studies for the FDA and then do like couple of months validation in the lab, whats your take on that? [00:49:00]

Andrew: So I I think that the thing is the regulation is different depending on risk profile and in that sense I mean Geneva can go and build an algorithm as and validate an algorithm as an LDT that works in Geneva on Geneva data. So it's a very small kind of focal thing and as well the people that have built that and that have the expertise in that they're all essentially local in that environment. So I would argue in that case it's it's fairly low risk because everyone is kind of like aware and knows right the FDA requirement and I saw that that podcast with the with the Roche gentleman that that you mentioned.

Aleks: You did?

Andrew: Yeah, I saw it a few days ago.

Aleks: Thank you so much for tuning in. You're a digital pathology trailblazer.

Andrew: For for sure my pleasure. For sure my pleasure.

And I think that the challenge is is that that sort of stuff is is a much higher risk profile because now if I want to go and take my H.Pylori algorithm and release it as a a test that anyone in the entire world can use, now it's not just affecting a very small [00:50:00] percentage of the world in that the Geneva patient population. Now it has potential to impact 8 billion people, right? That's that's really the the scale difference. It goes from, you know, a few hundred thousand people to 8 billion people very quickly, especially with software technology because it's copy and paste, copy and paste. So that's really the difference between a lab developed test where we know the population, we know it works in our population, we test it and we monitor it versus something that Roche deploys which you know that scanner can be bought by anyone anywhere in the world. So it it has to be a much much higher level of of regulatory concern because of that potential risk profile.

Aleks: Okay. Well, I get it. You're right. I agree.

Andrew: And just to put it clear, our our LDT test, we cannot transfer that to anyone else. So our LDT status is a Geneva LDT status. It's no one else's LDT status. So, I can't go and send it to, you know, burn, and they say, [00:51:00] "Oh, well, Geneva validated as an LDT, so we're just going to use it." Completely not permitted, right?

As soon as you start going and transferring things, now it sounds like you're selling a product, right? It sounds like you have a product. If you have a product, now you rise to this level of, oh, maybe I need an FDA approval. Maybe I need a CE mark because that risk profile is now just not me. It's all of these other hospitals that may, in fact, be impacted.

Aleks: But you can do LDT as a service, right? So they could send you samples, and then it runs your algorithm runs on their sample. So that's how, like, you kind of commercialize these. Okay.

Andrew: My understanding is that that's very similar to IHC laboratories. Like there's some laboratories that have this test and others other others don't, and maybe the clinician requests it and the pathology lab says we don't have it. So they send it out to an accredited laboratory that has that test validated as an LDT and they perform that test under their [00:52:00] you know under their purview. So I see no difference for algorithms in that context.

Aleks: I like your tool because it is not like rocket science. It's not something that like oh without the tool the pathologist is not going to do it anymore but still it could lead to deskilling and there was a publication in Lancet by a Polish author actually they cited it at the NCCN AI for cancer care guideline meetings about the deskilling of endoscopies when they would do and scope for colarctyl for polyps for during colonoscopy, and I was thinking about like is that a big deal like they would like deskill themselves if you take away the tool and if they're better with the tool why would they need to like maintain this skill without the tool how would you address this concern [00:53;00] especially for skeptical pathologists How do you handle skepticism from pathologists about digital tools? and other medical professionals.

Andrew: You know I don't I don't have a great answer to begin with I have I have ideas for…

Aleks: I don't either I'm like but I want to hear your opinion on that.

Andrew: I think the next kind of generation will see, first of all more human-computer interaction studies, right? Because if we if I think about it, you know, I've been in this field about 20 years. The algorithms that we had 20 years ago just bluntly were not good, right? They were very brittle. They weren't robust. They didn't work very well in between different hospitals. They weren't fast enough because the computers weren't fast enough. There were just a lot of limitations back then. But technology advanced, the field advanced, and I feel like we've kind of been blessed with fire. You know, deep learning is somehow fire. We're now in the last few years, we've gained this technology that allows us to do things that we had been struggling with for a number of years before that and that's really advanced our our capabilities. So, one of the things, that I think we have to be cautious is that the deskilling, these sort of [00:54:00] questions we always kind of philosophize about before. But we never really studied and we couldn’t study because we simply didn’t have good enough algorithms to do it, right?

If your algorithm is 70% accurate, there was no one that was saying “Oh, we should clinically deploy this and then everyone’s going to be deskilled.” You know, all the pathologists were like please go make your algorithm better and we can discuss it. So, maybe now we’re at the time and period when this becomes an issue and I genuinely believe and we’re looking into ourselves how we can fo and figure these things out because I think its a very important concept. I would say as well that Inti Zlobec and I would say I was a co-author on this paper, we put out a paper. I think about a year ago, where we used tumor cell fraction, we used the deep learning algorithm and we asked some survey from senior pathologists, junior pathologists residents, all these sort of folks and what you end up seeing is when the algorithm is wrong, because algorithms are always going to be wrong somehow, right? They’re just not that smart, they don’t understand the full context, like a pathologist does [00:55:00] when they have to report. So when the algorithm is wrong, the senior pathologists were very confident and very able to just disregard the results and say, “ I don’t like this. I’m going to write down my own value.” And that’s why they should do. That's the right thing.

The challenge that we found is that a lot of younger pathologists were actually negatively impacted by having an incorrect deep learning result. They kind of said, “You know what? Maybe I actually didn’t do this the way. Okay, I’ll add an extra 3% because of this.”

And consistently, it got them further away from the truth instead of closer to the truth. So, it seemed like this is exactly what that deskilling is, except in that context, they were never skilled to begin with, right? They were just starting their skilling process, and then they received the tool, which, if they had continued to use that tool, may potentially have actually prevented them from reaching the expected level of competency.

So, maybe an example of this is that people should be trained without those algorithms and be expected to perform up to a specific par again to be discussed with experts and what’s reasonable sort of things, [00:56:00] reach that par before they gett introduced to algorithms and before they’re able to go and take advantage of them so that they do have the internal confidence to say this is garbage. I’m not going to use that sort of thing. That's maybe one solution.

Aleks: On Yeah. Interesting. Interesting. And you mentioned also the study of human-computer interaction. I'm going to tell you a story, a personal story. My kid went to school. He's in first grade, and I'm like, "Okay, he was homeschooled last year. I don't I didn't know if he's going to like if they're going to throw him out of school or something." Anyway, he the first week he comes with math, and he I explain him something. Second week, he does it more or less on his own with little supervision. Third week he he does it on his own without any supervision. and I realize he can actually read because he reads instructions. And then week four, I get

a letter from school. Your kid needs like serious help with math. And I'm like, What? Like it doesn't track at all. [00:57:00] And it turned out that he doesn't have access to a laptop at home. It turns out that at the beginning, they gave him some like standardized test on a laptop and he was like in the lowest percentile he could be. like I'm like no I don't think it's that bad. Maybe the first week I had to help him. But it totally was because of the tool that he was not used to using and obviously, on the month his reading skills also improved. That goes fast in first grade. But yeah, the tool use is another aspect this and that brings me to a question like how do you train teams on tools? Do you even need to pay attention to that nowadays? Like now everybody knows how to click in Microsoft Word. You don't really get specific training. How do you train teams — should tools be training-free? you need the tools to be so good that you don't need the training, or how do you deal with that?

Andrew: No, I think your point is very valid because it impacts many things, right? [00:58:00] I think if you go and look at for example the IQ tests in the past there was exactly these types of biases as well because the IQ tests were kind of designed in the western world using western kind of cultural norms and then if you transplant them somewhere else you know like the people in Asia as an example had extremely low scores and they were like, “Ah these people have low IQs,” and it turns out it wasn't it wasn't…

Aleks: Not really…

Andrew: It no, it wasn't really you know the problem was is like the way that you tried to measure it wasn't was an incorrect measurement like it was surrogate too many things and you weren't able to measure the actual thing. So I think…

Aleks: I need to interrupt you for one second here. Another thing that like gives the perception of intelligence or lack of it is that how well you can speak in a certain language which like totally distorts your perception of how like what's the domain knowledge of this person. So that's like another thing that goes into this.

Andrew: Absolutely. But which is also [00:59:00] interesting because I would argue that your ability to speak English probably has no bearing on how well you can solve math equations. But somehow…

Aleks: It does not…

Andrew: …It does not. But if the only types of questions that you're exposed to are very complex word problems where it says Peter went to the store and bought 55 different watermelons right and and you don't understand that sort of thing, then you end up with an inaccurate representation. It's like this quote where it says it's a terrible thing if you try and measure a fish's quality by how well it can climb a tree, right?

It's just like it doesn't make sense to do that sort of thing. So, you have to figure out what exactly you want to measure and how you want to measure it and make sure it's aligned with the thing that you're trying to measure, right? Any gap there, you experience these kinds of really weird downstream metrics or downstream results that you look at, and you're like, "This wasn't this something's wrong here." as you as you noticed yourself. I think if I could segue into this, that's how the ISO kind of approaches this because ultimately, [01:00:00] if a person is going to be using something, this ISO concept is they should be trained to use it. Maybe not trained to use it, but they should be verified as knowing how to use it, right?

So, one of the concepts is this idea of a competency register. And if someone wants to go, for example, and use our H pylori tool, they're encouraged to do that. They have to undergo training. Now, if they already know how to use the tool, that's great. You know, you send over someone and say, "Show me using the tool." They use the tool in front of them. Do you have any questions? No. Great. This person's competent. I saw them do it. They, you know, they did a few cases. Everything looks great. No problem. Right?

So, it's it's not even that you have to train them, but it's more that you have to verify that a person is doing something the way that you think that they're doing it. And if they're not, then maybe you train them until they do do it the way that they should be doing and that sort of thing. But this I think this connects with everything because you know as an example of this I you know of course I have some students and we use a lot of Linux and a lot of kind of high performance computing sort of stuff and on occasion I'll say hey can you just share your screen [01:01:00] I you know just you know maybe they're having a technical problem of some sort can you share your screen they'll share their screen and I'll look at how they're doing it and I'm like oh I realize I failed here like I failed to give you you know the the preliminary instruction I was like there's like five Linux tools that you should be using right now that would have immediately prevented this problem from arising. You know, it's like you see people like trying to physically copy over things manually and you're like, you know, there's a tool for that. Sorry, you should have used that. You know, it's like all of these sort of things that you don't really think about because they're so ingrained and natural for you that a lot of it is just making sure everyone's at the same level. And that's really hard to do, especially in research settings where the cutting edge is advancing very very quickly and everyone comes comes in with different levels of competency trying to get everyone to at least the bare minimum. You should always be verifying at least if not training.

Just verify. Can you show me how how would you do this? Okay, you know the at least the right tools. Maybe I can assume that you could put those tools together well enough. Maybe that's good enough. But in something like [01:02:00] patient care, maybe it has to be more strict at that and say, "Show me you using it. I'll watch you using great. you did a great job, you're competent, and then and then we move forward. It's like scuba diving. Even when they go and they tell you this stuff, you still have to go and actually perform it in front of the instructor and they say, "Show me this. Show me this. Show me this." There's a written exam, of course, but it's really that practical where they say, "Show me these different components." You show them, they say, "Okay, I'll sign off. Here's your license." Right? It's kind of like, you know, helping you make sure that you know what you're supposed to know as well.

Aleks: So, a story. I was recently I was doing a video for Grundium for this little scanner and they sent me this thing in a box and I make the video, I open it, this unboxing and I send it for review and they were very very diplomatic and gave me feedback and like oh fantastic maybe music a little less loud and when you open this box can you not have it horizontally? It was designed to be vertical [01:03:00] so that you don't drop it. And sure enough, it's like block letters on the box open in a vertical orientation. And I'm like here in in front of the camera opening, super happy. But then they said, "Oh, it's a it's an interesting customer feedback. Thank you for that. It's a customer feedback. You didn't read the instruction on the box. It was very big." But so you're not going to see it in the video. you're going to see it the correct way in the video. But that ties into this, you know, show me how you use the tool and then I'll tell you

if you need to train a little more or not.

Andrew: I mean, this is for sure happens everywhere, right?

Because we, you know, we have these these NIH supported tools, the open source tools that we've done these webinars on. We're now working on the next versions that scale to billions of objects. I mean, I think it's a really really impressive uh next version of these next iteration of these tools. So, one of this is we've actually done user studies where we've had some of our pathologist colleagues. We've not told them anything, [01:04:00] right?

We said, "Here's the tool. I want you to do these things and it's called a usability study." And you say, "Can you upload an image?"

Okay, how would you upload an image? And they Okay, what would you do now?

Aleks: Drag and drop.

Andrew: You want to annotate something. How would you do that? They're like, "Oh, I guess I would click on this." Okay, why don't you do that? Right? And then you you're kind of like asking them to go and do it. There's this concept called thinking out loud where they're supposed to go and say, "Oh, I know that I want add image. I guess I want to add an image. I'm going to do that." Right? And then, you know, we record them and then we go and review it. And it turns out that you know we did this recently with one of our tools and the feedback was invaluable because the number of assumptions that we made very similar to your like it was obvious to us our documentation wasn't as good as like large letters pointing or whatever it is but that's how it is when you go and work so closely with something.

You don't really imagine fully what that other person [01:05:00] that's completely naive with that tool will come in and how they're going to approach it. and going and studying these sort of things. It's the same exact thing as this ISO accreditation component where say I think I built a very reasonable tool. Show me how you would use it. And you start saying like, oh, you're going to click there. Yeah. Yeah. Yeah. I see. I see. Okay. I'm going to remove that. I'm going to add a button over here. I'm going to put this word here. I'm going to do these sort of things. And then that's also part of this iterative process.

So eventually, if you've done, you know, it's kind of like the iPhone effect. If you've done a a great job, and I imagine they've done tons of usability studies for that first iPhone, you could give it to someone who knows nothing about technology and they just start clicking it. Oh, you can pinch zoom and they're like, "Oh, look this. I could do this. I could do this. I could do this." Right? It becomes very very intuitive for them, even if they don't have that background. And that's really where we should be aiming to get.

Aleks: That's so funny because I have an Android and I'm like, I don't want an iPhone because I don't want to learn a new operating system. And that's one that apparently I would not have to learn so much. Yeah, I like the paper. I'm going to be using it in [01:06:00] my next talk which is going to be this week. I'll probably will publish the LM notebook LM version before we have this published so that people just are aware of it and and I'm going to link to it. for you, what's next? Are you guys making more tools or what's the next frontier for you?

Andrew: So essentially now that we have this workflow and I think we've done a a reasonable job of generalizing it, you know, we're just grinding away at that workflow and we're trying to improve it itself. So we we're building another a number of other in-house tools, similar kind of like diagnostic tools that allow us to work on that workflow to try and get it better. And I think what's interesting here is that there's really in the near term there's only a handful of of discrete categories, right? You have like a detection category, you have a counting category, and you have maybe like a segmentation category. And each of these are and that's it, right? That's like [01:07:00] kind of the bulk the bulk of these show as….

Aleks: Let me show you something…This is my book. Digital pathology 101.

Andrew: If we could build use cases for all of these, like in terms of templated documents, in terms of understanding what the validation is expected to be, how we're going to display the results, how we're going to get feedback from the pathologists, then we'll probably have 95% of the space covered.

Aleks: Ah, shoot, my camera is like following me. These are like what we just said, detection.

Andrew: Exactly. There's only three categories. There's only three.

Aleks: Yeah, you can like count cells, find region and classify them, right?

Andrew: That's it.

Aleks: And then you figure out how to prove it and and you're good to go. You can deploy and for every problem you can have a tool. For every you can have a problem that as well.

Andrew: So I guess my my my thing now is to try and figure out how we can go in that direction and make it even more efficient, right? how we can go and build larger scale [01:08:00] you know kind of deployment efforts so we can build tools easier so we can validate them easier you know I think there's opportunities for things like large language models to help because we have to like write reports and I don't really know that there's a lot of value add from us writing those reports even though they take us a long time so if I have to explain why we need an H pylori detector should I go and spend two hours writing that or can I just ask ChatGPT to say you know can you just write me a page about why we need this and then quickly review it and say, "Yeah, this is accurate. Now it takes me five minutes versus two hours." And now what how many parts can I do that for? Like of course I have to actually run my own experiments and do my own statistical analyses. I find that stuff interesting. I'm happy to do that. But that's a small piece of this. So questions of like how much of this can be generalized, how much of this can be reused, copy and paste, slight modifications so that we can really scale these up with the goal of let's say 50 algorithms in the next five years. What does that look like to to start thinking at that at that level? And I don't think it's going to be that bad [01:09:00] if we really narrow down and find the bottlenecks, which we're already doing, I think, a very good job at doing.

If we already find kind of find these small little, oh, every time we do this, it takes two weeks, but if we kind of do this, maybe it'll only take a day. Let's try that. And then by going and putting these things together, you know, maybe there it becomes a lot more easy to do. I think that's it.

Aleks: And then you make an agent that does that.

Andrew: That would be pretty cool.

Aleks: There is like so much cool stuff that can be done with it. And my problem is like okay in any institution I think like carving out the resources to look at the workflow from the outside and see okay where in the short term we need to put energy into so that in the long term we can decrease the burden or scale or whatever. And I think these organizations that are thinking that way like and you you see it like in the commercial space that where you would have like somebody [01:10:00] specifically to go and obviously the healthcare is super complex, drug development is super complex but you have like people who are dedicated to walk around and see okay what could be done better with AI and then they get a team they do it and then people just pick up the iPhone and start using it. Or Android.

Andrew: I think it's exactly that. The the thing is, you know, in Geneva, we're very fortunate. We have a head of a department that's extremely visionary, that sees this value and is willing to make the investments to to do these sort of things. And and we see other institutions that are not as committed and things don't get done, right? It's impossible to essentially build a digital pathology pipeline and then build an algorithm on 2% effort for one person, right?

We really need to go and commit to having the right team, the right people at the right place with the right resources and the right technology and all of these sort of things.

Aleks: For three years, right?

Andrew: For three years…

Aleks: Or for whatever it takes. It's not that you do, oh, let's do it for a month and see if it works. It will not. [01:11:00]

Andrew: Yeah. Exactly. So, it's really about the commitment and the steadfastness and saying this I believe in this is where we need to go and then choraling the people and getting them to focus on achieving that goal. And it's it's hard to do that. You know, it's not something that's trivial that everyone everywhere in the world has. I I think nowadays there's probably less than 1% because the last statistic I read of laboratories in the world that have fully digital pipelines or even partially digital pipelines.

It's still, you know, it's still in its infancy. And what's the real difference between, let's say, you know, Geneva and somewhere else? It's just pure vision and and and courage to go and and deal with the consequences for that execution to struggle. It for sure as I mentioned three years it it it wasn't I would say easy it's not as easy as doing a research project where maybe I can sit down in isolation and write some code over a few months and say oh this this works let me write a paper that's a lot easier but the scope of impact is much smaller if you want to impact patient care [01:12:00] you have to go and commit to a much broader challenge to get these things in the hands of the users that can actually use them.

Aleks: Thank you so much, thank you so much for writing it for joining me today. I hope we have a wonderful rest of your day.

Andrew: Thank you. I appreciate it. Always a pleasure. I look forward to catching up again.