
Digital Pathology Podcast
Digital Pathology Podcast
122: The Role of Generative vs. Non-Generative AI in Medical Diagnostics
In this episode of the Digital Pathology Podcast, I explore the evolving role of Generative vs. Non-Generative AI in Medical Diagnostics. As AI continues to transform the medical field, understanding the differences between these two approaches is essential for pathologists, researchers, and healthcare professionals.
We break down the key concepts behind generative AI models (like ChatGPT and image-generation tools) and non-generative AI models (such as traditional machine learning for diagnostic support). I also highlight a groundbreaking seven-part AI review series published in Modern Pathology, which serves as a crucial reference for integrating AI into pathology.
🔬 Key Topics Covered:
- [00:00:00] Introduction and Technical Adjustments
- [00:02:00] Why AI Education in Pathology Is More Important Than Ever
- [00:04:00] Overview of the Modern Pathology AI Review Series
- [00:06:00] Generative vs. Non-Generative AI: What’s the Difference?
- [00:08:00] AI in Pathology: Current Applications and Future Potential
- [00:12:00] Addressing Bias and Ethical Concerns in AI Models
- [00:16:00] How AI Can Improve Accuracy in Medical Imaging
- [00:20:00] The Role of Large Language Models (LLMs) in Pathology
- [00:25:00] Multi-Modal AI: The Future of Integrating Imaging and Text Data
- [00:30:00] Real-World Use Cases and AI-Driven Diagnostics
🩺 Why This Episode Matters:
AI is no longer a futuristic concept—it’s here, and it’s shaping the future of digital pathology and medical diagnostics. In this episode, I break down how AI can enhance accuracy, improve workflow efficiency, and make diagnostic insights more accessible. However, AI models also come with risks, such as bias and interpretability challenges, which we need to address responsibly.
🚀 Take Action:
AI in pathology isn’t just a passing trend—it’s a paradigm shift. Whether you're a pathologist, researcher, or lab professional, this episode will give you the knowledge you need to stay ahead in the era of AI-driven diagnostics.
🎧 Listen now and explore the future of AI in pathology!
👉 Watch it here: https://www.youtube.com/live/Mq4Xwxoq_ok?si=o7bA90BlZff9iI_A
#DigitalPathology #AIinHealthcare #PathologyInnovation #GenerativeAI
Become a Digital Pathology Trailblazer get the "Digital Pathology 101" FREE E-book and join us!
[00:00:00] Good morning! Welcome my digital pathology trailblazers. So happy to be here. Trailblazers, we have something super, super special today. Uh, I kind of have been waiting for this for over a year. So what happened? Um, there is a resource about AI that we can all use, and no excuses anymore that there are no resources, that the societies are not stepping up, that our education is, the training is not good enough, because we have the resources.
Learn about the newest digital pathology trends in science and industry, meet the most interesting people in the niche, and gain insights. Sites relevant to your own projects. Here is where pathology meets computer science. You are listening to the Digital Pathology Podcast with your host, Dr. Alexandra Raf.
Let me tell you the story, [00:01:00] how I knew that it's gonna happen. So what am I talking about? I'm talking about the series in modern pathology. Modern Pathology is the journal of the United States and Canadian Academy of Pathology, USCAP, which they have a conference very soon in March and I'm going to be there.
So if you're going to be there, let me know in the chat. I'm going to USCAP. Are you pronouncing it USCAP or USCAP or ASCAP? People have different versions. Anyway, going back to the main topic of our stream today is, um, we have this. seven part artificial intelligence review serious. And so basically US Cops stepped up and invited a lot of people.
And this is what I'm showing you is the introduction. Let me tell you who is responsible for this. The main responsible people. Well, they are not the only [00:02:00] authors, but they worked on it pretty hard. Our Human Rashidi, Matthew Hanna, and Liren Pantanovic, and they are from the Department of Pathology, University of Pittsburgh Medical Center.
And they are working on making it a super strong AI in pathology research and educational center. And, and what happened? How did I know about it last? year already, even the year before. Actually, I knew about it in 2023. I met Human at, um, ACVP, American College of Veterinary Pathologists Conference, um, where I was a speaker and he was a speaker.
We spoke at the same session and he was talking about synthetic data. And there is going to be a paper on synthetic data, but he mentioned that That's what they're doing. That's what Matthew and Liron and a bunch of other experts in the field are doing. They're working on this seven part series. And I'm like, [00:03:00] okay, good, good for you.
You're working on those papers. Fantastic. So we sat at dinner together and we were geeking out on AI and the different image analysis and different things. And he told me about those publications. And obviously he's author of many publications. I'm like, okay, okay. And then I went to the US cap conference and I learned from Patrick Miles, the CEO of path presenter that Raj Singh is also contributing.
And then I learned about other people contributing. So it kind of made this thing real for me, but you never know when those publications will be out, but now they are out. And basically they gave us super timely peer reviewed. I don't want to say blueprint, but definitely a guide. And they call it a guide.
They call it a guide. Well, review series. Anyway, so we have this resource in modern pathology and what are we, what we're going to be doing for the next seven weeks or more, because when I went through the first paper, it may be that it's going to last longer to [00:04:00] discuss it in the live stream. So let's discuss what we've got.
This essential seven part artificial intelligence review series. There were, so this is serious, within Modern Pathology, the responsible organization is USCAP and there are contributions from global experts, which is fantastic. It's not just like me reviewing some more or less random papers coming to my inbox from PubMed.
It's actually people who are doing it, uh, people whose departments, uh, Our digital are embarking on the A. I. Journey. So we have a contribution from everybody who's doing it and, um, probably working out the kinks and developing as they work with it, which is fantastic. And this series is intended for pathologists, clinicians, laboratory professionals, researchers, trainees.
Um, and other health care [00:05:00] providers with essential knowledge about the present use, upcoming trends and future possibilities of the AI in medicine. So basically they're going to tell us what is the state of art right now, including generative AI, including everything that was before the combinations. And obviously this is now I don't want to like say set in stone, but this is like frozen resource.
So the development is going to be ongoing, but. Um, in the next two months, we're going to go through this and we will have a super nice basic understanding. So what we're going to cover here, and I know this is a long intro, but then we're going to jump into the first paper. They're going to cover the fundamental principles, key AI terminologies.
And there's a big, big table of terminologies today that I'm going to leave for the podcast and not for this live stream. Practical applications, but also ethical and regulatory challenges. Uh, complexities of generative AI, non generative machine learning models. [00:06:00] Essential statistics, that's gonna be a fun one, guys.
For this one, for the statistics, I will need to prepare better than just, you know, show up with the paper. Ethics, biases, and regulations. And this is gonna help us navigate the wide range of AI applications within the medical field. There is a dictionary of 200 crucial AI and machine learning terms.
That's the table I'm going to show you later. So they are basically telling us what they're going to cover here. So Article 1 is here. That's going to be today, basic principles, generative. So large language models, chatbots, non generative AI. And basically the, the, the, um. The fundamentals that we need to know to even be able to speak this language if you have any questions In the show notes or in the comments of this live stream I'm gonna pin the links to all the articles.
Some of them [00:07:00] are available online as pre published Proofer, it has a name, I'll tell you the name, but Human sent them to me on LinkedIn and I said, yes, I'm going to review all of them and I'm going to make people aware of them, because this is a fantastic resource for me, for you, and we can create interactive content around it to teach people about AI.
In article two, we're going to talk about generative large language models, image generation models, right? So that's going to be generative AI. In Article 3, the focus is going to be on non generative AI, predictive analytics models. So this is widely used in clinical decision support. This is the things that are making it into the space off medical devices.
These are the algorithms for detection classifications of cells. The non generative is [00:08:00] not disappearing. There is a merger of generative and non generative, and that's what makes this space super exciting.
Okay. So that's article number three and number four serves as broad statistics tutorial. That's going to be interesting. So why do we need to care about statistics? And I want to, I need to like make myself care every time I'm working on a new model. And because if you don't know how to, the statistics, uh, for this particular application, it's not just general statistics, but for an AI model evaluation, um, these, this, this is a little bit of a different statistics then.
We use for other stuff. So we need to and and they are different for generative and non generative AI And so that's what they say. So, you know perplexity this app that I'm sometimes using as my AI [00:09:00] assistant this is actually a name of an a performance metric for a generative AI evaluation Um, and there is different things like bilingual evaluation under study scores and for non generative, uh, machine learning, this is all based on confusion matrix.
So this is the false positives and true positives, false negatives, true negatives. So this is the stuff that we know that is a little bit more popular in the literature. So it's going to be accuracy, sensitivity, receiver operating characteristic area under the curve. And. And others, and they have a full article number four about it.
So maybe we need to schedule two hours for that or split it into two live streams. Okay, so that was four. What do we have? Where do we have our number five? I guess this is six here, five article number five [00:10:00] overview on this regulatory landscape. So obviously a big discussion discussion, not only how to regulate AI, how to regulate generative AI, and there is an FDA guidance on how to use it, which we need to cover in another live stream.
But that's going to be after we finished this series, we're going to be locked into this series for the next two months. If everything goes according to plan or even three months, but it's worth investing into this because after going through this, your AI conversations at conferences in your departments are going to be at a totally different level.
So in article five, we're going to review the regulatory landscape. I'm also interested. And also like when you look at the Authors, we're going to look at the authors of the first one. These are going to be the experts in those fields So if you like need questions need to reach out to people These are the people that are going to be able to help you in whichever capacity they are able to help you beyond [00:11:00] providing those manuscripts, right?
Anything else in article five? Yeah, data privacy, of course, software as a medical device, this is a big thing, and agency approval process. So this is going to be super important for vendors for my digital pathology trailblazers on the vendor side. Reimbursement policies and guidelines for LDTs, laboratory development.
test. So that's going to be an important one as well. Article six is going to explore ethical and bias considerations. Yeah, super important to be aware of that because then we create tools that can propagate the biases, biases in the data, biases. Like, societal biases, it's a big thing, and already given the healthcare disparity in the world, um, we don't want to make it bigger, and these tools, [00:12:00] if no, like, checks and balances are introduced, they can make this gap even bigger.
They can make it a lot smaller, or maybe even eliminate it, um, but it can go either way, if they're not proper controls. in place on different levels, right? So, and they provide a thoughtful examination on how AI can both help and hinder equitable medical care from biases embedded in training data to potential for AI to reinforce existing disparities, right?
So they urge. Caution and how to mitigate ethical risk because a lot of these approaches is going to be basically a risk mitigation. The risk is going to stay there. What are you going to do to avoid and to mitigate this risk? And our article number seven. And it's going to look towards the future of AI.
The future probably is going to be next week or before we finish this series, there's going to be [00:13:00] something new there, but that's okay. I want to, I want to learn about the, maybe not projections, but the thoughts of where is it going to be, how it's going to pan out across the U S across the globe. You know, what are the.
obstacles to adoption. How can we increase the adoption? How can we change the way it's being deployed to be able to deploy it in places where it's not possible to deploy yet because of logistics, because of whatever, right? So yeah, operational needs within pathology and medicine. This final article not only touches on the machine learning operational aspects in this new field, but also speculates on the near term and long term impacts of AI.
So we'll see, uh, maybe some of the speculations are already happening or didn't happen. So that's going to be an interesting one as well. And basically this is designed to be an [00:14:00] indispensable quick reference and guide for anyone involved in pathology, medicine, education, and AI research. So anybody who touches this needs to read the series.
And I am super happy about this because it's experts. Um, Uh, it's X created by experts created by people who are doing it and like commissioned or organized by the United States and Canadian Academy of Pathology. So we have a resource now. This is something we can teach wherever. pathology is taught, right?
So, let's move to article number one, and this is still journal pre proof. Maybe they already have a proof. So, today we're going to be talking about introduction to artificial intelligence and machine learning in pathology and medicine, generative and non generative AI basics. And who are our experts? I'm going to be mentioning them because That's important.
We have [00:15:00] Hooman Rashidi, Joshua Pantanoic, Matthew Hanna, Ahmed Pitafti, Perth Sangani, Adam Buczynski, Brandon Fennell, Mustafa Dibadja, Sarah Wheeler, Thomas Pierce, Ibrahim Abu Kiran, Scott Robertson, Octavia Palmer, Mert Gerdt, Nam K Tran, and Liron Pantanoic. I know we not always like mention all the authors. I do want to just acknowledge them that this is based on their work.
So thank you so much for doing that for us. And here we're going to be talking about the fundamental terminology in AI. ML is machine learning, AIML. And there is an extensive dictionary. This dictionary is in the table at the end of this article, which you will be able to download. And this provides a broad overview of the main domains of AI and ML field, encompassing both generative and non generative, which is traditional.
So generative generates non [00:16:00] generative. doesn't really generate, it reproduces, maybe. They have better definitions.
So, let's start at the beginning. In the 50s, in the 1950s, Alan Turing, this is an important name, Alan Turing, planted the seeds of this revolution, AI revolution. He developed so called Turing test. And that was a standard test to check if machine has a capacity for intelligent behavior comparable to a person.
How they, how, how exactly they tested, I don't know, but the Turing test is something that comes up and they, they test basically for. AI for artificial intelligence, is this AI or not? Uh, and then in 57 we had neural networks, artificial neural networks, a and Ns. And in the seventies there was the introduction of expert systems.
Mm. And the next big thing was the creation of deep [00:17:00] neural networks, deep learning methods that have further propelled AI and ML into the forefront of medical research. So deep learning is. And the convolutional neural networks are used for imaging. Let me just go through the paper, let me not, not freestyle too much, because they have it explained very nicely.
And because of that, the AI and ML algorithms can now assess large quantities of data, image, text, and tabular data. This is going to be something important, pay attention to. the different types of data. Um, and this has already been demonstrated in radiology for a long time, but also in pathology that the AI can help with tasks such as identifying breast cancer within mammograms and detect prostate adenocarcinoma in whole slide imaging.
And we know who made this as a medical device, the only FDA cleared [00:18:00] system for a AI. computer aided diagnostics and pathology. But there are other different things that are happening right now, like breakthrough designations happening for other software. So. There are going to be tables and I have this like which table you need to look at and but we need to start with the definition.
I remember in 2019, I went to a conference, European Society of Toxicology Pathology, and that was when the ToxPath world was maybe not starting. Yeah, starting was already like embarking in this AI journey and that's what I started my presentation with. And you can see this in my book as well, but they have a super cool table with a lot more terms that we're going to look into basic concepts and terminology of AI and machine learning.
So let's start with data types. Just data types, uh, three main data [00:19:00] types in medicine. We have images, including video. So video is important video. So one time we always in pathology, we always whine like, Oh, how big the whole slide images are, and they're so big. And how can we logistically solve this? And then I heard that surgeons do like.
hour longs or several hour long videos of surgeries that they also store that are comparable or exceed the size of whole slide images. So I'm like, okay, we're not alone, but still we're special because whole slide images are just special, right? Anyway, the image includes video and there are algorithms for video as well.
Then we have text including audio And numerical data and and there's a figure for this. We're gonna go through that figure as well and those different types of data the image text and numerical data, they [00:20:00] require certain preparation activities to be processed. So different for images. So images are going to be a computer vision.
This branch of computer science is going to be used for images. Then for a text, it's going to be, and what are we going to do in computer vision? We're going to classify images, for example, distinguish between cancer and non cancer. We can segment images, for example, segment nuclei, segment cell nuclei. or whatever we want to segment, right?
Segment is basically delineate, delineate some stuff in the tissue, like maybe tumor mass versus non tumor mass, and that's going to be computer vision. Then text is going to be natural language processing. So everything with text, this is also natural language processing, is the name, um, of a branch of computer science, right?
So we have computer vision, we have natural language processing, um, and here are our friends, large language models, chat [00:21:00] GPT and the family. And there's a big family llama. I need to try this llama cause it's cool name. Anyway, llama by Meta and our GPT. GPT stands for generative pre trained transformer. So of course it's an abbreviation.
And then finally we have that tabular numerical data. So, this is the, like, all the numbers that we can get from patients, and they can include continuous and discrete values. And this is very common data type employed in electronic health records, and can be used for various predictive modeling tasks.
And these can include, but are not limited to, predicting disease states, patient responses to therapy, and patient outcomes. So yeah. Images. Text. numbers. And you know that the power lies in combining all of them, right? We're going to be talking about that. So regardless of the data type, [00:22:00] irrespective of the data type, it is crucial to understand different categories of machine learning and AI.
So we have supervised, unsupervised, and reinforcement learning methods, and they can apply to all data types and all branches, like, All data types and both the supervised and unsupervised AI. So they're, you know, different. divisions of how you can talk about AI. But I like the division by data type and generative versus non generative.
They thought it through. If we adopt this division, then we all are on the same page talking about that stuff. So many of the generative AI tools that people now, I love this, are enamored with. Yes, I'm a little enamored. I'm a little enamored with all those new technologies. And then I start using them, figure out that they have limitations like all the other technologies.
[00:23:00] And then I become less enamored and more like, okay, how can I plug it in into my workflow? How can I use this responsibly? But it's like across the board when Deep learn, no, random forest. First I learned about random forest for image analysis and it worked fantastic in one image and then I'm like, yes, it's the new best thing.
And then it wasn't. And then the same for deep learning. It worked fantastic. I'm like, let's. Get rid of the thresholding and then you know, it's not there is no one tool fits all i'm searching for this tool But it's non existent Anyway, and people are enamored with chat gpt Or if people are not I was at the beginning.
I still think it's a super powerful technology And it The, the chat GPT is constructed through a pre training step. So this is the unsupervised or self supervised thing. Then there is a supervised and reinforcement learning approach that make it even [00:24:00] better. And for example, something called reinforcement learning with human feedback, of course it has an acronym RLHF reinforcement learning with human feedback.
So let's talk about what is supervised, unsupervised, and this reinforcement. Supervised. Supervised is what we know from, um, image analysis project. It's the ground truth approach. It's the pathologist giving labels. Basically you, if you hear the words label, uh, if you hear ground truth, gold standard, all this is pertains to supervised learning because.
We supervise it, an expert supervises it. Sometimes it's not expert, sometimes it's like an additional method. But the supervised learning then can be divided into other subcategories.
[00:25:00] Classification and regression tasks, and we have a table for that as well. And classification is predicting a categorical label of tasks. or class. So for example, malignant, benign, uh, cancer, non cancer, one type of tissue versus other type of tissue. Basically, you classify something and this can be classifying whole site images as tumor or non tumor and regression tasks involve predicting a continuous value or range of values.
For example, concentration of a biomarker in blood or likelihood of disease progression over time. Uh, and
These supervised applications rely on human experts. So this is where we're going to be annotating the data with correct diagnosis or classification. So we can leverage annotated data. For example, sorry guys, I'm trying to click again between the windows. Leverage [00:26:00] already labeled data. So for example, image in the report.
What else? Image and some genomic data on this. So basically we already have pairs of labels, or we can leverage the human expert, which is not scalable, but necessary, which is going to be annotating, having an pathologist annotate features of tumor, having a pathologist annotate where the tumor is versus where the tumor is not on the slide, and then feeding this into the system.
That is supervised. We have labels, we have ground truth. And unsupervised, obviously, is we don't have all these things, right? So, um, this is unlabeled data sets, no ground truth. And some of, some of the most common ways of doing this type of AI is clustering, dimensionality reduction, anomaly detection, and also the unsupervised part of pre training method.
Also known as self supervised, that is [00:27:00] something that we're going to be talking about when it comes to chat GPT, large language models, there is a table for this. And it enables the, this enables the machines to identify patterns. So basically like cluster or group data points. that are similar. For example, clustering algorithms can be used to group patients based on their gene expression profiles, right?
Like, not feasible for a human to cluster, to group this type of data, right? And AI can do this for us. Then also the dimensionality reduction techniques like principal component analysis or this is super cool because I knew the acronym TISNE but T distributed stochastic neighbor and this is the TISNE plots if you know them you like group group points stochastic multidimensional data points on a two [00:28:00] dimensional plot and you see, oh, there is a group of dots that have, that are together here and then there.
Basically, you make them 2D, make like three, make multi dimensional data visual, you visualize it as less dimensions. Yeah, so this can be applied to high dimensional data sets from, for example, from mass spectrometry, proteomics. and this can help us visualize and identify patterns in protein expression.
And also this can facilitate the discovery of outliers. So, you know, you, you have a, a group of data points together and there is something far away, you, the, the self supervised or the unsupervised learning can help with that. And then we have reinforcement learning, and there is a figure for that as well.
And so this is where, where, Something, us or something else, it can be AI as well, [00:29:00] gives feedback to the system that we are developing so that it can get better. So this is the ability to make decisions through iterative interactions with its environment. This autonomous agent acquires the ability to make decisions.
So a program, an AI system, gets feedback. And then gets better. It receives rewards or penalties as feedback for its actions. And then, for example, in the context of large language models and chatbots, what we can utilize is reinforcement learning from human feedback, this RLHF, and also reinforcement learning from AI feedback.
I prefer full words than acronyms, but R L A F or laugh. I don't know. Anyway, so you can optimize the model you like. [00:30:00] Design it and then give it feedback and with feedback it gets better. Yeah, so For this reinforcement learning, the human version, the human feedback, involves human assessors offering feedback on the bot's responses.
So if you are using ChatGPT, there is an option, I think they have, like it's embedded in your chat that, Oh, was it good? Was it bad? What would you want to change? And that's also feedback. That's like part of this reinforcement. Way of making improving this I'm not talking about this. Oh, can we use your data to improve the model?
That's a separate thing and you can actually switch it off in chat gpt right now You can and in cloud and all the other models you can tell them no don't use my data but the the Feedback whether you you had a good response or bad response from the bot helps them improve the model So [00:31:00] let's talk about generative AI versus traditional machine learning platforms.
Overview of generative AI models. So generative, as the word says, generates. It creates something. It can be, it creates something that was not there before. So it can create text, it can create images. Now it can create, oh, music. Let me show you. I have this music so I played it at the beginning, but basically I have AI generated music and next week is valentine's so we can play valentine's music in the background.
It's AI generated generative AI in my streaming software Yeah, it can also generate so synthetic data can be generated with it but the the the most famous application where it even well, it didn't start there, but it became famous as a mainstream AI application, [00:32:00] um, was of course, our friend Chad GPD, uh, and, uh, their large language models that were transformer based neural networks that can produce language outputs resembling those of humans.
And they fall under the general umbrella of foundation models and foundation models, basically this term means, um, it's a model. trained on, oh, sorry, a lot of data. It doesn't mean like any specific way of training model, foundation model. It has huge amounts of data and because it was trained on so much data, it can do a lot of things.
So basically like talk to you. Amazing, right? Um, so this transformer architecture became famous in 2017. There was a paper called Attention is all you need parallelization and self attention mechanisms. So This is a machine. This is an AI architecture the [00:33:00] Like how it looks, it's not that relevant to us.
You know, you can Google it, but it's the name transformer is important. And because it propelled the AI capabilities to the next level. So the same happened with convolutional neural networks. It was a type of architecture that suddenly started outperforming the old ways of doing things. So that. And now this transformer architecture is also used for images and different things but the our popular models We know chat gpt from openai claude from anthropic.
I use both of them gemini from alphabet, which is google parent company I use I use this one less I have it, but I didn't leverage it yet. I don't know this mistral I think mistral is maybe open source and llama from meta. I don't use I know it. I know of it, but I don't use so another, like another, I don't know, architecture or a way of putting those AI frameworks [00:34:00] together, because this was text, the GPT models was text based foundation models, but we can also.
use non text data types, for example, image and tabular, like we said, and something that is worth knowing about are generative adversarial networks. It's actually a combination of two networks, or more maybe, that can generate images. Also something called variational auto encoders and diffusion models.
And, and, The introduction of multi model models such as GPT 4. 0, multi model is going to be an important word. We're going to talk about it. Multi model is the power for medicine. And so multi model can handle text and data simultaneously and basically such a model will be able to ingest a diagnostic text query and a pathology image or radiology image or whatever [00:35:00] image and then and like all the other data and subsequently create a full pathology diagnostic report based on text combination and text query input.
and the context of the pathology image of interest. That's like, we're not there yet, but the potential for this to happen is there. When you like, put it all together, you have a lot more context. It's like, like doctors, like medical professionals interpreting different data points together. So far, the most Well, it's still, it's still going to be the most popular and it's going to be happening in parallel, like the narrow AI applications.
And this multi model is not just stringing narrow AI applications together, but making an application that can take this data, like combine it, not just have them separate, but combine it like a human reader, human evaluator would, right? This is powerful. And [00:36:00] comparison of generative AI to traditional machine learning platforms.
So they share similarities, but of course they have differences. That's if they were the same, they wouldn't be called differently. And so example of traditional non generative AI is going to be classification models. And they are created to predict outcome and they use available data. So we feed the data, there is data given to it.
And, and they use confusion, matrix based performance metrics. So these are the metrics that we know from all the image analysis papers. So we talk about accuracy, sensitivity, specificity, and ROC, AUC, and generative AI. such as large language models, they produce novel and artificial data, for example, text.
So they produce something that was not there before. And they do not possess the same well defined criteria for evaluation, [00:37:00] since it usually involves the generation of paragraph outputs instead of making a simple classification or a call. If you have a classification, it's either correct or incorrect, right?
If you have text, you have to read through it to say, okay, does it make sense or not? Is it in the style I want it to be? So, so there are different metrics for that. And also that traditional non generative AI are ways of doing things. Models are typically dependent on their labeled training data, right?
Generative AI models are less constrained and are able to outputs without explicit guidance. So once And that's foundation models, because they were trained on so much data, they see patterns and can generate according to the observed patterns. Whereas the classical non generative AI, it predicts something from the data that it was fed, and it's usually narrow or smaller data sets, right?[00:38:00]
Generative kind of predicts, but can predict it so well that it can create. So, both AI model types, they have issues, key issues, of course, are bias and ethical constraints, regardless whether it's generative on non generative, traditional non generative I love to say traditional, non generative.
Traditional, they perpetuate biases found in the original labeled training data and generative AI generate biases outputs that can be something sometimes challenging to identify and mitigate. It also comes from data. Uh, but not just like, so for the traditional, a bias would be an example of such bias.
Let me see if I have, they already described it. They have a, they have a figure in the table for this, of course. [00:39:00] Uh, but in the traditional, way. Bias would be, okay, there are images coming from one medical center, and it can only predict things, classify things on images from that data center. When you take it somewhere else, excuse me, a lot of talk.
When you take it, when you try to deploy the model on a different data set, it's not going to perform. And, and you know, if you're aware that it's very narrow application to a certain data set appearance, then okay, you're not going to be deploying it on a different data set. set. But if you are not aware of it, you're going to get results and you're going to think those results are comparable to the results you get on the data set that is similar to the training data set, but they will not be.
So basically you're going to make false predictions and perpetuating bias that can later lead to false diagnosis, [00:40:00] false, whatever, right? Wrong, wrong prediction, wrong outcome of the model. But it's a narrow data set, right? If you have a huge data set, like for the language models, it's going to be recreating biases that it was, I don't want to say programmed, but basically, you know, societies are biased.
There is like preconceptions, whatever, people are biased, right? And it is represented in the text, in like the whole internet, it's gonna reproduce what's currently in there, what it was trained on, and it has the, there is the danger of reproducing some societal biases, right? So, we need to mitigate. It's important to know that this can be happening and this can be happening at scale.
It's not that, Oh, you're going to make a person aware to like always check for some biases. Uh, you're, you're going to make [00:41:00] the health professionals aware. This is going to be a tool that, um, can be deployed at scale. Uh, and it's not going to be just like, everybody's responsibility. It's going to be, there's going to be everybody's responsibility, but not like single persons.
We have to discuss the ethics and the way of deploying it. So. Let's move on to AI Machine Learning Toolkits, Libraries, and Algorithms, and Neural Networks. So, data analysis tools and libraries in generative AI and non generative AI. So, researchers use data analysis tools and libraries. to extract insights from complex data.
And for generative AI, there are frameworks that include OpenAI's GPT 4, Google's Gemini, DALI, Stable Diffusion, Longchain, Yama or Lama Index. autogen crew AI. I know some of them, some of them I'm just hearing about them for the first [00:42:00] time, but because they're in this paper, they're important enough, so I'm going to learn them.
And they can be leveraged to generate synthetic images. Augment datasets. So this is something important. Augment datasets. If we want to do data augmentation is basically making more data out of fewer data. So an example would be if you train an algorithm, and we need more data for the narrow AI, the classical AI, right?
If you train an algorithm to detect mitotic figures, let's say this is, this is like the chromosomes mitotic figure, right? Um, and you have an image of this mitotic figure, my hand is mitotic figure, you, human observer, if you turn this image this way or this way, you will still recognize that this is the same mitotic figure.
Computer will treat it as an additional data point, so you can like turn it however you want, you can even like make [00:43:00] it stretch it, make it narrow, change the colors, and basically artificially create more data points for an algorithm to be able to detect it later. Anyway, so is this cheating? Moving this mitotic figure?
No, it's the same mitotic figure. You can even stretch it, like, is there a potential for, for this being, like, A little bit different. Yes. So it's good to, to augment data sets. And then we can use these framework also to develop novel customized chatbots and multi model models along with multi agent frameworks.
So that's going to be something we're going to be talking about as well. Multi agent is basically like having an army of AI agents, AI workers that put different frameworks together. And these, these were the generative AI libraries. Traditional non generative AI studies in medicine rely [00:44:00] on more established libraries and auto ML tools like Psyche Learn, PyTorch, tansor, flow, Mylo and STNG to develop predictive models for disease, diagnosis, prognosis, and treatment responses.
I, I've heard about these. You probably have. As well, and they also use validated synthetic tabular data sets, and this is being used. This enables those tools to be used as clinical decision support workflows, and you can combine the strengths and it's always you can combine the strengths with both of both generative and non generative AI, and that's gonna revolutionize the medical field.
But my thought was you combine the tools. It's not that, Oh, we have a better tool and now the old tools are not in use anymore. No, we have a bigger toolkit. [00:45:00] We can be more precise. We can like, we have now like, like a carpenter, right? It's not just one hammer. I as much as I dream of like one tool for everything, it's not going to happen.
And we have. a lot of tools, which is good. We need to know what they're good for, when to use them, how to use them responsibly, right? To have a good clinical decision support tool. And what about algorithms and neural networks in generative AI and non generative AI? At the heart, Of many AI enabled tools are those algorithms, sophisticated algorithms, and neural networks.
This is an important term. It's, it's an architecture as well, AI architecture, that has powered, propelled the, the AI development to the next level. So for generative AI, we have LLMs for text generation. And we talked which these are, Gemini, Lama, Mistral. They take advantage of the transformer based neural networks, [00:46:00] while synthetic image generators, stable diffusion, use diffusion based neural networks.
So we mentioned transformer, here is fusion, but these are still types of neural networks. So these neural networks that were invented in, I think, 57, and then in the seventies were the deep learning convolutional. neural networks, they are still at the, they are still a building block of whatever we're inventing or whatever the scientists, the AI researchers are inventing right now.
Also, we mentioned generative adversarial networks. They can generate synthetic images and the Traditional non generative AI studies may also rely on more traditional algorithms. So not only deep learning, but the classical stuff like thresholding. Although deep learning options are the gold standard, they can You for image and text analysis, so deep [00:47:00] learning is the powerful building block and But but and we have non generative AI convolutional neural networks, for example, the ResNet family We use these and in the if you join the DigiPath digest the normal ones where I review PubMed PubMed publications, there is a lot of them that still uses ResNet or like different types of Deep different types of CNNs, convolutional neural networks.
So that's for cancer classification versus no cancer. And it like seems to be a simple task, but when you deploy it at scale, it does help the pathologist, right? A pathologist sees it, which is cancer, which is non cancer. unless it's, you know, a questionable or, or, or a difficult case that needs a consultation.
But usually you see that you're trained for that. But if you need to do it at scale, if these are small fauci, these things still help. For tabular data, non generative AI tasks, for example, [00:48:00] classification of diabetes versus no diabetes. we can still employ non neural network algorithms, which could include random forest, logistic regression, SVM, and KNN key and nearest neighbor models.
So this is like pre deep learning. It is still in use. We can still use it. Now, let's talk about open source versus closed source platforms. They have advantages and disadvantages. So we have open source AI frameworks like PyTorch, open source models such as ResNet 50, among others. Of course, there are a lot more.
And They are open source and showed their capabilities and true worth, right? But with generative AI, we're thinking, okay, do we want it to be open source? Or do we even have the option for this to be open source? Because many are proprietary developed by companies, for example, [00:49:00] GPT. GPT, CLAW, this is proprietary.
They are a super revolutionally forced, and I'm literally using the words from the papers, but there are certain open source LLM models that also provide their value and capabilities for, for example, Lama and Mistral, so Mistral family of models. And when it's open source, then. you give access to more people, right?
So the advantages are obviously they're cost effective and flexible, and they have this community driven innovation. So I don't know if you know how those open source, if you're in the computer science space, obviously, you know, open source, you have. Like a community of people who are interested in developing something and they go on github or wherever like smiley face I don't know the names of those platforms, but basically they Develop it as a community and they then can develop like different versions of it [00:50:00] They they fork it so called I don't know that they make a fork.
So like a group develops it in a different direction and there are different versions. So you are not restricted. You're super flexible, but then it's less controlled, less controlled environment. And obviously the choice between, and it's for image analysis space and for any, any type of tools, especially in the regulated environment, you're, you're going to be faced with the question.
What is the cost of using something that you have support for, which would be the proprietary thing where you actually collaborate with a company that has a legal obligation to maintain it in a certain way for you? Or do you want the flexibility? Do you? Want this to be available for free to you, and then you do whatever you want with it.
But there is no official support. Uh, it can change and you know, you're more responsible over this. So it's gonna be [00:51:00] weighing the benefits of cost, collaboration, security, and customization, whether it's gonna be open source or not. Where do we even apply AI in pathology and medicine? Image analysis is my favorite.
Maybe it's not my favorite anymore, but this is where I started, where I was a pathologist for an image analysis company, and I was working with image analysis scientists, and that was so much fun. It still is so much fun for me. Um, so there, this is a big application, right? And we have This has happened in radiology.
So we can, there was, there's potential shown for whole site images and radiology image analysis, classification projects. So what we do here, we detect and classify different things. We detect cells, cancer, classify it as certain types of cells, certain types of tissue, and we can also work on differential blood counts, anything that, you can do on an image, right?
[00:52:00] Automate urine microscopic analysis, screen liquid based cytology pap tests, and also do the quantitative image analysis of biomarkers, like all the IHC biomarkers that pathologists are asked to visually guesstimate what's the percentage of cells, we can actually, you don't need to guesstimate, you can calculate it, right?
Um, and deep learning models excel in recognizing and quantifying complex patterns within histopathological images, right? And this level of precision is difficult to achieve with classical image analysis. So the, the classical way was giving thresholds, giving parameters, and the Deep learning based is you give examples and it learns from examples and it's a lot more powerful to teach with examples because there is no way you can define every particular way a cell or a structure can look.
Especially to the computer, because [00:53:00] the computer needs like more instruction. So you give instruction by showing examples, different examples. Then we have tabular data. This is what I learned about the first time I met Human Rashidi at the ACVP. He was talking about synthetic tabular data. Here, AI applications can help eliminate errors.
So, for example, a common source of results errors from blood tests is contamination of blood specimen from intravenous fluids being administered to patients or mislabeled specimens where the blood is drawn from a different patient than listed on the tube's label. And this you can use an unsupervised nonlinear nearest neighbor based approach to eliminate those, to eliminate those errors, right?
And, uh, it has shown promise in correctly identifying contamination that is missed by current non machine learning protocols. Um, and identification of wrong [00:54:00] blood in tubers was assessed using multiple analytes and both logistic regression models as regression and SVM models and it significantly outperformed traditional detection methods.
So if we can employ a tool that is, you know, not prone to whatever a human being is, um, affected with, like time of day, amount of work that they did before. If you can, we can have a consistent tool that screens for this types of errors. Fantastic. It's just going to alert you. Hey. double check, and then you double check, and you can prevent a fatal error, right?
And another application is that there is an option for a point of care AI ML for use in combat casualty care, whereby acute Kidney injury can be predicted from a patient's tabular laboratory data findings when using a novel handheld device measuring blood neutrophil [00:55:00] gelatinase associated lipocolline levels.
That's a lot of, that's a mouthful, but basically it's It's an art tool that can screen for patterns that you visually cannot assess or like you don't have time when you're attending to patients. But you have this device and, and there is a reference, reference number 21. This is a handheld device that measures blood neutrophil gelatinase associated lipocalin levels.
So these levels are going to tell you, okay, is there kidney injury or not? And another type of data analysis application, text data analysis, NLP, natural language processing. And they, in the past, they were based on human logic. And And the problem with words, it's, it's a problem and a blessing because if you're just using logic then there's so many combinations because you, you can have ambiguity in words since they, uh, are strung into [00:56:00] sentences and may have meanings depending on the context, right?
They have different meanings and there's, it's not possible to, um, feed all those contexts. Like you will not think of all those contexts to, uh, classically train an NLP model, but it's not an unlimited number of contexts. So if you give it enough examples with like all the internet book data, whatever data was used and, um, you know, where this data came from is a topic for a different discussion, but basically a lot of data you will.
the computer will be able to see all those patterns without you having to specifically mention them. Uh, so then you account for this, uh, ambiguity. There are also cross cultural challenges with text. And generative AI models are revolutionizing text data analysis in healthcare. So they can generate contextually [00:57:00] appropriate human like text, summarize complex medical reports, create synthetic clinical notes for training and testing purposes.
So this is, like, it has been a game changer for me since I was able to start dictating things like dictating loose thoughts, um, have a transcribed like for different applications for like blog posts, everything. And obviously this can be applied to medical data as well. But basically like you can extract stuff.
out of your head in the form you want to extract it as long as it's It has formed words I bet it can also analyze some sounds now as well But anyway, as long as you can say it, it doesn't have to be polished. It doesn't have to be in a certain format You extract the stuff from your head in words and AI helps you structure it.
And in the medical context, it's, it's like, um, I [00:58:00] don't know how much it's worth. It's, it's a goldmine because this is where the time. This is a huge time thing. This is a source of error where it's like not in the right way, like structured in the right way, not in the right place in the template, right?
Because if you, if you rely on logic, you rely on people always. Being correct when generating certain ways of text. That's not the case here You can just like have all the unstructured data that that was a big problem with pathology reports They're like there is a this initiative of structured reporting and some people some institutions use it and you know There are templates but not everybody uses it most of the pathology reports and definitely in veterinary medicine.
It's just free text And what are you going to do with free text? Well, now you can do a lot because we have the large language models that were the foundation, large language models that were trained on all those patterns, language patterns. And you [00:59:00] can basically query it. You can ask it, you can ask scientific literature, their apps to.
Like, if you don't want to read the paper, just want to ask questions about the paper, they are powered by retrieval augmented generation. I bet they're going to be talking about that. But basically, like, you have a piece of text and you can ask questions about this text, and your AI is just going to go into that text and find that information.
And it's going to show you even which line and all these things, right? So, uh, the generative models can be trained to extract relevant information from pathology reports, um, such as cancer diagnosis, treatment plans, generate concise summaries for clinicians, and for patients. Like, people, uh, don't understand what's in the pathology report.
They have to Google it. Now they don't need to Google it. They can have it summarized in an understandable language, understandable for, for their level of expertise, their level of medical knowledge, right? [01:00:00] And here, the holy grail that is already happening, multi modal learning. And modality refers to single type of input or output data.
For example, like we said, text, image, video, audio, whatever, right? Different forms of data. And this, the development of those multi modal approaches will undoubtedly further shape the future of AI. And that's a beautiful sentence within medicine. But the important thing is that multi model differs from merely combining different unimodal models that were all trained separately.
Because They like don't talk to each other, just text with just image with just, I don't know, audio or whatever. They are optimized for their data input. Instead, the multi model approach simultaneously combines different types of input data to generate content, as well as improve performance and accuracy.
And multi modal models can [01:01:00] capture a more comprehensive understanding of complex diseases. This is still work in progress, but it has tremendous potential. But, as always, with all tools that are scalable, we need to think about diversity, inclusion, and bias, which, again, is an interesting topic right now in the U.
S. But, regardless of the common, understanding of these topics. In healthcare, the systems have shown that to have intrinsic bias that can be problematic, especially when applied to healthcare, right? Bias can be introduced continuous, sorry, consciously or unconsciously into the AI life cycle. Consciously or unconsciously, we can introduce it in design development and deployment stages.
So in different stages, it can be introduced. Uh, and it is imperative to actively actively assess the performance of AI models [01:02:00] for bias. And because if we want to rectify it and have achieved better health equity, Then we need to be aware of that and we not only like be aware of that we need to develop framework to check for that.
We need to develop some kind of metrics and there is going to be a paper on this in this series. So, so, This field, like this discipline, needs to have a clear understanding of known health disparities, biases, stereotypes, and discriminatory practices, and their current impact is imperative and can provide key insights for designing AI models.
And I want to emphasize this clear understanding, like, those people who are going to be dealing with it, like, everybody's responsible, but those people, like, need to know more about this, about, and there is an unlimited number of biases [01:03:00] because we are biased by our experience, by our upbringing, by, like, you know, it's a whole societal discussion that we're not going to have right here.
But basically, the fact is, people are biased creatures. We want the tools to be unbiased, so we need to look for that. And there's gonna be there. There need to be standardized guidelines for bias detection assessment and mitigation should remain a priority. Yes, and there were already publications on this and with without careful consideration of the data.
of training datasets diversity, AI systems risk perpetuating and amplifying existing societal biases related to gender, race, ethnicity, sexual orientation, disability status, and potentially irreversible propagating biases, healthcare delivery. And, and these are just examples of different biases. [01:04:00] And also the And it's another discussion.
I know I'm like going in circles a little bit and we are almost at the end of the session. And you're still with me. I think you're real digital pathology trailblazers. This discussion is, is again, the discussion that I kind of mentioned, Oh, I would love to have one tool for everything, but Exactly because of that, there may not be one tool for everything.
There may be a tool, an AI tool for a specific population. And the trick, or the art, the science, art and everything, like the important thing is to know that there needs to be a tool like that. And not to blanket use a tool on all the patients in the world that was actually only developed on one patient population.
And it's. a drawback, like the limited generalizability of AI tools is a drawback, [01:05:00] but maybe it's inevitable. Maybe it's a strength to know that this particular tool is not generalizable and not wanna generalize it, but develop different ones. It of course creates more work and, you know, a whole another discussion, but it's something to think about, like how generalizable can it be?
Can we define how much it generalizes so that we don't feed in, uh, stuff from outside this domain when it generalizes and then say, Oh no, for this patient population or for this region or for that thing, we need to use a different tool or we cannot use a tool at all. We need to ask a pathologist and go back to bench or go back to, to the classical way of doing things before those tools.
Um, and I think. As we develop new tools, as they become more sophisticated, as we provide more and more personalized medicine approaches, one of the options is going to be not use the tool. [01:06:00] Oh, but like then you're depriving a person from the tool. No, if you know that this tool is not going to work well for this type of treatment, a diagnosis, whatever, then it's actually beneficial not to use it because you're not creating any false results, any like confusion you just do the best you can and the tool would not be the best you can do.
Yep, that's my rambling on generalizability of tools. Anyway, also explainability in machine learning and AI methods. This is a whole new, well not new, it has appeared in parallel with the black box, black box concept that, oh, what is this AI even doing? Are we arguing with results or not? Do we care how those results were generated or not?
Like, let, let's explain it. I love this acronym. X AI. It's explainable AI. And this provides insight into how decisions are made by [01:07:00] such tools, helping in identifying and mitigating biases. It will also enable users to better understand the results. And there are different techniques to achieve that. Again, it's not like one explainability tool where you put your model through and it tells you, Oh, this model has weights and biases like that and makes this decision because of that.
No, it's. A toolbox of different tools. Some are Still like going back to biology and confirming the AI predictions with another gonal, so called orthogonal. So like something that can prove, you can prove the same thing with a different method. Like for example, an exam, for example, an example, anyway, an example would be, okay, you can check for metastasis in lymph nodes.
You can use a human observer and let's assume they are perfect for this particular example, or, and, and they are, delineating the epithelial cells because they know how, how they look. And you can [01:08:00] use also IHC that stains epithelial cells. And you have two methods showing hopefully the same thing. And then you check the AI against those two methods.
Yeah. One, one way of. One way of actually checking if it's correct, not really explaining it, but that that was my explanation of an orthogonal method, different ways of doing it very important for the regulators and in general, because we're using automated tool to make decisions about people and we need to explain why those decisions are made and techniques such as future importance analysis model agnostic methods like local interpretable model agnostic explanations.
L I M E, attention mechanisms in neural networks. There are some, these are some ways, there are other ways computer scientists that do X A I do this all day, all week long and twice on Sunday. And the goal is to reduce the black box elements and [01:09:00] enhance trust, right? Facilitate the identification of unintended biases.
So, my digital pathology trailblazers, you actually stay till the end. Conclusion and future directions. The AI applied to healthcare has the potential to revolutionize medical diagnostic research and education. And, you know, the revolutionize is a word used by Chad GPT very often, but here it actually fits very well.
Because it has this potential, it has this potential, it also has a lot of challenges like it's often that the technology is somewhere there, but it cannot be applied because of all the other things that prevent it from actually reaching the place where it can revolutionize something. But the potential is there, there are researchers actively working on it, and let's acknowledge here as well.
I'm so happy about this series. This is part of educational effort of [01:10:00] CPACE, Computational Pathology and AI Center of Excellence, University of Pittsburgh. This is where Matthew Liron and Hooman are working at the moment. And look, they did use generative AI, which is perfectly fine. This is what it's designed for, and it can Like speed up the process of writing seven publications like I don't know how much normal Seven publication takes but a long time and they managed in I don't know two years maybe And they used dally they used chat gpt zero adobe express online, and I don't know what else they Used but they did use generative AI and they disclose it and that's okay.
And when you go to pass the references, maybe next time I'm going to structure it differently and go to, to these, these resources. So what I'm going to do is I'm basically going to not now, don't worry. Now we're almost done. Table one, the dictionary of AI and [01:11:00] ML commonly used terminology. To be able to speak this language is like any other foreign language, you need to learn the words and you need to learn what those words mean.
What I'm going to do, I'm going to read this for you or like have some like speech generator read it for you in the audio form. So, In case you don't have like an hour or two to go through all those 200 terms, I will have it ready for you in audio form because I think it's important and if you need to listen to it twice or, you know, whatever.
Some people, if you want to study it, and I think it's worth studying it, I will probably learn. Uh, some terms that I'm not using on a regular basis. There are images. Let's see if I can do it myself or if I will use a tool to do it, or maybe a combination of myself and tool where a commentary is needed.
But there is a huge [01:12:00] table, table one, that has all the terms that you need for AI literacy, basically for AI literacy, and that's the goal of these publications to make us, make pathology professionals, healthcare professionals, AI literate. Then all the concepts that we talked about are in those tables, table two, three, and four.
We have table two for supervised machine learning categories and the definition of this. Table three, the common unsupervised methods and what the key features are. And then we have the comparison and the table. So, you know, if you don't want to read the full table, you go to these, sorry, the full paper, you go to these tables and you just use them as cliff notes, which is already super useful.
We have the comparison of traditional. and generative AI. So non generative and generative. We have the history of AI and ML. And I heard also some people are being asked to [01:13:00] prepare, um, presentations on AI. Cite these papers. This is like a fantastic resource, you know, um, they are made available. Um, Modern Pathology, they have super cool graphics, and I love them.
And here is our journey through this seven part series. We are here, uh, Trailblazers. We still have one, two, three, four, five, six. Papers to go. If you have been here, you already know you'll get access to this. Give me the weekend to put together the recording, the podcast. So I'm gonna tell you what the surprise is.
So together with my team, we're actually working on a shop. On a shop with different things. I didn't wear my earrings today, where, where the pathology earrings are. I always have them here. Anyway, pathology hearings, the book is going to be there, different [01:14:00] courses that they kind of hid because they were, I was reworking them and, and what I'm gonna do, so there's an AI course that I prepared on my own before this series came out, which is actually based on the resources, that I have on YouTube, but it's curated, it's like in a logical order.
This series edited without all my rambling. Let's call it rambling or like excessive explanations and Our live stream interaction this series is going to be edited and it's going to be there. So, um Whenever you watch it whether it's at the time of making this or you know Two years in the future and you want to learn it is going to be there and uh, I hope to be able to show it to you next Friday.
If not next Friday, then whenever I will be able to show it to you. [01:15:00] So so much for staying. You are a true digital pathology trailblazers. If you have been here for one hour and a half, I think that's the longest live stream. ever If you're watching this as a recording, let me know in the comments that you were here And let me know what you want me to do differently next time and let me know what you liked if you want to go through the Tables and biggers rather than through the text.
Let me know as well. I can adjust it. This was the first intro, intro session of the series. I want it to be useful for you. I want it to be useful, not only regarding the content, which I already know is useful and like more people worked on this. So yeah, give me feedback on what do you like, what do you not like?
And I'll adjust whatever I can do to make it easier. That does not require. More [01:16:00] resources that I have available. I will do it because this is for you. It's gonna be on youtube for free And oh, I want to thank someone. I don't know who that was. I have to check next time but on youtube There is a way to Support a channel like you can give a thank you and somebody gave me a monetary.
Thank you I think it was I don't know, 1 or 2 of a super thanks for um, whether it was a live stream or something. There is a way to support the show in that way and that just made me so, that was heartwarming. Thank you so much for your continuous support, showing up, staying till the end and I talk to you in the next episode.