Healthcare AI is the domain where I feel most acutely the tension between the genuine promise of the technology and the very real consequences of its failures. A hallucination in a customer service bot is annoying. A hallucination in a clinical decision support system can kill someone. This isn't a reason to avoid AI in healthcare — the status quo of diagnostic errors, physician burnout, and drug discovery timelines that stretch to a decade also has enormous human costs. But it demands a level of rigor and epistemic humility that not everyone in the industry is bringing to the table.
Let me walk through the areas where GenAI is making real, documented progress in healthcare, and the areas where the hype is running ahead of the evidence.
Clinical Documentation: The Clearest Win
The problem of clinical documentation burden is severe and well-documented. Physicians spend 35–55% of their working time on documentation — entering notes, completing forms, coding diagnoses. This is time not spent with patients, and it is a primary driver of clinician burnout. The suicide rate among physicians is meaningfully higher than the general population, and burnout is a documented contributor.
AI-powered ambient clinical documentation has emerged as the most unambiguous GenAI success story in healthcare. Systems like Nuance's DAX (Dragon Ambient eXperience, now part of Microsoft), Suki AI, Abridge, and DeepScribe listen to physician-patient conversations and automatically generate structured clinical notes. The technology has matured significantly: recognition accuracy is high, specialty-specific vocabulary is handled well, and integration with major EHR systems (Epic, Cerner/Oracle Health) has improved.
The outcomes data is compelling. Studies and real-world deployments consistently show 50–70% reductions in documentation time, clinician satisfaction improvements, and — critically — patients reporting that their physician was more present and engaged during the visit. This is a case where AI is clearly removing friction from a high-value human interaction, not replacing the interaction itself.
The remaining challenges are consent and privacy (conversations are being processed), EHR integration complexity, and ensuring that AI-generated notes are reviewed rather than blindly signed — a behavioral challenge, not a technical one.
Diagnostic Imaging: Proven Performance, Adoption Hurdles
AI in radiology and pathology is no longer experimental — it's FDA-cleared and in production use at thousands of institutions. The FDA has cleared over 900 AI/ML-enabled medical devices as of early 2026, the majority in radiology. Google's ARDS, Paige Pathology's cancer detection systems, Viz.ai's stroke detection platform, and dozens of specialized diagnostic AI tools have cleared regulatory review and demonstrated clinical utility in peer-reviewed studies.
The performance data on specific, narrow tasks is genuinely impressive. AI systems for detecting diabetic retinopathy from fundus images, flagging potential breast cancers in mammograms, or identifying pulmonary emboli on CT scans have achieved sensitivity and specificity comparable to or exceeding specialist physicians on benchmark datasets. IDx-DR, cleared in 2018, was the first fully autonomous diagnostic AI — it required no physician to interpret results.
But the gap between benchmark performance and real-world clinical utility is wide, and closing it has taken longer than enthusiasts projected. Distribution shift — the performance gap between a model trained on a curated dataset and the same model facing the messiness of real clinical images from diverse equipment and patient populations — is a real and serious problem. Several early radiology AI deployments were quietly discontinued after real-world performance failed to match validation data.
The institutions that are succeeding with diagnostic AI have learned that deployment is at least as hard as development: ongoing performance monitoring, feedback loops from radiologists to flag model errors, and clear protocols for how AI outputs are integrated into clinical workflow.
Drug Discovery: Where the Long-Term Stakes Are Highest
The economics of drug discovery are brutal. A new drug costs roughly $2.5 billion and 10–15 years to bring from compound identification to market approval, and the failure rate is staggering — roughly 90% of compounds that enter clinical trials fail. If AI can meaningfully accelerate any stage of this pipeline, the payoff is enormous in both human and financial terms.
The most credible progress is in structure-based drug design, protein structure prediction, and molecule generation. AlphaFold 2 (and the subsequent AlphaFold 3 from Google DeepMind) represents a genuine scientific breakthrough: protein structure prediction accuracy that previously required months of crystallography work can now be done computationally in hours. This has real downstream effects on target identification and binding site analysis.
AI-generated molecule candidates — using generative models to propose novel compounds with desired binding properties and pharmacokinetic profiles — have moved from academic papers to clinical trials. Insilico Medicine's INS018_055, an AI-designed molecule for idiopathic pulmonary fibrosis, reached Phase 2 clinical trials, a genuine milestone. Recursion Pharmaceuticals, Exscientia, and AbSci are all operating in this space with real pipeline assets.
However, I want to push back on the "AI designed a drug" framing that dominates press coverage. What AI has done is significantly accelerate and improve the early discovery and optimization stages — lead identification, hit-to-lead optimization, ADMET property prediction. The clinical development stages — Phase 1, 2, 3 trials — remain human-intensive, expensive, and slow. AI hasn't yet fundamentally changed the probability of clinical success, which is the most expensive part of the equation.
The honest current state: AI is compressing early discovery timelines by 30–50% and enabling exploration of larger chemical spaces. It is not yet delivering a drug that would not have been discovered by conventional means (though a few candidates are getting close to proving that claim).
Genomics and Personalized Medicine
The intersection of large language models and genomic data is producing some fascinating science. Models trained on genomic sequences — treating DNA the same way language models treat text — have shown surprising capability in predicting gene expression, identifying regulatory elements, and characterizing variant effects. Nucleotide Transformer from InstaDeep (acquired by BioNTech) and Google's Enformer are examples of this class of foundation model.
The clinical application of genomic AI is advancing in oncology, where tumor genomic profiling increasingly guides treatment selection. Foundation Medicine's platform, Tempus's AI-powered genomics analysis, and similar products are integrating AI-assisted interpretation of genomic reports into oncology workflows. The challenge remains variant interpretation — a large proportion of variants found in genomic sequencing are "variants of uncertain significance," and AI is beginning to make inroads on classifying these more accurately, but the ground truth data for training is limited.
Mental Health: High Need, High Risk
Mental health is an area where the unmet need is massive — global prevalence of mental health conditions, combined with a shortage of mental health providers, creates enormous pressure to find AI-based solutions. Products like Woebot, Wysa, and more recently large LLM-based mental health companions have attracted both funding and scrutiny.
My view here is cautious. The evidence base for AI-assisted mental health intervention is thin, the failure modes are serious (a model giving poor advice to a suicidal user), and the regulatory framework is still developing. The legitimate applications — psychoeducation, symptom tracking, triage and appointment scheduling, supporting therapists with session notes and treatment planning — are meaningfully different from claims of AI providing therapy. The industry needs to be much more honest about which category a given product falls into.
The Regulatory Environment
The FDA's Digital Health Center of Excellence has been developing a more sophisticated approach to regulating AI-enabled medical devices, including guidance on "predetermined change control plans" — essentially pre-approving the range of model updates that don't require new clearance. This is a pragmatic acknowledgment that AI models need to be updated as they encounter new data, and the old device approval paradigm doesn't fit.
In Europe, the AI Act explicitly classifies AI systems for medical devices in high-risk categories requiring conformity assessment, technical documentation, and post-market surveillance. This aligns with how the most rigorous U.S. developers are already operating, but imposes compliance costs on smaller innovators.
The Bottom Line
Healthcare AI in 2026 is delivering real value in clinical documentation, mature diagnostic imaging applications, and early-stage drug discovery. The gap between early promise and reliable deployment has been painful, but organizations that have navigated it carefully are seeing genuine clinical impact.
The healthcare leaders I most respect in this space are those who approach AI the same way they approach any other clinical intervention: ask for evidence of safety and efficacy, demand real-world performance monitoring, design clear human oversight protocols, and be honest about what you know and don't know. That rigor isn't a barrier to AI adoption in healthcare — it's the foundation that makes adoption trustworthy. And in medicine, trust is not optional.
Explore more from Dr. Jyothi



