Are LLMs Replacing Physicians?
What is Context Rot? Why should we care?
Context rot refers to the phenomenon where the performance of large language models (LLMs) degrades as the amount of context or conversational memory grows. Despite advances enabling LLMs to handle tens of thousands, even millions, of tokens, their ability to maintain consistent accuracy and relevance across a long conversation or extended input is uneven.
Many of us have seen it, and I’ve personally experienced it: As input length increases, response quality can decline because the model struggles to effectively prioritize and integrate all relevant information within its context window.
This inconsistency highlights a critical limitation of current LLM technology. They do not process every token with the same fidelity, leading to potential errors or misinterpretations over long sessions.
In healthcare, this issue isn’t just a technical footnote. It’s a critical point of failure. Why? Because this is how medicine happens in the real world.
Healthcare in the Real World
Think about a typical clinical visit. Over a few minutes, patient and physician engage in a conversation that teases out key insights related to the current illness or concern. But there’s so much more.
The clinician acts as a detective. If they do it well, Sherlock Holmes would be proud.
That’s how I saw myself in clinical medicine. I wasn’t just a court reporter, typing furiously to capture information. I was observing the patient—their demeanor, affect, mood. I listened for clues in what was said and what wasn’t. I wasn’t sitting in judgment but striving to understand. I was capturing information with all my senses and human intuition—anything that could impact clinical decision-making.
These discussions were rarely linear or straightforward though. They took unexpected turns, bypasses, and detours. They were filled with personal anecdotes, euphemisms, and narratives that jumped across inconsistent timelines—like a sci-fi movie with many multiverses. That’s why physicians spend years learning how to interview patients, ensuring nothing falls through the cracks. And that’s just the patient encounter.
Where Dangers May Be Lurking
The complexity and nuance required for accurate medical diagnosis and treatment depend heavily on a context-rich, precise understanding of patient histories, diagnostic data, and clinical guidelines. LLMs’ context rot means they may lose track of vital details or weigh information incorrectly as conversations or data inputs grow larger. This poses a serious risk to patient safety and care quality, since LLMs could provide inaccurate recommendations or fail to adhere to standardized protocols without careful controls and human supervision.
Recent research underscores this issue: Even state-of-the-art LLMs struggle with negation, ambiguity, and maintaining context in clinical tasks—precisely the subtleties that define expert clinical judgment. Task-specific prompting helps, but it cannot fully compensate for these weaknesses.
Moreover, LLMs are sensitive not only to the amount of information but also to how it’s presented. The order, clarity, and structure of medical records or clinical inputs can drastically affect model output. This variability demands ongoing context engineering, carefully summarizing and reorganizing information to preserve critical signals and reduce noise for the model to function effectively.
Paradoxical Effect
Context rot underscores the limits of current LLMs in handling complex, context-dependent tasks like medical care. Unlike a human physician, whose care improves as they get to know a patient over time, LLMs experience the inverse: More data can mean more confusion.
That shouldn’t be the case in healthcare. Ideally, LLMs and other clinical decision tools should improve in accuracy, precision, and speed the more they “know” a patient. But that’s the thing. They’re not really learning, at least not yet.
Specialized clinical LLMs (like GatorTronGPT) show promise, but their real-world reliability, especially as context grows, remains unproven. Human oversight is still essential.
The Path Forward
Given these technical constraints, the vision of LLMs completely replacing physicians remains far-fetched for now. Instead, LLMs should be viewed as powerful tools to assist clinicians while human judgment remains essential to interpret and verify output. The LLMs can augment human clinical expertise by efficiently processing large datasets, supporting decision-making, and providing insights.
The field is making progress. Techniques like retrieval-augmented generation (RAG), improved prompting strategies, and hybrid human-AI workflows are being explored to mitigate context rot. But these are complements, not substitutes, for human expertise.
A Call for Collaboration
This nuanced understanding can help shape realistic expectations about the role of AI in medicine and drive innovations that strengthen model reliability in real-world clinical settings. The future of healthcare is in collaboration—LLMs augmenting physicians, not replacing them.
Let’s focus on building systems that combine the best of AI’s analytical power with the irreplaceable human elements of care: empathy, intuition, and the ability to navigate complexity and ambiguity. Only then can we harness the full potential of this technology while safeguarding the outcomes that matter most—our patients’ health and well-being.
If you are passionate about empowering clinicians to help them deliver high-quality patient care, I’d love to work with you.