As AI makes its way into more and more facets of daily life — not least among these, healthcare — recent studies reveal the upside and downside of the rapidly expanding tech. Each of the studies examines the use of GPT-4, the latest system from software company Epic, which is also behind the ChatGPT chatbot, arguably the public’s first usable introduction to generative AI.
Earlier this year, Cambridge University tested GPT-4’s prowess in diagnosing eye problems, pitting the AI system against non-specialist junior doctors, eye doctor trainees and practiced ophthalmologists. Their results found the tech performed significantly better than the juniors and similarly to the ophthalmologist trainees and experts, though top-performing eye doctors scored higher.
Also this year, researchers at Harvard University conducted a study that found GPT-4 performed better than both attending physicians and residents in clinical reasoning, when presented with 20 simulated patient cases involving common issues like chest pain, coughing, headaches and the like. On a 10-point scale, GPT-4 scored a 10, while attending physicians achieved a median 9, and residents 8.
On the other hand, GPT-4 provided significantly more responses including incorrect clinical reasoning (13.8%) than the residents (2.8%), and a touch more than attending physicians (somewhat troublingly at 12.5%).
Scrutinizing AI in healthcare through a different lens, a recent New York Times article explored the increased presence of GPT-4 in a decidedly more personal realm — composing doctors’ direct messages to patients. Through the communications platform MyChart, a system used by most hospitals, AI-generated replies are on the rise, and results are mixed.
On the plus side, these GPT-4 drafted messages can save physicians’ time, thus helping prevent burnout. This is important in light of the marked rise in virtual healthcare during peak Covid-19, which included MyChart messages as a direct line from patients to their doctors. Though much of the overloading that hospitals experienced has dropped to more manageable levels, patient messages still tend to come in at a relentless pace. Many physicians were finding lunch breaks and evenings going toward responding.
Critics of the AI-generated messages worry that the software could impact doctors’ personal relationships with their patients, as well as their clinical decision-making. MyChart’s message draft tool, called In Basket Art, taps into previous patient messages and electronic medical records to create its drafts. It can be instructed to write in a physician’s particular voice and tone. However — and importantly — the responses it drafts aren’t always correct. Another recent study found that of 116 GPT-4-generated In Basket Art messages, seven contained “hallucinations,” when an AI delivers straight-up false information.
Though messages are only sent if a doctor presses “Send,” critics are concerned about the possibility of doctors slipping in their vigilance. Further, the Times article cites concerns over automation bias, the human tendency to favor automated systems and suggestions even if they differ from their own expert opinion.
We are not at a place to decree AI in healthcare as definitively good, or definitively bad, and there will likely always be gray areas. It’s clear there are elements where the technology can assist busy healthcare professionals, and better serve patients. As in any realm, those using AI must remain extremely attentive to ensure it actually benefits the humans it’s designed to serve.