ChatGPT Health misses urgent medical crises over 50% of the time

Summary created by Smart Answers AI

In summary:

PCWorld reports that new research in Nature Medicine reveals ChatGPT Health failed to recognize medical emergencies in 51.6% of cases, advising patients to stay home instead.
The AI struggled particularly with complex, rapidly escalating symptoms that could become life-threatening, posing significant safety risks for users seeking medical guidance.
While OpenAI claims continuous model refinement and disputes the study’s real-world applicability, the research highlights current limitations in AI medical assessment tools.

According to new research published in Nature Medicine, ChatGPT Health (OpenAI’s dedicated AI-driven chatbot that’s “designed for health and wellness,” which launched earlier this year) repeatedly failed to identify medical emergencies that required immediate medical attention, reports The Guardian.

Lead researcher Dr. Ashwin Ramaswamy, along with his colleagues, created “60 realistic patient scenarios covering health conditions from mild illnesses to emergencies,” which were reviewed by independent doctors based on established clinical guidelines.

In 51.6% of the cases where patients should’ve been sent to the hospital for emergency care, they were instead advised to stay home and/or book a regular doctor’s appointment.

ChatGPT Health performed well enough in clear-cut emergency situations, such as in the case of strokes and severe allergic reactions, it didn’t fare so well when symptoms were more complex and weren’t yet emergencies but could become life-threatening very quickly.

“If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal,” said doctoral researcher Alex Ruani. “Eight times out of 10, [ChatGPT Health] sent a suffocating woman to a future appointment she would not live to see. […] Meanwhile, 64.8% of completely safe individuals were told to seek immediate medical care.”

OpenAI told The Guardian that these results don’t reflect how the service is normally used and that the model is continuously refined.

This article originally appeared on our sister publication M3 and was translated and localized from Swedish.