AI Chatbots: A New Frontier in Health Diagnosis
Exploring AI's Role in Health Queries
Unexpected rashes, persistent headaches, and troubling coughs are prompting individuals to seek answers from ChatGPT instead of traditional medical consultations or extensive online searches. Amulya Yadav, an associate professor at Penn State's College of Information Sciences and Technology, notes that while this trend has merit, it comes with significant considerations. Yadav recently conducted a study assessing the accuracy of large language models (LLMs) like ChatGPT in responding to health inquiries. The results were noteworthy: these AI chatbots provided correct diagnoses approximately 75% of the time.
This research, set to be presented at the 2026 Association for Computing Machinery Fairness, Accountability and Transparency (FAccT) conference in Montreal, revealed that LLMs achieved around 76% accuracy in health-related responses when evaluated by certified medical professionals.
AI vs. Traditional Search Engines
Better Than Search Engines But Not A Medical Professional
The study stemmed from a straightforward observation: for years, individuals have turned to search engines to decipher their symptoms. With the emergence of conversational AI, researchers aimed to determine if tools like ChatGPT offered more reliable information than a typical Google search.
To investigate, participants provided over 200 health symptom descriptions to various AI models. Nine board-certified physicians then evaluated and rated the responses for accuracy. The findings indicated that ChatGPT was one of the top performers, significantly surpassing traditional search engines like Google and Bing. This advantage arises from the conversational nature of chatbots, which provide succinct responses rather than directing users to multiple websites. However, this can sometimes lead to misconceptions. Unlike healthcare professionals, AI systems lack the ability to physically examine patients, conduct diagnostic tests, interpret body language, or fully grasp a patient's medical history. Their responses are based on predicting language patterns, which can be misleading.
Yadav emphasizes that this distinction is crucial, as many users may place excessive trust in AI-generated information.
AI as a Potential Healthcare Resource
A Healthcare Tool?
Despite the potential risks, the study highlights a significant opportunity. According to estimates from the World Health Organization and the World Bank, nearly half of the global population lacks sufficient access to healthcare. In many areas, individuals may have no feasible way to consult a qualified doctor when health issues arise.
For these people, AI could provide an accessible source of initial guidance. Yadav suggests, "If someone cannot reach a doctor, wouldn't it be reasonable for them to receive some imperfect assistance from a language model?" With a tool that is accurate 76% of the time, individuals in remote locations or those facing financial obstacles to healthcare could gain valuable insights into their symptoms, recognize warning signs, or assess the need for urgent medical care.
However, the research also indicated that AI models faced challenges with inquiries related to dermatology, mental health, and internal medicine, as dermatological conditions often require visual evaluations, which current AI systems struggle with compared to text-based tasks.
The takeaway from the Penn State study is clear: while ChatGPT can serve as a helpful health resource, particularly for those with limited access to care, the expertise of a human doctor remains irreplaceable when it comes to diagnosing illnesses and making treatment decisions.