GPT-4o outperformed Gemini Flash 2.0 in diagnosing acne and rosacea, demonstrating high accuracy in primary diagnosis but moderate success in identifying subtypes, a new clinical study by Semmelweis University shows. AI-tools can speed up diagnosis and access to care but consultation with dermatologist remains essential, authors say.

The study tested how well two AI tools, OpenAI’s GPT-4o and Google’s Gemini Flash 2.0, could identify acne and rosacea from photos of patients with confirmed diagnoses at Semmelweis University, who visited the dermatology department of the university between 2021 and 2023. Two experienced dermatologists reviewed the images separately, and a third made a decision if they disagreed. Final diagnoses were based on the majority opinion.

Acne and rosacea are very common skin conditions that can impact quality of life yet their diagnosis can be challenging due to overlapping clinical features

, says corresponding author Norbert Kiss, assistant professor at Department of Dermatology, Venerology and Dermatooncology Semmelweis University.

Images were submitted to GPT-4o and Gemini Flash 2.0 using a standard prompt to simulate a patient without dermatological knowledge. The models were asked, “Can you guess the most likely diagnosis?” If correct, they were then asked to guess the subtype. An international dermatology team, including a Yale AI expert, oversaw the study.

GPT-4o provided a diagnosis in 100% of cases, with a correct diagnosis rate of 93% (achieving a sensitivity of 93.0% and specificity of 97.7%).

For acne, GPT-4o correctly identified 91 out of 100 cases and didn’t mistake anything else for acne. For rosacea, it caught all the cases (100%) and very rarely confused something else as rosacea, getting it right 98% of the time.

Gemini Flash 2.0 by Google diagnosed only 21% of cases, precluding further statistical analysis.

We were surprised at how well (one of the) the AI-tools performed

, Kiss says.

“Patients tend to be distrustful of AI but doctors also have their doubts. That’s why we were curious to see how capable these models are of diagnosing medical conditions”, he adds.

Identifying the different subtypes of the skin conditions proved more challenging: GPT-4o managed to identify 55% of acne and 50% of rosacea subtypes correctly, with a specificity of 90% and 80.0% consequently.

While demand for dermatology rises due to an aging population and increasing focus on appearance, the number of dermatologists worldwide remains low—ranging from 1 per 100,000 people in the U.K. to 11.4 in Greece[1].

“As AI tools like ChatGPT become more accessible in dermatology, they increasingly influence decisions to seek professional care. While in our study GPT-4o excelled at diagnosing acne and rosacea, it struggled with subtyping—underscoring the need for objective evaluation and public awareness so people can make informed decisions when using AI tools to check their skin.,” says first author Mehdi Boostani of Semmelweis University.

Augmented AI is a great support tool that can speed up access to care and help prevent illness from progressing, the authors say.

However, for a confirmed diagnoses and to access prescribed medication, consultation with a dermatologist remains essential

, says András Bánvölgyi, co-author,  head of Teledermatology and Outpatient Unit at the Department of Dermatology, Venerology and Dermatooncology Semmelweis University.

[1] https://www.researchgate.net/figure/Total-number-and-ratio-of-dermatologists-and-general-practitioners-GPs-per-100-000_tbl1_232713101

Photo: Semmelweis University – Boglarka Zellei, illustration: iStock