« Geri
LARGE LANGUAGE MODELS IN INTRACARDIAC ELECTROGRAM INTERPRETATION: A NEW FRONTIER IN CARDIAC DIAGNOSTICS FOR PACEMAKER PATIENTS
Serdar Bozyel, Ahmet Berk Duman, adiye Nur Dalg, Abdlcebar ipal, Faysal aylk, kriye Ebru Glck nder, Metin ada, Tmer Erdem Gler, Tolga Aksu, Ulas Bac, Nurgl Keser
The Anatolian Journal of Cardiology - 2025;29(10):533-542
Department of Cardiology, Health Sciences University, Kocaeli City Hospital, Kocaeli, Trkiye

Background: Interpreting intracardiac electrograms (EGMs) requires expertise that many cardiologists lack. Artificial intelligence models like ChatGPT-4o may improve diagnostic accuracy. This study evaluates ChatGPT-4o's performance in EGM interpretation across 4 scenarios (A-D) with increasing contextual information. Methods: Twenty EGM cases from The EHRA Book of Pacemaker, ICD, and CRT Troubleshooting were analyzed using ChatGPT-4o. Ten predefined features were assessed in Scenarios A and B, while Scenarios C and D required 20 correct responses per scenario across all cases. Performance was evaluated over 2 months using McNemar's test, Cohen's Kappa, and Prevalence- and Bias-Adjusted Kappa (PABAK). Results: Providing clinical context enhanced ChatGPT-4o's accuracy, improving from 57% (Scenario A) to 66% (Scenario B). "No Answer" rates decreased from 19.5% to 8%, while false responses increased from 8.5% to 11%, suggesting occasional misinterpretation. Agreement in Scenario A showed high reliability for atrial activity (kappa = 0.7) and synchronization (kappa = 0.7), but poor for chamber (kappa = -0.26). In Scenario B, understanding achieved near-perfect agreement (Prevalence-Adjustment and Bias-Adjustment Kappa (PABAK) = 1), while ventricular activity remained unreliable (kappa = -0.11). In Scenarios C (30%) and D (25%), accuracy was lower, and agreement between baseline and second-month responses remained fair (kappa = 0.285 and 0.3, respectively), indicating limited consistency in complex decision-making tasks. Conclusion: This study provides the first systematic evaluation of ChatGPT-4o in EGM interpretation, demonstrating promising accuracy and reliability in structured tasks. While the model integrated contextual data well, its adaptability to complex cases was limited. Further optimization and validation are needed before clinical use.

Facebook'ta Payla