Hypothesis / aims of study
Since its availability for public usage, ChatGPT and other generative artificial intelligence (gAI) and large language model services have been tested for their ability and utilization in a number of medical diagnostic procedures and surgical education. Over time, ChatGPT’s ability to provide concise assessments and evaluations has improved so has its ability to learn and research true resources and references, while previous concerns about its shortcomings with regard to hallucinations and fallacy, among other issues, are easing. Urodynamic studies’ good practices and terminology have been described and published by the International Continence Society (ICS) and are recommended for practitioners [1-3]. We report on ChatGPTs ability to analyze urodynamic traces correctly before and after teaching it the ICS best practices and terms for urodynamics, cystometry and pressure-flow studies.
Study design, materials and methods
We chose at random urodynamic traces including cystometries and pressure-flow studies conducted at our institution by a single functional urology expert. The traces were then reassessed by another expert, and agreement in the assessments was noted. Any disagreement was resolved by a third expert. ChatGPT 4o was used, and it was asked to analyze the same traces twice: first batch without instruction to use certain resources or teaching, and the second batch after it was taught and instructed to use the ICS standards [1-3]. Parameters analyzed include components of the urodynamics traces as well as select nomograms, overall diagnosis and devising a management plan accordingly. Concordance with expert opinion before and after teaching ChatGPT was calculated and statistical significance of the teaching was calculated using McNemar’s test. Institutional board review approval was obtained.
Results
We analyzed a total of 100 urodynamic traces of different etiologies, of which 75% were of female patients. The most common presenting complaint was mixed urinary incontinence, while the most common diagnosis was non-neurogenic dry detrusor overactivity. With regards to assessments of filling cystometry results, ChatGPT did not exhibit any improvement or regression in its ability to correctly diagnose dysfunctions in sensation (p=1), cystometric capacity (p=0.1336), bladder compliance (p=0.4795), urinary incontinence (p=0.3711), and electromyography (EMG) synergy or dyssynergia (p=0.7728). The only parameter that reached some statistical significance was ChatGPTs ability to identify uninhibited detrusor contractions after teaching it (p=0.07). In a similar manner, ChatGPT’s ability to assess voided volume, detrusor pressure at maximal flow (Pdet@Qmax), maximum urinary flow rate (Qmax), and identify after contractions remained unchanged after instruction and teaching (p>0.1), as well as its ability to calculate the bladder contractility index (p=0.25), provide an overall diagnosis (p=1) and formulate an appropriate management plan (p=0.13).
Interpretation of results
This study demonstrates that ChatGPT-4o has a consistent baseline ability to interpret urodynamic traces irrespective of formal instruction. Notably, its capacity to identify uninhibited detrusor contractions improved following exposure to ICS terminology and standards (p = 0.07), suggesting promise for refinement with targeted training.