CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
A question-answering model designed to enable large language models to perform diagnostic reasoning over auscultation audio recordings from real-world clinical data.
CaReAQA is an open-ended audio question answering model designed to enable large language models to reason over real-world auscultation recordings of heart and lung sounds. Unlike traditional diagnostic systems that rely on predefined labels or closed-form outputs, CaReAQA generates free-form diagnostic responses to open-ended clinical questions, closely resembling human clinical reasoning.
The model combines a self-supervised medical audio encoder with a language model by aligning both modalities in a shared latent space. This enables it to interpret subtle audio features—such as murmurs, crackles, and timing patterns—and integrate them with the semantic content of diverse clinical queries.
To support development and evaluation, we introduce a benchmark dataset of cardiac and respiratory recordings paired with natural language questions that reflect real diagnostic workflows. CaReAQA demonstrates strong qualitative and quantitative performance in generating accurate, interpretable responses, making it a promising step toward audio-based clinical decision support.