Combining the latest advancements in speech-to-text transcription and speaker-labeling (diarization), this platform quickly produces accurate transcriptions of audio-video files. The goal is to provide a scalable, automated, and secure approach to generating transcriptions and performing analysis on those outputs.
This effort began as a solution to the administrative burden healthcare professionals face, specifically in patient note-taking. Evaluation so far has explored over 40 hours of medical conversations between a patient and their provider. The system is adept at handling complex conversations, noisy environments, and overlapping dialog with ease. Above all, the system strives to make manual evaluation more efficient.
CAT-Talk is a secure, web-based AI platform offering fast, speaker-labeled, and time-stamped transcripts, integrating summarization and theme extraction tools on NIST-53, HIPAA-compliant infrastructure.
Integral to this system are two highly advanced open-source models:
Diarization and transcription tasks are executed and combined into a single time-stamped, speaker-labeled transcript using WhisperX. When a user uploads an audio file via the web interface, ClearML manages job scheduling, ensuring simultaneous processing. The result is one unified, time-stamped, speaker-labeled transcript optimized for simple human verification.
The transcription services platform is available for experimental use.
Email ai@uky.edu for more information.
A training video is available on YouTube.
Read more in the paper: Toward Automated Clinical Transcriptions (PubMed)
The platform relies on NIST-53, HIPAA-compliant infrastructure with ClearML for job management.