Hear: macOS CLI Transcription

Recently, I needed to run around 1000 mp3 files through a speech-to-text engine. Despite using dictation tools daily like Siri, Hey Google, and Alexa, this is surprisingly not available to use out of the box.

After researching online services through GCP and Amazon that charge by the minute, I realized I could use the default dictation capability in macOS.

It’s possible to use an application like Rogue Amoeba’s Loopback and a Kernel driver to automate one file at a time; however, this approach would require the removal of only-apple-signed Kernel drivers, which is sub-optimal.

I then ran into this tool, playfully named hear (in contrast to say), that utilizes the same macOS Dictation API in a way that would allow threaded transcription. With the -d flag to force on-device transcription, hear allowed easy processing through all 1000 mp3 files. I did have to manually trigger dictation through the double-ctrl invocation once after a reboot for the -d flag to work.

The transcription quality is OK and less fancy than some online services that model multiple speakers and accurate punctuation. Still, for my purposes of searchable indexing, it fits the bill for free.