Courtroom-grade transcription and ML-ready dataset creation for Africa.

Machine captions and translations butcher African accents.

Verbatim, Baby! fixes them with custom AI and human expertise.


At Verbatim, Baby!, we see the need for better representation in African languages every day. Existing ASR models are very effective for unaccented English audio, but not so much for South Africa's ten other official languages.Verbatim, Baby! offers combined transcription and dataset services. For regular Afrikaans language clients, we create a new speaker-based dataset that can auto-correct Whisper engines on the go, and soon we will open our isiXhosa division. Please navigate to our Transcription page for the enquiry form.

We always aim to have ML data sets ready to license on a range of speaker patterns and topics, and we also accept requests for specialized speaker profiles.Two of our current speakers are showcasing the extent to which STT/TTS AI is failing them in Afrikaans. They deal with everything from accent changes to language flips to the model's voice changing mid-query, as well as severe misunderstandings.Our private speakers for NLP are ethically sourced, and the overwhelming majority have tied their identities to their conversational Verbatim, Baby! dataset contributions. They can be contacted through the site administrator for further targeted speech and linguistic research.Please see our Data Sets section for samples and speaker profiles.


https://verbatimbaby.github.io/

Verbatim, Baby! 2025