What it does: - Runs 7 model families: offline transcription (CTC, RNNT, TDT, TDT-CTC), streaming (EOU, Nemotron), and speaker diarization (Sortformer) - Word-level timestamps - Streaming transcription from microphone input - Speaker diarization detecting up to 4 speakers
Hoe does this compare?
With models like these often you want to glue things together and manage multithreaded queues. And gluing in cpp is no fun.
This assumes that your offering will perform better than onnxruntime on at least some metric such as memory.
https://github.com/rishikanthc/Scriberr
What it does: - Runs 7 model families: offline transcription (CTC, RNNT, TDT, TDT-CTC), streaming (EOU, Nemotron), and speaker diarization (Sortformer) - Word-level timestamps - Streaming transcription from microphone input - Speaker diarization detecting up to 4 speakers
Hoe does this compare?
With models like these often you want to glue things together and manage multithreaded queues. And gluing in cpp is no fun.
This assumes that your offering will perform better than onnxruntime on at least some metric such as memory.
https://github.com/rishikanthc/Scriberr