From 47242e81de626781e00ae3d8b4c6984a13e537be Mon Sep 17 00:00:00 2001 From: tiyn Date: Mon, 30 Mar 2026 04:26:03 +0200 Subject: [PATCH] speech recognition - Added Whisper CLI - Split Nerd Dictation into single file --- wiki/linux/nerd-dictation.md | 48 ++++++++++++++++++++++++ wiki/linux/whisper-cli.md | 44 ++++++++++++++++++++++ wiki/speech_recognition_and_synthesis.md | 34 +++++++++++++++-- 3 files changed, 123 insertions(+), 3 deletions(-) create mode 100644 wiki/linux/nerd-dictation.md create mode 100644 wiki/linux/whisper-cli.md diff --git a/wiki/linux/nerd-dictation.md b/wiki/linux/nerd-dictation.md new file mode 100644 index 0000000..337faaf --- /dev/null +++ b/wiki/linux/nerd-dictation.md @@ -0,0 +1,48 @@ +# Nerd Dictation + +[Nerd Dictation](https://github.com/ideasman42/nerd-dictation/) is a real-time offline speech +recognition software for [Linux](/wiki/linux.md)-based operating systems which uses the +[VOSK API](/wiki/speech_recognition_and_synthesis.md#vosk-api). + +## Setup + +The Nerd Dictation program can be installed from source as described +[on GitHub](https://github.com/ideasman42/nerd-dictation/). +Some [Linux](/wiki/linux.md) [package managers](/wiki/linux/package_manager.md) package Nerd +Dictation in the `nerd-dictation` package. + +### Configuration + +Nerd Dictation needs a model to recognize and transcribe speech. +The default path for this is `~/.config/nerd-dictation/model`. +In this directory a VOSK model can be placed to use as default model. + +## Usage + +This section addresses the usage of Nerd Dictation. + +### Basic Usage + +Nerd Dictation can be started using the following command. + +```sh +nerd-dictation begin +``` + +The model can also be specified using the `--vosk-model-dir` flag. +Assuming the path to the model is `` a command will look like the following. +If no model path is specified it uses the default model as described in +[the setup section](#configuration). + +```sh +nerd-dictation begin --vosk-model-dir= +``` + + +```sh +nerd-dictation end +``` + +## Troubleshooting + +This section will focus on errors and the fixing of errors of /name/. diff --git a/wiki/linux/whisper-cli.md b/wiki/linux/whisper-cli.md new file mode 100644 index 0000000..566bdff --- /dev/null +++ b/wiki/linux/whisper-cli.md @@ -0,0 +1,44 @@ +# Whisper CLI + +[Whisper CLI](https://github.com/vatsalaggarwal/whisper-cli) is a non-real-time offline speech +transcription software for [Linux](/wiki/linux.md)-based operating systems which uses the +[Whisper API](/wiki/speech_recognition_and_synthesis.md#whisper-api). + +## Setup + +The Whisper CLI program can be installed from source as described +[on GitHub](https://github.com/vatsalaggarwal/whisper-cli). +Some [Linux](/wiki/linux.md) [package managers](/wiki/linux/package_manager.md) package Whisper CLI +in the `whisper.cpp` package. + +### Configuration + +By default, Whisper CLI does not feature a global default model path. +If not model path is given as shown in the [basic usage section](#basic-usage) the path +`./models/ggml-base.en.bin` will be used. +The behavior can not be changed using configuration of the program, but a simple alias can set a +default path as shown below. + +```sh +alias whisper="whisper-cli -m ~/.config/whisper-cli/ggml-large-v3-turbo-german-q5_0.bin" +``` + +## Usage + +This section addresses the usage of Whisper CLI. + +### Basic Usage + +Whisper CLI can be used to transcribe an [audio](/wiki/audio.md) file as shown in the following +command. +In this example `` is the path to the Whisper model. + +```sh +whisper-cli -m +``` + +Additionally, a `.txt` file can be generated as shown below. + +```sh +whisper-cli -m -otxt +``` diff --git a/wiki/speech_recognition_and_synthesis.md b/wiki/speech_recognition_and_synthesis.md index 7d5bd51..b9a287b 100644 --- a/wiki/speech_recognition_and_synthesis.md +++ b/wiki/speech_recognition_and_synthesis.md @@ -4,14 +4,42 @@ Speech recognition describes the process of understanding and interpreting spoke The most common form of this is speech-to-text (STT) programs, that convert spoken language into text. On the other hand speech synthesis describes the artificial production of human speech. -A Text-to-speech (TTS) program is one, that converts an input text to speech. +A Text-to-speech (TTS) program is one, that converts an input text to speech. ## Speech-to-Text Programs The following is a list of STT programs. -- [Nerd Dictation](https://github.com/ideasman42/nerd-dictation/) is an offline speech recognition - software for [Linux](/wiki/linux.md)-based operating systems. +- [Nerd Dictation](/wiki/linux/nerd-dictation.md) is a real-time offline speech recognition + software for [Linux](/wiki/linux.md)-based operating systems which uses the + [VOSK API](#vosk-api). +- [Whisper CLI](/wiki/linux/whisper-cli.md) is a non-real-time offline speech transcription + software for [Linux](/wiki/linux.md)-based operating systems which uses the + [Whisper API](#whisper-api). Some alternatives mostly for Linux systems were listed in a [Reddit post by tuananh_org](https://www.reddit.com/r/archlinux/comments/1j77921/speech_to_text_app/). + +## Model APIs + +There are various APIs that are used to do speech recognition. + +### VOSK API + +The VOSK API mostly does not include punctuation and is not case-sensitive. +However, it is generally real-time capable. +A good source for VOSK models is [alphacephei](https://alphacephei.com/vosk/models). + +For the german language the 900k model from the Tuda-DE project is recommended. +It is also available from the [corresponding GitHub page](https://github.com/uhh-lt/kaldi-tuda-de). +This model is mostly not capable of transcribing English words. + +### Whisper API + +The Whisper API mostly includes punctuation and is case-sensitive. +However, it is generally not real-time capable. +Many models are available on [Hugging Face](https://huggingface.co/). + +A good model for the German language is the +[GGML Q5_0 quantization of primelines whisper-large-v3-turbo-german](https://huggingface.co/F1sk/whisper-large-v3-turbo-german-ggml-q5_0). +This model is also capable of transcribing some English words.