speech recognition

- Added Whisper CLI - Split Nerd Dictation into single file
2026-05-27 10:41:36 +02:00 · 2026-03-30 04:26:03 +02:00
parent e285db8486
commit 47242e81de
3 changed files with 123 additions and 3 deletions
--- a/wiki/linux/nerd-dictation.md
+++ b/wiki/linux/nerd-dictation.md
@@ -0,0 +1,48 @@
 # Nerd Dictation
 [Nerd Dictation](https://github.com/ideasman42/nerd-dictation/) is a real-time offline speech
 recognition software for [Linux](/wiki/linux.md)-based operating systems which uses the
 [VOSK API](/wiki/speech_recognition_and_synthesis.md#vosk-api).
 ## Setup
 The Nerd Dictation program can be installed from source as described
 [on GitHub](https://github.com/ideasman42/nerd-dictation/).
 Some [Linux](/wiki/linux.md) [package managers](/wiki/linux/package_manager.md) package Nerd
 Dictation in the `nerd-dictation` package.
 ### Configuration
 Nerd Dictation needs a model to recognize and transcribe speech.
 The default path for this is `~/.config/nerd-dictation/model`.
 In this directory a VOSK model can be placed to use as default model.
 ## Usage
 This section addresses the usage of Nerd Dictation.
 ### Basic Usage
 Nerd Dictation can be started using the following command.
 ```sh 
 nerd-dictation begin
 ```
 The model can also be specified using the `--vosk-model-dir` flag.
 Assuming the path to the model is `<model-path>` a command will look like the following.
 If no model path is specified it uses the default model as described in
 [the setup section](#configuration).
 ```sh 
 nerd-dictation begin --vosk-model-dir=<model-path>
 ```
 ```sh 
 nerd-dictation end
 ```
 ## Troubleshooting 
 This section will focus on errors and the fixing of errors of /name/.
--- a/wiki/linux/whisper-cli.md
+++ b/wiki/linux/whisper-cli.md
@@ -0,0 +1,44 @@
 # Whisper CLI
 [Whisper CLI](https://github.com/vatsalaggarwal/whisper-cli) is a non-real-time offline speech
 transcription software for [Linux](/wiki/linux.md)-based operating systems which uses the
 [Whisper API](/wiki/speech_recognition_and_synthesis.md#whisper-api).
 ## Setup
 The Whisper CLI program can be installed from source as described
 [on GitHub](https://github.com/vatsalaggarwal/whisper-cli).
 Some [Linux](/wiki/linux.md) [package managers](/wiki/linux/package_manager.md) package Whisper CLI
 in the `whisper.cpp` package.
 ### Configuration
 By default, Whisper CLI does not feature a global default model path.
 If not model path is given as shown in the [basic usage section](#basic-usage) the path
 `./models/ggml-base.en.bin` will be used.
 The behavior can not be changed using configuration of the program, but a simple alias can set a
 default path as shown below.
 ```sh
 alias whisper="whisper-cli -m ~/.config/whisper-cli/ggml-large-v3-turbo-german-q5_0.bin"
 ```
 ## Usage
 This section addresses the usage of Whisper CLI.
 ### Basic Usage
 Whisper CLI can be used to transcribe an [audio](/wiki/audio.md) file as shown in the following
 command.
 In this example `<model-path>` is the path to the Whisper model.
 ```sh
 whisper-cli -m <model-path>
 ```
 Additionally, a `.txt` file can be generated as shown below.
 ```sh
 whisper-cli -m <model-path> -otxt
 ```
--- a/wiki/speech_recognition_and_synthesis.md
+++ b/wiki/speech_recognition_and_synthesis.md
@@ -4,14 +4,42 @@ Speech recognition describes the process of understanding and interpreting spoke
 The most common form of this is speech-to-text (STT) programs, that convert spoken language into
 text.
 On the other hand speech synthesis describes the artificial production of human speech.
-A Text-to-speech (TTS) program is one, that converts an input text to speech. 
+A Text-to-speech (TTS) program is one, that converts an input text to speech.
 ## Speech-to-Text Programs
 The following is a list of STT programs.
- [Nerd Dictation](https://github.com/ideasman42/nerd-dictation/) is an offline speech recognition
+- [Nerd Dictation](/wiki/linux/nerd-dictation.md) is a real-time offline speech recognition
-    software for [Linux](/wiki/linux.md)-based operating systems.
+    software for [Linux](/wiki/linux.md)-based operating systems which uses the
    [VOSK API](#vosk-api).
 - [Whisper CLI](/wiki/linux/whisper-cli.md) is a non-real-time offline speech transcription
    software for [Linux](/wiki/linux.md)-based operating systems which uses the
    [Whisper API](#whisper-api).
 Some alternatives mostly for Linux systems were listed in a
 [Reddit post by tuananh_org](https://www.reddit.com/r/archlinux/comments/1j77921/speech_to_text_app/).
 ## Model APIs
 There are various APIs that are used to do speech recognition.
 ### VOSK API
 The VOSK API mostly does not include punctuation and is not case-sensitive.
 However, it is generally real-time capable.
 A good source for VOSK models is [alphacephei](https://alphacephei.com/vosk/models).
 For the german language the 900k model from the Tuda-DE project is recommended.
 It is also available from the [corresponding GitHub page](https://github.com/uhh-lt/kaldi-tuda-de).
 This model is mostly not capable of transcribing English words.
 ### Whisper API
 The Whisper API mostly includes punctuation and is case-sensitive.
 However, it is generally not real-time capable.
 Many models are available on [Hugging Face](https://huggingface.co/).
 A good model for the German language is the
 [GGML Q5_0 quantization of primelines whisper-large-v3-turbo-german](https://huggingface.co/F1sk/whisper-large-v3-turbo-german-ggml-q5_0).
 This model is also capable of transcribing some English words.