1
0
mirror of https://github.com/tiyn/wiki.git synced 2026-03-31 18:34:47 +02:00

speech recognition

- Added Whisper CLI
- Split Nerd Dictation into single file
This commit is contained in:
2026-03-30 04:26:03 +02:00
parent e285db8486
commit 47242e81de
3 changed files with 123 additions and 3 deletions

View File

@@ -0,0 +1,48 @@
# Nerd Dictation
[Nerd Dictation](https://github.com/ideasman42/nerd-dictation/) is a real-time offline speech
recognition software for [Linux](/wiki/linux.md)-based operating systems which uses the
[VOSK API](/wiki/speech_recognition_and_synthesis.md#vosk-api).
## Setup
The Nerd Dictation program can be installed from source as described
[on GitHub](https://github.com/ideasman42/nerd-dictation/).
Some [Linux](/wiki/linux.md) [package managers](/wiki/linux/package_manager.md) package Nerd
Dictation in the `nerd-dictation` package.
### Configuration
Nerd Dictation needs a model to recognize and transcribe speech.
The default path for this is `~/.config/nerd-dictation/model`.
In this directory a VOSK model can be placed to use as default model.
## Usage
This section addresses the usage of Nerd Dictation.
### Basic Usage
Nerd Dictation can be started using the following command.
```sh
nerd-dictation begin
```
The model can also be specified using the `--vosk-model-dir` flag.
Assuming the path to the model is `<model-path>` a command will look like the following.
If no model path is specified it uses the default model as described in
[the setup section](#configuration).
```sh
nerd-dictation begin --vosk-model-dir=<model-path>
```
```sh
nerd-dictation end
```
## Troubleshooting
This section will focus on errors and the fixing of errors of /name/.

44
wiki/linux/whisper-cli.md Normal file
View File

@@ -0,0 +1,44 @@
# Whisper CLI
[Whisper CLI](https://github.com/vatsalaggarwal/whisper-cli) is a non-real-time offline speech
transcription software for [Linux](/wiki/linux.md)-based operating systems which uses the
[Whisper API](/wiki/speech_recognition_and_synthesis.md#whisper-api).
## Setup
The Whisper CLI program can be installed from source as described
[on GitHub](https://github.com/vatsalaggarwal/whisper-cli).
Some [Linux](/wiki/linux.md) [package managers](/wiki/linux/package_manager.md) package Whisper CLI
in the `whisper.cpp` package.
### Configuration
By default, Whisper CLI does not feature a global default model path.
If not model path is given as shown in the [basic usage section](#basic-usage) the path
`./models/ggml-base.en.bin` will be used.
The behavior can not be changed using configuration of the program, but a simple alias can set a
default path as shown below.
```sh
alias whisper="whisper-cli -m ~/.config/whisper-cli/ggml-large-v3-turbo-german-q5_0.bin"
```
## Usage
This section addresses the usage of Whisper CLI.
### Basic Usage
Whisper CLI can be used to transcribe an [audio](/wiki/audio.md) file as shown in the following
command.
In this example `<model-path>` is the path to the Whisper model.
```sh
whisper-cli -m <model-path>
```
Additionally, a `.txt` file can be generated as shown below.
```sh
whisper-cli -m <model-path> -otxt
```

View File

@@ -4,14 +4,42 @@ Speech recognition describes the process of understanding and interpreting spoke
The most common form of this is speech-to-text (STT) programs, that convert spoken language into The most common form of this is speech-to-text (STT) programs, that convert spoken language into
text. text.
On the other hand speech synthesis describes the artificial production of human speech. On the other hand speech synthesis describes the artificial production of human speech.
A Text-to-speech (TTS) program is one, that converts an input text to speech. A Text-to-speech (TTS) program is one, that converts an input text to speech.
## Speech-to-Text Programs ## Speech-to-Text Programs
The following is a list of STT programs. The following is a list of STT programs.
- [Nerd Dictation](https://github.com/ideasman42/nerd-dictation/) is an offline speech recognition - [Nerd Dictation](/wiki/linux/nerd-dictation.md) is a real-time offline speech recognition
software for [Linux](/wiki/linux.md)-based operating systems. software for [Linux](/wiki/linux.md)-based operating systems which uses the
[VOSK API](#vosk-api).
- [Whisper CLI](/wiki/linux/whisper-cli.md) is a non-real-time offline speech transcription
software for [Linux](/wiki/linux.md)-based operating systems which uses the
[Whisper API](#whisper-api).
Some alternatives mostly for Linux systems were listed in a Some alternatives mostly for Linux systems were listed in a
[Reddit post by tuananh_org](https://www.reddit.com/r/archlinux/comments/1j77921/speech_to_text_app/). [Reddit post by tuananh_org](https://www.reddit.com/r/archlinux/comments/1j77921/speech_to_text_app/).
## Model APIs
There are various APIs that are used to do speech recognition.
### VOSK API
The VOSK API mostly does not include punctuation and is not case-sensitive.
However, it is generally real-time capable.
A good source for VOSK models is [alphacephei](https://alphacephei.com/vosk/models).
For the german language the 900k model from the Tuda-DE project is recommended.
It is also available from the [corresponding GitHub page](https://github.com/uhh-lt/kaldi-tuda-de).
This model is mostly not capable of transcribing English words.
### Whisper API
The Whisper API mostly includes punctuation and is case-sensitive.
However, it is generally not real-time capable.
Many models are available on [Hugging Face](https://huggingface.co/).
A good model for the German language is the
[GGML Q5_0 quantization of primelines whisper-large-v3-turbo-german](https://huggingface.co/F1sk/whisper-large-v3-turbo-german-ggml-q5_0).
This model is also capable of transcribing some English words.