mirror of
https://github.com/tiyn/wiki.git
synced 2026-03-31 18:34:47 +02:00
speech recognition
- Added Whisper CLI - Split Nerd Dictation into single file
This commit is contained in:
48
wiki/linux/nerd-dictation.md
Normal file
48
wiki/linux/nerd-dictation.md
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
# Nerd Dictation
|
||||||
|
|
||||||
|
[Nerd Dictation](https://github.com/ideasman42/nerd-dictation/) is a real-time offline speech
|
||||||
|
recognition software for [Linux](/wiki/linux.md)-based operating systems which uses the
|
||||||
|
[VOSK API](/wiki/speech_recognition_and_synthesis.md#vosk-api).
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
The Nerd Dictation program can be installed from source as described
|
||||||
|
[on GitHub](https://github.com/ideasman42/nerd-dictation/).
|
||||||
|
Some [Linux](/wiki/linux.md) [package managers](/wiki/linux/package_manager.md) package Nerd
|
||||||
|
Dictation in the `nerd-dictation` package.
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Nerd Dictation needs a model to recognize and transcribe speech.
|
||||||
|
The default path for this is `~/.config/nerd-dictation/model`.
|
||||||
|
In this directory a VOSK model can be placed to use as default model.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
This section addresses the usage of Nerd Dictation.
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
Nerd Dictation can be started using the following command.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
nerd-dictation begin
|
||||||
|
```
|
||||||
|
|
||||||
|
The model can also be specified using the `--vosk-model-dir` flag.
|
||||||
|
Assuming the path to the model is `<model-path>` a command will look like the following.
|
||||||
|
If no model path is specified it uses the default model as described in
|
||||||
|
[the setup section](#configuration).
|
||||||
|
|
||||||
|
```sh
|
||||||
|
nerd-dictation begin --vosk-model-dir=<model-path>
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```sh
|
||||||
|
nerd-dictation end
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
This section will focus on errors and the fixing of errors of /name/.
|
||||||
44
wiki/linux/whisper-cli.md
Normal file
44
wiki/linux/whisper-cli.md
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
# Whisper CLI
|
||||||
|
|
||||||
|
[Whisper CLI](https://github.com/vatsalaggarwal/whisper-cli) is a non-real-time offline speech
|
||||||
|
transcription software for [Linux](/wiki/linux.md)-based operating systems which uses the
|
||||||
|
[Whisper API](/wiki/speech_recognition_and_synthesis.md#whisper-api).
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
The Whisper CLI program can be installed from source as described
|
||||||
|
[on GitHub](https://github.com/vatsalaggarwal/whisper-cli).
|
||||||
|
Some [Linux](/wiki/linux.md) [package managers](/wiki/linux/package_manager.md) package Whisper CLI
|
||||||
|
in the `whisper.cpp` package.
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
By default, Whisper CLI does not feature a global default model path.
|
||||||
|
If not model path is given as shown in the [basic usage section](#basic-usage) the path
|
||||||
|
`./models/ggml-base.en.bin` will be used.
|
||||||
|
The behavior can not be changed using configuration of the program, but a simple alias can set a
|
||||||
|
default path as shown below.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
alias whisper="whisper-cli -m ~/.config/whisper-cli/ggml-large-v3-turbo-german-q5_0.bin"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
This section addresses the usage of Whisper CLI.
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
Whisper CLI can be used to transcribe an [audio](/wiki/audio.md) file as shown in the following
|
||||||
|
command.
|
||||||
|
In this example `<model-path>` is the path to the Whisper model.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
whisper-cli -m <model-path>
|
||||||
|
```
|
||||||
|
|
||||||
|
Additionally, a `.txt` file can be generated as shown below.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
whisper-cli -m <model-path> -otxt
|
||||||
|
```
|
||||||
@@ -10,8 +10,36 @@ A Text-to-speech (TTS) program is one, that converts an input text to speech.
|
|||||||
|
|
||||||
The following is a list of STT programs.
|
The following is a list of STT programs.
|
||||||
|
|
||||||
- [Nerd Dictation](https://github.com/ideasman42/nerd-dictation/) is an offline speech recognition
|
- [Nerd Dictation](/wiki/linux/nerd-dictation.md) is a real-time offline speech recognition
|
||||||
software for [Linux](/wiki/linux.md)-based operating systems.
|
software for [Linux](/wiki/linux.md)-based operating systems which uses the
|
||||||
|
[VOSK API](#vosk-api).
|
||||||
|
- [Whisper CLI](/wiki/linux/whisper-cli.md) is a non-real-time offline speech transcription
|
||||||
|
software for [Linux](/wiki/linux.md)-based operating systems which uses the
|
||||||
|
[Whisper API](#whisper-api).
|
||||||
|
|
||||||
Some alternatives mostly for Linux systems were listed in a
|
Some alternatives mostly for Linux systems were listed in a
|
||||||
[Reddit post by tuananh_org](https://www.reddit.com/r/archlinux/comments/1j77921/speech_to_text_app/).
|
[Reddit post by tuananh_org](https://www.reddit.com/r/archlinux/comments/1j77921/speech_to_text_app/).
|
||||||
|
|
||||||
|
## Model APIs
|
||||||
|
|
||||||
|
There are various APIs that are used to do speech recognition.
|
||||||
|
|
||||||
|
### VOSK API
|
||||||
|
|
||||||
|
The VOSK API mostly does not include punctuation and is not case-sensitive.
|
||||||
|
However, it is generally real-time capable.
|
||||||
|
A good source for VOSK models is [alphacephei](https://alphacephei.com/vosk/models).
|
||||||
|
|
||||||
|
For the german language the 900k model from the Tuda-DE project is recommended.
|
||||||
|
It is also available from the [corresponding GitHub page](https://github.com/uhh-lt/kaldi-tuda-de).
|
||||||
|
This model is mostly not capable of transcribing English words.
|
||||||
|
|
||||||
|
### Whisper API
|
||||||
|
|
||||||
|
The Whisper API mostly includes punctuation and is case-sensitive.
|
||||||
|
However, it is generally not real-time capable.
|
||||||
|
Many models are available on [Hugging Face](https://huggingface.co/).
|
||||||
|
|
||||||
|
A good model for the German language is the
|
||||||
|
[GGML Q5_0 quantization of primelines whisper-large-v3-turbo-german](https://huggingface.co/F1sk/whisper-large-v3-turbo-german-ggml-q5_0).
|
||||||
|
This model is also capable of transcribing some English words.
|
||||||
|
|||||||
Reference in New Issue
Block a user