telkeron.blogg.se - Mozilla speech to text

#MOZILLA SPEECH TO TEXT MOVIE#
#MOZILLA SPEECH TO TEXT INSTALL#
#MOZILLA SPEECH TO TEXT DOWNLOAD#

Then, I use pyAudioAnalysis for silence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resulting in smaller audio segments which are much easier to process. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate. So, when you first run the script, I use FFMPEG to extract the audio from the video and save it in audio/. You should definitely check it out for STT tasks. Mozilla DeepSpeech is an amazing open-source speech-to-text engine with support for fine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. $ python3 autosub/main.py -file ~/movie.mp4 -vtt Nearly identical to VTT file downloaded from YouTube with youtube_dl.

WEB VTT Output (Credits - Output VTT file including cue points for individual words.

Open the video file and add this SRT file as a subtitle, or you can just drag and drop in VLC.

After the script finishes, the SRT file is saved in output/.

$ python3 autosub/main.py -file ~/movie.mp4 The -file argument is the video file for which SRT file is to be generated

After following the installation instructions, you can run autosub/main.py as given below.

Make sure the model and scorer files are in the root directory.

Make sure to use container name while copying to local.

$ docker build -build-arg model=0.9.3 -t ds-stt. You can manually edit them to point to other model files easily. The model build-arg configures which model and scorer versions to use.

Installation using Docker is pretty straight-forward.

#MOZILLA SPEECH TO TEXT INSTALL#

Make sure to install the appropriate CUDA version. If you would like the subtitles to be generated faster, you can use the GPU package instead. If you're running Ubuntu, this should work fine. # Model file (~190 MB)Ĭreate two folders audio/ and output/ to store audio segments and final SRT file $ mkdir audio output The scorer file is optional, but it greatly improves inference results. All further steps should be performed while in the AutoSub/ directory $ git clone Ĭreate a pip virtual environment to install the required packages $ python3 -m venv subĭownload the model and scorer files from DeepSpeech repo.

Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it.

#MOZILLA SPEECH TO TEXT MOVIE#

I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded.

#MOZILLA SPEECH TO TEXT DOWNLOAD#

In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. ⭐ Featured in DeepSpeech Examples by Mozilla Motivation I use the DeepSpeech Python API to run inference on audio segments and pyAudioAnalysis to split the initial audio on silent segments, producing multiple small files. AutoSub is a CLI application to generate subtitle file (.srt) for any video file using Mozilla DeepSpeech.