

Then, I use pyAudioAnalysis for silence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resulting in smaller audio segments which are much easier to process. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate. So, when you first run the script, I use FFMPEG to extract the audio from the video and save it in audio/. You should definitely check it out for STT tasks. Mozilla DeepSpeech is an amazing open-source speech-to-text engine with support for fine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. $ python3 autosub/main.py -file ~/movie.mp4 -vtt Nearly identical to VTT file downloaded from YouTube with youtube_dl.
#MOZILLA SPEECH TO TEXT INSTALL#
Make sure to install the appropriate CUDA version. If you would like the subtitles to be generated faster, you can use the GPU package instead. If you're running Ubuntu, this should work fine. # Model file (~190 MB)Ĭreate two folders audio/ and output/ to store audio segments and final SRT file $ mkdir audio output The scorer file is optional, but it greatly improves inference results. All further steps should be performed while in the AutoSub/ directory $ git clone Ĭreate a pip virtual environment to install the required packages $ python3 -m venv subĭownload the model and scorer files from DeepSpeech repo.

Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it.
#MOZILLA SPEECH TO TEXT MOVIE#
I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded.
#MOZILLA SPEECH TO TEXT DOWNLOAD#
In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. ⭐ Featured in DeepSpeech Examples by Mozilla Motivation I use the DeepSpeech Python API to run inference on audio segments and pyAudioAnalysis to split the initial audio on silent segments, producing multiple small files. AutoSub is a CLI application to generate subtitle file (.srt) for any video file using Mozilla DeepSpeech.
