Non-Commercial Digital Voice Projects and Technology Sites
The entries below emphasize non-commercial, research, community, and open-source efforts in speech recognition, speech synthesis, vocoders, corpora, and related digital voice tooling. URLs are displayed in full and remain clickable.
-
Mozilla. "Common Voice." Mozilla, n.d., https://commonvoice.mozilla.org/. Accessed 27 Dec. 2025.
Community-led platform for collecting and sharing open speech data to support speech recognition and related voice technologies.
-
OpenSLR. "Open Speech and Language Resources." OpenSLR, n.d., https://openslr.org/. Accessed 27 Dec. 2025.
Repository-style site hosting speech and language resources (training corpora, language models, and speech-related software), commonly used in ASR research workflows.
-
Panayotov, Vassil, with assistance from Daniel Povey. "LibriSpeech ASR Corpus." OpenSLR, n.d., https://www.openslr.org/12. Accessed 27 Dec. 2025.
Widely used, freely available English read-speech corpus (~1,000 hours) for training and evaluating automatic speech recognition systems.
-
CNRS (LACITO). "Pangloss Collection." Pangloss, n.d., https://pangloss.cnrs.fr/?lang=en. Accessed 27 Dec. 2025.
Open-access archive of linguistic audio documents (often aligned with transcriptions/translations), useful for speech and language research and preservation of rare or less-studied languages.
-
CMUSphinx Project. "CMUSphinx: Open Source Speech Recognition Toolkit." CMUSphinx, n.d., https://cmusphinx.github.io/. Accessed 27 Dec. 2025.
Classic open-source speech recognition suite (including PocketSphinx) with documentation, models, and tooling for building recognizers.
-
Povey, Daniel, et al. "Kaldi: Speech Recognition Toolkit." Kaldi, n.d., https://kaldi-asr.org/. Accessed 27 Dec. 2025.
Research-grade ASR toolkit with extensive recipes and documentation; widely used for building and benchmarking speech recognition pipelines.
-
Cambridge University Engineering Department. "HTK Speech Recognition Toolkit." HTK, n.d., https://htk.eng.cam.ac.uk/. Accessed 27 Dec. 2025.
Hidden Markov Model Toolkit (HMM) used in speech recognition research; includes model building/manipulation utilities and supporting documentation.
-
Julius Speech Team. "Julius: Large Vocabulary Continuous Speech Recognition Engine." GitHub, n.d., https://github.com/julius-speech/julius. Accessed 27 Dec. 2025.
Open-source LVCSR decoder aimed at researchers and developers; commonly paired with community acoustic models and grammars for on-device recognition.
-
Centre for Speech Technology Research (CSTR), University of Edinburgh. "Festival Speech Synthesis System." University of Edinburgh, n.d., https://www.cstr.ed.ac.uk/projects/festival/. Accessed 27 Dec. 2025.
General framework for building speech synthesis systems, providing a full text-to-speech workbench with multiple APIs and multilingual support.
-
FestVox Project. "FestVox." FestVox, n.d., https://www.festvox.org/. Accessed 27 Dec. 2025.
Tools, scripts, and documentation for building new synthetic voices (often used alongside Festival and related research toolchains).
-
Language Technologies Institute, Carnegie Mellon University. "CMU Flite: Speech Synthesizer." CMU Flite, n.d., https://cmuflite.org/. Accessed 27 Dec. 2025.
Small-footprint, fast, open-source TTS engine designed for embedded systems and server deployments where Festival is too heavy.
-
OpenMary Project. "MARY TTS (OpenMary) Development Page." OpenDFKI, 11 Aug. 2018, https://mary.opendfki.de/. Accessed 27 Dec. 2025.
Open-source, multilingual Java-based text-to-speech platform; entry point to releases, demos, and project documentation.
-
Duddington, Jonathan. "eSpeak Text to Speech." eSpeak, n.d., https://espeak.sourceforge.net/. Accessed 27 Dec. 2025.
Compact open-source formant-synthesis TTS engine supporting many languages, frequently used where small size and wide language coverage are priorities.
-
Open JTalk Team. "Open JTalk." Open JTalk, 25 Dec. 2018, https://open-jtalk.sourceforge.net/. Accessed 27 Dec. 2025.
Japanese text-to-speech system released under a BSD-style license; useful reference implementation and toolchain for Japanese TTS.
-
Numediart Institute. "MBROLA: Diphone-Based Speech Synthesizer." GitHub, n.d., https://github.com/numediart/MBROLA. Accessed 27 Dec. 2025.
Diphone-concatenation synthesizer; often used as a back-end with a separate text-to-phoneme front end (e.g., eSpeak NG) and voice databases.
-
McAuliffe, Michael, et al. "Montreal Forced Aligner Documentation." Read the Docs, n.d., https://montreal-forced-aligner.readthedocs.io/. Accessed 27 Dec. 2025.
End-to-end forced-alignment toolchain for time-aligning transcripts with audio; commonly used for speech corpora preparation and phonetic analysis.
-
Boersma, Paul, and David Weenink. "Praat: Doing Phonetics by Computer." University of Amsterdam, n.d., https://www.fon.hum.uva.nl/praat/. Accessed 27 Dec. 2025.
Widely used phonetics tool for speech analysis (pitch, formants, spectrograms), manipulation, and synthesis; includes extensive documentation and scripting.
-
Morise, Masanori. "WORLD." Meiji University, n.d., https://www.isc.meiji.ac.jp/~mmorise/world/english/. Accessed 27 Dec. 2025.
High-quality speech vocoder (analysis/synthesis) used in speech processing and singing/voice synthesis workflows; provides parameter extraction and resynthesis.
-
SPTK Working Group. "Speech Signal Processing Toolkit (SPTK)." SPTK, 25 Dec. 2017, https://sp-tk.sourceforge.net/. Accessed 27 Dec. 2025.
Suite of command-line speech signal processing tools (e.g., LPC/PARCOR/LSP analysis and related utilities) for UNIX-style pipelines.
-
Centre for Speech Technology Research (CSTR), University of Edinburgh. "Merlin: The Neural Network Speech Synthesis System." CSTR, n.d., https://www.cstr.ed.ac.uk/projects/merlin/. Accessed 27 Dec. 2025.
Toolkit for building deep neural network models for statistical parametric speech synthesis; designed to integrate with front ends like Festival and vocoders such as WORLD.
-
Weinberger, Steven. "Speech Accent Archive." George Mason University, n.d., https://accent.gmu.edu/. Accessed 27 Dec. 2025.
Uniformly collected and presented accent recordings (plus metadata/transcriptions) used for linguistics teaching, phonetic analysis, and speech technology evaluation.
-
VoxForge Community. "VoxForge." VoxForge, n.d., https://www.voxforge.org/. Accessed 27 Dec. 2025.
Community effort to collect transcribed speech and provide acoustic models for free/open-source speech recognition engines (e.g., Sphinx, Julius).