I've been doing tons of Googling and RSS-reading lately. It turns out that there is plenty of speech technology out there, a lot of it open source, but all difficult to find on the web. I'm not sure why that is. I think that voice has just been under the radar for a while.
In any case, here is a list of software and related sites for speech technology. I hope to play with each of these and give my impressions here. If you're familiar with any of them, please leave me a note!
Speech Recognition Engines
CMU Sphinx
* cmusphinx.sourceforge.net/html/cmusphinx.php
* sourceforge.net/projects/cmusphinx
* www.speech.cs.cmu.edu/
* Several versions, C & Java
* sphinx2, sphinx3, sphinx4, and PocketSphinx
* BSD license
Julius
* julius.sourceforge.jp/en/
* C/C++
* custom open source license
* mostly Japanese, but supports English
HTK 3
* htk.eng.cam.ac.uk/
* C/C++
* software & code are free for internal use, but distribution of any kind is prohibited
ISIP ASR
* Institute for Signal and Information Processing: Automatic Speech Recognition
* www.ece.msstate.edu/research/isip/projects/speech/index.html
* C/C++
* Public Domain
Snack Sound Toolkit
* www.speech.kth.se/snack/
* C/C++ plus Python & TCL
* BSD license
* Snack for Ruby: rbsnack.sourceforge.net/
Open Mind Speech
* freespeech.sourceforge.net/
* Previously called "FreeSpeech"
* appears dead, last release 2002
Speech Synthesis
Festival
* www.cstr.ed.ac.uk/projects/festival/
* C/C++
* very popular on Linux, built into several apps like KDE's KSayIt and Gnome's gnome-speech
* BSD-ish license
Flite
* www.speech.cs.cmu.edu/flite/
* "Festival Lite"
* a lighter-weight, version of Festival written by CMU
* written entirely in C rather than C++
* BSD license
FreeTTS
* freetts.sourceforge.net/docs/index.php
* Java port of Flite
* BSD license
Voice & Speech Corpora Sources
FestVox
* www.festvox.org
* from CMU, they provide documentation and scripts to create your own voices
VoxForge
* www.voxforge.org/
* GPL-licensed collection of voice recordings and their transcriptions (called "speech corpora")
* can be used in most of the speech recognition engines listed above (Sphinx, Julius, HTK3, and ISIP; possibly Snack)
Cepstral, LLC
* www.cepstral.com/
* high-quality synthetic voices that are Festival compatible
* commercial, but much higher quality than most free voices
* great dynamic samples on their website
Comments (1)
i'll definately be checking back on your blog. As i am currently looking at Sphinx 4 + voxforge for a personal proof of concept.
Definately interested i knowing how each compare. Will you compare the open source to commercially available apps like Drgon Naturally speaking?
Posted by Sean Scott | July 11, 2007 9:57 AM
Posted on July 11, 2007 09:57