« O'Reilly on Google Speech and Gestures to Speech | Main | DCampSouth Unconference 2007 »

Overview of Open Source Speech Software

I've been doing tons of Googling and RSS-reading lately. It turns out that there is plenty of speech technology out there, a lot of it open source, but all difficult to find on the web. I'm not sure why that is. I think that voice has just been under the radar for a while.

In any case, here is a list of software and related sites for speech technology. I hope to play with each of these and give my impressions here. If you're familiar with any of them, please leave me a note!

Speech Recognition Engines

CMU Sphinx
* cmusphinx.sourceforge.net/html/cmusphinx.php
* sourceforge.net/projects/cmusphinx
* www.speech.cs.cmu.edu/
* Several versions, C & Java
* sphinx2, sphinx3, sphinx4, and PocketSphinx
* BSD license

Julius
* julius.sourceforge.jp/en/
* C/C++
* custom open source license
* mostly Japanese, but supports English

HTK 3
* htk.eng.cam.ac.uk/
* C/C++
* software & code are free for internal use, but distribution of any kind is prohibited

ISIP ASR
* Institute for Signal and Information Processing: Automatic Speech Recognition
* www.ece.msstate.edu/research/isip/projects/speech/index.html
* C/C++
* Public Domain

Snack Sound Toolkit
* www.speech.kth.se/snack/
* C/C++ plus Python & TCL
* BSD license
* Snack for Ruby: rbsnack.sourceforge.net/

Open Mind Speech
* freespeech.sourceforge.net/
* Previously called "FreeSpeech"
* appears dead, last release 2002

Speech Synthesis

Festival
* www.cstr.ed.ac.uk/projects/festival/
* C/C++
* very popular on Linux, built into several apps like KDE's KSayIt and Gnome's gnome-speech
* BSD-ish license

Flite
* www.speech.cs.cmu.edu/flite/
* "Festival Lite"
* a lighter-weight, version of Festival written by CMU
* written entirely in C rather than C++
* BSD license

FreeTTS
* freetts.sourceforge.net/docs/index.php
* Java port of Flite
* BSD license

Voice & Speech Corpora Sources

FestVox
* www.festvox.org
* from CMU, they provide documentation and scripts to create your own voices

VoxForge
* www.voxforge.org/
* GPL-licensed collection of voice recordings and their transcriptions (called "speech corpora")
* can be used in most of the speech recognition engines listed above (Sphinx, Julius, HTK3, and ISIP; possibly Snack)

Cepstral, LLC
* www.cepstral.com/
* high-quality synthetic voices that are Festival compatible
* commercial, but much higher quality than most free voices
* great dynamic samples on their website

Comments (1)

i'll definately be checking back on your blog. As i am currently looking at Sphinx 4 + voxforge for a personal proof of concept.

Definately interested i knowing how each compare. Will you compare the open source to commercially available apps like Drgon Naturally speaking?

Post a comment

About

This page contains a single entry from the blog posted on May 31, 2007 9:00 PM.

The previous post in this blog was O'Reilly on Google Speech and Gestures to Speech.

The next post in this blog is DCampSouth Unconference 2007.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34