Copyright © 2005-2016 MultiMedia Soft

How to integrate Microsoft's Speech API

Previous pageReturn to chapter overviewNext page

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications.


A managed code version of the API ships as part of the .NET Framework 3.0 and the System.Speech.Synthesis namespace is used to access the SAPI synthesizer engine to render text into speech using a voice installed inside the system; the System.Speech.Synthesis.SpeechSynthesizer.SetOutputToDefaultAudioDevice method allows to send speech directly to the system default sound card but there is no way to redirect the sound flow to a different sound card or to perform speakers assignment and, obviously, there is no way to apply any kind of special effect or to display visual feedbacks for the sound stream being sent in output.


The System.Speech.Synthesis.SpeechSynthesizer.SetOutputToAudioStream method allows sending speech to a generic System.IO.Stream object and Audio DJ Studio allows leveraging this feature in order to take full control over the sound stream of the speech; needed steps are the following:


Create a queued stream through the StreamQueueCreate method
Put the queued stream in playback through the PlaySound method
Create an object of class System.Speech.AudioFormat.SpeechAudioFormatInfo having the same frequency, channels and bits per sample used for the StreamQueueCreate method
Create an object of class System.IO.MemoryStream
Call the System.Speech.Synthesis.SpeechSynthesizer.SetOutputToAudioStream method passing both the MemoryStream object and the SpeechAudioFormatInfo object
Start generating the audio stream through the System.Speech.Synthesis.SpeechSynthesizer.Speak method
Through the System.Speech.Synthesis.SpeechSynthesizer.SetOutputToNull method reset SAPI's output to the system default sound card
Call the StreamQueuePushDataMs method passing the MemoryStream object


Due to the fact that the combination of StreamQueueCreate and StreamQueuePushDataMs methods act on a certain instanced player, once the queued stream is in playback you can apply any output redirection, speakers management, special effects and visual feedbacks as for any sound file loaded through the LoadSound method.


A sample of of integration with SAPI in Visual C# and Visual Basic.NET can be found inside the following sample installed with the product's setup package:

- TextToSpeechHelper (requires Visual Studio 2008 or higher)