Introduction:
This article describes an application used to exercise
some of the Text To Speech features available to .NET developers through the
Microsoft Speech 5.1 SDK. This article does not address the newer speech server
related libraries nor does it address web based deployments of speech related
technologies.
The application performs several functions although all
work in basically the same manner. The application is intended to provide a
introduction to working with the TTS library by illustrating how to go about
gaining access to and manipulating voices, and playing text out as synthesized
voice. The application provides examples of generating speech as you type,
passing canned phrases to TTS, and passing entire text files to TTS.
Getting Started:
In order to get started, unzip the included project and
open the solution in the Visual Studio 2005 environment. You will note that the
project contains a file cleverly named "Form1.vb". This form contains all of
the code necessary to get a start with programming TTS.
To begin, you may not have the necessary references on
your machine as the application requires the installation of Microsoft's speech
5.1 SDK and the Microsoft sample TTS engine library. These may be downloaded
with the SDK at no cost from this URL:
Speech 5.1 SDK:
http://www.microsoft.com/downloads/details.aspx?FamilyId=5E86EC97-40A7-453F-B0EE-6583171B4530&displaylang=en
You may also obtain a couple of additional voices (the
SDK includes Microsoft Mary, Microsoft Mike, and Microsoft Sam) by downloading
and the Microsoft Reader and additional TTS components found on this URL: (not
required, but you will gain two additional voices if you do add these to your
system)
http://www.microsoft.com/reader/downloads/default.asp
You do not need to activate the reader for this to
work, however, you can't install the additional voices unless you have the
reader installed.
If you have any other voices on your system, they may
also be exposed to the application. For example, my Toshiba laptop has an
additional voice called "TOSHIBA male adult (U.S.)" and this voice also appears
as available to this application at runtime.
If you need to update the project references, do so
prior to attempting to run the application. Once you have installed the speech
SDK, go back to the project and run a build. If the references are absent,
remove these (highlighted) references: (Figure 1)
Figure 1: Speech Related Project References
After removing the old references, right click on the
project and select "Add Reference". Once the dialog opens, select the COM tab
(then go get a cup of coffee while it takes forever to load) and when you get
back, look for and add these two references (figures 2 and 3) (Note: You really
don't need the second reference to the sample TTS engine):
Figure 2: Add the Microsoft Speech Object Library
Reference
Figure 3: Add the Sample TTS Engine Type Library
Reference
Having added these references, go ahead and do a build
to see if anything else is missing. If anything turns up, add the missing
project references in the same manner and build again. Once you have a good
build, go ahead and run the application. On start, you will see this form
appear:
Figure 4: The main form of the TTS Reader application
Looking at the form note that it has five control
groups: "Configuration", "Speak As You type", "Speak Specific Phrases", "Speak
On Enter", and "Load a Text File and Read It".
Configuration.
This control group contains two controls, the speaker
combo box, and the speech rate track bar control. The speaker combo box is
populated with the names of each of the TTS speaker voices, you may change the
current speaker by selecting a different option form this combo box.
The rate track bar control will speed up or reduce the
cadence of the synthesized speech. It is set to contain five positions and
whenever its value is changed, the rate of speech will be altered to execute at
the newly set rate.
Speak As You Type.
This control group contains a single text box which has
been configured such that, whenever the user hits the space bar, the speaker
will read the contents of the text box and, once finished reading, it will clear
the text box. The intent here was to see if you could type as you go and speak
through TTS. It seemed like a nice idea and it seems like it would be
worthwhile for someone lacking the capacity for speech to use a function like
this to speak by typing. In reality, the action is a little choppy and the
speech rendered is not too terrific. With the application running, you may key
in a word and listen to the results for yourself. If you type slow enough, it
is adequate but it is not quite quick enough to use as a form of conversation.
Speak Specific Phrases.
This control group contains a single combo box;
whenever a new value is selected from the box, it will immediately be read by
the speaker.
Speak On Enter.
This appears to be a far more viable way to conduct a
conversation using TTS as a voice medium. This control works in a manner very
similar to the "Speak As You Type" option, however, it reads and clears the text
box only after the user hits the "enter" key. You may try typing in a sentence
and then hitting the enter key to get a feel for how that works.
Load a Text File and Read It.
This control group contains a single multi-line text
box control and three buttons: "Open File", "Stop", and "Read File". Click on
the "Open File" button and use the open file dialog box to navigate to any text
file. The text file will load into the text box and with a file loaded, you may
click on the "Read File" button to have the speaker read the contents of the
text box end to end. TTS does a fair job of this however I will point out that
punctuation and abbreviations do not work out too well for the 5.1 SDK.
You may also key text into the text box and evoke the
"Read File" function to read the contents of the text box.
The Code.
The code is pretty straight forward and easy to
follow. The class definition begins as follows:
Imports
SpeechLib
Imports
System.Environment
Imports
System.DateTime
Public
Class Form1
#Region
"Declarations"
Public WithEvents
vox As New
SpVoice
Public RateOfSpeech
As Integer = 3
#End
Region
Private Sub
Form1_Load(ByVal sender
As System.Object,
ByVal e As
System.EventArgs)
Handles MyBase.Load
' Load the voices combo box
Dim Token As
ISpeechObjectToken
For Each Token
In vox.GetVoices
cboVoxOptions.Items.Add(Token.GetDescription())
Next
cboVoxOptions.SelectedIndex = 0
Dim str As
String = Environment.UserName.ToString()
SayGreeting(str)
End Sub
As you
can see, the imports section includes the speech library. A declaration region
was next defined and two variables were declared within that region. The first
creates an instance of an SpVoice and note that the declaration is made with
events. The other variable, RateOfSpeech, is used to keep track of the current
rate of speech selected using the rate of speech track bar control.
In form
load, we begin by collecting all of the current voices and adding them to the
combo box used to select a speaker. The current index is set to zero such that,
when the form loads, a current speaker will be defined.
The last
two lines of the form load subroutine are used to capture the user's name
(however it may be defined on the target machine) and to pass the name to the
Say Greeting subroutine. The "Say Greeting" subroutine is used to present a
welcome message to the user through TTS. The "Say Greeting" subroutine is
written as follows:
Public
Sub SayGreeting(ByVal
strUser As String)
' Now say something
vox.Voice =
vox.GetVoices().Item(cboVoxOptions.SelectedIndex)
Dim dt As
DateTime
dt = Now
' clear your throat
vox.Rate = RateOfSpeech
vox.Speak("".ToString,
SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)
Try
vox.Speak("Greetings
" & strUser & " from Text To Speech",
_
SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)
vox.Speak("Today's
Date is " & dt.ToShortDateString, _
SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)
vox.Speak("The
time is " & dt.ToShortTimeString, _
SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)
Catch ex As
Exception
MsgBox(ex.ToString,
MsgBoxStyle.Exclamation, "I'm Speechless")
End Try
End
Sub
As you can see, the subroutine formats a message
containing the passed in user name as well as the date and time and then reads
that message aloud using the current speaker voice. Note the use of the
SVSFPurgeBeforeSpeak flag; it is there to ensure that the speaker will finish
the last statement before progressing on to the next one.
Next up is the track bar control's handler, it is
written as follows:
Private
Sub tbarRateOfSpeech_Scroll(ByVal
sender As System.Object,
ByVal
e
As System.EventArgs)
Handles tbarRateOfSpeech.Scroll
Me.RateOfSpeech = tbarRateOfSpeech.Value
End
Sub
This function merely sets the rate of speech variable
to contain the current track bar value. The variable is used to set the rate
property for the speaker whenever the speaker is passed text to read.
Following the track bar control handler, you will see
the following code:
Private
Sub TextBox1_KeyPress(ByVal
sender As Object,
ByVal e As
System.Windows.Forms.KeyPressEventArgs) Handles
TextBox1.KeyPress
vox.Rate = RateOfSpeech
' this will try to speak each word as you type, it
does not keep up
' all
that well
If e.KeyChar =
Microsoft.VisualBasic.ChrW(Keys.Space) Or
e.KeyChar =
Microsoft.VisualBasic.ChrW(Keys.Enter)
Then
vox.Speak(TextBox1.Text, SpeechVoiceSpeakFlags.SVSFDefault)
TextBox1.Text =
""
End If
End
Sub
This bit of code is used to drive the Speak As You Type
function, here the rate of speech is set to the current rate of speech
variable's value and the text box is set to look for a space key hit; whenever a
space is entered, the code will pass the contents of the text box to the
speaker, the speaker will read the text, and then the text box will be cleared
and made ready for the next word to be typed.
The next bit of code will drive the Speak On Enter
function, the code is identical to that used in the Speak As You Type function
but rather than reading out the contents of the text box on space, the contents
will be read out whenever the user hits the enter key. That code looks like
this:
Private
Sub TextBox2_KeyPress(ByVal
sender As Object,
ByVal e As
System.Windows.Forms.KeyPressEventArgs) Handles
TextBox2.KeyPress
vox.Rate = RateOfSpeech
' this will try to speak the contents of the textbox
on Enter
If e.KeyChar =
Microsoft.VisualBasic.ChrW(Keys.Enter) Then
vox.Speak(TextBox2.Text, SpeechVoiceSpeakFlags.SVSFDefault)
TextBox2.Text =
""
End If
End
Sub
The last pieces of code to look at manage the function
used to read from a text file. The first item is used to open a file open
dialog and read a text file into the control group's text box. That code looks
like this:
Private
Sub btnOpenFile_Click(ByVal
sender As System.Object,
ByVal e As
System.EventArgs)
Handles btnOpenFile.Click
vox.Rate = RateOfSpeech
If OpenFileDialog1.ShowDialog() =
Windows.Forms.DialogResult.OK Then
Dim
sr As New
System.IO.StreamReader(OpenFileDialog1.FileName)
Me.txtReadFile.Text
= sr.ReadToEnd.ToString()
sr.Close()
End If
End
Sub
The next bit is used to read the file, it looks like
this:
Private
Sub btnReadFile_Click(ByVal
sender As System.Object,
ByVal e As
System.EventArgs)
Handles btnReadFile.Click
vox.Rate = RateOfSpeech
vox.Speak(txtReadFile.Text.ToString(), _
SpeechVoiceSpeakFlags.SVSFlagsAsync)
End
Sub
You will note that the function is basically the same
as that used to read from one of the other form text boxes (note that the speak
flag is set to the asynchronous mode). The next item to look at is used to stop
the speaker from continuing to read from the text; that code looks like this:
Private
Sub btnStop_Click(ByVal
sender As System.Object,
ByVal e As
System.EventArgs)
Handles btnStop.Click
vox.Speak("",
SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak)
End
Sub
This subroutine passes an empty string to the speaker
and in so doing stops the speaker from continuing.
The last bit of code in the application is used to
change the speaker's voice to one selected from the speaker combo box, that code
looks like this:
Private
Sub cboVoxOptions_SelectedIndexChanged(ByVal
sender As
System.Object,
ByVal e As
System.EventArgs) Handles
cboVoxOptions.SelectedIndexChanged
vox.Voice =
vox.GetVoices().Item(cboVoxOptions.SelectedIndex)
End
Sub
Summary.
This article and code sample was intended to provide a
very easy introduction into TTS based speech synthesis; there are a great many
more things that you can do with the speech SDK than have been addressed in this
document. A review of the contents of the speech SDK will provide greater
details on the use of the speech libraries.