Resource
Room Home > Older
Learners > Speech Recognition Software
SPEAKING TO WRITE / WORD FOR WORD: An overview of
Speech Recognition
by Bob Follansbee, Ed. D.
Reprinted with permission from the International
Dyslexia Association quarterly newsletter, Perspectives, Fall,
2003 vol. 29, No. 4. It's worth joining IDA just to get Perspectives
- each issue has many articles like this. Their website is
http://www.interdys.org.)
Few technological innovations have prompted as much promise
for students with learning disabilities (LD), and as many
dashed expectations, as speech recognition. You may have heard
that speech recognition is the "magic bullet" for
some students, but you may also have heard stories about how
speech recognition simply "doesn't work." What do
we really know about speech recognition for students with
LD and how it works or doesn't? The current situation with speech recognition.
Generally, when people today speak of "speech recognition"
they are referring to continuous speech recognition, so named
because, based on its claims, one can speak to the computer
in a natural voice and at a normal rate of speech, and it
will simply write what was said. This is a fine example of
marketing hyperbole that glosses over important aspects of
using the software. We will discuss below how such results
might be achieved for some students with LD. The primary continuous speech products now available are
various editions of Dragon NaturallySpeaking (now owned
by Scansoft) and IBM Via Voice. Dragon products operate
in Windows only, while ViaVoice offers versions for both Windows
and the Macintosh. There is also a newer Macintosh-only product,
iListen., by MacSpeech. (An older technology, discrete speech
recognition, represented by DragonDictate, is less readily
available now but may still be preferable for some individuals
- see below).
.
Product information can be obtained at the following
sites: For Dragon NaturallySpeaking:
http://www.scansoft.com/naturallyspeaking/
For IBM ViaVoice@:
http://www-3.ibm.com/software/speech
For MacSpeech iListen@: http://www.macspeech.com/products/iListen.html
Does speech recognition work? The promise and the reality. This question does not have a simple, one-size-fits-all,
answer. However, with the proper introduction to and training
on the technology, and with reasonable expectations, speech
recognition software can operate with remarkable ease and
accuracy, and can be a tremendous boon to some students who
might otherwise never have a successful writing experience.
It can be an avenue for some students to begin to participate
more effectively and more independently in grade-appropriate
work. Students who use speech recognition do not automatically
become better writers, but they are almost always able to
produce more work, more easily, which then allows them to
engage in writing instruction and composition at a more advanced
level. Instead of simple editing decisions like re copying
a few sentences to make them more legible, or correcting spelling
and j grammar mistakes, students can focus on the more complex
cognitive tasks that come with producing more text - tasks
like revision and reorganization. Students can begin to write
at a level that matches their grade level and, in fact, the
level at which they could actually dictate text. On the other hand, failure to understand the training and
usage requirements of speech recognition products often leads
to unsuccessful experiences with this technology. Additional
unsuccessful experiences are not what you want for your students
with LD.
-
The first speech recognition usage issue relates to
the potential user. Speech recognition is not the appropriate
tool for every individual. Generally, successful users
of speech recognition exhibit at least some of the following
characteristics:
-
Ability to compose orally. This does not have to be well-developed,
but at least show potential.
-
Ability to reflect on and modify their own speech patterns
and articulation of words. This requires a level of linguistic
self-monitoring and control over the muscles of articulation.
-
Patience and perseverance needed to complete the
initial stages of using the software.
-
Understanding of the purposes of literacy. Even students
with LD who shun participation in literacy-based activities
may understand WHY literacy is important.
These points assume a basic level of cognitive ability and
development, and most successful users are at least 10 years
old. However, these points are not rules, but guidelines
for consideration. With the appropriate training
and support, students of varying ages and abilities can use
speech recognition in some manner. Evaluating the success of speech recognition is not necessarily
a straightforward or objective matter, but should be based
on how well the software addresses the individual's source(s)
of writing difficulty. For example, a logical "objective"
measure of success is the error rate in the software's recognition
of a student's speech. However, some struggling and frustrated
students may feel very successful, perhaps for the first time,
simply being able to produce a much greater amount of text,
regardless of the software's recognition accuracy. Corrections
to errors in software accuracy can be treated as a different
form of editing, which is part of any writer's task. We know from considerable experience through our projects
and from the input of many participants in our listserv, that
speech recognition is being successfully implemented with
students around the country. Many successful users are students
with LD who are using the software at home to complete homework
and longer written assignments - the school may not even be
aware that a student is using speech recognition. Some successful
users are individuals who begin using speech recognition in
school through advocacy and under the requirements of an IEP;
situations like this can often quickly result in other students
using speech recognition also, with students helping to train
each other.
What is needed? Successful implementation of speech recognition does not
happen by accident. Schools must be committed to providing
basic resources, staff must be committed to working with the
student and the technology, and students must be committed
to the process of improving their performance both in the
mastery of the software and in their writing skills. First, it is desirable that speech recognition be considered
as only part of a continuum of assistive technology strategies.
Speech recognition is relatively intensive in terms of the
training required for successful use, and it is also somewhat
limited in terms of the environments where it can be used
successfully. Other strategies should be considered for any
given student. On the other hand it is a very powerful solution
in some cases and may be the most effective (or only) way
to "reclaim" certain students. Finally, there is
no reason why some students might not use multiple text creation
strategies, including speech recognition, depending on the
demands of the assignment or the situation.
Introduction and training. Learning to use speech recognition effectively requires
four things.
- First, the new user must train the software to recognize
his or her voice through an initial enrollment process that
can be demanding for poor readers: the enrollee must read
text that is presented on the screen. However, the enrollment
process allows pausing so that the text can be previewed
with the student screen-by- screen. While this process used
to take 30 minutes of continual reading in earlier software
versions, it has now been cut down to about 10 minutes.
Then the user must learn to:
- Speak so that the software can understand what is said.
This IS NOT the same as speaking in conversation. While
the software does adapt to the user's particular voice,
for most users each word must be enunciated relatively clearly
(e.g., "I have to study..." rather than "I
hafta study..."). Users are encouraged to speak in
multiple word utterances, but this can be varied from whole
sentences to phrases to even single words, depending on
what works best for the student's voice and style. This
process will require some monitoring by a knowledgeable
support person and time spent practicing, much as it takes
practice to learn to type. We recommend using simple writing
tasks with low cognitive demands during this phase - don't
begin with a major assignment.
- Make corrections and otherwise operate the software.
Learning to make corrections through the software is especially
important so that the software learns the user's voice better.
This is the step that is often left too much to chance.
The first few hours of dictation should be carefully
monitored so that errors in recognition are corrected through
the software and the student understands where the software
is having difficulty understanding his or her voice.
Other aspects of operating the software, such as using voice
commands (e.g., "Save File") may be important
or motivating to some students, but voice commands may be
mis-recognized, thereby serving as additional sources of
frustration. For students with LD, we recommend use of the
mouse and keyboard for such actions whenever possible.
- Compose through a new medium. Composing via speech is
different from doing so through a pencil or the keyboard.
This requires time to accommodate. In the meantime, other
scaffolding strategies to support writing, such as pre-writing
activities and editing, may still be appropriate.
The most common complaint encountered in speech recognition
implementation is that the software "just doesn't work"
when the student talks to it. Digging below the surface of
such complaints often demonstrates that the student and adult
supporter have not really understood some of the basic requirements
for training the software and have expectations of "natural
speech" that exceed the capacities of the technology.
Users must remember that the computer does not have any real
comprehension of language. When the user is enrolling to a
"voice file," the software is matching that user's
voice to the models it already has. It is essential that the
user learn both how to speak to the software to maximize its
understanding (#2 above), and also how to make corrections
the proper way to provide the software voice file with useful
information about its recognition errors (#3). Proper training
and patience is often the solution.
For successful implementation of speech recognition, students,
staff, and schools have specific responsibilities.
Schools must provide:
-
A support staff member to provide training and support
in use of the technology. This person might be a special
education teacher, speech pathologist, technology or inclusion
specialist, or even the English teacher, etc.
-
Opportunities for collaboration between the speech recognition
support staff member and the teacher(s) who implement
or support the student's writing requirements. In some
cases this MIGHT be the same person, but will often involve
two or more different teachers.
-
Training in implementation for an appropriate staff
member. This implies not only to actual workshop time,
but also supported practice time for the staff member.
-
Consulting support for staff and students as needed during
implementation with students.
-
Adequate hardware and technical support for hardware
problems, software installation, etc.
-
Space for use of speech recognition. This technology
does not require absolute silence, and can be used with
considerable background noise if set-up properly. However,
some environments are very difficult to accommodate. A
typically problematic space is the kind often encountered
in older school buildings: high ceilings with hard surfaces
(tile, plaster, etc.) everywhere and no acoustic absorption.
Finding smaller spaces or area adjustments (e.g., a carpeted
comer, use of a carrel, etc.) can help with this. Sometimes
this space may be found within the classroom when it is
not excessively noisy (e.g., one would not try to dictate
during an "inside recess" period). Most successful
users in schools have at least one alternative location
identified for dictating in any given period, and the
user must be willing to use that alternative location
when it seems appropriate. See also the next two points.
-
Space consideration for speech recognition is also important
because the act of composing is often a private matter
and some students may feel awkward "writing outloud"
in front of others. On the other hand, the speech recognition
user may react negatively to being removed from the regular
classroom, so students' perceptions of these issues should
be understood.
-
Location of an appropriate space for use of speech recognition
also requires sensitivity to the needs of other students
in their classes. The use of speech recognition by one
or more students might be disruptive to other students,
so this matter must be considered.
Time for staff to work with the student during initial
stages of speech recognition use. Students need the most
support when they are first using the software, and staff
should have some leeway to provide this.
-
Academic (substitute) credit for students who learn to
use speech recognition. Rather than adding an extra requirement
for the already over-burdened student with LD, learning
to use speech recognition might count as part of a class
in computer literacy or be integrated into requirements
of an English writing class.
Support staff should:
-
Be willing to learn how to support the students using
speech recognition. Adequate support requires learning
the strategies that successful speech recognition users
must know, which is further helped by learning how to
use the software themselves.
-
Provide a gradual "ramping up" of work requirements
for students using speech recognition. THIS IS
VERY IMPORTANT! Once students gain fluency in
using speech recognition they are often faced with a new
phenomenon - the requirement to complete the same work
as their peers. We have seen situations where students
responded to this realization by rejecting the technology
or rebelling in other ways. We believe it is critical
to increase work demands gradually to allow students who
previously were unable to write effectively a chance to
accept their "new" writing abilities and acclimate
to these new responsibilities.
-
Be committed to providing some "make up" instruction.
Typically, by the time of middle school, students who
have perennially struggled with writing have missed a
lot of important instruction in writing basics. Immediate
overemphasis on deficient mechanics can be discouraging.
Teachers should first value and support the increase in
amount produced and work on higher-level organization
(thinking) issues while helping students come to a gradual
appreciation of the importance of writing mechanics.
-
Be willing to try speech recognition with some students
who might be unlikely to ever use the technology completely
independently, but who might use the software with some
level of support to produce text that they otherwise could
not.
Individual students (and their parents) must:
-
Acknowledge that this technology is not necessarily
the correct solution for all students, including themselves.
-
Acknowledge that mastery of the software requires effort
and some flexibility in ways of working.
-
Acknowledge that mastery of the software will entail
an increase (hopefully gradual) in workload to reflect
the level of work expected for grade level (or depending
on other identified disabilities), and express a willingness
to participate on that basis.
From the perspective of costs, speech recognition software
is relatively inexpensive. Moreover, a single piece of speech
recognition software can be used by more than one student,
limited only by amount of disk space for voice files and available
time to use a single computer. Speech recognition does require
relatively newer, more powerful computers, so there may be
an initial hardware expense. The greatest costs in use of
speech recognition are those involved in initial training
of support staff who will be teaching students how to use
the software effectively, and subsequent time for actual student
training. However, we are seeing that training costs per student
are dropping as each staff member begins working with more
than one student. Other considerations. Another type of speech recognition was mentioned above. Discrete
speech recognition requires that the user speak one-word-at-a-time.
We have found that some students prefer this slower pace of
dictation and composition. The last product to provide this,
DragonDictate, is now very hard to find. However, the current
versions of continuous speech recognition products accommodate
slower-paced dictation to some extent. Our recommendations. 2 There are important differences for individuals and especially
children and students with disabilities in the operation of
the various continuous speech products, so knowledge of these
can be important. A few pointers are listed below. All other things being equal, Dragon NaturallySpeaking
Preferred is our recommended system for students
with LD in schools. The Preferred edition costs between $150-200
and is recommended over less expensive versions of NaturallySpeaking
because it includes several features that can be of particular
benefit: digital playback of the user's voice (what the user
actually said) and synthesized speech readback of the text
(what the software actually put on the page). Students with
LD can use these features to check for errors. All versions
of NaturallySpeaking also include other features that can
be important for students: dynamic updating of the words in
the correction window, easy management of the correction window,
and presence of an arrow to signal location (like a bouncing
ball) during digital readback. Naturally- Speaking versions
4 and above accommodate adolescent voices. According to most reviews and the author's personal experience,
IBM Via Voice is equal to the Dragon
products in accuracy for most adult and adolescent users.
Via Voice also includes synthesized speech readback of the
text to see what the software actually put on the page, but
has more limited digital readback of the speaker's voice.
On the whole, we believe that the interface is somewhat less
effective than that of the Dragon products for some students
with LD, as management of the correction window and cursor
location seem more intrusive. These characteristics might
not be a problem for many users. Certainly, if students need
a Macintosh product, Via Voice is a fine alternative. MacSpeech iListen is a Macintosh-
only alternative. The author has only used an early release
version that seemed promising, but was not a complete product.
Several members of the Speak to Write listserv (www.edc.org/spk2wrt/hyper-mail
report that the product performed well with middle-school
and older students, and compared it favorably to Via Voice
for the Macintosh. It is not clear whether the most recent
version has text- to-speech readback.
Two products from the UK bear mention. One is Screen
Speaker (Keystone), a utility that provides much-
needed text-to-speech support within the correction window
and adaptations to the enrollment process for the Dragon speech
recognition products. Another product is Read &
Write Gold (textHELP!), a utility that provides general
literacy support in many applications and which has a speech
recognition module. Final thoughts and words of encouragement. As with any promising new educational strategy, the use of
speech recognition has proceeded in fits and starts. Yet,
we have seen firsthand the positive outcomes of use of speech
recognition software on the lives of individual students with
ill. Now, there are increasing numbers of students across
the country who are using speech recognition successfully
to meet some or all of their writing needs. As more schools
move forward with this and other technologies in a more systematic
way, we better understand how to assure successful outcomes
for students and also how to manage the costs that inevitably
accompany such initiatives.
Endnotes 1This article is based on work done under two U.S. Dept.
of Education/National Institute of Disability Rehabilitation
and Research (NIDRR) funded projects awarded to the Education
Development Center, Newton, MA: Speaking to Write
(I H133G70143) and Word for Word
(# H133GOOO204). Some of the information here is elaborated
in the Speaking to Write website (www.edc.org/spk2wrt), although
that project is now inactive. However, the Speaking to Write
project operates a listserv that continues to be an active
source of information about speech recognition from many people
who are sup- porting its use in schools and elsewhere (www.edc.
org/ sp k2wrt/ sp k2wrt.html). 2 Based on the following versions of the software: Dragon
NaturallySpeaking v.6; IBM Via Voice v.9. Newer versions of
both products now exist.
Bob Follansbee, Ed.D, works at the Education Development Center
(EDC) in Newton, MA where he has directed several federally-funded
projects that involve development of educational tools and
strategies to help students with disabilities participate
and succeed in the regular education curriculum. Among these
are two projects focused on the use of speech recognition
by students. He oversees the operation of the Speak to Write
listserv (www.edc.orglspk2wrt/spk2wrt.html). Before coming
to EDC1 he was director of the Computer Learning Program an
assistive technology service of the Communication Enhancement
Center at Children's Hospital, Boston.
|