The IT industry seems to work like London buses: you see nothing new in a
field for ages, then several developments appear in a short time. This time it
is speech recognition. Nuance announced the
release of Dragon Naturally Speaking version 9; and Microsoft followed up a
disastrous demonstration of its own voice technology with an impressive
presentation that worked perfectly at the SpeechTEK show.
I have used earlier versions of both products and, if used carefully with
good microphones, they can produce excellent results. I haven't had a chance to
try out Dragon 9 yet but people who have tried it say it is a big step forward.
The accuracy is better and, for the first time, it has a speaker-independent
mode. If this feature works well it will open up a huge range of applications
for speech recognition.
Advertisement
I must admit that I've always thought that talking to your computer is
something to be avoided or, at least, done in private.
Microsoft obviously disagrees as
its speech technology will be integrated into Windows Vista, and I expect that
there will be a good deal of publicity about it when Vista is launched. I hope
this doesn't lead to a potentially far worse problem than mobile phones on
trains.
It is surprising to me how much research into speech recognition is currently
going on; many big companies have teams working on it, as do a large number of
universities. If it's not for talking to your PC, what is it all for? The answer
may lie in networked applications.
The European Telecommunications Standards Institute (Etsi) has a project
called Aurora to
develop standards for distributed speech recognition in mobile networks. The
idea is that part of the recognition engine runs on mobile devices and transmits
speech to the central recogniser. This sort of thing seems much more sensible to
me; it's more natural to talk into your mobile phone than your laptop.
Etsi says that one application is to dictate your impressions of a meeting
and then have them emailed to you so that you can do the editing when you get
back to your office or hotel room. That's quite neat. It's not too hard to think
of other neat services that could be integrated with various types of networked
servers.
As the popular SpinVox service has
shown, it is often much better to receive voicemail messages as text than
speech. A good centralised speech recogniser could let companies run a similar
service on their networks, especially if speaker-independent systems really are
just around the corner. Microsoft's Office Communications Server 2007 should be
helpful as it will have integrated speech recognition.
Bill Gates says that speech recognition will enter the mainstream in the next
decade; he may be right but it may be mainstream in networked applications
rather than for just interacting with your PC. In any case, corporates may now
wish to consider whether good speech recognition could improve the effectiveness
of their systems.
Do you agree?
Have your say on this article