Recognizing Text and Speech

By William LaMartin, Editor, Tampa PC Users Group


Several months back CompUSA ran a special on a bundle of two software items--TextBridge Pro 98 and ViaVoice Office 98. It was $49 after the in-store $50 rebate. But at $100 it would have still been a bargain based on recent prices of both programs. The implied goal here was to set the computer user up so that he would have little typing to do in the future. He could either scan in old paper documents and then have TextBridge Pro turn them into textual documents, all the while preserving the layout and graphics. Or he could simply sit in his chair with his hands folded and dictate into the supplied headset microphone and have ViaVoice turn his speech into a textual document in the computer.

One of these scenarios worked superbly and one didn't. Can you guess which one fell short.? It was the voice recognition. Up front I must admit that the computer I tried ViaVoice on, a Pentium 166 with 96 MB of memory fell one notch short of the required configuration--it needed to additionally be MMX. But I think the problem was more than that. The first time I tried out the program, letting it train itself while reading a short Mark Twain story, things seemed to go OK. During the setup for training, I got where I could speak the continuous speech required in a manner that produced few retires to get the training phrases understood. And after the training was over, the program recognized my speech in a so-so manner. The main problem was that the recognition was quite slow. But I will attribute that to my slow processor.

So I closed out the program and set it aside for a couple of weeks with the thought that when I next tried the program it would undoubtedly do better, since it would have learned my speech patterns better. That was not the case. On the second try, using the same profile I had created before, it recognized almost nothing of what I said. So I decided to train it again. This time I read a passage from Treasure Island. And after training, things were better--but not anywhere nearly as good as my first try with the program. In frustration I decided to shelve the program until I had a faster desktop computer. I suppose I could try it on my faster laptop, but I think I will wait until I have one of those 400 Mhz or 500 Mhz machines. That way I will be sure it is the program and not the computer which has shortcomings.

In general, I think the voice recognition programs have a long way to go. I previously tried some Microsoft developmental software for discrete speech recognition and wrote about it in the October 1997 newsletter (see http://www.tpcug.org/reviews/microsoft_dictation.html). I got it to work a bit better than I did IBM's Via Voice. I know others who have tried other programs with poor results. In fact, one of our members has a speech recognition program right now from another of the big name producers of such programs for which he is supposed to write a review. Unfortunately he can not get it to work satisfactorily and has yet not produced a review. So, if there are any members out there who have had success with a speech recognition program, let us hear about it.

head_set.jpg (6929 bytes)
 

Text recognition, on the other hand, has come of age. I am simply amazed at how well TextBridge does on converting the printed page to a computer document. For years I have been using an earlier version of TextBridge Pro and was happy with the results. TextBridge Pro 98, however, leaves that program in the dust as to speed and accuracy of the process. As one of my first projects with the new program I converted to a computer document a small book a friend of mine had written 22 years ago and which was now out of print. The average time it took to scan a page (you need a scanner before you can use such a program), perform optical character recognition (OCR) on it, and then present any questionable words to you was less than one minute. Additionally there were usually no more than a couple of questionable words on a page. In fact, the program had usually guessed correctly and I only had to click OK. Then on to the next page.

I used the feature that allows you to open TextBridge directly from Microsoft Word (a similar feature is available for WordPerfect and Microsoft Excel). After I had done about a dozen pages I would then close and it would convert those 12 pages to a Word document. Then I would open TextBridge again in Word and do another dozen pages. I was afraid of doing too much at one time and possibly losing it due to some computer glitch. If you choose not to work with Word, you can save your scanned text as an RTF file. I do miss one feature of the old TextBridge that allowed you to also save your work to the clipboard (in RTF format I assume).

Once I had the entire book in my computer as a Word document, I could of course print it out as a normal printed document. But this is the age of the Internet, so I saved it in HTML format, brought it into Microsoft FrontPage, scanned in separately (this has nothing to do with TextBridge) most of the photos used in the book and sent to me by the author, brought them into FrontPage, inserted them in the web document, and published everything to my web site, http://www.lamartin.com. To view the results, go there and click on the link History of Okeechobee County Florida. By the way, preparing the photos for the web took a lot more time than scanning the text.

Optical Character Recognition, in my opinion, is one of the great success stories of computing. Voice recognition—in time—probably will be a success too. u

scanner.jpg (6185 bytes)