Don’t Use Text To Speech Software For eLearning

Don’t Use Text To Speech Software For eLearning
Summary: This article makes the case that using Text To Speech software for eLearning, while allowing transferal of information, eliminates the possibility of learning. In all cases voice talent professionally trained to deliver eLearning should be employed.

Why You Shouldn’t Use Text To Speech Software For eLearning

Many eLearning providers have decided to use Text To Speech (TTS) software to voice their courses. With the advances in Text To Speech over the last few years, the result is less robotic sounding than before and people are getting used to computer based voices such as Siri and Cortana. Text To Speech software is reasonably cheap; it can be a one-time purchase and be used to voice any sort of courseware. It seems like a no-brainer to go for a decent TTS package and solve that pesky problem of hiring eLearning professional voice talent.

As both a voice talent and someone who has taught eLearning and Instructional Design, I have learned that the questions to be asked are: Do you want to present instruction, or have students learn? These are very different concepts. The presentation and transferal of information is fairly easy; just get the software to read copy. The prospect of a student retaining information and learning is far more nuanced

If you think back to the time you were in school, or are in school, how much did you learn from a teacher who lectured on and on droning away through the class period?  Then, how much did you learn, or remember learning, from an engaging teacher who was passionate about the subject matter and felt compelled to get the message across? Text To Speech is sort of similar to the droning teacher. These days, Text To Speech software may well be better than a dry lecturer since it may well sound good, but I would posit that learning will not easily take place.

Why? It’s a matter of engagement, talent, and inflection. As Mike Harrison, an eLearning voice talent, notes: “TTS is nothing more than words being dictated (data being manipulated) by what is essentially a robot that has no idea what the subject matter is, so it cannot know context with which to judge where proper inflection is placed. How inflection is applied can actually change the meaning of sentences.”

Widely used examples of how inflection changes meaning can be found just with a simple browser search. Here’s one.

Here’s another that Mike sent along. Read each sentence out loud, emphasizing the word in bold type and see the how meaning changes:

  1. I never said she ate your sandwich” – (Someone else said it)
  2. “I never said she ate your sandwich” – (I definitely did not say anything)
  3. “I never said she ate your sandwich” – (I implied it)
  4. “I never said she ate your sandwich” – (I said someone else did it)
  5. “I never said she ate your sandwich” – (I said she did something else with the sandwich)
  6. “I never said she ate your sandwich” – (I said she ate someone else’s sandwich)
  7. “I never said she ate your sandwich” – (I said she ate something else)

Text To Speech software cannot differentiate between these 7 very different interpretations of the same sentence. The sentence may be enunciated by Text To Speech software, but meaning will not come through. Common sense will tell you that without the ability to emphasize, meaning can wind up anywhere between muddy and lost. In normal speech, people inherently emphasize the proper word since they know what they are saying. Text To Speech software does not know what it is saying, and meaning is lost.

Engaging an audience of learners is something that takes training and talent, and unless an audience is engaged they just won’t pay attention. It’s often been said that teaching is like acting. My significant other, who taught secondary English for 29 years, told me that she acted every day; as  a previous college professor, I concur completely. Getting the message across and even broaching the concept of getting a concept from short-term to long-term memory requires a great degree of student engagement with the material, and, on the other side, the talent to present the material in a manner that is engaging.

It takes a good teacher to successfully teach, and it takes a trained voice talent to present learning using asynchronous technology in a manner in which learning can take place. The money spent on hiring such talent is well spent if it’s important that the words become learning.