Deciding On The Voice For Your eLearning Projects
Over five years ago, I recorded a YouTube video that became a big part of my legacy when it comes to eLearning videos on YouTube. I recorded a simple video about Adobe Captivate 8's text-to-speech software. To date, this video has more views on YouTube than any single one of the 500 plus videos I've recorded since then. If you would like to see it, you can watch it here.
Because of that video, I get requests from Adobe Captivate users several times per month to make suggestions as to where users can get additional or alternative voices or languages for their eLearning courses. I've decided it was about time for me to revisit this topic and create an updated article about this topic. In this article, I'm going to look at the options for other voices from the same software provider that makes the Captivate text-to-speech voices; some alternative voices from Microsoft. Also, I will share with you a completely separate text-to-speech solution from Amazon and a new startup in the text-to-speech technology that might amaze you. Finally, I will discuss the solution we should all consider as a great alternative to all of these offerings.
1. NeoSpeech/ReadSpeaker
Captivate users might already know the text-to-speech included with the software from Adobe. Included with Captivate is a text-to-speech add-in from a company called NeoSpeech. NeoSpeech has since become part of a company called ReadSpeaker. Included with Adobe Captivate are several English voices, a French voice, and a Korean voice. I've always been puzzled by these additional choices as I would have expected Mandarin Chinese, Hindi, and Spanish due to the popularity of these languages.
The main benefit of the NeoSpeech voices included with Captivate would be consistency. Consistency is the main benefit of all text-to-speech solutions. Recordings made with one of the NeoSpeech voices five years ago will sound identical to a new recording made today. If you recorded and compared your voice five years ago with a recording today, there would be differences that might prove distracting to your learners. Your age, general health, and equipment used to record your voice can all affect the quality. The other benefit of the NeoSpeech voices is the price. These voices are included with the software at no additional cost to you or your organization, and you can use them indefinitely, and for any quantity of eLearning you can produce.
If it turns out that the seven voices included with Adobe Captivate are not suitable for your needs and you require purchasing additional voices from ReadSpeaker to work with the Adobe Captivate text-to-speech software, you will need to visit ReadSpeaker's website. There are no options on their site to purchase voices a la carte. Instead, you will need to contact them and discuss your text-to-speech needs with a representative from their company. If your needs are minimal, they will likely refer you to their web-based tool where you can pay per use and download the resulting audio files as you need them. If your needs are more significant and you wish to add voices to your Captivate installation, they will likely offer you voices/languages at a rate of about $1,000 to $1,300 US dollars per voice, per year.
Whether we are talking about the NeoSpeech voices included with Adobe Captivate or additional voices for sale from ReadSpeaker, the resulting narration is not believable as being provided from actual people. There is a robotic quality that makes them only suitable for the most basic of eLearning courses. For this reason, I cannot recommend the NeoSpeech/ReadSpeaker text-to-speech voices.
2. Microsoft
If you are using a Windows-based computer and are looking at the list of text-to-speech voices in Adobe Captivate, you might notice that there are several voices from Microsoft. I recently created a video that explored how to add additional voice packages from Microsoft. The advantage of using Microsoft Speech packages is that they integrate well with Adobe Captivate. There are languages outside of English, Korean and French that you can select, and they are free to install in your instance of Windows 10. Like the NeoSpeech/ReadSpeaker product, the disadvantage of the Microsoft Speech packages is the quality. Once again, I cannot recommend this solution for quality eLearning projects.
3. Amazon
Amazon offers a text-to-speech solution they call Amazon Polly. Amazon Polly doesn't integrate with Adobe Captivate. You would need to copy and paste your narration text into their web-based tool slide by slide and generate audio files that you can download one by one. From there, you can import these files into your slides in your eLearning course regardless of the authoring tool. The main benefit of Amazon Polly is the price. Amazon offers a pay-as-you-go model of pricing for what Amazon calls its neural voices. It works out to about $16 US dollars per 1 million characters. They also offer a free tier that uses their standard voices. If all this sounds confusing, don't worry too much. I don't fully understand it either. In the time I've experimented with their technology, Amazon hasn't billed me for any of it. While I believe their product to be superior to all the solutions I've mentioned so far, it still doesn't come close enough to replace a human voice. There is a distinct robotic quality to these voices, but I might use it for basic eLearning courses that don't have the budget of a larger project.
4. WellSaid
Next, I want to share with you a new startup called WellSaid. What's interesting is that WellSaid has taken a different approach to text-to-speech. They are using Artificial Intelligence to predict what we would accept as a real human voice, and the quality is surprisingly good. The exciting thing is that each time you generate the narration, you may notice slight differences from the last time you produced the same passage. I equate this to asking a voice actor to record another take. That's the Artificial Intelligence at work, and this can be to your advantage to provide you with alternative clips to select. Also, they have some control over pacing and phonetics. You can add extra spaces to increase the pause between words and sentences, and you can also save alternative spellings for dealing with things like acronyms.
They offer a free trial version of their services that give you access to a subset of the available voices and a limited number of passages that you can generate. If you decide that their service meets your needs, you can sign up for the service at $100 US dollars per month. This gives you access to all the voices, and you can generate as many clips as you might need for your eLearning narration needs. In my opinion, this is the best text-to-speech solution I've ever heard.
The disadvantage of the product is that you can only render a certain number of characters at a time. You will need to stitch together a more significant passage if you intend to import it as slide audio in your eLearning course. This will increase your workflow time if you have many hundreds of clips in your eLearning course. While it's the best text-to-speech I've ever heard, it's still far from perfect. Occasionally, you will encounter issues that remind you it isn't a human voice. For example, the energy and tone might be different from what a human would use. You will sometimes have to intervene to adjust the pacing and perhaps the spelling of certain words to fix these issues manually. If WellSaid can increase the amount of text that it can render and increase the speed at which it operates, I would recommend it as a solution to replace text-to-speech built into your eLearning authoring tool.
The other issue for me is that in a given year, I don't always need text-to-speech services. Paying $100 per month just doesn't fit within my budget. Especially considering that that's more than my monthly budget for all my software combined. My understanding is that you can cancel or put on hold your subscription to WellSaid, but they put the onus on you to withdraw and re-subscribe. I prefer a pay-per-use solution to be more practical. Again, they are a new service, so we may see that at some point. They are certainly an organization to watch in the upcoming future of text-to-speech.
5. Human Beings
In the end, I still recommend looking to other human beings for your voice-over work. I'm fortunate enough to have some talent as a voice-over artist, and I have used my voice to provide narration for many of my courses. Unfortunately, I only speak English, and if the narrative calls for a female voice, I don't fit the bill (unless you're going for a Monty Python style of female voice). To provide my training in another language requires either a volunteer who speaks that language or hiring a voice actor to deliver those recordings. Volunteers who don't have much experience with recording narration can take too much time and can end up costing you in recording studio overruns. I prefer to hire voice talent and provide them with a script to record. A simple Google search or a search on LinkedIn can help you find who you need.
Additionally, there is a variety of talent for hire from sites like Fiverr.com or Freelancer.com. To me, the advantage of working with a voice-over actor is that you pay for the work you need. Also, if they mispronounce something or otherwise make a mistake with the script, they often will rerecord that passage at no additional cost to you. Most importantly, you use an actual voice-over artist for the more humanistic types of training where text-to-speech wouldn't be dynamic or expressive enough. Imagine using text-to-speech voices to simulate a conversation between two employees in a soft-skills course. I think that would be terrible to listen to and perhaps even a little insulting to your learners.