How To Incorporate AI-Generated Talking Head Videos Into Your Learning Design

Talking Head Videos With AI In Learning Design
Summary: As learning professionals you're likely no stranger to AI-powered talking head video generators these days. But we should not be using new technologies just for the sake of using new technologies. Here are some tips to help you make wiser decisions in incorporating them into your learning design.

AI-Generated Talking Head Videos In Learning Design

At its core, AI-generated talking head videos are powered by tools that convert written content into compelling videos with human-like avatars. Popular tools on the market all have plenty of human-like avatars that can narrate the content from your text input and generate a voiceover in the language of your choice, paired with lip-syncing.

Where To Use AI-Generated Talking Head Videos: 4 Use Cases

Suppose your company has just provided you with a new license for this innovative tool, and you're eager to implement it in your next learning project—I hear you! However, before you dive headfirst into using it, it's important to pause and consider your approach so as not to overwhelm your learners simply because there's a new tool at your disposal. With that in mind, I'd like to share some scenarios where I've found this technology to be particularly beneficial. These examples can help guide you in effectively integrating the tool into your learning strategy so that it enhances, rather than complicates, the learning experience.

1. Storytelling

Gone are the days when training meant monotonous lecturing, accompanied by endless slides of PowerPoint presentations. Today, talking head videos could be added to our tooling repertoire to enhance our storytelling capabilities. For instance, a digital avatar could be used to present real-world dilemmas. Imagine a scenario where an employee grapples with the decision to report a colleague's misconduct. The avatar could vividly express the internal debates occurring in their mind—what is leading them to believe it's misconduct and what is preventing them from reporting it to the manager. This approach not only brings the situation to life but also adds depth and complexity to the learning experience.

Another example is to add some personal touch to workplace narratives. Picture a digital avatar acting as a manager briefing you about a newly onboarded client, or perhaps a customer avatar voicing complaints about the hard-to-navigate product features. These scenarios do more than just relay information—they create an environment where learners can see and empathize with different perspectives. Using talking head videos for storytelling adds more flavor to the learning experience as it drives learners to actively engage with the content, encouraging a deeper level of interaction and reflection.

2. Scenario-Based Practice

Scenario-based practice is a powerful mechanism through which we develop power skills in a more immersive and authentic way. Picture a scenario brought vividly to life with digital avatars, placing learners right in the midst of a workplace situation. Here, they are not just observers but active decision-makers, challenged to identify and select the most appropriate course of action.

Now, imagine using the same approach to focus on skills like active listening and communicating feedback, with two digital avatars engaged in a detailed, authentic workplace conversation. As this dialogue unfolds, learners aren't merely listening; they're actively engaging with the content, tasked with the challenge of practicing active listening. Following the interaction, they are encouraged to introspect; how effectively did they listen? What elements of the conversation could be enhanced or approached differently? This reflective exercise not only reinforces listening skills but also deepens the learners' understanding of effective communication dynamics, making a simple exercise a powerful learning experience.

By immersing learners in these simulated environments, scenario-based practice with digital avatars opens up new avenues for learning. Traditionally, these videos are filmed in professional settings, but right now, you can easily produce them right from your computer in just minutes!

3. Product Training

In the world of product training where information is dynamic and frequently requires updates, talking head video generators emerge as invaluable tools. Imagine the convenience of using these generators to create a digital clone of your spokesperson. This avatar could effortlessly announce the latest product updates, bringing a consistent and familiar face to your communications.

Moreover, you could edit your product training videos just like editing a Word doc, making sure that they always stay current with the latest features and information. This not only streamlines the process of keeping training materials up-to-date but also maintains a high level of engagement and clarity for the end users, making complex product details more accessible and easier to digest.

4. Onboarding Training (Maybe Not Yet?)

Although numerous tools on the market claim that they could be used to deliver onboarding and compliance training, I remain skeptical of this. Onboarding, in particular, is a critical process that essentially serves as the "face" of the company for new joiners. It sets the tone for an employee's experience and expectations. In this context, a personal and humanistic approach often holds more value than what can be achieved through mass-produced, automated solutions. The warmth, nuance, and direct interaction that come from human-led onboarding cannot be fully replicated by technology just yet.

Tips On Generating Avatar Videos Of Better Quality

If you decide to give them a try on your next learning project, below are some practical tips to create talking head videos with higher quality.

1. Pronunciation And Pacing

Check the pronunciation guide of the tool you are using. Most of them allow you to add pauses to create a longer stop or use punctuation marks to improve the rhythm between the sentences. Hyphens (-), for example, will separate pronouncing syllables. Commas (,) will add shorter breaks. Periods (.) will add a longer break and downward inflection.

2. Spelling

Spell correctly and don’t mix languages. For example, don’t include Chinese words in an English script. Spell as to what it should be pronounced, for example, a fictitious company named Ultra Sys Ltd., should be best spelled as, "Ultra Sys Limited." Write the whole word or phonetics to get the pronunciation you want. Similarly:

  • Numbers
    2012 → twenty twelve, 3/8 → three-eighths, 01:18 → one minute and eighteen seconds, 10-19-2016 → October nineteenth two thousand sixteen.
  • Characters and digits
    Try inserting space if you want the characters or digits pronounced separately. For example: test → t e s t, 12345 → 1 2 3 4 5
  • One sentence per scene
    If you don't want to spend a ton of time listening and re-listening to the script, it's recommended to keep each scene short, one sentence per scene if possible.

Concerns And Critiques Of Using Talking Head Videos

If it is the first time you are pitching the use of this tool to your client or stakeholders, you may face some pushback. Here are some common quotes I heard from people who are hesitant about this type of tool:

1. "This Looks A Bit Weird"

It's common for first-time viewers to find the appearance of digital avatars a bit unusual or uncanny. The technology, while advanced, sometimes struggles to perfectly replicate the subtleties of human appearance, leading to avatars that can seem slightly off to the human eye. This initial reaction is understandable, therefore I suggest not using them on some high-visibility, high-impact projects yet.

2. Robotic Voiceovers

Another point people mention can be the quality of the AI-generated voiceovers. While incredibly advanced, these synthetic voices can sometimes lack the natural fluidity and emotional range of a human speaker, coming across as somewhat robotic or monotone. This can impact the perceived authenticity and engagement of the content. With that being said, I hasten to add that the voice-generating capabilities of these tools are progressing rapidly, as the models underpinning them are a hot topic of research.

3. Lack Of Gestures And Facial Expressions In Digital Avatars

Digital avatars may not always fully capture the range of human gestures and facial expressions. These nuances are crucial in communication and their absence can make the avatars seem less relatable or engaging. The subtlety of human nonverbal communication is a complex aspect that technology is still striving to master.

It's important to acknowledge that the technologies underpinning these tools are still evolving. While they have made significant strides, they haven't fully reached the pinnacle of their potential. Making mindful decisions on where you want to use this tool and staying relevant to the latest updates is something we should all be doing as learning professionals!


Whether you have used AI-powered talking head video generators or not, if you feel inspired by this article, feel free to test it out yourself!