In the AI and robotics industries, there’s a phenomenon known as the “uncanny valley”: the observation that until a synthetically-created human is flawless, people will probably find it a bit spooky. We’re not out of the uncanny valley yet, but we’re getting close—and plenty of software companies are putting together practical uses for AI-generated human avatars in the meantime.
One of them is Synthesia, which focuses its AI video technology on enterprise use cases like employee onboarding and training videos, which tend not to have the same expectations of hyper-realism as videos created for consumer entertainment.
To explore Synthesia’s capabilities, I created a variety of videos using different aspects of its technology. While most people won’t be fooled into thinking these avatars are real humans, Synthesia’s versatility and usefulness for business use still managed to surprise me.
What is Synthesia?
Synthesia is an AI video generator that uses AI avatars in place of actors or voiceovers. It orients its entire product around the needs of enterprise and tech businesses, which often maintain hundreds (or thousands) of videos for employee onboarding, training, product walkthroughs, and customer how-to videos.
The traditional way to create these videos—with actors or voiceovers—is expensive and time-consuming. It’s also cumbersome to update. But creating videos in Synthesia, making changes to existing videos, or even translating them into another language is quick, and doesn’t require re-recording.
The proprietary avatars themselves are each based on a real actor who has licensed their likeness to Synthesia. Over one hundred actors were recorded by Synthesia’s team using 160 cameras simultaneously to capture a range of natural movements and facial cues. Add a touch of AI—or, to use Synthesia’s term, “neural video synthesis”—and you’ve got some of the most realistic AI avatars on the market.
A deep dive into Synthesia’s features
Synthesia’s core features are its AI video avatars, its text-to-speech engine, and its presentation design tool. I tested each—here’s what I found.
AI video avatars
Synthesia’s avatar inventory is massive, with 150+ diverse avatars. Since these avatars license the likeness of real actors, they’re more realistic than many competitors. To find the right avatar, you can simply type in what you’re looking for (“a corporate woman,” “a trendy young man”) and you’ll see a wide range of options.
I ended up choosing one of Synthesia’s improved “V3” avatars, “Natalie.” The output—which you can see at the end of this article—is impressive: not flawless, but close enough not to be too distracting.
As you’re creating your Synthesia video, you can change the avatar at any time. You can also feature different avatars on each slide.
Synthesia also has the ability to create a custom AI avatar based on your likeness. (You can “clone” your voice, too). The main drawback? The custom AI avatar costs $1,000/year, which is more than the Synthesia subscription itself.
Once you drop your script into the presentation builder, you can customize your avatar’s performance by adding in gestures like head nods and raised eyebrows. Gestures make the avatar look more natural, but since you have to do them manually, it can be time-consuming. It’s also tricky to get the timing right—it’d be nice if Synthesia were able to use AI to interpret the tone of your text and update the avatar’s gestures accordingly.
Synthesia offers 120+ languages, accents, and voice tones: its English-language accents alone range from Irish to Nigerian to Indian, and most accents have multiple variations and styles.
Once you’ve added your script, you can change your avatar’s voice at any time. You can also change your language, although that requires you to change the language of your script, too.
As you look through the available voices, you can listen to a brief preview of each. Each voice is given a label, like “lifelike” or “professional.”
I tested all of the US voices, and a range of other English-language accents. I also input Spanish-language text and tested the Mexican Spanish voice out with my wife, who is a native Spanish speaker.
The verdict? The quality is surprisingly inconsistent. Some AI voices are fantastic; many are pretty good; others sounded so robotic and flat that they reminded me of the phone menus I navigate while waiting on hold with my bank. The Spanish-language voice was ok, but still noticeably robotic. My suspicion is that Synthesia loaded up its latest audio models without removing the original ones; as a result, users need to experiment to sort the good from the bad.
Synthesia’s AI voices—even the best ones—don’t pronounce everything perfectly. In my experience, there was probably one word per slide that was off in some way. Synthesia had trouble pronouncing years naturally: 1995 was nineteen-hundred ninety-five instead of nineteen ninety-five. Camaraderie was another tough one. Fortunately, there’s a solution: Synthesia’s Diction feature, which allows you to give custom pronunciation instructions. This works well overall, though I couldn’t for the life of me fix “camaraderie” (even after Googling the phonetic spelling).
Synthesia’s text-to-speech tool also lets you add pauses anywhere in the script, and automatically generates closed captions in the final video—a key accessibility feature for corporate video-making.
One of Synthesia’s bright spots is its user interface, which feels instantly familiar. Generating an AI video feels a lot like assembling a PowerPoint presentation. There’s a quick learning curve, and features like Synthesia’s AI script assistant and 65+ video templates make it easy to assemble a video.
As with other presentation tools, you can use animations to make your presentation more dynamic. Synthesia also includes a built-in screen recorder, perfect for creating product walkthroughs and how-to videos. Collaboration features allow coworkers to add comments once your video is finished; each comment is linked to a video timestamp, so you can quickly understand what needs to be changed.
For such a cutting-edge platform, it’s surprising how intuitive this tool is. In my experience, there was only one issue: speed. Generating a one-minute video took around fifteen minutes. While this is understandable, since a large amount of computational power goes into animating the AI avatar, navigating the rest of the presentation tool is a bit laggy, too.
You can make use of all these features even more efficiently by connecting Synthesia to Zapier. Then you can automate processes like creating personalized Synthesia videos, sending the videos to prospects, backing them up, and uploading them to your project management tool—all without any manual effort. Learn more about how to automate Synthesia, or get started with one of these pre-made workflows.
Zapier is the leader in no-code automation—integrating with 6,000+ apps from partners like Google, Salesforce, and Microsoft. Build secure, automated systems for your business-critical workflows across your organization’s technology stack. Learn more.
Putting Synthesia to the test
To test Synthesia, I created an employee onboarding presentation. After generating an employee onboarding script, selecting an avatar, adding facial gestures, and choosing the most realistic accent I could find, I clicked “generate” to put it all together.
Here’s a quick clip of the output:
Yes, you can tell this isn’t a real human. But here’s the real question for Synthesia’s customers: is it good enough for an internal employee onboarding video? For many companies—like Heineken, which has used Synthesia to train 70,000 employees—the answer is yes, especially when it means saving money and boosting productivity. It’s clear that this is the future for much of video production, especially since the gap between AI and humans will only shrink in the coming years.
I found the avatars themselves one of the most compelling parts of Synthesia’s technology. The AI voices are impressive, too, as long as you avoid the more robotic-sounding accents. The facial expressions still need some work, but the most glaring issue is the lip syncing—the avatar struggles to find a pacing that precisely matches the dialogue.
The key thing to understand is that this technology is not right for every use case. For high-impact presentations—especially sales and marketing, where first impressions matter—it’s worth the investment to have a real person deliver it. It says a lot that Synthesia’s own walkthrough video uses a real person—one of the company founders—and not one of their AI avatars. (Their “Synthesia Academy,” which features how-to videos, does use AI avatars.)
HeyGen, a competing AI video generator, produces output that’s a bit more polished in terms of lip syncing, pronunciation, and facial expressions; for marketing videos and anything less “corporate,” a tool like that might be a better choice. However, Synthesia’s orientation toward the enterprise market makes it a great choice for larger companies looking to scale their internal communications and training videos.
The future of AI video generation
Is Synthesia perfect—and does it avoid the uncanny valley? Not yet. But it’s a powerful tool that serves a real need for businesses. In the meantime, AI avatars and text-to-speech technology are getting better quickly. It’s hard to escape the conclusion that the Synthesia team may be right in its prediction: “In the not-too-distant future synthetic media will replace the need for physical cameras and complex video editing tasks.”
Pricing for Synthesia can seem steep: its Starter plan, at $29/month, lets you generate just 10 minutes of video per month. Meanwhile, its $89/month Creator plan offers 30 minutes of video per month. Still, businesses that stand to save money by using Synthesia should be able to easily justify these rates.
Interested to try Synthesia for yourself? You can generate a test video for free before signing up for a plan.