Microsoft plans to let Teams users clone their voices so they can have their sound-alikes speak to others in meetings in different languages.
At Microsoft Ignite 2024 on Tuesday, the company revealed Interpreter in Teams, a tool for Microsoft Teams that delivers “real-time, speech-to-speech” interpretation capabilities. Starting in early 2025, people using Teams for meetings will be able to use Interpreter to simulate their voices in up to nine languages: English, French, German, Italian, Japanese, Korean, Portuguese, Mandarin Chinese, and Spanish.
“Imagine being able to sound just like you in a different language,” Microsoft CMO Jared Spataro wrote in a blog post shared with TechCrunch. “The Interpreter agent in Teams provides real-time speech-to-speech translation during meetings, and you can opt to have it simulate your speaking voice for a more personal and engaging experience.”
Microsoft gave few concrete details about the feature, which will only be available to Microsoft 365 subscribers. But it did say that the tool doesn’t store any biometric data, doesn’t add sentiments beyond what’s “naturally present” in a voice, and can be disabled through Teams settings.
“Interpreter is designed to replicate the speaker’s message as faithfully as possible without adding assumptions or extraneous information,” a Microsoft spokesperson told TechCrunch. “Voice simulation can only be enabled when users provide consent via a notification during the meeting or by enabling ‘Voice simulation consent’ in settings.”
A number of firms have developed tech to digitally mimic voices that sound reasonably natural. Meta recently said that it’s piloting a translation tool that can automatically translate voices in Instagram Reels, while ElevenLabs offers a robust platform for multilingual speech generation.
AI translations tend to be less lexically rich than those from human interpreters, and AI translators often struggle to accurately convey colloquialisms, analogies and cultural nuances. Yet, the cost savings are attractive enough to make the trade-off worth it for some. According to Markets and Markets, the sector for natural language processing technologies, including translation technologies, could be worth $35.1 billion by 2026.
AI clones also pose security challenges, however.
Deepfakes have spread like wildfire across social media, making it harder to distinguish truth from disinformation. So far this year, deepfakes featuring President Joe Biden, Taylor Swift, and Vice President Kamala Harris have racked up millions of views and reshares. Deepfakes have also been used to target individuals, for example by impersonating loved ones. Losses linked to impersonation scams topped $1 billion last year, per the FTC.
Just this year, a team of cybercriminals reportedly staged a Teams meeting with a company’s C-level staff that was so convincing that the target company wired $25 million to the criminals.
In part due to the risks (and optics), OpenAI earlier this year decided against releasing its voice cloning tech, Voice Engine.
From what’s been revealed so far, Interpreter in Teams is a relatively narrow application of voice cloning. Still, that doesn’t mean the tool will be safe from abuse. One can imagine a bad actor feeding Interpreter a misleading recording — for example, someone asking for bank account information — to get a translation in the language of their target.
Hopefully, we’ll get a better idea of the safeguards Microsoft will add around Interpreter in the months to come.