A startup claims to have launched a world-first AI speech-to-speech translation system.
The tool, called Aivia, was developed by Interprefy, a Zurich-based provider of translation services. The firm focuses on interpreting meetings and events — a market being turbocharged by globalisation.
As interactions spread across borders, they can become harder to understand. Although English is the language of international business, it’s only spoken by an estimated 17% of the world. The remainder is often excluded from the conversation.
Interprefy supplies a way to remove this language barrier — and the demand seems strong. In the eight years since the company was founded, Interprefy says it’s supported over 50,000 meetings. They range from remote press conferences at the Euro 2020 football tournament to interviews with astronauts at the International Space Station.
Join us at TNW Conference June 15 & 16 in Amsterdam
Here are four reasons why you can’t miss it
Aivia was designed to expand this client base. At the touch of a button, the service translates speech into audio and captions in real time. Interprefy claims it’s the first-ever advanced automated speech translation service for online and live events.
“Many organisations and events lack the budget to book professional interpretation.
Oddmund Braaten, Interprefy’s CEO, has grand ambitions for Aivia. He wants the tool to finally make simultaneous translations mainstream.
“Over the last eight years, our remote interpreting technology has helped democratise access to these services greatly and has seen wide adoption, especially during the Covid era,” Braaten tells TNW.
“But we’ve still seen many organisations and events lacking the budget to book professional interpretation. That’s why we developed a service that provides affordable real-time translation as well as the flexibility and support needed to ensure a seamless multilingual user experience.”
Braaten is bullish about the results. He believes Aivia is the most accurate and flexible AI speech translator on the market.
Under the hood, Aivia integrates three main AI technologies: automatic speech recognition, machine translation, and synthetic voice generation.
To enhance their outputs, Interprefy built a benchmarking toolkit to evaluate the best AI for every language combination. The company also uses a glossary extraction tool to further customise Aivia for each event. This preps the system with relevant keywords and hard-to-catch names or abbreviations from pertinent content.
According to Braaten, this approach addresses two shortcomings in real-time speech translation: inconsistent results and the needs of event organisers.
“We believe we’ve solved both pain points,” he says. “Because we’ve been supporting events of all shapes and sizes for nearly a decade, we have the expertise to support event organisers hands-on. We’ve also built a solution that can benchmark leading AI solutions to use only the best-performing AI technologies available on the market.”
Initially, Aivia will be available in 24 languages and regional accents. Both in-person audiences and platforms such as Microsoft Teams, Zoom, and ON24 can use the service.
Interprefy plans to add many more languages in the near future — and with good reason. Globally, an estimated 30% of internet users now use online translation tools every week — but real-time speech interpretation remains a challenge. Aivia offers a new solution to the problem.
Aivia arrives amid rapid advances in AI translation. Last year, an Italian interpretation company predicted that machines will surpass the top human translations by the end of the decade.
Naturally, the progress raises concerns about the future prospects for the profession. Braaten argues that AI and humans can play complementary roles.
Only skilled linguists, he says, can translate the subtleties of context, tone, humour, and idioms. Furthermore, they’re the only safe option for sensitive content.
“Interpreters have the unique ability to adapt their translation to every scenario, as well as being able to paraphrase and interpret non-spoken information such as body language and tone of voice,” says Braaten.
“These are qualities that AI simply cannot replicate and are especially important for higher-level communication such as board meetings, legal meetings, or diplomatic conversations.”
AI, meanwhile, is better suited to situations in which nuance is rare and risks are low. In these scenarios, machine translations can provide a more affordable and practical alternative.
Yet for live events and meetings, simultaneous interpretation remains a niche service. Braaten hopes Aivia’s accessibility can change that.