Artificial intelligence technologies, when used to simulate human speech, sometimes lack expressiveness analogous to robots depicted in cinema. Lifeless, with repetitive rhythm and pure and simple attachment to the meanings of words, the lines of these intelligent systems, most of the time, do not deceive humans – except in some very interesting cases, such as Google Duplex introduced in 2018.
There is still a long way to go in this direction and reach a world with replicating androids like in Blade Runner, however, Nvidia seems well on its way. The company revealed, this Tuesday (31), a video to show the advances in the construction of the I AM AI technology, originally presented in 2017. In the images, it is possible to notice the advances in artificial intelligence, especially in terms of the naturalness with which she communicates, sounding perfectly human.
The idea has always been to use artificial intelligence to narrate the program and explain the evolution of technologies, but even today the episodes are described by humans. It was in 2020, then, that the NVIDIA research team returned to efforts to improve the narrator.
Want to catch up on the best tech news of the day? Access and subscribe to our new youtube channel, Canaltech News. Everyday a summary of the main news from the tech world for you!
The Flowtron, as the voice of the AI in question was called, sounded like a human but was not yet complete. It needed to create a way to unite the direction of the clip with the virtual narrator, and for that, work continued on the model called RAD-TTS. From it, directors could record their own speech and, on top of this audio, determine tone, length of words and expressiveness of the artificial intelligence.
An AI capable of copying tones
“With RAD-TTS I was able to record myself speaking a specific line. When I emphasized or put more energy into a word, it made my voice deeper or higher, it slowed down the speech, and that affected the actor’s voice in the same way,” commented NVIDIA video producer David Weissman.
The incredible result, which can be seen in the video above, was possible thanks to the comparison between spoken speech and music. “The speech has notes, it has rhythm and, as a researcher coming from the musical segment, I am always listening to the voice as an instrument that I can manipulate”, described the researcher from Rio de Janeiro, Rafael Valle, another involved in the project. For Valle, with the new model it became possible to “create art”.
For NVIDIA, the doors opened by the evolution of the model are several. The artificial intelligence voice could be used to help people with speech problems communicate or recreate performances by iconic singers, as melodies are among their skills.
For now, the results of this evolution should not appear in everyday life, but the path is being paved. The NVIDIA NeMo open source toolset gives developers the opportunity to learn about and experience the development of speech models.
More details on the evolution of NVIDIA’s artificial intelligence voice will be released at Interspeech 2021, an event focused on communication that runs from August 30th to September 3rd.
Did you like this article?
Subscribe your email on Canaltech to receive daily updates with the latest news from the world of technology.