The monotonous and synthesized voices of virtual assistants seem to have their days numbered. NVIDIA has developed a new artificial intelligence (AI) that reproduces an extremely realistic voice.
Using the combination of AI and human reference recordings, the “electronic voice” sounds almost identical to that of a real person. During the Interspeech 2021 event, the brand posted a video about the process of creating the “natural voice”.
The video showcases recent advances in the industry dedicated to researching NVIDIA voice technologies. In this project, the researchers used a version of the open source software NeMo optimized to run on the brand’s video cards.
Experts equate speech with music, presenting complex rhythms, tones and timbres that are not simple to replicate. However, new tools are helping to reduce complexities.
With machine learning, AI is fed in two ways. First, a text-to-speech model of human-dictated speech is used. Then, the software is able to take excerpts from the passage and convert it into a female voice.
The second method is direct voice conversion. The tool takes an audio file of a person speaking and converts the voice into artificial intelligence, combining patterns and intonations.
New Nvidia AI can be applied to accessibility projects.Source: Nvidia/Disclosure
Narrator AI for the next NVIDIA series
Showing the high level of discovery, Nvidia’s AI will narrate the video series I am AI (I’m an AI, in free translation). The project will show the influence and impacts of machine learning in various sectors.
The brand also wants to prove that the new technology has the potential to go much further. For example, the tool can help people with vocal disabilities or collaborate with users to translate between languages using their own voice.