Microsoft’s VALL-E AI Can Simulate Anyone’s Voice

Microsoft has developed a new AI model named VALL-E that can simulate anyone’s voice. The AI can recreate human voices with just a 3-second audio input, compared to other AI models that require at least a minute of audio recording.

The model was trained on 60,000 hours of English language recordings from Meta’s Libri-Light library, which contains audio from over 7,000 speakers. VALL-E’s voice simulation accuracy varies depending on the similarity of the input voice to one of the speakers the model was trained on.

Microsoft plans to continue developing the model for improved accuracy and pronunciation. Currently, the code is not open-source, but a demo of VALL-E can be accessed.

Surprised there isn't more chatter around VALL-E

This new model by @Microsoft can generate speech in any voice after only hearing a 3s sample of that voice 🤯

Demo → https://t.co/GgFO6kWKha pic.twitter.com/JY88vf4lYc
See also
TECH
Meta Acquires VR Fitness Developer Supernatural
— Steven Tey (@steventey) January 9, 2023