Microsoft has developed a new AI model named VALL-E that can simulate anyone’s voice. The AI can recreate human voices with just a 3-second audio input, compared to other AI models that require at least a minute of audio recording.
The model was trained on 60,000 hours of English language recordings from Meta’s Libri-Light library, which contains audio from over 7,000 speakers. VALL-E’s voice simulation accuracy varies depending on the similarity of the input voice to one of the speakers the model was trained on.
Microsoft plans to continue developing the model for improved accuracy and pronunciation. Currently, the code is not open-source, but a demo of VALL-E can be accessed.