Ahead of the launch of iOS 11 this fall, Apple has published a research paper detailing its methods for improving Siri to make the voice assistant sound more natural, with the help of machine learning.
Beyond capturing several hours of high-quality audio that can be sliced and diced to create voice responses, developers face the challenge of getting the prosody – the patterns of stress and intonation in spoken language – just right. That’s compounded by the fact that these processes can heavily tax a processor, and so straightforward methods of stringing sounds together would be too much for a phone to handle.