AI
48 you map it to whether it was a " bu " or a " th ". Neural nets actually match the nature of the data because audio is quite hierarchical . With images , at first you can learn edges , followed by whole body parts , and then uncovering what a whole cat is . It ' s very similar for audio . You have these low-level features that you need to learn , and you can build on top of them to understand high-level features .”
Speechmatics ’ specific implementation accommodates language in all its variety . Historically , methods for ASR struggled with variations within a language such as accents and dialects . English , owing to its geographical spread , likely features the most variation of all – a problem which demands attention considering it is the world ’ s most widely spoken language . Speechmatics specifically addresses this concern inside its offerings . “ Compared to competitors , we are quite accent agnostic ,” says Williams . “ We have a model , for example , called Global English that we ' ve trained on tons of different accents and variants of English , and we bundle it all into one model .”
“ WE RELY ON MACHINE LEARNING , WHERE WE CAN SOLVE GENERAL PROBLEMS , IN ORDER TO USE LARGE AMOUNTS OF DATA TO SMOOTH OVER ALL THE PROBLEMS THAT YOU MIGHT HAVE IN BUILDING A NEW LANGUAGE ”
— Will Williams ,, Machine learning researcher , Speechmatics
DECEMBER 2019