Computertakkallam? The challenges of speech recognition in Arabic

Speech recognition technology has come along leaps and bounds in recent years and we are starting to see the emergence of systems which work in near-natural language from Apple Siri through AmazonAlexa to Microsoft Cortana, virtual assistants are able to communicate verbally with increasing sophistication.


Middle East banking


“We are seeing Artificial Intelligence (AI) and speech technology popularity soar in the Arab world, particularly in Middle East banking and with ‘voice as password’ services. As time moves on, technology that can recognise and ‘understand’ nuanced voice patterns will have a huge impact across all areas of customer services and business operations in the region,” said Anil Kumar, strategic business unit head for Middle East and Africa at customer experience management specialist Servion.


“Where Arabic speech recognition, AI and machine learning are really making an impact is in areas where we recognise that we cannot preprogram everything. We let the machine learn for itself and fine tune the speech recognition using data capture from customer interactions. After three months of letting machines ‘learn’ and tune, we have already seen over 80% improvement in accuracy in our recent implementations and this, crucially, is better than many human agents could provide,” Kumar added.


Arabic abjad, a leap in inference


So yes, machine learning is helping bring more Arabic speech recognition into the real world because it is helping us iron out the inconsistencies of real spoken words. But no, it’s never going to be easy and one of the challenges we may always face with Arabic is that the script is ‘abjad’ i.e. the writing system consists only of consonants and the reader is required to ‘infer’ the vowels.


“We’re quite used to this in the real world, but the problem arises because speech recognition software requires, at some point, to link pronunciation and spelling. This means it is always harder if there is less information in the spelling,” explains Benoît Brard in his capacity as head of languages at speech recognition company Speechmatics.


The good news is that we are getting around all these challenges and Arabic speech recognition and AI are developing rapidly. There’s more good news even though Arabic is written right-to-left in opposition to many of the computer sequences used to understand Arabic speech, which are written left-to-right. As Speechmatics’ Brard puts it, to a machine, a sequence is a sequence, so reversing direction is not so hard. Although its should be said that it’s not simply a dumb mirroring task and there are many details to be wary of.


Arabic agglutination 


“Last but not least, Arabic has rich morphology. Word endings change depending of the gender, the number, the function of the word in the utterance. Some agglutination occurs when articles, prepositions and conjunctions are joined with the following word,” heeds Brard.


Could Arabic agglutination (the term simply meaning a joining or ‘clumping’ together) plus abjad-related dialect-specific challenges come to represent our Arab language speech recognition Achilles’ Heel? Well no, but they do go some way to explaining why Arabic speech recognition is hard and why we’re still working to perfect these technologies. Speeding that work is important: we are recognizing that voice is the new interface and can clearly see AI-driven systems that will depend on near-perfect natural language capabilities. And while Arabic is lagging behind English right now, developers are now working on the challenges specific to Arabic.


Computer takkallam Arabi? Shwiyyeh, shwiyyeh… but getting much better soon.