NeuralSpace has achieved exceptional results in a benchmarking experiment of Natural Language Understanding (NLU) intent classification and entity recognition across a range of languages including Arabic, Hindi, Chinese (Mandarin) and many more. Our NLU technology surpasses industry leaders Google, IBM and Rasa in accurately identifying user intents and Rasa in extracting entities. These results are testament to our commitment to delivering reliable, best-in-class NLU solutions to our global community.
In this blog post, we will explore the significance of NLU benchmarking and our results, as well as the challenges associated with building highly accurate NLU models in multiple languages.
Natural Language Understanding (NLU) refers to the ability of a computer or machine to comprehend and interpret human language in a way that is similar to how humans understand it. It involves the analysis of various aspects of language, including syntax, semantics, and context, to derive meaning from text. NLU is a critical component of many applications such as chatbots, virtual assistants, sentiment analysis dashboards and (semantic) search engines, as it enables these systems to understand user queries and respond appropriately.
Intent classification and entity recognition are the two predominant features of NLU models. Intent classification refers to the ability of an NLU system to accurately understand the purpose or intention behind a user’s input or query. For example, if a user types “احجز رحلة إلى دبي في مساحة النادي” (in English, book a flight to Dubai in the Club Space) into a travel website’s virtual assistant, the intent of their message is “book flight”.
Entity recognition, on the other hand, refers to the ability of an NLU system to identify and extract relevant information from a user’s input or query. Entities are specific pieces of information that are relevant to the user’s intent, such as names of people, places, dates, or prices. In the above example, “دبي” (Dubai) is recognised as the entity “city” and “مساحة النادي” (Club Space) is recognised as a custom entity “seat type”. The accurate understanding of both intents and entities is crucial for a successful NLU model.
Our performance benchmarking was conducted using the open-source multilingual Amazon MASSIVE dataset. NeuralSpace’s model outperformed Google, IBM and Rasa in every language we tested! We applied out-of-the-box implemented methods for training an NLU model, using the user interface or APIs available on Google Dialogflow, IBM Watson, Rasa and NeuralSpace.
Not all evaluated service providers offer training custom entities. As such, benchmarking entity recognition is limited to Rasa and NeuralSpace.
We evaluated each model’s performance using the F1 score, which computes the average of precision and recall scores. The strict F1 score considers the exact boundary surface string match and entity type. The partial F1 score considers a partial boundary match over the surface string, regardless of the type.
Every language has its own unique vocabulary, grammar, and sentence structure. Capturing the nuances of each language is essential for accurate NLU. Colloquialisms and idiomatic phrases can have entirely different meanings than their literal translations, making it difficult for NLU models to understand them.
Achieving high NLU accuracy requires understanding the context in which words or phrases are used. This can be particularly challenging when dealing with multiple languages and their cultural nuances.
For many languages, there is a lack of large-scale, high-quality training data, which hinders the development of accurate NLU models. This is especially the case for languages that do not use the Latin (or English) alphabet, which had their keyboards developed considerably later than the one for Latin characters.
In many countries, speakers often switch between languages within a single conversation. In Saudi Arabia, for example, Arabic is often mixed with English, and in India, Hindi and English are often mixed. NLU models must be able to detect and understand content regardless of the language-mix to maintain high accuracy.
We collect proprietary data to train in-house models for better performance. These data come from a wide range of sources, trying to capture the nuances of multiple industries, regions, age groups, religious beliefs and many more cultural and contextual settings.
We fine-tune our models on mixed language datasets, making them more effective in many practical settings where users tend to use English, French or Spanish words within another language.
Our models are fine-tuned on domain and task-specific data to ensure high performance in NLU tasks, such as sentiment analysis, text categorisation and content analysis.
Our models are designed to handle multiple languages effectively, enabling seamless cross-lingual understanding and transfer learning across languages.
Highly accurate Natural Language Understanding (NLU) is crucial for AI-powered services to effectively interpret and respond to user inputs. When this technology underperforms, it can lead to frustrating user experiences, which hinders product adoption. Since its inception, NeuralSpace has focused on developing NLU models that capture the nuances of different languages and dialects. With this approach, we aim to unlock seamless AI-powered experiences for people around the world, irrespective of their language or location. We’re delighted to see this long-standing commitment lead to highly accurate models that deliver better outcomes for our customers.
Get in touch to learn more about NeuralSpace or visit our website.