Written by Lily Chuang / former Voice User Interface Architect at VUI.agency
Before we dive into the topic, let me tell you a few things about me. For the past five and a half years I worked as an NLU developer for automotive voice control systems in Chinese Mandarin and Japanese for Volkswagen, Porsche, and Audi car models.
Prior to my first Natural Language Understanding (NLU) job, I had never heard of this field, and for most people this is still the case, including business leaders and managers. However, NLU is a key component of voice products and voice experiences. I like to call it the invisible engine. Without NLU there would be no voice assistants. No matter how good the voice design is, it would be like trying to make a luxury car run without an engine.
Today, I want to talk about five things that I think managers in general should know about NLU, even if they have absolutely no background in the field.
What exactly does an NLU developer do
As I said in the beginning of the article, I am an experienced NLU developer, and even now I am pretty sure none of my family members understand what I do for a living. Not that my job is as mysterious as being a Michelin Star inspector, but the concept of someone working with languages and essentially programming a machine to understand the human language is just very difficult to grasp. So far.
If I had to come up with a one-line explanation I would say: NLU is an interpreter between humans and machines.
NLU is very language-specific
This is something managers and product owners should definitely be aware of, especially if they are looking to create virtual assistants and immersive voice experiences for their customers. Not every NLP (Natural Language Processing) developer can be a good NLU developer. Sure, it is essential to have a well-developed pipeline in a language model created by a NLP developer, but it does not mean that an NLP developer automatically has extensive linguistic knowledge of a specific language.
If you compare Chinese Mandarin and English, the syntax might be similar: a sentence is often constructed with a subject plus a verb. But the Chinese language is very poor in morphology, and what complicates things, even more, is that it contains a vast number of homophones (words that have the same pronunciation but different meanings, for instance, “pair” and “pear” in English).
And on top of that, Chinese is an ideographic language, meaning a character can represent its meaning without reflecting the pronunciation. Without characters the sound itself does not necessarily represent the semantics, i.e. the meaning of the word. Thus, an English-speaking developer cannot simply transfer their knowledge to developing a Chinese NLU. Another example would be languages such as Japanese where the verb suffix can reflect the meaning of the word.
From an English perspective, one can argue that the stem is つけ (switch). The stem is the part of a word that is responsible for its semantic meaning. However, the direction of meaning is completely altered by the suffix られる when you go from an active to a passive verb.
The difference between NLP and NLU.
Continuing with the interpreter analogy, NLP is the translation software an interpreter can use, so how you use it and what you use it for is essential. Quite often NLP developers also cover NLU development, so sometimes there is no clear cut unless it is language-specific. Good NLU is all about understanding linguistic details and how to utilize them with the given tools.
It often surprises the user when a voice assistant can understand something the user actually did not expect to be understood. Moments like these create a sense of empathy, engagement, and sometimes even charisma.
Why clean data and intents are so important
I call unclean data and intents the “gaslighting in NLU”.
The language model itself is often not as complicated as people think. Imagine you are drawing a flower (the data) on a canvas (the machine). You mislabeled the yellow pigment as blue on the package (the intent). The canvas is passive. You are expecting to draw a yellow flower, but it turns out to be a blue one. The canvas will, of course, only show a blue flower.
Or perhaps you painted a yellow flower and a yellow elephant on the canvas, and you only classify yellow as one intent. Yes, they are both yellow, but you would not say that a yellow flower and a yellow elephant are the same. It would be much more useful to classify the yellow plants and the yellow animals as different intents.
You are not teaching your children confusing information, so why would you do that with your NLU engine? We should do the same with data, keeping our data and intents clean, and treat it with respect.
/ Lily Chuang / was Voice User Interface Architect at VUI.agency
When a product is released with confusing data and intents, it may work for a while if you only have a simple function. But once you start to expand your functionalities, it will get more complicated to debug, i.e. to find out where the mistaken data comes from. And it also gives your users an impression of poor quality and lack of sophistication.
Is there a perfect NLU platform
Unfortunately, just like everything else, perfection is just a perception and not a reality. Every NLU platform has its own pros and cons. You can overcome a platform’s cons by working with linguistic experts and using more or better training data and tuning rules, which is more practical than searching for the perfect ONE.
The NLU engine is a tool, after all. How you use it and what you use it for is far more critical. Imagine, for instance, that Alexa is a pan and Google Assistant is a pot. And let’s keep in mind that they are just two puzzle pieces out of thousands in the conversational AI world. You can use both for cooking, but if you want to cook a delicious dinner, the actual ingredients and your cooking skills are probably far more crucial than the pan or pot you use. I am sure Gordon Ramsay can still make a wonderful meal with very bad cookware.
Things that a tech manager or product owner should remember to ensure better results when building virtual assistants and voice experiences.
Furthermore, a lot of the time people aim for the stars and fall short. Try to keep NLU simple and clean one step at a time. Non-ambiguous intents defined by qualified linguistic experts are always a good starting point. However, at some point, a clearly defined decision will be necessary to deal with ambiguous data.
So, try to keep in mind the actual goal of your service or product and how you aim to achieve it with the data you have on hand. And remember to put emphasis on the linguistic domain because that is where quality voice experiences are created. Training an NLU engine is like teaching a child, without forgetting it is still a tool. The more meaningful and clear language rules you teach your NLU engine to process, the better it understands you. After all, you don’t need a pleasant voice assistant without functionalities.