What is NLP?
NLP, shortened for Natural Language Processing, is a field of Artificial Intelligence that helps machines communicate with humans, with natural language. The primary aim of NLP is to read, analyze, and understand human languages in a useful manner. Over the years, there has been massive research and development in this regard and even though it’s a challenging field to navigate, significant breakthroughs have been made.
In this tutorial, we will get to the bolts and nuts of NLP. By the end of this tutorial, you will understand:
- Some NLP terminologies
- The History of NLP
- How NLP works
- The Components of NLP
- Why NLP is so challenging
- How to Implement NLP
- Practical Applications of NLP
- Natural Language Vs. Computer Language
- Advantages of NLP
- Disadvantages of NLP
Some NLP Terminologies
- Corpus: A corpus is simply a set of text. It could be a collection of book reviews, tweets, or a conversation over WhatsApp.
- Vocabulary: It is the complete set of terms used in a text body.
- Document: A document is the body of a text. So, one or more documents form a corpus. A book review, for instance, is a document.
- Syntax: This refers to how the words are arranged in a sentence. The syntax involves the identification of the structural implication of words in a phrase or sentence.
- Morphology: Morphology is the study of the creation of words from original units.
- Semantics: This involves the identification of word meanings and how to combine to make a meaningful sentence or phrase.
- Discourse: This refers to the implication of the preceding sentences in a sentence.
The History of NLP
The 1930s – Researchers submitted patents for ‘translating machines. Peter Troyanskii proposed a dictionary that processes discrepancies in grammar across languages.
The 1940s – Machine translations were used in World War 2 to translate Russian to English. The results failed.
1950 – ‘Computing Machinery and Intelligence’ was published by Alan Turing. This was the paper that detailed the concept of the Turing Test.
1968 to 1970 – Terry Winograd developed an NLP program that allowed computers to relate with humans, with limitations.
1970 to 1980 – Roger Schank presented the conceptual dependency theory of NLP. Several bots were also created.
2006 – IBM Watson was created.
2010 to 2020 – Technologies that use NLP were used in homes and workplaces. Technologies such as Siri in 2011, Alexa in 2014, Google Assistant in 2016, etc. The use of chatbots in business operations sky-rocketed
How NLP Works
When we communicate as humans, many things come into play. Not only do we communicate with words, gestures, and voice modulation are important ingredients. NLP does not, however, infer meaning from the tonality of voice or body language, but rather through speeches or text and the contextual patterns of various words.
For instance,
A lion is to Lioness as King is to …
Lion -> King
Therefore, Lioness -> Queen
From the above, you can quickly see that the lion is the male gender while the lioness is the female gender. In the same vein, the King is the masculine gender, while the queen is the feminine gender. Let’s see another example
Man is to men as a woman is to …
Guess the answer? Yes, women! From the first part of the sentence, it is clear that NLP relates the singular form of man to its plural form. In like manner, for ‘woman’ in the singular form, it’s intuitive to know the answer would be in plural form. This is how Natural Language Processing (NLP) works – contextual patterns.
So the next question is, how does the machine know the intrinsic meaning of each word. Through experience! Imagine a class where your lecturer provides you with a lot of worked examples. You’re likely going to do ace a test more, than in a class where you were provided with one worked example.
The machine needs a lot of data to learn from. Each sample of the data provides more learning opportunities for the model, enabling it to perform better. In the example we treated above, the training data could be in the corpus below.
[box type=”info” align=”” class=”” width=””]
“The queen is the mother of the Lion pride.”
“ The queen must be a female lion.”
“The queen takes care of the Pride.”
“ The queen does not have a mane”
“The King and Queen have enormous jaw strength while the cubs’ jaw strength.”
[/box]
With these sentences, the model begins to understand what the entity ‘queen’ by forming a word vector using surrounding words.
Based on strength, the King and Queen are closely related. So, therefore, based on gender, it relates the lion to the lioness.
The Components of Natural Language Processing
There are five major components of NLP, which are:
- Lexical analysis
- Syntactic analysis
- Semantic analysis
- Discourse integration
- Pragmatic analysis
Lexical Analysis
The lexical analysis involves the analysis of words and their structures. In the lexical analysis, a chunk of text is divided into paragraphs, sentences, and words. Punctuation signs are removed from the words and each word is analyzed into their various components.
Syntactic Analysis
This can also be called parsing. It is the analysis of words to show the relationship between them in a sentence. The syntactic analysis arranges the words based on the syntax of a given language. In other words, the syntactic analysis makes sure that the rules that govern a particular language are adhered to.
For instance, the syntactic analyzer does not allow a sentence like “The boy is ate”, since the action is in a continuous tense but a past tense verb was used.
Semantic Analysis
The semantic analysis checks for the dictionary meaning of individual words in a text and ensures that the sentence makes sense. This is done by mapping the objects in the task domain with the syntactic structure of the sentence. The semantic analyzer shows how the words in a sentence are connected.
For instance, the semantic analyzer would not permit a sentence like “Lioness queen the is of the pride the” since the sentence does not make any sense.
Discourse Integration
In any language, there could be situations where a sentence depends on the context. Discourse integration checks for how preceding sentences are connected with a particular sentence. The discourse integration makes sure that every sentence is placed in the right place such that a sentence is not interpreted out of context.
For instance, in the sentence “I asked, which did he like?”. “which” here implies that there is a list of options and is determined by the past sentences.
Pragmatic Analysis
Pragmatic analysis the use of general real-world knowledge to abstract meaning from a sentence. It deals with the general content and the consequence of interpretation. In a pragmatic analysis, the focus dwells on what was said in the light of what should be said.
For instance, “pass me the glass of water?” can be seen not as an order, but as a request.
Why is NLP so Challenging?
NLP is not a done deal yet. While some level of progress has been made over the years, there are still a lot of hurdles to cross. The major challenge of NLP is that languages are ambiguous. This ambiguity can take different forms and levels. Lexical ambiguity, for instance, is at the primitive level. For instance, having to determine whether a word is to be used as a verb or a noun can pose a problem.
Referential ambiguity is another level of ambiguity. This applies when we refer to something using pronouns. For instance, in the sentence “I like him”, we do not know the exact person you like.
Syntax ambiguity is one other level of ambiguity. This says that a sentence can be interpreted in different ways. For example, in the sentence “He joggled the ball with his shoes”. We can’t be so sure whether he held his shoes and joggled the ball with them or he wore his shirt while he joggled the ball with his feet.
In a nutshell, NLP depends on how solid the corpus is. With a wide domain repository, it can get challenging to understand the context.
How to Implement NLP
Natural Language Processing can be implemented through 2 primary ways – statistical inference and machine learning
- Statistical inference: Statistical inference involves the use of probabilities and other statistical theories to estimate from some finite samples to a potentially infinite set of new observations. The inference can be done through estimation or hypothesis testing. NLP can utilize statistical inference algorithms to create robust models.
- Machine learning: Various machine learning algorithms such as decision trees or even Deep learning approaches can also be used to build sophisticated NLP models.
Popular Applications of NLP
The applications of Natural Processing Language are vast and are still on the increase. It started roughly back in the mid-1950s but has now grown especially in recent years. NLP is now integrated into many websites we interact with, applications, and software alike. Let’s see some of the most popular applications.
- Machine translation: Machine translation is the act of a machine to translate information from one language to another. The most popular application of this is Google Translate. When you visit a website with a language you are not accustomed to, Google Translate does the translation to your preferred website without human intervention. This is Machine Translation.
- Speech Recognition: The interesting thing is that speech recognition has been in development for more than 50 years now. But has made tremendous success in the last few decades thanks to NLP. Speech recognition is fast replacing other modes of inputs such as typing, selecting texts, or clicking. Voice assistants such as Google Assistant or Siri are also powered by speech recognition to take command and perform an action.
- Sentiment analysis: When you buy a good online and drop a review, the seller can parse comments to ascertain the customers’ opinion and their level of satisfaction. With the mined data, they can build ML models through NLP that will help to improve their product. The same thing applies when you drop a comment on social media, your text is processed using NLP and can be used to identify complaints and lessen the level of dissatisfaction.
- Chatbot: Interestingly, chatbots have been in development since the 1960s. Over the past decade, however, NLP has enabled this field to grow in leaps and bounds. NLP is the primary driver of chatbots today. They enable the machine to parse your message, process it, and return a meaningful reply.
- Spell checking: Spell checking is used to identify possible errors in a text and suggest the most probable intent. Many text editors now incorporate it into their systems. Some common examples include the MS Word, Google Doc, and Grammarly app.
Some other applications that are worthy of mention include information retrieval, text summarization, and question answering
Natural Language Vs Computer Language
Natural language is ambiguous in that the rules and rules exceptions are inexhaustible while computer languages are unambiguous. Natural languages can make use of idiomatic expressions or metaphor. Computer languages, on the other hand, means exactly what it outputs.
Advantage of NLP
- NLP helps humans communicate with computers in our language
- Users can ask flexible questions and get a direct answer on the spot.
- Answers are provided in natural language, clear and lucid
- NLP can get a structured answer from a highly unstructured data source.
Disadvantages of NLP
- When a question lacks adequate information, the system is likely to provide an incorrect answer.
- NLP user interface does not allow the user to interact further with it
- Every system is used for a particular task. This makes It difficult for the system to navigate through new problems
In summary, we said that NLP is the process of a computer to interact with humans using natural language. You have learned how NLP works and went further to understand each component of Natural Language Processing. We juxtaposed between computer language and natural language and finally discussed the pros and cons of NLP. In the next tutorial, we will guide you on how to install NLTK on your machine.