How Much Should We Say on LinkedIn? 4 Questions to Ask Yourself Before You Post

Recently I have noticed the line between personal and professional posts on LinkedIn becoming a bit blurry. A lot is going on in the world today and people feel inclined to talk about it. But how…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Tables of Contents

The backgrounds, use cases and challenges of NER in Natural Language Processing field.

Textual information stored on the web becomes larger over time as the number of people around the world engaging in web platforms grows exponentially in the recent modernization world. As the available data increases, the extraction of useful information from the data has become the most significant activity across all the existing domains.

In this article, we’ll be discussing the basics processes of NER, various use cases, and major challenges in the field of NER. In the last section, we’ll also be learning how to develop our own NER algorithm using the Python Spacy library.

Named Entity Recognition (NER) is a sub-process of Information Extraction which is known as the process of automatic extraction of named entities by means of finding the entities in a given text (Recognition) and assigning their type (Classification).

That is to say, NER is used to identify and classify the textual information that is belonging to particular semantic types such as locations, organizations, quantities, persons and etc.

Example of NER result

Recently, NER is widely used across various domains and sectors to automate the information extraction process.

NER suits the intent of Information Extraction (IE) which is to produce a knowledge base. It can organize and arrange the information in a way that is useful to the people and also for the Natural Language Processing (NLP) algorithms to make useful inferences from it.

The approach of identifying a Named Entity from a raw text and arrange these texts into sub-groups is called Entity Recognition. The basic steps of NER are shown in the diagram below:

Basic Steps of NER

Let’s breakdown each step of NER to see how it exactly works:

Firstly, the raw text will go through the Sentence Segmentation stage. It is a process of dividing a string of written language into its component sentences. For instance, the sentences need to split apart whenever punctuation marks or periods are found. The purpose of sentence segmentation is to assign sentence boundaries.

Word Tokenization is known as a process of dividing a string of written language into its component words.

Example of Word Tokenization

Part of Speech (POS) tagging is the process of assigning up a word in a text to a corresponding part of a speech tag according to its context and definition. It describes the characteristic structure of lexical terms within a sentence or text, which means the POS tags can be used for making assumptions about semantics.

Example of Part of Speech Tagging

Lastly, the Entity Detection stage. It is a process of identifying key elements from text, then classifies them into predefined categories. This is the crucial process of NER as it completes the purpose of NER. The image below shows an example result when named entities being detected in a sentence.

Example of Entity Detection

So, these are the basic steps of NER to fit in its purpose in the field of Information Extraction (IE).

Alright, now let’s see what are the real-world use cases of NER to get to know how it is being implemented to solve a certain problem.

NER can be a tool for indexing and cataloging textual document datasets, like online reports, news, and articles. It could extract meaningful entities from them which will then classify these reports into various classes appropriately. These identified entities can be used for indexing the documents which will help the search engines to perform well for recommending relevant documents based on the search keyword.

NER works as an important element in a Question Answering System, for instance, Customer Support Chatbot. The purpose of having the implementation of NER in the chatbot is to find the answers to many fact-based questions and these answers are the entities that can be detected using NER. NER in chatbot systems makes the task of finding answers a lot more efficient as it saves time without having to interact with the customer support staff.

It is known as sentiment analysis which involves building an algorithm to collect and categorize opinions about an entity. The entity can be referred as products, individuals, services, etc. NER is considered as one of the important components in opinion mining as it is used to extract the relevant attributes of products or entities. From these extracted data, the opinion analysis can be done accordingly over time to determine whether the product sentiment classification rises or falls (positive or negative).

The performance of NER is challenged by various complexities that occurred in any natural language.

Text can be ambiguous if it appears as a named entity at one place and a common noun at another place or if it is used to refer to different entity types. In other words, it recognizes words that can have multiple meanings or words that can be a part of different sentences.

NER process can be very challenging to identify words entities when there is a lack of textual information resources.

The words which are not frequently used these days, or words that are not heard by a lot of people, is another challenging part of this field.

Abbreviations are shortened forms of words. Words are abbreviated as it eases for us as a human to write and understand the word context efficiently. But in NLP, words abbreviations are considered as the major problem to handle as it involves a mundane process that requires some label for identification to expand it to its original words. It will undeniably affect the NER performance as abbreviated words cannot be classified to their correct entity accordingly and it needs the text pre-processing phase to handle this issue properly.

Thus, we can use Spacy to develop and run NER models on any available text data according to our needs.

You can install the package by using pip.

Download the language model

Import spacy

Feed the text data to the spacy model called nlp

Display result

Here‘s the output:

From the output above, there are 4 entities identified by Spacy NER model which are ORG, GPE, DATE, and EVENT. Below is the explanation of each of the spacy NER entity types:

Use this code below to visualize the named entities on the defined text.

The output

You can try it yourself and see how Spacy does the magic.

In conclusion, NER is a sub-task of Natural Language Processing (NLP) that plays a significant role in automated information extraction. The huge amount of textual information available on the web recently, makes the NER gained a lot of significance in extracting useful insights. This shows that NER is becoming an emerging field that will continually improve due to its important role in many NLP applications.

In this article, we discussed few important components of NER. Starting by explaining what is the purpose of NER and getting to know how NER works by exploring the basic steps behind it such as Sentence Segmentation, Word Tokenization, POS Tagging, and Entity Detection. Then, we go through some real-world use cases that implement NER and the challenges faced for NER applications. Lastly, we do some exercise on running the NER algorithm using Spacy which is good to start for us before building larger NER applications.

Hope you enjoy reading this article!

Thank you for reading.

Peace! ✌️

[3] S. R. Kundeti, J. Vijayananda, S. Mujjiga and M. Kalyan, “Clinical named entity recognition: Challenges and opportunities,” 2016 IEEE International Conference on Big Data (Big Data), 2016, pp. 1937–1945, doi: 10.1109/BigData.2016.7840814.

[4] Neudecker, Clemens. “An open corpus for named entity recognition in historic newspapers.” Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 2016.

[5]Suman, C., Reddy, S. M., Saha, S., & Bhattacharyya, P. (2021). Why pay more? A simple and efficient named entity recognition system for tweets. Expert Systems with Applications, 167, 114101.

Add a comment

Related posts:

Water is Thicker Than Wine

I would kill to confess that this power is a fraud, that any alchemy is a fallacy abroad, far from the praise of domestic applause, an implausible reality with no cause for pause. But I would add…

DIGITAL GOLD in France

With the capital Paris, one of the most beautiful and visited cities in the world, France a sovereign country belonging to the European Union located in Western Europe, this country is one of the…

My Journey to Google Summer of Code 2020

A blog about my journey to getting selected in Google Summer of Code 2020 with Wikimedia Foundation as a sophomore.