Identify Person, Place and Organisation in content using Python

This article outlines the concept and python implementation of Named Entity Recognition using StanfordNERTagger. The technical challenges such as installation issues, version conflict issues, operating system issues that are very common to this analysis are out of scope for this article.

NER NLP using Python

Table of contents:

1. Named Entity Recognition defined
2. Business Use cases
3. Installation Pre-requisites
4. Python Code for implementation
5. Additional Reading: CRF model, Multiple models available in the package
6. Disclaimer

1. Named Entity Recognition Defined

The process of detecting and classifying proper names mentioned in a text can be defined as Named Entity Recognition (NER). In simple words, it locates person name, organization and location etc. in the content. This is generally the first step in most of the Information Extraction (IE) tasks of Natural Language Processing.

NER Sample

2. Business Use Cases

There is a need for NER across multiple domains. Below are a few sample business use cases for your reference.

Investment research: To identify the various announcements of the companies, people’s reaction towards them and its impact on the stock prices, one needs to identify people and organisation names in the text
Chat-bots in multiple domains: To identify places and dates for booking hotel rooms, air tickets etc.
Insurance domain: Identify and mask people’s names in the feedback forms before analyzing. This is needed for being regulatory compliant(example: HIPAA)

3. Installation Prerequisites

1.Download Stanford NER from http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip

2.Unzip the zipped folder and save in a drive.

3.Copy the “stanford-ner.jar” from the folder and save it just outside the folder as shown in the image

4.Download the caseless models from https://stanfordnlp.github.io/CoreNLP/history.html by clicking on “caseless” as given below. The models in the first link work as well. However, the caseless models help in identifying named entities even when they are not capitalised as required by formal grammar rules.

5.Save the folder in the same location as the Stanford NER folder for ease of access

Stanford NER Installation - Step1

NER Installation - Step2

4. Python Code for implementation:

#Import all the required libraries.
import os
from nltk.tag import StanfordNERTagger
import pandas as pd

#Set environmental variables programmatically.
#Set the classpath to the path where the jar file is located
os.environ['CLASSPATH'] = "<path to the file>/stanford-ner-2015-04-20/stanford-ner.jar"

#Set the Stanford models to the path where the models are stored
os.environ['STANFORD_MODELS'] = '<path to the file>/stanford-corenlp-caseless-2015-04-20-models/edu/stanford/nlp/models/ner'

#Set the java jdk path
java_path = "C:/Program Files/Java/jdk1.8.0_161/bin/java.exe"
os.environ['JAVAHOME'] = java_path

#Set the path to the model that you would like to use
stanford_classifier = '<path to the file>/stanford-corenlp-caseless-2015-04-20-models/edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz'

#Build NER tagger object
st = StanfordNERTagger(stanford_classifier)

#A sample text for NER tagging
text = 'srinivas ramanujan went to the united kingdom. There he studied at cambridge university.'

#Tag the sentence and print output
tagged = st.tag(str(text).split())
print(tagged)

Output

[(u'srinivas', u'PERSON'), 
(u'ramanujan', u'PERSON'), 
(u'went', u'O'), 
(u'to', u'O'), 
(u'the', u'O'), 
(u'united', u'LOCATION'), 
(u'kingdom.', u'LOCATION'), 
(u'There', u'O'), 
(u'he', u'O'), 
(u'studied', u'O'), 
(u'at', u'O'), 
(u'cambridge', u'ORGANIZATION'), 
(u'university', u'ORGANIZATION')]

5. Additional Reading

StanfordNER algorithm leverages a general implementation of linear chain Conditional Random Fields (CRFs) sequence models. CRFs seem very similar to Hidden Markov Model but are very different.

Below are some key points to note about the CRFs in general.

It is a discriminative model unlike the HMM model and thus models the conditional probability
It does not assume independence of features unlike the HMM model. This means that the current word, previous word, next word are all considered for model as features
Relative to HMM or Max ent Markov Models, CRFs are the slowest

6. Disclaimer

This article explains the implementation of StanfordNER algorithm for research purposes and does not promote it for commercial gains. For any questions on commercial aspects of implementing this algorithm, please contact Stanford University

Identify Person, Place and Organisation in content using Python

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

Download The Last Ship 3ª Temporada Dublado e Legendado – MEGA

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Black Angus Grilled Artichokes

PHOTOS: Taarak Mehta Ka Ooltah Chashmah cast then and now; Check out your...

Four Air Leitchville Pty Ltd v Hurlad Pty Ltd (No 3) [2024] FCA 238

Forum Post: RE: Plugin timeout exception in custom workflow activity

Serial child killer David Threinen’s reign of terror

Ex-Colchester United youth player Craig Winskill carried out armed robbery to...

Karimnagar District Tahsildars Phone Numbers-Mobile Numbers Telangana-State

Bureau of Internal Revenue: Regional Offices (Directory)

HP P2000 Storage Error Controller A Unknown Issue Resolution Request

Moondru Mudichu 04-10-2017 – Polimer tv Serial

The 10 Tennessee Cities With The Largest Black Population For 2021

TunerPad KeyGen FREE

Windows Update / Microsoft Update の接続先 URL について

(get) Tej Dosa Letter 81 - How To Make An Extra $200-$500/Week (In 2025)

Philly Mobster Ronnie Turchi Took Last Ride In October ’99, Turned Up Trunk...