TECH OFFER

A System for Extracting Named Entities in News Articles

KEY INFORMATION

TECHNOLOGY CATEGORY:
Infocomm - Big Data, Data Analytics, Data Mining & Data Visualisation
TECHNOLOGY READINESS LEVEL (TRL):
LOCATION:
Singapore
ID NUMBER:
TO087668

TECHNOLOGY OVERVIEW

This software technology is developed to support extracting named entities (people, companies, locations) mentioned in news articles, together with readers’ comments.

The system implements a fast and effective named entity recognition and linking techniques.

It also offers an interactive visualization of the results and the linking process, with associations of the named entities.

TECHNOLOGY FEATURES & SPECIFICATIONS

The software system is developed in Python language. It uses Standford Named Entity Recognizer (NER) to extract entity mentions in text, then classifies each of them into person, location, organization, or miscellany category. It implements Pair-Linking algorithm for fast disambiguating each of the extracted mentions into its corresponding profile in Wikipedia. The front-end web-service is based on Flask.

The system’s input and output are described as follows:

Input: The system takes the raw text of news articles and their associated comments as input. Each article and its comments are formatted in JSON, and an example can be found in the folder “core/sample_documents”.

Output: The system outputs entity mentions extracted from each article and comment. Each mention comes with a label (person, location, organization, or miscellany category) and a link to Wikipedia (if the mentions are likely to be associated with the entity profile).

POTENTIAL APPLICATIONS

Entity extraction is usually the first step to analyze text documents.

It enables wide range of applications and use cases such as resolving a person’s identity for government security and fraud detection, tracking customer sentiment around products and companies, providing targeted search for content publishers and recommendation engine.

Market Trends & Opportunities

People are interacting with digital news on daily basis and it creates a huge source of information for mining.

However, existing systems have their own limitations, and they are not specialized for news articles and user comments as software system can do.

Benefits

  • Different from existing systems on the market, this software system especially focuses on extracting entities in news articles and user comments.
  • It collectively links concepts across articles and comments for better accuracy.
  • The visualization in the software system is unique and it provides customers with an interactive view of the extraction results.
  • The whole system is modularized into smaller sub-components that can be modified and maintained easily.
RELATED TECH OFFERS
Learning Analytics System for Step Based Learning Activities
Immersive and Collaborative Communication Technology For Interactive Information Sharing
Optimise Utility and Industrial Asset Management with Predictive Analytics Technology
Computer Vision and Video Analytics System for Plant Growth and Disease Detection
Digital Hygiene Map System for Indoor Space
Analysis of Gene Expression Data
Non-invasive Diagnosis of Internal Combustion Engine Performance
Remote Photoplethysmography for Contactless Vital Signs Monitoring
Advanced Analytics and Sensor Platform for Workspace Management
Unified Platform with AI Modules for Management of Smart Estate