Speech Classification and Localisation

Technology Overview

Video surveillance has been prominent in recent years. These surveillance systems generally do not incorporate acoustic modules due to the noise/interference problem. Our team has developed a technology which allows discrimation of speech signals against noise/interference. This allows one to detect and localise sound sources as well as to classify sound sources such as screaming and glass breaking. 


This technology, addresses the problem of acoustic source localisation, separation from other interfering sources and enhancement in a cocktail party environment in both enclosed and open environment. It consists of a microphone array, with aperture size comparable to an A4-size paper. A video sub-module also allows the operator to indicate the area of interest for the system. As such, it finds great applications in

 1. Defence, security and surveillance agencies

2. Research agencies

3. Corporate conference rooms, press conferences

4. Robotics

Technology Features & Specifications

The technology comprises of a microphone array consisting of 308 digital MEMS microphones. It also has a video camera integrated at the center of the microphone array. The overall size of the array is approximately A4 size. Data acquisition is performed using off-the-shelf devices and integrated to a PC based platform.


This platform allows users to select (via video display) where he/she would like the microphone array to zoom into for acoustic surveillance/eavesdropping. The system also allows one to track the location of any speech sources while rejecting non-speech interferences. This is useful for applications where determining the location of a person talking is of paramount importance.

Potential Applications

The potential applications of the technology includes -

 1. Teleconferencing systems.

2. Hands-free telephony for source localization.

3. Speech enhancement and speaker recognition applications.

4. Key-word spotting.

5. Classification of sounds.

Market Trends and Opportunities

The market size cannot be determined at the moment. As it stands, the IP would be very attractive to government agencies who have started installing cameras country-wide (both in Singapore and Overseas). These surveillance solutions do not, in general, come with acoustic/sound capability. The proposed solution can be incorporated to existing systems with minimal intervention.

Customer Benefits

Existing technology for microphone arrays are rather large in dimensions in order to achieve good results. The benefit of the proposed system is that the dimension is of A4 size or smaller and can achieve comparable performance.

In addition, existing systems do not discriminate speech sources from that of interferers, i.e., existing systems localises any sound in an environment. The proposed system allows the users to localise only speech sources.

