Topic Dictionary and Topic Modelling, English and Chinese Language (APIs/SaaS)


Infocomm - Big Data, Data Analytics, Data Mining & Data Visualisation
Infocomm - Data Processing
Show more >


We are a text analysis startup based in one of Singapore's national universities, specialising in advanced topic-modelling systems.

Our topic modelling databases are not based on mathematical proximity scores, but on web search results. Using the internet hive-mind to build the dictionary rather than relying on complex statistical calculations produces far more human-like results than other competing systems. Currently, researchers doing text analysis use mathematical formulae based on word proximities. The principal method used is Latent Dirichlet allocation, or LDA. Instead, our technology compares the themes in your document with concept maps crowdsourced via web search results. Our system uses the internet as a giant categorisation machine, allowing it to produce much more human-like results than alternative systems. 

You can see the difference in the example projects above, made using two Wikipedia articles: those on Bruce Willis and Wyatt Earp. The LDA model took our researcher around a day and a half to code and test. Our own word clouds took just three seconds to generate. While the LDA model is static, our results can be edited, even by non-technical users. 

If you would like a free demo, please get in touch!


We can provide:

  • English Language - Specific
    • English language tokeniser
    • English language tokeniser with common phrase identification
    • English language topic modeling
    • English language topic dictionary
    • English language sentiment analysis
  • Chinese Language - Specific
    • Chinese language tokeniser
    • Chinese language tokeniser with common phrase identification
    • Chinese language topic modeling
    • Chinese language topic dictionary
    • Chinese language sentiment analysis
  • Topic modeling, visualisation and text analysis SaaS platforms
  • Custom algorithms and APIs (enquire for details)


Our clients have used our software to analyse and predict financial trends using news data, to process survey and focus groups, and to mine customer complaints data for underlying trends. The software can be used easily by non-technical users, and the APIs can easily be integrated into your own in-house systems to improve their output. Industries and fields that can benefit from accurate and customisable topic and keyword identification systems include:

  • Polling, surveys and market research
  • CRM
  • Academia
  • Marketing 
  • Data analytics platforms
  • Social media analysis
  • SEO
  • Law

Market Trends & Opportunities

Many data analytics platforms are available online, and more and more companies are building their own in-house software to deal with the data that their activities generate. We are not proposing to replace these systems, but rather to upgrade a single component. Currently, for example, most online survey platforms offer only hand-tagging as a means to analyse open-ended responses, a process which usually takes days or weeks to carry out. Our system takes just a few seconds. These platforms have over a billion users worldwide, concentrated in the English and Chinese language markets.

At minimal cost, their existing text analysis modules can be replaced with our topic dictionary API, producing far better results much faster. Because our dictionary produces much more human-like categorisations than competing systems it gives researchers a much more accurate perspective on datasets too large for any human to analyse, while the customisability options allow even non-technical users to adapt its models to their precise needs. No other algorithm on the market offers these features.

If you have text data to analyse or even a text data analytics platform, the chances are that you are already using either a hand-tagging system or computational linguistics algorithm. Our API can provide massively improved speed, accuracy and customisability for your users at minimal cost. If you have text analysis systems that you are looking to improve or you are simply looking to build one, get in touch!


Accuracy: Because our topic database is crowdsourced via web search results, it produces more human-like categorisations than competing systems.

Speed: Our method keeps the number of calculations, and thus the amount of processing power required, to a minimum, making it fast to use.

Customisability: No AI will ever produce a perfect analysis, simply because it can't know which topics interest you. Our system allows users with no programming skills to edit the model that the computer produces, adapting it to their individual needs.

Easy integration: Whether via our API or our online SaaS options, our system integrates easily with your existing analytics tools.

Computer Vision and Video Analytics System for Plant Growth and Disease Detection
Digital Hygiene Map System for Indoor Space
Analysis of Gene Expression Data
Non-invasive Diagnosis of Internal Combustion Engine Performance
Remote Photoplethysmography for Contactless Vital Signs Monitoring
Advanced Analytics and Sensor Platform for Workspace Management
Unified Platform with AI Modules for Management of Smart Estate
Anti-counterfeit & Smart Tagging Technology For Securing Product Quality and Integrity
Operational Intelligence Platform For Optimization of Water Treatment
Intelligent Personal Assistants And Software Platform For Online Information Processing