Machine Learning (ML) and Deep Learning (DL) have been the primary growth driver of Artificial Intelligence (AI) and has seen widespread adoption in areas such as Computer Vision, Speech Processing, Natural Language Processing, and Graph Search, among many others. It is also well-known that AI both needs and produces large amounts of data. However, traditional data repositories have not scaled effectively to handle the large amounts of vector representations that are common in AI applications - in such cases, searching for similarities across high-dimensional vectors is inefficient. To address such limitations, vector databases have been developed to address the limitations of traditional hash-based searches and search scalability, enabling similarity searches across large datasets.
This technology offer is a unified Online Analytical Processing (OLAP) data platform that supports approximate vector search, enabling efficient searching over billion-scale structured data and vector data. The data engine simplifies the process of building enterprise-level AI applications such as search and recommendation systems, video analytics, text-based searches, and chatbots while accelerating the development of production-ready systems. Developers no longer need to deal with complicated scripts to query vector data as low latency, high-performance structured data, and vector data searches are made possible via vector data indexing methods and the use of extended Structured Query Language (SQL) syntax.
This technology offer is purpose-built OLAP database, CPU-only implementation with a built-in vector query engine that uses extended SQL statements for data querying. Supported data include structured data (tabular text, numbers, dates, times) and unstructured data (image, video, audio) that have been converted to vector data representation. This technology enables high-performance joint queries, and a simplified manner of querying labels, text, and numbers within a single SQL statement. It supports highly performant SQL + vector searches, operating on billion-scale data, with an operating latency of 200 milliseconds at a throughput of 200 queries per second (QPS).
The key features of this technology are as follows:
The following similarity metrics are currently supported:
The following indexing libraries are currently supported:
The following interfaces are available for developer integration:
This technology can be applied for similarity searches (identifying similar high-dimensionality vectors), or classification (locating images that contain a certain element, e.g. car, flower). The following potential applications of this technology have also been tested:
Compared with existing techniques, this technology represents a single, unified pipeline for querying vector representation data without the need to store structured data and vectors separately in traditional databases (SQL) and vector repositories. This solves the limitation of having to merge results from standard database engines (specifically optimised for hash-based searches) with that of vector query databases. This data engine includes a vector search function and it can efficiently store, index, and manage vectors that are generated by deep learning networks and machine learning models. Additionally, the extended SQL query syntax of this technology enables a highly efficient, simplified search across a variety of different AI applications.
The technology owner is keen to collaborate with companies that are conducting in-house AI application development in industries such as, but not limited to, e-commerce, video analytics, smart city, and healthcare.
The following is an example of how the vector search engine can be used to query for similar logos (images):