Conventional big data analytics systems (e.g., MapReduce, Dryad, Spark) are designed to work in an offline, batch-based manner originally. All data needs to be available in advance and will be processed as a whole. However, data is often generated continuously and needs to be processed in real time, for instance, network traffic data in the telecommunication environment.
The team developed a novel system for big data online distributed stream processing. It provides a high-performance, fault-tolerant, and generic analytics platform for various analytics applications, such as data synopsis, stream database queries, and online machine learning for telecommunication, big data analytics industries and IT service operators.
The technology could be used for network measurements (example: anomaly detection, flow size distribution, failure diagnosis), data mining and machine learning (example: frequent pattern mining, classification, regression, prediction). Applications can be found in preventive maintenance of heavy traffic servers, in which abnormal or specific patterns can be identified for early detection of a potential failure.