A NEW APPROACH TO REAL-TIME ANALYTICS FOR STREAMING DATA AT CLOUD SCALE

Designed to take full advantage of low-cost commodity cloud hardware and deliver high-performance real-time analytics:

  • Future-proof existing data warehouses by scaling horizontally on inexpensive commodity hardware without downtime
  • Easily profile, classify and integrate new data sources into existing models for analysis
  • Distributed computing (in-memory processing, distributed storage, parallel execution,) with massive parallel processing on Big Data
  • Linear scalability for large number of users and projects with complex data models
  • Full integration with R, Python, Scala, R-studio and other data science tools
Cloud Execution Schema
Distributed Repository

Column based approach

The Cluster is a set of columns with common aggregation level and constrain

The Cluster is horizontally split into Partitions, columns are split into Chunks

Partitions are organized into Allocation Units to define distribution over computing nodes (to optimize parallel processing and ensure redundancy for high availability)

Each data chunk can be presented in several versions concurrently. This allows for time-shifting of reports to view data as it appeared at different points in time.

Chunks that contain the same data share common physical data space