Inzata Labs

Inzata 2nd Generation Aggregation Engine

Picture of Nikol H
Nikol H
A passionate tech enthusiast and seasoned tech blogger, Nikol's writing style is characterized by its clarity and accessibility. Whether demystifying the intricacies of artificial intelligence, or guiding readers through the world of data modeling, her articles are a beacon for those navigating the ever-evolving tech landscape.

Business Intelligence technology faces challenges in analyzing large data sets, increasing user numbers, and handling ad-hoc reporting, big data analysis, and scalability in a rapidly changing environment. To address these challenges, Inzata is upgrading its current 1st Generation Aggregation Engine (1GAE) to the 2nd Gen version.

Inzata’s current 1st-gen aggregation engine is based on patented algorithms from 2013, optimized for Intel’s 2012 Core processor family based on Ivy Bridge silicon, and NVIDIA GPU hardware. The upgraded GAE takes full advantage of the newest CPU hardware. The new approach for instruction-level parallelization under the “vectorization” name enables a new type of parallelism – vectorization– the process of converting an algorithm from operating on a single value at a time to operating on a set of values (vector) at one time. Vectorization allows a CPU to operate on multiple pieces of data at the same time.

The 2nd Gen Aggregation Engine also has a new Map-Reduce aggregation module that benefits from parallel multithreaded execution with simultaneous vector operations (SIMD), providing significantly better scalability and performance than the first gen engine. The reduce algorithm is optimized for the best usage of CPU vectorization. Additionally, the 2ndGAE replaces OpenCL-based coded threads with the main algorithm coded in GO language with embedded Assembler routines to leverage the vectorization capabilities of current CPUs.

 

 

The upgraded 2GAE also includes a new optimized sorting algorithm, specifically designed for small data chunks. The new algorithm (“net sort”) is 10-15x faster for this use case than generally used sorting algorithms, such as quick sort. Vectorization is used in the Map-Reduce process for reduction phases.

The new GAE uses a micro just-in-time compilation (µJIT) approach, where the GO language is used to program a particular compiler intended to compile small chunks of code that implement critical vector-based parts of the Reduce algorithm. To achieve a maximum speed of compilation, the Assembler language is utilized as opposed to OpenCL technology in the 1st gen engine.

In summary, the 2GAE offers the following benefits:

– Faster, more scalable, and flexible BI analysis for big data

– Enhanced instruction-level parallelism and SIMD operations for faster processing

– A new approach to software vectorization for improved parallelism

– Better compatibility with the Inzata Kubernetes-based cloud platform

– Optimized CPU cache utilization for improved performance

– Faster Just-In-Time compilation for faster code execution

Polk County Schools Case Study in Data Analytics

We’ll send it to your inbox immediately!

Polk County Case Study for Data Analytics Inzata Platform in School Districts

Get Your Guide

We’ll send it to your inbox immediately!

Guide to Cleaning Data with Excel & Google Sheets Book Cover by Inzata COO Christopher Rafter