HEAVY.AI Docs
v8.1.0
v8.1.0
  • Welcome to HEAVY.AI Documentation
  • Overview
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
      • Hardware Reference
      • Software Requirements
      • Licensing
    • Installation
      • Free Version
      • Installing on Docker
        • HEAVY.AI Installation using Docker on Ubuntu
      • Installing on Ubuntu
        • HEAVY.AI Installation on Ubuntu
        • Install NVIDIA Drivers and Vulkan on Ubuntu
      • Installing on Rocky Linux / RHEL
        • HEAVY.AI Installation on RHEL
        • Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL
      • Getting Started on AWS
      • Getting Started on GCP
      • Getting Started on Azure
      • Getting Started on Kubernetes (BETA)
      • Upgrading
        • Upgrading HEAVY.AI
        • Upgrading from Omnisci to HEAVY.AI 6.0
        • CUDA Compatibility Drivers
      • Uninstalling
      • Ports
    • Services and Utilities
      • Using Services
      • Using Utilities
    • Executor Resource Manager
    • Configuration Parameters
      • Overview
      • Configuration Parameters for HeavyDB
      • Configuration Parameters for HEAVY.AI Web Server
      • Configuration Parameters for HeavyIQ
    • Security
      • Roles and Privileges
        • Column-Level Security
      • Connecting Using SAML
      • Implementing a Secure Binary Interface
      • Encrypted Credentials in Custom Applications
      • LDAP Integration
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
      • Kafka
      • Using HeavyImmerse Data Manager
      • Importing Geospatial Data
    • Command Line
      • Loading Data with SQL
      • Exporting Data
  • SQL
    • Data Definition (DDL)
      • Datatypes
      • Users and Databases
      • Tables
      • System Tables
      • Views
      • Policies
      • Comment
    • Data Manipulation (DML)
      • SQL Capabilities
        • ALTER SESSION SET
        • ALTER SYSTEM CLEAR
        • DELETE
        • EXPLAIN
        • INSERT
        • KILL QUERY
        • LIKELY/UNLIKELY
        • SELECT
        • SHOW
        • UPDATE
        • Arrays
        • Logical Operators and Conditional and Subquery Expressions
        • Table Expression and Join Support
        • Type Casts
      • Geospatial Capabilities
        • Uber H3 Hexagonal Modeling
      • Functions and Operators
      • System Table Functions
        • generate_random_strings
        • generate_series
        • tf_compute_dwell_times
        • tf_feature_self_similarity
        • tf_feature_similarity
        • tf_geo_rasterize
        • tf_geo_rasterize_slope
        • tf_graph_shortest_path
        • tf_graph_shortest_paths_distances
        • tf_load_point_cloud
        • tf_mandelbrot*
        • tf_point_cloud_metadata
        • tf_raster_contour_lines; tf_raster_contour_polygons
        • tf_raster_graph_shortest_slope_weighted_path
        • tf_rf_prop_max_signal (Directional Antennas)
        • ts_rf_prop_max_signal (Isotropic Antennas)
        • tf_rf_prop
      • Window Functions
      • Reserved Words
      • SQL Extensions
      • HeavyIQ LLM_TRANSFORM
  • HeavyImmerse
    • Introduction to HeavyImmerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
      • Dashboard List
      • Creating a Dashboard
      • Configuring a Dashboard
      • Duplicating and Sharing Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Using Cross-link
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Joins (Beta)
    • Chart Types
      • Overview
      • Bubble
      • Choropleth
      • Combo
      • Contour
      • Cross-Section
      • Gauge
      • Geo Heatmap
      • Heatmap
      • Linemap
      • Number
      • Pie
      • Pointmap
      • Scatter Plot
      • Skew-T
      • Table
      • Text Widget
      • Wind Barb
    • Deprecated Charts
      • Bar
      • Combo - Original
      • Histogram
      • Line
      • Stacked Bar
    • HeavyIQ SQL Notebook
  • HEAVYIQ Conversational Analytics
    • HeavyIQ Overview
      • HeavyIQ Guidance
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
    • Raster Data Wrapper Reference
  • HeavyML (BETA)
    • HeavyML Overview
    • Clustering Algorithms
    • Regression Algorithms
      • Linear Regression
      • Random Forest Regression
      • Decision Tree Regression
      • Gradient Boosting Tree Regression
    • Principal Components Analysis
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)
      • Installation
      • Registering and Using a Function
      • User-Defined Table Functions
      • RBC UDF/UDTF Example Notebooks
      • General UDF/UDTF Tutorial Notebooks
      • RBC API Reference
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
      • Forecasting with HEAVY.AI and Prophet
  • APIs and Interfaces
    • Overview
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
      • Vega Tutorials
        • Vega at a Glance
        • Getting Started with Vega
        • Getting More from Your Data
        • Creating More Advanced Charts
        • Using Polys Marks Type
        • Vega Accumulator
        • Using Transform Aggregation
        • Improving Rendering with SQL Extensions
      • Vega Reference Overview
        • data Property
        • projections Property
        • scales Property
        • marks Property
      • Migration
        • Migrating Vega Code to Dynamic Poly Rendering
      • Try Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • FAQs
    • Troubleshooting
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
      • Release 6.x
      • Release 5.x
      • Release 4.x
      • Release 3.x
Powered by GitBook
On this page
  • Rendering a Heatmap Colored by Bin Statistics
  • SQL Queries
  • Render the Heatmap in Vega
  • Creating a Geo Pointmap Using Transforms to Drive Color and Size
  • Distinct
  • Median
  • Quantile
Export as PDF
  1. APIs and Interfaces
  2. Vega
  3. Vega Tutorials

Using Transform Aggregation

PreviousVega AccumulatorNextImproving Rendering with SQL Extensions

You can create Vega-based visualizations with render properties that are driven by aggregated statistics. You can use Vega transform aggregation and formula expressions to automate the process of gathering statistical information about a rendered query. By doing so, you do not have to run an SQL prequery to get the information, thereby reducing the time it takes to process and render a chart.

The following examples show how to use transforms in Vega to do the following:

  • Render a heatmap that is colored using dynamic statistics of the bins

  • Create a geo pointmap with different transform modes

NOTE: You can see Vega examples in the . For more information about the OmniSci Vega engine, see .

Rendering a Heatmap Colored by Bin Statistics

The following heatmap example demonstrates the benefits of Vega transforms for performance and reducing redundancy:

  • First, the example shows using an SQL expression to render a heatmap, as well as an additional expression to color the hexagonal bins according to the min and max of the cnt value of the aggregated bins from the query.

  • Then, you will see how to render the heatmap and color the bins directly in Vega by using source data definitions and performing aggregation transforms on that data, decreasing chart rendering time and redundancy.

SQL Queries

The following is a typical SQL query used for rendering a hexagonal heatmap:

SELECT
        reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as x,
        reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as y,
        count(*) as cnt
FROM tweets_nov_feb
WHERE ((lon >= -157 AND lon <= 157) AND (lat >= -63 AND lat <= 81))
GROUP BY x, y;

To color the hexagonal bins according to the min and max of the cnt value of the bins from the query, you need to run a prequery to gather these statistics manually. Here, this is done using a subquery SQL statement:

    SELECT
             min(cnt), max(cnt)
    FROM (SELECT
       reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as x,
       reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as y,
       count(*) as cnt
FROM tweets_nov_feb
WHERE ((lon >= -157 AND lon <= 157) AND (lat >= -63 AND lat <= 81))
GROUP BY x, y
    );

The values returned from this query can then be embedded in the Vega code to color the heatmap bins. Notice that the second query does an aggregation over the query, effectively running the query twice.

To avoid the redundancy and expense of running the query twice, you can instead specify the aggregation in Vega.

Render the Heatmap in Vega

The following Vega code renders the heatmap colored by aggregated statistics using transforms.

{
  "width": 897,
  "height": 647,
  "data": [
        {
          "name": "heatmap_query",
          "sql": "SELECT reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as x, reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as y, count(*) as cnt FROM tweets_nov_feb WHERE ((lon >= -157 AND lon <= 157) AND (lat >= -63 AND lat <= 81)) GROUP BY x, y"
        },
        {
          "name": "heatmap_stats",
          "source": "heatmap_query",
          "transform": [
                {
                  "type": "aggregate",
                  "fields": ["cnt", "cnt"],
                  "ops":    ["min", "max"],
                  "as":     ["mincnt", "maxcnt"]
                }
          ]
        }
  ],
  "scales": [
        {
          "name": "heat_color",
          "type": "quantize",
          "domain": {"data": "heatmap_stats", "fields": ["mincnt", "maxcnt"]},
          "range": ["#115f9a", "#1984c5", "#22a7f0", "#48b5c4", "#76c68f",
                                "#a6d75b", "#c9e52f", "#d0ee11", "#d0f400"
          ]
        }
  ],
  "marks": [
        {
          "type": "symbol",
          "from": {
                "data": "heatmap_query"
          },
          "properties": {
                "shape": "hexagon-horiz",
                "xc": {
                  "field": "x"
                },
                "yc": {
                  "field": "y"
                },
                "width": 9.9667,
                "height": 11.5085,
                "fillColor": {
                  "scale": "heat_color",
                  "field": "cnt"
                }
          }
        }
  ]
}

The data section named heatmap_stats has a source data table defined by the "source": "heatmap_query" line:

"name": "heatmap_stats",
"source": "heatmap_query",

The "heatmap_stats" data takes as input the "heatmap_query" data, which is the data supplied by the SQL query. Use the source data type to apply intermediary steps or expressions (transforms) to the input source data.

To color the data according to the range of values defined by two standard deviations from the mean, edit the "heatmap_stats" section as follows to:

  • Aggregate the minimum, maximum, average, and sampled standard deviation of the count column.

  • Use formula expressions to calculate values that are two standard deviations from the average.

{
   "name": "heatmap_stats",
   "source": "heatmap_query",
   "transform": [
     {
       "type": "aggregate",
       "fields": ["cnt", "cnt", "cnt", "cnt"],
       "ops":    ["min", "max", "avg", "stddev"],
       "as":     ["mincnt", "maxcnt", "avgcnt", "stdcnt"]
     },
     {
       "type": "formula",
       "expr": "max(mincnt, avgcnt-2*stdcnt)",
       "as": "mincnttouse"
     },
     {
       "type": "formula",
       "expr": "min(maxcnt, avgcnt+2*stdcnt)",
       "as": "maxcnttouse"
     }
   ]
 }

Then, reference these values in the scale domain:

{
   "name": "heat_color",
   "type": "quantize",
   "domain": {"data": "heatmap_stats", "fields": ["mincnttouse", "maxcnttouse"]},
   "range": ["#115f9a", "#1984c5", "#22a7f0", "#48b5c4", "#76c68f",
             "#a6d75b", "#c9e52f", "#d0ee11", "#d0f400"
   ]
 }

Performing these calculations in Vega improves performance because the SQL query is only run once and the aggregated statistics are done “on the fly.” Because the query is not repeated in a statistical prequery step, you can reduce the full render time by half by performing the statistics step in Vega at render time.

Creating a Geo Pointmap Using Transforms to Drive Color and Size

This section shows how to use Vega tranforms to drive the color and size of points in a geo pointmap. Specifically, it show examples using the following aggregation transforms:

distinct: An array of distinct values from an input data column.

median: The median of an input data column.

quantile: An array of quantile separators; operates on numeric columns and takes the following pameters:

  • numQuantiles: The number of contiguous intervals to create; returns the separators for the intervals. The number of separators equals numQuantiles - 1.

  • includeExtrema: Whether to include min and max values (extrema) in the resulting separator array. The size of the resulting separator array will be numQuantiles + 1.

As with the heatmap example described earlier, using Vega transforms eliminate the need for an SQL prequery and significantly improves performance for dynamic operations.

The examples that follow use a Twitter dataset to create a geo pointmap.

Distinct

In the following example, the size of the points in a geo pointmap are defined by the numeric range two standard deviations from the average number of followers of the input data. The color of the points is driven by the distinct languages of the input data. To calculate the distinct languages, you could run a prequery using DISTINCT and then populate a Vega color scale with the results. However, the query would need to be run before every render update if the distinct data is meant to be dynamic, which would be very costly.

With the distinct Vega transform, this can be performed when evaluating the Vega code in the backend, so you do not need to run the prequery. This can improve performance considerably.

{
  "width": $width,
  "height": $height,
  "data": [
    {
      "name": "table",
      "sql": "SELECT conv_4326_900913_x(lon) as x,conv_4326_900913_y(lat) as y,followers,lang,rowid FROM tweets_2017_may WHERE (lon between $minLon and $maxLon AND lat between $minLat and $maxLat) LIMIT 200000"
    },
    {
      "name": "xformtable",
      "source": "table",
      "transform": [
        {
          "type": "aggregate",
          "fields": ["followers", "followers", "followers", "followers", "lang"],
          "ops":    ["min", "max", "avg", "stddev", "distinct"],
          "as":     ["minfol", "maxfol", "avgfol", "stdfol", "distinctlang"]
        },
        {
          "type": "formula",
          "expr": "max(minfol, avgfol-2*stdfol)",
          "as": "minfoltouse"
        },
        {
          "type": "formula",
          "expr": "min(maxfol, avgfol+2*stdfol)",
          "as": "maxfoltouse"
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": [
        $minXBounds,
        $maxXBounds
      ],
      "range": "width"
    },
    {
      "name": "y",
      "type": "linear",
      "domain": [
        $minYBounds,
        $maxYBounds
      ],
      "range": "height"
    },
    {
      "name": "size",
      "type": "linear",
      "domain": {"data": "xformtable", "fields": ["minfoltouse", "maxfoltouse"]},
      "range": [
        1, 20
      ],
      "clamp": true
    },
    {
      "name": "color",
      "type": "ordinal",
      "domain": {"data": "xformtable", "field": "distinctlang"},
      "range": [
        "blue", "red", "green", "yellow", "magenta", "purple", "teal"
      ]
    }
  ],
  "marks": [
    {
      "type": "points",
      "from": {
        "data": "table"
      },
      "properties": {
        "x": {
          "scale": "x",
          "field": "x"
        },
        "y": {
          "scale": "y",
          "field": "y"
        },
        "fillColor": {
          "scale": "color",
          "field": "lang"
        },
        "size": {
          "scale": "size",
          "field": "followers"
        }
      }
    }
  ]
}

This Vega code results in this image:

Median

Outliers in a dataset can significantly skew statistics such as AVG and STDDEV. To mitigate this, you can use median and quantile to create a more meaningful probability distribution of the data. Median and quantiles are computed dynamically when Vega is evaluated and can be used to drive different render properties.

The following hexmap example uses median to drive the color of the hex bins. Notice in the final render that roughly half of the bins are colored red, and the other half are blue.

{
  "width": $width,
  "height": $height,
  "data": [
    {
      "name": "heatmap_query",
      "sql": "SELECT reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),$minXBounds,$maxXBounds,conv_4326_900913_y(lat),$minYBounds,
	  $maxYBounds,9.931506849315069,11.467916305821335,0,0,$width,$height) as x,reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),$minXBounds,$maxXBounds,
	  conv_4326_900913_y(lat),$minYBounds,$maxYBounds,9.931506849315069,11.467916305821335,0,0,$width,$height) as y, count(*) as color FROM tweets_2017_may 
	  WHERE ((lon >= $minLon AND lon <= $maxLon) AND (lat >= $minLat AND lat <= $maxLat)) GROUP BY x, y"
    },
    {
      "name": "heatmap_statistics",
      "source": "heatmap_query",
      "transform": [
        {
          "type": "aggregate",
          "fields": ["color"],
          "ops":    ["median"],
          "as":     ["mediancolor"]
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "heat_color",
      "type": "threshold",
      "domain": {"data": "heatmap_statistics", "field": "mediancolor"},
      "range": [
        "blue", "red"
      ]
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {
        "data": "heatmap_query"
      },
      "properties": {
        "shape": "hexagon-horiz",
        "xc": {
          "field": "x"
        },
        "yc": {
          "field": "y"
        },
        "width": 9.931506849315069,
        "height": 11.467916305821335,
        "fillColor": {
          "scale": "heat_color",
          "field": "color"
        }
      }
    }
  ]
}

Quantile

The quantile function takes two additional parameters:

  • numQuantiles is the number of contiguous intervals to create and returns the separators for the intervals. The number of returned separators is numQuantiles - 1.

  • includeExtrema is a true or false value indicating whether to include the extrema (min and max) in the resulting separator array. If true, the number of returned values is numQuantiles + 1.

To see how a quantile works, consider a query that results in this set of values for "followers":

{3, 6, 7, 8, 8, 10, 13, 15, 16, 20}

With a quantile operator defined as {"type": "quantile", "numQuantiles": 4}, the result of the operator would be the following array:

[7, 9, 15]

25% of the data has less than 7 followers, 25% has between 7 and 9, 25% has between 9 and 15, and 25% has more than 15.

With a quantile operator defined as {"type": "quantile", "numQuantiles": 4, "includeExtrema": true}, the result of the operator would be the following array:

[3, 7, 9, 15, 20].

With "includeExtrema" == true, the min and max are included in the resulting array, so 25% of the data has between 3 and 7 followers, 25% has between 7 and 9, 25% has between 9 and 15, and 25% has between 15 and 20.

The following Vega code snippet gets the octiles (8-quantiles) and sextiles (6-quantiles) of a column called "followers":

...
{
 "name": "xformtable",
 "source": "table",
 "transform": [
   {
     "type": "aggregate",
     "fields": ["followers", "followers"],
     "ops":    [{"type": "quantile", "numQuantiles": 8}, {"type": "quantile", "numQuantiles": 6}],
     "as":     ["octile_fol", "sextile_fol"]
   }
 ]
}
...

Here is a more complete example using sextiles. Notice in the resulting image approximately the same number of hexagons appears in each of the six quantile groups colored blue to red, from left to right.

{
  "width": $width,
  "height": $height,
  "data": [
    {
      "name": "heatmap_query",
      "sql": "SELECT reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),$minXBounds,$maxXBounds,conv_4326_900913_y(lat),
	  $minYBounds,$maxYBounds,9.931506849315069,11.467916305821335,0,0,$width,$height) as x,reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),
      $minXBounds,$maxXBounds,conv_4326_900913_y(lat),$minYBounds,$maxYBounds,9.931506849315069, 11.467916305821335,0,0,$width,$height) as y, 
      count(*) as color FROM tweets_2017_may WHERE ((lon >= $minLon AND lon <= $maxLon) AND (lat >= $minLat AND lat <= $maxLat)) GROUP BY x, y"
    },
    {
      "name": "heatmap_statistics",
      "source": "heatmap_query",
      "transform": [
        {
          "type": "aggregate",
          "fields": ["x"],
          "ops":    [{"type": "quantile", "numQuantiles": 6}],
          "as":     ["sextilex"]
        }
      ]
    }
  ],
  "scales": [
     {
      "name": "heat_color",
      "type": "threshold",
      "domain": {"data": "heatmap_statistics", "field": "sextilex"},
      "range": [
        "rgb(0, 0, 255)", "rgb(51, 0, 204)", "rgb(102, 0, 153)", "rgb(153, 0, 102)", "rgb(204, 0, 51)", "rgb(255, 0, 0)"
      ]
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {
        "data": "heatmap_query"
      },
      "properties": {
        "shape": "hexagon-horiz",
        "xc": {
          "field": "x"
        },
        "yc": {
          "field": "y"
        },
        "width": 9.931506849315069,
        "height": 11.467916305821335,
        "fillColor": {
          "scale": "heat_color",
          "field": "x"
        }
      }
    }
  ]
}

For information about syntax and requirements for the source and transform properties, see the property.

For more information about the aggregate functions used in these examples, see in the Vega reference.

For more information about quantiles, see: .

OmniSci Vega Editor
Try Vega
Data
transforms
Data
https://en.wikipedia.org/wiki/Quantile