All pages
Powered by GitBook
1 of 9

Vega Tutorials

These tutorials introduce you to common Vega specification patterns so you can start creating visualizations quickly and easily. Each tutorial uses a code example that demonstrates a particular Vega feature or pattern. The Getting Started with Vega tutorial covers basic Vega concepts and serves as the foundation for some of the tutorials that follow, and introduces you to the API for communication with the backend. Other tutorials provide more in-depth information about specific Vega implementations.

Use these tutorials to gain a better understanding of Vega by experimenting with them to create new visualizations on your own HEAVY.AI system and database. You can also Try Vega to make adjustments to Vega code and see real-time changes in charts.

For information about the Vega specification syntax and properties, see Vega Reference.

Tutorial Framework

Because the tutorials focus on the Vega specification, they use a simple client implementation that sends the render request to the HEAVY.AI server and handles the response:

Common index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>HEAVY.AI</title>
    <meta charset="UTF-8">
  </head>
  <body>
    <script src="js/browser-connector.js"></script>
    <script src="js/vegaspec.js"></script>
    <script src="js/vegademo.js"></script>

    <script>
    document.addEventListener('DOMContentLoaded', init, false);
    </script>
  </body>
</html>

Common vegademo.js

function init() {
    var conn = new MapdCon()
        .protocol("http")
        .host("my.host.com")
        .port("6273")
        .dbName("heavyai")
        .user("heavyai")
        .password("changeme")
        .connect(function(error, con) {
          con.renderVega(1, JSON.stringify(exampleVega), vegaOptions, function(error, result) {
            if (error) {
              console.log(error.message);
            }
            else {
              var blobUrl = `data:image/png;base64,${result.image}`
              var body = document.querySelector('body')
              var vegaImg = new Image()
              vegaImg.src = blobUrl
              body.append(vegaImg)
            }
          });
        });
}

The renderVega() function sends the exampleVega JSON structure described in the tutorials. Getting Started with Vega covers Vega library dependencies and the renderVega() function in more detail.

Finding the Cause of Errors

On a connection error, you can view the error message to determine the cause of the error. To determine the cause of Vega specification errors, catch and handle the renderVega() exception.

Available Tutorials

  • Vega at a Glance - Provides an overiew of Vega and a simple example to visualize tweets.

  • Getting Started with Vega - Maps a continuous, quantitative input domain to a continuous output range. Uses the same visualization as Vega at a Glance, but elaborates on the runtime environment and implementation steps.

  • Getting More from Your Data - Builds on the Getting Started with Vega tutorial by color-coding tweets according to language.

  • Creating More Advanced Charts - Introduces Symbol Type marks by creating a heatmap visualization of political donations.

  • Using the Poly Marks Type - Shows how to use the Polys Type marks, which uses an implicit polygon data table format. The visualization in the tutorial is a map of zip codes, color-coded according to average political contribution amount.

  • Vega Accumulator - Describes the three modes of accumulation rendering and provides some implementation examples. The data used contains information about political donations, including party affiliation, the amount of the donation, and location of the donor.

  • Using Transform Aggregation - Shows how to create Vega-based visualizations with render properties that are driven by aggregated statistics. Use Vega transform aggregation and formula expressions to automate the process of gathering statistical information about a rendered query.

  • Improving Rendering with SQL Extensions - Describes how to use SQL extension functions in Vega to map meters to pixels and improve map rendering.

Vega at a Glance

Source code is located at the end of this topic.

This tutorial provides an overiew of Vega and a simple example to visualize tweets in the EMEA geographic region:

Defining the Source Data

The Vega JSON structure maps data to geometric primitives.

A first task is to specify the data source. You can either define data statically or use a SQL query. This examples uses a SQL query to get tweet geolocation information from a tweets database:

SELECT goog_x as x, goog_y as y, tweets_nov_feb.rowid FROM tweets_nov_feb

The resulting SQL columns can be referenced in other parts of the specification to drive visualization elements. In this example, the projected columns are goog_x and goog_y, which are renamed x and y, and rowid, which is a requirement for hit-testing.

Creating a Visualization using Vega

The Vega specification for this example includes the following top-level properties:

  • height and width, which define the height and width of the visualization area.

  • data, which defines the data source. The SQL data described above is defined here with the label tweets for later referencing.

  • marks, which describes the geometric primitives used to render the visualization.

  • scales, which are referenced by marks to map input domain values to appropriate output range values.

Here is the full Vega specification used in this example:

const exampleVega = {
  "width": 384,
  "height": 564,
  "data": [
    {
      "name": "tweets",
      "sql": "SELECT goog_x as x, goog_y as y, tweets_nov_feb.rowid FROM tweets_nov_feb"
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": [
        -3650484.1235206556,
        7413325.514451755
      ],
      "range": "width"
    },
    {
      "name": "y",
      "type": "linear",
      "domain": [
        -5778161.9183506705,
        10471808.487466192
      ],
      "range": "height"
    }
  ],
  "marks": [
    {
      "type": "points",
      "from": {
        "data": "tweets"
      },
      "properties": {
        "x": {
          "scale": "x",
          "field": "x"
        },
        "y": {
          "scale": "y",
          "field": "y"
        },
        "fillColor": "blue",
        "size": {
          "value": 3
        }
      }
    }
  ]
};

The following sections describe the top-level Vega specification properties.

Define the Visualization Area Dimensions

The width and height properties define a visualization area 384 pixels wide and 564 pixels high:

"width": 384
"height": 564

The scales position encoding properties map the marks into this visualization area.

Define the Marks

The marks property defines visualization geometric primitives. The OmniSci Vega implementation defines the following primitive types:

  • lines A line

  • points A point

  • polys A polygon

  • symbol A geometric symbol, such as a circle or square

Each primitive type has a set of properties that describe how the primitive is positioned and styled.

This example uses points to represent the tweets data:

"marks": [
  {
    "type": "points",
    "from": {
      "data": "tweets"
    },
    "properties": {
      "x": {
        "scale": "x",
        "field": "x"
      },
      "y": {
        "scale": "y",
        "field": "y"
      },
      "fillColor": "blue",
      "size": {
        "value": 3
      }
    }
  }
]

Points support the following properties; not all are included in the example:

  • x The x position of the point in pixels.

  • y The y position of the point in pixels.

  • z The depth coordinate of the point in pixels.

  • fillColor The color of the point.

  • fillOpacity The opacity of the fill, from transparent (0) to opaque (1).

  • opacity The opacity of the point as a whole, from transparent (0) to opaque (1).

  • size The diameter of the point in pixels.

The points in the example reference the tweets SQL data and use the x and y columns from the SQL to drive the position of the points. The positions are appropriately mapped to the visualization area using scales as described in Scale Input Domain to Output Range. The fill color is set to blue and point size is set to three pixels.

Scale Input Domain to Output Range

The scales definition maps data domain values to visual range values, where the domain property determines the input domain for the scale. See the d3-scale reference for background information about how scaling works.

This example uses linear scales to map mercator-projected coordinates into pixel coordinates for rendering.

"scales": [
  {
    "name": "x",
    "type": "linear",
    "domain": [
      -3650484.1235206556,
      7413325.514451755
    ],
    "range": "width"
  },
  {
    "name": "y",
    "type": "linear",
    "domain": [
      -5778161.9183506705,
      10471808.487466192
    ],
    "range": "height"
  },
]

The x and y scales use linear interpolation to map point x- and y-coordinates to the width and height of the viewing area. The width and height properties are predefined keywords that equate to the range [0, <current width>] and [0, <current height>].

After completing the Vega specification, you send the JSON structure to the backend for rendering.

Connecting to the Server and Rendering the Visualization

The following steps summarize the rendering and visualization sequence:

  1. Instantiate the MapdCon object for connecting to the backend.

  2. Call the connect method with server information, user credentials, and data table name.

  3. Provide the renderVega() callback function to connect() and include the Vega specification as a parameter.

  4. Display the returned PNG image in you client browser window.

OmniSci uses Apache Thrift for cross-language client communication with the backend. Include the browser-connector.js, connector API, which includes Thrift interface libraries and the renderVega() function:

<script src="js/browser-connector.js"></script>

The following example encapsulates the connect, render request, and response handling sequence:

var vegaOptions = {}
var connector = new MapdCon()
  .protocol("http")
  .host("my.host.com")
  .port("6273")
  .dbName("omnisci")
  .user("omnisci")
  .password("changeme")
  .connect(function(error, con) {
    con.renderVega(1, JSON.stringify(exampleVega), vegaOptions, function(error, result) {
      if (error) {
        console.log(error.message);
      }
      else {
        var blobUrl = `data:image/png;base64,${result.image}`
        var body = document.querySelector('body')
        var vegaImg = new Image()
        vegaImg.src = blobUrl
        body.append(vegaImg)
      }
    });
  });

Next Steps

This example demonstrated the basic concepts for understanding and using Vega. To become comfortable with Vega, try this example using your own OmniSci instance, changing the MapdCon() parameters according to match your host environment and database.

As you gain experience with Vega and begin writing your own applications, see the Reference for detailed information about Vega code.

Source Code

HTML

Vega at a Glance index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>OmniSci</title>
    <meta charset="UTF-8">
    <style>
      .title {
        font-weight: bold;
        text-align:center;
      }
      .omnisci {
        position: relative;
        top: 2px;
      }
      .search{
        displa"y": inline-block;
        margin-top: 12px;
        margin-left: 50px;
      }
      .data-count {
        padding-right:20px;
      }
      .filter-count {
        font-weight: bold;
        color:  #45B1E8;
      }
    </style>
  </head>
  <body>

    <script src="js/browser-connector.js"></script>

    <script>

    function init() {
      const exampleVega = {
        "width": 384,
        "height": 564,
        "data": [
          {
            "name": "tweets",
            "sql": "SELECT goog_x as x, goog_y as y, tweets_data_table.rowid FROM tweets_data_table"
          }
        ],
        "scales": [
          {
            "name": "x",
            "type": "linear",
            "domain": [
              -3650484.1235206556,
              7413325.514451755
            ],
            "range": "width"
          },
          {
            "name": "y",
            "type": "linear",
            "domain": [
              -5778161.9183506705,
              10471808.487466192
            ],
            "range": "height"
          }
        ],
        "marks": [
          {
            "type": "points",
            "from": {
              "data": "tweets"
            },
            "properties": {
              "x": {
                "scale": "x",
                "field": "x"
              },
              "y": {
                "scale": "y",
                "field": "y"
              },
              "fillColor": "blue",
              "size": {
                "value": 3
              }
            }
          }
        ]
      };

       var vegaOptions = {}
       var connector = new MapdCon()
         .protocol("http")
         .host("my.host.com")
         .port("6273")
         .dbName("omnisci")
         .user("omnisci")
         .password("changeme")
         .connect(function(error, con) {
           con.renderVega(1, JSON.stringify(exampleVega), vegaOptions, function(error, result) {
             if (error) {
               console.log(error.message);
             }
             else {
               var blobUrl = `data:image/png;base64,${result.image}`
               var body = document.querySelector('body')
               var vegaImg = new Image()
               vegaImg.src = blobUrl
               body.append(vegaImg)
             }
           });
         });
    }

    document.addEventListener('DOMContentLoaded', init, false);
    </script>
  </body>
</html>

Getting Started with Vega

Source code is located at the end of the tutorial.

This tutorial uses the same visualization as Vega at a Glance but elaborates on the runtime environment and implementation steps. The Vega usage pattern described here applies to all Vega implementations. Subsequent tutorials differ only in describing more advanced Vega features.

This visualization maps a continuous, quantitative input domain to a continuous output range. Again, the visualization shows tweets in the EMEA region, from a tweets data table:

Backend rendering using Vega involves the following steps:

  1. Create the Vega Specification

  2. Connect to the Backend

  3. Make the Render Request and Handle the Response

You can create the Vega specification statically, as shown in this tutorial, or programmatically. See the Poly Map with Backend Rendering charting example for a programmatic implementation. Here is the programmatic source code:

  • example3.html

  • example3.js

Step 1 - Create the Vega Specification

A Vega JSON specification has the following general structure:

const exampleVega = {
  width: <numeric>,
  height: <numeric>,
  data: [ ... ],
  scales: [ ... ],
  marks: [ ... ]
};

Specify the Visualization Area

The width and height properties define the width and height of your visualization area, in pixels:

const exampleVega = {
  width: 384,
  height: 564,
  data: [ ... ],
  scales: [ ... ],
  marks: [ ... ]
};

Specify the Data Source

This example uses the following SQL statement to get the tweets data:

data: [
    {
        "name": "tweets",
        "sql": "SELECT goog_x as x, goog_y as y, tweets_nov_feb.rowid FROM tweets_nov_feb"
    }
]

The input data are the latitude and longitude coordinates of tweets from the tweets_nov_feb data table. The coordinates are labeled x and y for Field Reference in the marks property, which references the data using the tweets name.

Specify the Graphical Properties of the Rendered Data Item

The marks property specifies the graphical attributes of how each data item is rendered:

marks: [
    {
        type: "points",
        from: {
            data: "tweets"
        },
        properties: {
            x: {
                scale: "x",
                field: "x"
            },
            y: {
                scale: "y",
                field: "y"
            },
            "fillColor": "blue",
            size: {
                value: 3
            }
        }
    }
]

In this example, each data item from the tweets data table is rendered as a point. The points marks type includes position, fill color, and size attributes. The marks property specifies how to visually encode points according to these attributes. Points in this example are three pixels in diameter and colored blue.

Points are scaled to the visualization area using the scales property.

Specify How Input Data are Scaled to the Visualization Area

The following scales specification maps marks to the visualization area.

scales: [
    {
        name: "x",
        type: "linear",
        domain: [
            -3650484.1235206556,
            7413325.514451755
        ],
        range: "width"
      },
      {
        name: "y",
        type: "linear",
        domain: [
            -5778161.9183506705,
            10471808.487466192
        ],
        range: "height"
    }
]

Both x and y scales specify a linear mapping of the continuous, quantitative input domain to a continuous output range. In this example, input data values are transformed to predefined width and height range values.

Later tutorials show how to specify data transformation using discrete domain-to-range mapping.

Step 2 - Connect to the Backend

Use the browser-connector.js renderVega() API to communicate with the backend. The connector is layered on Apache Thrift for cross-language client communication with the server.

Follow these steps to instantiate the connector and to connect to the backend:

  1. Include browser-connector.js located at https://github.com/omnisci/mapd-connector/tree/master/dist to include the MapD connector and Thrift interface APIs.

    <script src="<localJSdir>/browser-connector.js"></script>
  2. Instantiate the MapdCon() connector and set the server name, protocol information, and your authentication credentials, as described in the MapD Connector API:

    var vegaOptions = {}
    var connector = new MapdCon()
      .protocol("http")
      .host("my.host.com")
      .port("6273")
      .dbName("omnisci")
      .user("omnisci")
      .password("HyperInteractive")

    Property

    Description

    dbName

    OmniSci database name.

    host

    OmniSci web server name.

    password

    OmniSci user password.

    port

    OmniSci web server port

    protocol

    Communication protocol: http, https

    user

    OmniSci user name.

  3. Finally, call the MapD connector API connect() function to initiate a connect request, passing a callback function with a (error, success) signature as the parameter.

For example,

.connect(function(error, con) { ... });

The connect() function generates client and session IDs for this connection instance, which are unique for each instance and are used in subsequent API calls for the session.

On a successful connection, the callback function is called. The callback function in this example calls the renderVega() function.

Step 3 - Make the Render Request and Handle the Response

The MapD connector API renderVega() function sends the Vega JSON to the backend, and has the following parameters:

.connect(function(error, con) {
  con.renderVega(1, JSON.stringify(exampleVega), vegaOptions, function(error, result) {
    if (error)
      console.log(error.message);
    else {
      var blobUrl = `data:image/png;base64,${result.image}`
      var body = document.querySelector('body')
      var vegaImg = new Image()
      vegaImg.src = blobUrl
      body.append(vegaImg)
    }
  });
});

Parameter

Type

Required

Description

widgetid

number

X

Calling widget ID.

vega

string

X

Vega JSON object, as described in Step 1 - Create the Vega Specification.

options

number

Render query options.

compressionLevel:PNG compression level. 1 (low, fast) to 10 (high, slow). Default = 3

callback

function

Callback function with (error, success) signature.

Return

Description

Base64 image

PNG image rendered on server

The backend returns the rendered Base64 image in results.image, which you can display in the browser window using a data URI.

Source Code

Getting Started Directory Structure

 index.html
 /js
   browser-connector.js
   vegaspec.js
   vegademo.js

HTML

Getting Started index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>OmniSci</title>
    <meta charset="UTF-8">
  </head>
  <body>
    <script src="js/browser-connector.js"></script>
    <script src="js/vegaspec.js"></script>
    <script src="js/vegademo.js"></script>

    <script>
    document.addEventListener('DOMContentLoaded', init, false);
    </script>
  </body>
</html>

JavaScript

Getting Started vegademo.js

function init() {
  var vegaOptions = {}
  var connector = new MapdCon()
    .protocol("http")
    .host("my.host.com")
    .port("6273")
    .dbName("omnisci")
    .user("omnisci")
    .password("changeme")
    .connect(function(error, con) {
      con.renderVega(1, JSON.stringify(exampleVega), vegaOptions, function(error, result) {
        if (error) {
          console.log(error.message);
        }
        else {
          var blobUrl = `data:image/png;base64,${result.image}`
          var body = document.querySelector('body')
          var vegaImg = new Image()
          vegaImg.src = blobUrl
          body.append(vegaImg)
        }
      });
    });
}

Getting Started vegaspec.js

const exampleVega = {
  "width": 384,
  "height": 564,
  "data": [
    {
      "name": "tweets",
      "sql": "SELECT goog_x as x, goog_y as y, tweets_data_table.rowid FROM tweets_data_table"
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": [
        -3650484.1235206556,
        7413325.514451755
      ],
      "range": "width"
    },
    {
      "name": "y",
      "type": "linear",
      "domain": [
        -5778161.9183506705,
        10471808.487466192
      ],
      "range": "height"
    }
  ],
  "marks": [
    {
      "type": "points",
      "from": {
        "data": "tweets"
      },
      "properties": {
        "x": {
          "scale": "x",
          "field": "x"
        },
        "y": {
          "scale": "y",
          "field": "y"
        },
        "fillColor": "blue",
        "size": {
          "value": 3
        }
      }
    }
  ]
};

Getting More from Your Data

Source code can be found at the end of this tutorial.

This tutorial builds on the Getting Started with Vega tutorial by color-coding tweets according to language:

  • Tweets in English are blue.

  • Tweets in French are orange.

  • Tweets in Spanish are green.

  • All other tweets are light or dark gray.

To highlight language in the visualization, the example specifies the language column query in the Vega data property, and associates language with color.

"data:" [
  {
    "name": "tweets",
    "sql": "SELECT goog_x as x, goog_y as y, lang as color, tweets_nov_feb.rowid FROM tweets_nov_feb"
  }

The Scales property maps the language abbreviation string to a color value. Because we want to map discrete domain values to discrete range values, we specify a color scale with an ordinal type scale:

"scales:" [
     .
     .
     .
  {
    "name": "color",
    "type": "ordinal",
    "domain": ["en",  "es", "fr"],
    "range": ["#27aeef", "#87bc45", "#ef9b20"],
    "default": "gray",
    "nullValue": "#cacaca"
  }
]

You can specify a default color values for values not specified in range and for data items with a value of null. In this example, tweets in languages other than English, Spanish, or French are colored gray and tweets with a language value of null are colored light gray (#cacaca).

Similar to using x and y scales to map Marks property x and y fields to the visualization area, you can scale the fillColor property to the visualization area.

In previous examples the fill color of points representing tweets was statically specified as blue:

"marks:" [
  {
    "type:" "points",
    "from:" {
      "data:" "tweets"
    },
    "properties:" {
         .
         .
         .
      },
      "fillColor": "blue",
      "size:" {"value:" 3}
    }
  }
]

This example, uses Value Reference to specify the fill color:

"marks:" [
  {
    "type:" "points",
    "from:" {
      "data:" "tweets"
    },
    "properties:" {
         .
         .
         .
      },
      "fillColor:" {
        "scale": "color",
        "field": "color"
      },
      "size:" 3
    }
  }
]

The fillColor references the color scale and performs a lookup on the current language value, from the color data table field.

Source Code

Getting More Insight Tutorial Directory Structure

index.html
/js
  browser-connector.js
  vegaspec.js
  vegademo.js

HTML

Getting More Insight Tutorial index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>OmniSci</title>
    <meta charset="UTF-8">
  </head>
  <body>
    <script src="js/browser-connector.js"></script>
    <script src="js/vegaspec.js"></script>
    <script src="js/vegademo.js"></script>

    <script>
    document.addEventListener('DOMContentLoaded', init, false);
    </script>
  </body>
</html>

JavaScript

Getting More Insight Tutorial vegademo.js

function init() {
  var vegaOptions = {}
  var connector = new MapdCon()
    .protocol("http")
    .host("my.host.com")
    .port("6273")
    .dbName("omnisci")
    .user("omnisci")
    .password("changeme")
    .connect(function(error, con) {
      con.renderVega(1, JSON.stringify(exampleVega), vegaOptions, function(error, result) {
        if (error) {
          console.log(error.message);
        }
        else {
          var blobUrl = `data:image/png;base64,${result.image}`
          var body = document.querySelector('body')
          var vegaImg = new Image()
          vegaImg.src = blobUrl
          body.append(vegaImg)
        }
      });
    });
}

Getting More Insight Tutorial vegaspec.js

const exampleVega = {
  "width": 384,
  "height": 564,
  "data": [
    {
      "name": "tweets",
      "sql": "SELECT goog_x as x, goog_y as y, lang as color, tweets_data_table.rowid FROM tweets_data_table"
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": [
        -3650484.1235206556,
        7413325.514451755
      ],
      "range": "width"
    },
    {
      "name": "y",
      "type": "linear",
      "domain": [
        -5778161.9183506705,
        10471808.487466192
      ],
      "range": "height"
    },
    {
      "name": "color",
      "type": "ordinal",
      "domain": ["en",  "es", "fr"],
      "range": ["#27aeef", "#87bc45", "#ef9b20"],
      "default": "gray",
      "nullValue": "#cacaca"
    }
  ],
  "marks": [
    {
      "type": "points",
      "from": {
        "data": "tweets"
      },
      "properties": {
        "x": {
          "scale": "x",
          "field": "x"
        },
        "y": {
          "scale": "y",
          "field": "y"
        },
        "fillColor": {
          "scale": "color",
          "field": "color"
        },
        "size": 3
      }
    }
  ]
};

Creating More Advanced Charts

Source code is located at the end of this tutorial.

This tutorial introduces you to Symbol Type marks by creating a heatmap visualization. The heatmap shows contribution level to the Republican party within the continental United States:

The contribution data are obtained using the following SQL query:

"data": [
  {
   "name": "heatmap_query",
   "sql": "SELECT rect_pixel_bin(conv_4326_900913_x(lon), -13847031.457875465, -7451726.712679257, 733, 733) as x,
           rect_pixel_bin(conv_4326_900913_y(lat), 2346114.147993467, 6970277.197053557, 530, 530) as y,
           SUM(amount) as cnt
           FROM contributions
           WHERE (lon >= -124.39000000000038 AND lon <= -66.93999999999943) AND
               (lat >= 20.61570573311549 AND lat <= 52.93117449504004) AND
               amount > 0 AND
               recipient_party = 'R'
           GROUP BY x, y"
  }
]

The visualization uses a Symbol Type marks type to represent each data item in the heatmap_query data table:

"marks": [
    {
        "type": "symbol",
        "from": {
            "data": "heatmap_query"
        },
        "properties": { ... elided ... }
    }
]

The marks properties property specifies the symbol shape, which is a square. Each square has a pixel width and height of one pixel.

"marks": [
    {
        ... elided ...

        "properties": {
            "shape": "square",
            "x": {
                "field": "x"
            },
            "y": {
                "field": "y"
            },
            "width": 1,
            "height": 1,
            "fillColor": {
                "scale": "heat_color",
                "field": "cnt"
            }
        }
    }
]

Notice that the data x and y location values do not reference a scale. The location values are the values of the SQL query, transformed using extension functions.

The fill color of the square uses the heat_color scale to determine the color used to represent the data item.

Quantize scales are similar to linear scales, except they use a discrete rather than continuous range. The continuous input domain is divided into uniform segments based on the number of values in the output range.

"scales": [
    {
        "name": "heat_color",
        "type": "quantize",
        "domain": [
            10000.0,
            1000000.0
        ],
        "range": [ "#0d0887", "#2a0593", "#41049d", "#5601a4", "#6a00a8",
                   "#7e03a8", "#8f0da4", "#a11b9b", "#b12a90", "#bf3984",
                   "#cb4679", "#d6556d", "#e16462", "#ea7457", "#f2844b",
                   "#f89540", "#fca636", "#feba2c", "#fcce25", "#f7e425", "#f0f921"
        ],
        "default": "#0d0887",
        "nullValue": "#0d0887"
    }
]

A heatmap shows a continuous input domain divided into uniform segments based on the number of values in the output range. This is a quantize scales type. In the example, dollar amounts between $10,000 and $1 million are uniformly divided among 21 range values, where the larger amounts are represented by brighter colors.

Values outside the domain and null values are rendered as dark blue, #0d0887.

Source Code

Advanced Chart Type Tutorial Directory Structure

index.html
/js
  browser-connector.js
  vegaspec.js
  vegademo.js

HTML

Advanced Chart Type Tutorial index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>OmniSci</title>
    <meta charset="UTF-8">
  </head>
  <body>
    <script src="js/browser-connector.js"></script>
    <script src="js/vegaspec.js"></script>
    <script src="js/vegademo.js"></script>

    <script>
    document.addEventListener('DOMContentLoaded', init, false);
    </script>
  </body>
</html>

JavaScript

Advanced Chart Type Tutorial vegademo.js

function init() {
  var vegaOptions = {}
  var connector = new MapdCon()
    .protocol("http")
    .host("my.host.com")
    .port("6273")
    .dbName("omnisci")
    .user("omnisci")
    .password("changeme")
    .connect(function(error, con) {
      con.renderVega(1, JSON.stringify(exampleVega), vegaOptions, function(error, result) {
        if (error) {
          console.log(error.message);
        }
        else {
          var blobUrl = `data:image/png;base64,${result.image}`
          var body = document.querySelector('body')
          var vegaImg = new Image()
          vegaImg.src = blobUrl
          body.append(vegaImg)
        }
      });
    });
}

Advanced Chart Type Tutorial vegaspec.js

const exampleVega = {
  "width": 733,
  "height": 530,
  "data": [
    {
      "name": "heatmap_query",
      "sql": "SELECT rect_pixel_bin(conv_4326_900913_x(lon), -13847031.457875465, -7451726.712679257, 733, 733) as x,
                     rect_pixel_bin(conv_4326_900913_y(lat), 2346114.147993467, 6970277.197053557, 530, 530) as y,
                     SUM(amount) as cnt
                     FROM contributions
                     WHERE (lon >= -124.39000000000038 AND lon <= -66.93999999999943) AND
                           (lat >= 20.61570573311549 AND lat <= 52.93117449504004) AND
                           amount > 0 AND
                           recipient_party = 'R'
                           GROUP BY x, y"
    }
  ],
  "scales": [
    {
      "name": "heat_color",
      "type": "quantize",
      "domain": [
        10000.0,
        1000000.0
      ],
      "range": [ "#0d0887", "#2a0593", "#41049d", "#5601a4", "#6a00a8",
                 "#7e03a8", "#8f0da4", "#a11b9b", "#b12a90", "#bf3984",
                 "#cb4679", "#d6556d", "#e16462", "#ea7457", "#f2844b",
                 "#f89540", "#fca636", "#feba2c", "#fcce25", "#f7e425", "#f0f921"
      ],
      "default": "#0d0887",
      "nullValue": "#0d0887"
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {
        "data": "heatmap_query"
      },
      "properties": {
        "shape": "square",
        "x": {
          "field": "x"
        },
        "y": {
          "field": "y"
        },
        "width": 1,
        "height": 1,
        "fillColor": {
          "scale": "heat_color",
          "field": "cnt"
        }
      }
    }
  ]
};

Using Polys Marks Type

Source code can be found at the end of this tutorial.

This tutorial introduces you to the Polys Type marks, which uses an implicit polygon data table format. The visualization is a map of zip codes color-coded according to average contribution amount. The data table encodes polygons representing zip code areas.

See the Poly Map with Backend Rendering charting example for a programmatic rendering of this visualization.

The following data property extracts the average contribution amount from the contributions_donotmodify data table, omitting rows that do not have a contribution amount:

"data": [
  {
    "name": "polys",
    "format": "polys",
    "sql": "SELECT zipcodes.rowid,AVG(contributions_donotmodify.amount) AS avgContrib FROM contributions_donotmodify,zipcodes WHERE contributions_donotmodify.amount IS NOT NULL AND contributions_donotmodify.contributor_zipcode = zipcodes.ZCTA5CE10 GROUP BY zipcodes.rowid ORDER BY avgContrib DESC"
  }
]

When working with polygon data, the "format": "polys" property must be specified.

The scales specification scales x values to the visualization area width and y values to the height. A color scale, polys_fillColor is also specified that linearly scales nine contribution amount ranges to nine colors:

"scales": [
  {
    "name": "x",
    "type": "linear",
    "domain": [-19646150.75527339, 19646150.755273417],
    "range": "width"
  },
  {
    "name": "y",
    "type": "linear",
    "domain": [-3071257.58106188, 10078357.267122284],
    "range": "height"
  },
  {
    "name": "polys_fillColor",
    "type": "linear",
    "domain": [0, 325, 650, 975, 1300, 1625, 1950, 2275, 2600],
    "range": ["#115f9a", "#1984c5", "#22a7f0", "#48b5c4", "#76c68f", "#a6d75b", "#c9e52f", "#d0ee11", "#d0f400"],
    "default": "green"
  }
]

Zip code areas for which average contribution amounts are not specified by the domain are color-coded green.

The marks property specifies visually encoding the data from the polys data table as polygons:

"marks": [
  {
    "type": "polys",
    "from": { "data": "polys" },

       ... elided ...

    }
  }
]

Polygon x and y vertex locations are transformed to the visualization area using the x and y scales.

"marks": [
  {
       ... elided ...

    "properties": {
      "x": {
        "scale": "x",
        "field": "x"
      },
      "y": {
        "scale": "y",
        "field": "y"
      },
       ... elided ...
    }
  }
]

The x and y polygon vertex locations are implicitly encoded in the data table as described in Polys Type.

Polygon fill color color-codes the average contribution amount, avgContrib, linearly scaled by the polys_fillColor scale:

"marks": [
  {
       ... elided ...

    "properties": {

       ... elided ...

      "fillColor": {
        "scale": "polys_fillColor",
        "field": "avgContrib"
      },

       ... elided ...

    }
  }
]

Finally, the marks property specifies the polygon border width and color, and line join constraints:

"marks": [
  {
       ... elided ...

    "properties": {

       ... elided ...

      "strokeColor": "white",
      "strokeWidth": 0,
      "lineJoin": "miter",
      "miterLimit": 10
    }
  }
]

Source Code

Working with Polys Tutorial Directory Structure

index.html
/js
  browser-connector.js
  vegaspec.js
  vegademo.js

HTML

Working with Polys Tutorial index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>OmniSci</title>
    <meta charset="UTF-8">
  </head>
  <body>
    <script src="js/browser-connector.js"></script>
    <script src="js/vegaspec.js"></script>
    <script src="js/vegademo.js"></script>

    <script>
    document.addEventListener('DOMContentLoaded', init, false);
    </script>
  </body>
</html>

JavaScript

Working with Polys Tutorial vegademo.js

function init() {
  var vegaOptions = {}
  var connector = new MapdCon()
    .protocol("http")
    .host("my.host.com")
    .port("6273")
    .dbName("omnisci")
    .user("omnisci")
    .password("changeme")
    .connect(function(error, con) {
      con.renderVega(1, JSON.stringify(exampleVega), vegaOptions, function(error, result) {
        if (error) {
          console.log(error.message);
        }
        else {
          var blobUrl = `data:image/png;base64,${result.image}`
          var body = document.querySelector('body')
          var vegaImg = new Image()
          vegaImg.src = blobUrl
          body.append(vegaImg)
        }
      });
    });
}

Working with Polys Tutorial vegaspec.js

const exampleVega = {
  "width": 1004,
  "height": 336,
  "data": [
    {
      "name": "polys",
      "format": "polys",
      "sql": "SELECT zipcodes.rowid,AVG(contributions_donotmodify.amount) AS avgContrib FROM contributions_donotmodify,zipcodes WHERE contributions_donotmodify.amount IS NOT NULL AND contributions_donotmodify.contributor_zipcode = zipcodes.ZCTA5CE10 GROUP BY zipcodes.rowid ORDER BY avgContrib DESC"
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": [
        -19646150.75527339,
        19646150.755273417
      ],
      "range": "width"
    },
    {
      "name": "y",
      "type": "linear",
      "domain": [
        -3071257.58106188,
        10078357.267122284
      ],
      "range": "height"
    },
    {
      "name": "polys_fillColor",
      "type": "linear",
      "domain": [
        0, 325, 650, 975,
        1300, 1625, 1950, 2275, 2600
      ],
      "range": [
        "#115f9a", "#1984c5", "#22a7f0", "#48b5c4",
        "#76c68f", "#a6d75b", "#c9e52f", "#d0ee11", "#d0f400"
      ],
      "default": "green",
      "nullValue": "#CACACA"
    }
  ],
  "marks&quoquot;: [
    {
      "type": "polys",
      "from": {
        "data": "polys"
      },
      "properties": {
        "x": {
          "scale": "x",
          "field": "x"
        },
        "y": {
          "scale": "y",
          "field": "y"
        },
        "fillColor": {
          "scale": "polys_fillColor",
          "field": "avgContrib"
        },
        "strokeColor": "white",
        "strokeWidth": 0,
        "lineJoin": "miter",
        "miterLimit": 10
      }
    }
  ]
};

Vega Accumulator

Accumulation works by aggregating data per pixel during a backend render. Data is accumulated for every pixel for every shape rendered. Accumulation rendering is activated through color scales – scales that have colors defined for their range.

Note: Currently, only the COUNT aggregation function is supported.

This topic describes accumulation rendering and provides some implementation examples. The data source used here – a table called contributions – contains information about political donations made in the New York City area, including party affiliation, the amount of the donation, and location of the donor.

There are three accumulation modes:

  • Density mode

  • Blend mode

  • Percentage mode

Density Mode

Density accumulation performs a count aggregation by pixel. It allows you to color a pixel by normalizing the count and applying a color to it, based on a color scale. In Heavy Immerse, if you open or create a Pointmap chart, you can toggle density accumulation on and off by using the Density Gradient attribute. For more information, see Pointmap.

Note: Blend and percentage accumulation are not currently available in Heavy Immerse.

The density mode examples use the following base code:

{
  "width": 714,
  "height": 535,
  "data": [
        {
          "name": "table",
          "sql": "SELECT conv_4326_900913_x(lon) as x,  conv_4326_900913_y(lat) as y,amount,rowid FROM contributions WHERE (conv_4326_900913_x(lon) between -8274701.640628147 and -8192178.083370286) AND (conv_4326_900913_y(lat) between 4946220.843530051 and 5008055.72186748) LIMIT 2000000",
          "dbTableName": "contributions"
        }
  ],
  "scales": [
        {
          "name": "x",
          "type": "linear",
          "domain": [-8274701.640628147,-8192178.083370286],
          "range": "width"
        },
        {
          "name": "y",
          "type": "linear",
          "domain": [4946220.843530051,5008055.72186748],
          "range": "height"
        },
        {
          "name": "pointcolor",
          "type": "linear",
          "domain": [100,10000],
          "range": ["blue","red"],
          "clamp": true
        }
  ],
  "marks": [
        {
          "type": "points",
          "from": {"data": "table"},
          "properties": {
                "x": {"scale": "x","field": "x"},
                "y": {"scale": "y","field": "y"},
                "fillColor": {"scale": "pointcolor","field": "amount"},
                "size": {"value": 2}
          }
        }
  ]
}

This code generates the following image:

All points are rendered with a size of 2 and colored according to the contribution amount:

  • $100 or less is colored blue.

  • $10,000 or more is colored red.

  • Anything in between is colored somewhere between blue and red, depending on the contribution. Amounts closer to $100 are more blue, and amounts closer to $10,000 are more red.

The examples that follow adjust the pointcolor scale and show the effects of various adjustments. Any changes made to Vega code are isolated to that scale definition.

Density accumulation can be activated for any scale that takes as input a continuous domain (linear, sqrt, pow, log, and threshold scales) and outputs a color range. In the following code snippet, the density accumulator has been added to the linear pointcolor scale:

{
  "name": "pointcolor",
  "type": "linear",
  "domain": [0.0,1.0],
  "range": ["blue","red"],
  "clamp": true,
  "accumulator": "density",
  "minDensityCnt": 1,
  "maxDensityCnt": 100
}

The final color at a pixel is determined by normalizing the per-pixel aggregated counts and using that value in the scale function to calculate a color. The domains of density accumulation scales should be values between 0 and 1 inclusive, referring to the normalized values between 0 and 1. The normalization is performed according to the minDensityCnt and maxDensityCnt properties. After normalization, minDensityCnt refers to 0 and maxDensityCnt refers to 1 in the domain. In this case, 0 in the domain equates to a per-pixel count of 1, and 1 in the domain equates to a per-pixel count of 100.

minDensityCnt and maxDensityCnt are required properties. They can have explicit integer values, or they can use keywords that automatically compute statistical information about the per-pixel counts. Currently available keywords are:

  • min

  • max

  • 1stStdDev

  • 2ndStdDev

  • -1stStdDev

  • -2ndStdDev

If you change the color scale to the following:

{
  "name": "pointcolor",
  "type": "linear",
  "domain": [0.01,1.0],
  "range": ["blue","red"],
  "clamp": true,
  "accumulator": "density",
  "minDensityCnt": "min",
  "maxDensityCnt": "max"
}

The minimum aggregated count of all the pixels is used as the minDensityCnt, and the maximum aggregated count used as the maxDensityCnt. This results in the following:

Notice that the area with the most overlapping points is in the upper east side of Manhattan.

Now, use +/- 2 standard deviations for your counts:

{
  "name": "pointcolor",
  "type": "linear",
  "domain": [0.01,1.0],
  "range": ["blue","red"],
  "clamp": true,
  "accumulator": "density",
  "minDensityCnt": "-2ndStdDev",
  "maxDensityCnt": "2ndStdDev"
}

This produces the following:

In this example, the scale is changed to a threshold scale, and the colors are adjusted to create a more interesting image:

{
  "name": "pointcolor",
  "type": "threshold",
  "domain": [0.111, 0.222, 0.333, 0.444, 0.555, 0.666, 0.777, 0.888],
  "range": ["rgba(17,95,154,1)", "rgba(25,132,197,1)", "rgba(34,167,240,1)", "rgba(72,181,196,1)", "rgba(118,198,143,1)", "rgba(166,215,91,1)", "rgba(201,229,47,1)", "rgba(208,238,17,1)", "rgba(208,244,0,1)"],
  "clamp": true,
  "accumulator": "density",
  "minDensityCnt": "-2ndStdDev",
  "maxDensityCnt": "2ndStdDev"
}

This results in:

Note: You can mix and match explicit values and keywords for minDensityCnt and maxDensityCnt. However, if your min value is greater than your max value, your image might look inverted.

Blend Mode

Blend accumulation works only with ordinal scales. This accumulation type blends the per-category colors set by an ordinal scale so that you can visualize which categories are more or less prevalent in a particular area.

The following Vega code colors the points according to the value in the recipient_party column:

{
  "width": $width,
  "height": $height,
  "data": [
        {
          "name": "table",
          "sql": "SELECT conv_4326_900913_x(lon) as x,  conv_4326_900913_y(lat) as y,recipient_party,rowid FROM contributions WHERE (conv_4326_900913_x(lon) between $minXBounds and $maxXBounds) AND (conv_4326_900913_y(lat) between $minYBounds and $maxYBounds) LIMIT 2000000"
        }
  ],
  "scales": [
        {
          "name": "x",
          "type": "linear",
          "domain": [
                $minXBounds,
                $maxXBounds
          ],
          "range": "width"
        },
        {
          "name": "y",
          "type": "linear",
          "domain": [
                $minYBounds,
                $maxYBounds
          ],
          "range": "height"
        },
        {
          "name": "pointcolor",
          "type": "ordinal",
          "domain": ["R", "D"],
          "range": ["red", "blue"],
          "default": "green"
        }
  ],
  "marks": [
        {
          "type": "points",
          "from": {
                "data": "table"
          },
          "properties": {
                "x": {
                  "scale": "x",
                  "field": "x"
                },
                "y": {
                  "scale": "y",
                  "field": "y"
                },
                "fillColor": {
                  "scale": "pointcolor",
                  "field": "recipient_party"
                },
                "size": {
                  "value": 5
                }
          }
        }
  ]
}

This results in the following chart:

Each point is colored according to recipient party. Values of R (republican) are colored red, D (democrat) are colored blue, and everything else is colored green.

To activate blend accumulation, add the "accumulator": "blend" property to an ordinal scale.

{
  "name": "pointcolor",
  "type": "ordinal",
  "domain": ["R", "D"],
  "range": ["red", "blue"],
  "default": "green",
  "accumulator": "blend"
}

This generates the following chart:

Activating blend accumulation shows you where one party is more dominant in a particular area. The COUNT aggregation is now being applied for each category, and the colors associated with each category are blended according to the final percentage of each category per pixel.

Note: Unlike in density mode, a field property is required in mark properties that reference blend accumulator scales.

Percentage Mode

Percentage (pct) mode can help you visualize how prevalent a specific category is based on a percantage. Any scale can be used in percentage mode, but the domain values must be between 0 and 1, where 0 is 0% and 1 is 100%.

Using the political donations database, you can determine where the recipient_party of “R” (republican) is more prevalent.

Here’s the color scale:

{
  "name": "pointcolor",
  "type": "threshold",
  "domain": [0.33, 0.66],
  "range": ["blue", "purple", "red"],
  "accumulator": "pct",
  "pctCategory": "R"
}

And the resulting image:

Using the threshold scale, anything colored blue is between 0%-33% republican, purple is 33%-66% republican, and red is 66%-100% republican.

pctCategory is a required property for percentage mode and can be numeric or a string. A string refers to a string value from a dictionary-encoded column.

You can modify the example to use a numeric value for pctCategory. First, modify the SQL in the Vega to select the contribution amount for each data point:

"SELECT conv_4326_900913_x(lon) as x, conv_4326_900913_y(lat) as y,amount,rowid FROM contributions WHERE (conv_4326_900913_x(lon) between -8274701.640628147 and -8192178.083370286) AND (conv_4326_900913_y(lat) between 4946220.843530051 and 5008055.72186748) LIMIT 2000000"

Now use the amount as the driving field for the pct accumulator scale:

"fillColor": {"scale": "pointcolor","field": "amount"},

Now, change the pct scale to the following:

{
  "name": "pointcolor",
  "type": "threshold",
  "domain": [0.33, 0.66],
  "range": ["blue", "purple", "red"],
  "accumulator": "pct",
  "pctCategory": 1000
}

This results in the following output, showing where thousand-dollar contributions are most prevalent:

You can use the pctCategoryMargin property to buffer numeric pctCategory values, so you can use a range for the numeric category.

Using Transform Aggregation

You can create Vega-based visualizations with render properties that are driven by aggregated statistics. You can use Vega transform aggregation and formula expressions to automate the process of gathering statistical information about a rendered query. By doing so, you do not have to run an SQL prequery to get the information, thereby reducing the time it takes to process and render a chart.

The following examples show how to use transforms in Vega to do the following:

  • Render a heatmap that is colored using dynamic statistics of the bins

  • Create a geo pointmap with different transform modes

NOTE: You can see Vega examples in the OmniSci Vega Editor. For more information about the OmniSci Vega engine, see Try Vega.

Rendering a Heatmap Colored by Bin Statistics

The following heatmap example demonstrates the benefits of Vega transforms for performance and reducing redundancy:

  • First, the example shows using an SQL expression to render a heatmap, as well as an additional expression to color the hexagonal bins according to the min and max of the cnt value of the aggregated bins from the query.

  • Then, you will see how to render the heatmap and color the bins directly in Vega by using source data definitions and performing aggregation transforms on that data, decreasing chart rendering time and redundancy.

SQL Queries

The following is a typical SQL query used for rendering a hexagonal heatmap:

SELECT
        reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as x,
        reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as y,
        count(*) as cnt
FROM tweets_nov_feb
WHERE ((lon >= -157 AND lon <= 157) AND (lat >= -63 AND lat <= 81))
GROUP BY x, y;

To color the hexagonal bins according to the min and max of the cnt value of the bins from the query, you need to run a prequery to gather these statistics manually. Here, this is done using a subquery SQL statement:

    SELECT
             min(cnt), max(cnt)
    FROM (SELECT
       reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as x,
       reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as y,
       count(*) as cnt
FROM tweets_nov_feb
WHERE ((lon >= -157 AND lon <= 157) AND (lat >= -63 AND lat <= 81))
GROUP BY x, y
    );

The values returned from this query can then be embedded in the Vega code to color the heatmap bins. Notice that the second query does an aggregation over the query, effectively running the query twice.

To avoid the redundancy and expense of running the query twice, you can instead specify the aggregation in Vega.

Render the Heatmap in Vega

The following Vega code renders the heatmap colored by aggregated statistics using transforms.

{
  "width": 897,
  "height": 647,
  "data": [
        {
          "name": "heatmap_query",
          "sql": "SELECT reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as x, reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),conv_4326_900913_x(-157),conv_4326_900913_x(157),conv_4326_900913_y(lat),conv_4326_900913_y(-63),conv_4326_900913_y(81),9.9667,11.5085,0,0,897,647) as y, count(*) as cnt FROM tweets_nov_feb WHERE ((lon >= -157 AND lon <= 157) AND (lat >= -63 AND lat <= 81)) GROUP BY x, y"
        },
        {
          "name": "heatmap_stats",
          "source": "heatmap_query",
          "transform": [
                {
                  "type": "aggregate",
                  "fields": ["cnt", "cnt"],
                  "ops":    ["min", "max"],
                  "as":     ["mincnt", "maxcnt"]
                }
          ]
        }
  ],
  "scales": [
        {
          "name": "heat_color",
          "type": "quantize",
          "domain": {"data": "heatmap_stats", "fields": ["mincnt", "maxcnt"]},
          "range": ["#115f9a", "#1984c5", "#22a7f0", "#48b5c4", "#76c68f",
                                "#a6d75b", "#c9e52f", "#d0ee11", "#d0f400"
          ]
        }
  ],
  "marks": [
        {
          "type": "symbol",
          "from": {
                "data": "heatmap_query"
          },
          "properties": {
                "shape": "hexagon-horiz",
                "xc": {
                  "field": "x"
                },
                "yc": {
                  "field": "y"
                },
                "width": 9.9667,
                "height": 11.5085,
                "fillColor": {
                  "scale": "heat_color",
                  "field": "cnt"
                }
          }
        }
  ]
}

The data section named heatmap_stats has a source data table defined by the "source": "heatmap_query" line:

"name": "heatmap_stats",
"source": "heatmap_query",

The "heatmap_stats" data takes as input the "heatmap_query" data, which is the data supplied by the SQL query. Use the source data type to apply intermediary steps or expressions (transforms) to the input source data.

For information about syntax and requirements for the source and transform properties, see the Data property.

To color the data according to the range of values defined by two standard deviations from the mean, edit the "heatmap_stats" section as follows to:

  • Aggregate the minimum, maximum, average, and sampled standard deviation of the count column.

  • Use formula expressions to calculate values that are two standard deviations from the average.

{
   "name": "heatmap_stats",
   "source": "heatmap_query",
   "transform": [
     {
       "type": "aggregate",
       "fields": ["cnt", "cnt", "cnt", "cnt"],
       "ops":    ["min", "max", "avg", "stddev"],
       "as":     ["mincnt", "maxcnt", "avgcnt", "stdcnt"]
     },
     {
       "type": "formula",
       "expr": "max(mincnt, avgcnt-2*stdcnt)",
       "as": "mincnttouse"
     },
     {
       "type": "formula",
       "expr": "min(maxcnt, avgcnt+2*stdcnt)",
       "as": "maxcnttouse"
     }
   ]
 }

Then, reference these values in the scale domain:

{
   "name": "heat_color",
   "type": "quantize",
   "domain": {"data": "heatmap_stats", "fields": ["mincnttouse", "maxcnttouse"]},
   "range": ["#115f9a", "#1984c5", "#22a7f0", "#48b5c4", "#76c68f",
             "#a6d75b", "#c9e52f", "#d0ee11", "#d0f400"
   ]
 }

Performing these calculations in Vega improves performance because the SQL query is only run once and the aggregated statistics are done “on the fly.” Because the query is not repeated in a statistical prequery step, you can reduce the full render time by half by performing the statistics step in Vega at render time.

Creating a Geo Pointmap Using Transforms to Drive Color and Size

This section shows how to use Vega tranforms to drive the color and size of points in a geo pointmap. Specifically, it show examples using the following aggregation transforms:

distinct: An array of distinct values from an input data column.

median: The median of an input data column.

quantile: An array of quantile separators; operates on numeric columns and takes the following pameters:

  • numQuantiles: The number of contiguous intervals to create; returns the separators for the intervals. The number of separators equals numQuantiles - 1.

  • includeExtrema: Whether to include min and max values (extrema) in the resulting separator array. The size of the resulting separator array will be numQuantiles + 1.

As with the heatmap example described earlier, using Vega transforms eliminate the need for an SQL prequery and significantly improves performance for dynamic operations.

The examples that follow use a Twitter dataset to create a geo pointmap.

For more information about the aggregate functions used in these examples, see transforms in the Vega Data reference.

Distinct

In the following example, the size of the points in a geo pointmap are defined by the numeric range two standard deviations from the average number of followers of the input data. The color of the points is driven by the distinct languages of the input data. To calculate the distinct languages, you could run a prequery using DISTINCT and then populate a Vega color scale with the results. However, the query would need to be run before every render update if the distinct data is meant to be dynamic, which would be very costly.

With the distinct Vega transform, this can be performed when evaluating the Vega code in the backend, so you do not need to run the prequery. This can improve performance considerably.

{
  "width": $width,
  "height": $height,
  "data": [
    {
      "name": "table",
      "sql": "SELECT conv_4326_900913_x(lon) as x,conv_4326_900913_y(lat) as y,followers,lang,rowid FROM tweets_2017_may WHERE (lon between $minLon and $maxLon AND lat between $minLat and $maxLat) LIMIT 200000"
    },
    {
      "name": "xformtable",
      "source": "table",
      "transform": [
        {
          "type": "aggregate",
          "fields": ["followers", "followers", "followers", "followers", "lang"],
          "ops":    ["min", "max", "avg", "stddev", "distinct"],
          "as":     ["minfol", "maxfol", "avgfol", "stdfol", "distinctlang"]
        },
        {
          "type": "formula",
          "expr": "max(minfol, avgfol-2*stdfol)",
          "as": "minfoltouse"
        },
        {
          "type": "formula",
          "expr": "min(maxfol, avgfol+2*stdfol)",
          "as": "maxfoltouse"
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": [
        $minXBounds,
        $maxXBounds
      ],
      "range": "width"
    },
    {
      "name": "y",
      "type": "linear",
      "domain": [
        $minYBounds,
        $maxYBounds
      ],
      "range": "height"
    },
    {
      "name": "size",
      "type": "linear",
      "domain": {"data": "xformtable", "fields": ["minfoltouse", "maxfoltouse"]},
      "range": [
        1, 20
      ],
      "clamp": true
    },
    {
      "name": "color",
      "type": "ordinal",
      "domain": {"data": "xformtable", "field": "distinctlang"},
      "range": [
        "blue", "red", "green", "yellow", "magenta", "purple", "teal"
      ]
    }
  ],
  "marks": [
    {
      "type": "points",
      "from": {
        "data": "table"
      },
      "properties": {
        "x": {
          "scale": "x",
          "field": "x"
        },
        "y": {
          "scale": "y",
          "field": "y"
        },
        "fillColor": {
          "scale": "color",
          "field": "lang"
        },
        "size": {
          "scale": "size",
          "field": "followers"
        }
      }
    }
  ]
}

This Vega code results in this image:

Median

Outliers in a dataset can significantly skew statistics such as AVG and STDDEV. To mitigate this, you can use median and quantile to create a more meaningful probability distribution of the data. Median and quantiles are computed dynamically when Vega is evaluated and can be used to drive different render properties.

The following hexmap example uses median to drive the color of the hex bins. Notice in the final render that roughly half of the bins are colored red, and the other half are blue.

{
  "width": $width,
  "height": $height,
  "data": [
    {
      "name": "heatmap_query",
      "sql": "SELECT reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),$minXBounds,$maxXBounds,conv_4326_900913_y(lat),$minYBounds,
	  $maxYBounds,9.931506849315069,11.467916305821335,0,0,$width,$height) as x,reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),$minXBounds,$maxXBounds,
	  conv_4326_900913_y(lat),$minYBounds,$maxYBounds,9.931506849315069,11.467916305821335,0,0,$width,$height) as y, count(*) as color FROM tweets_2017_may 
	  WHERE ((lon >= $minLon AND lon <= $maxLon) AND (lat >= $minLat AND lat <= $maxLat)) GROUP BY x, y"
    },
    {
      "name": "heatmap_statistics",
      "source": "heatmap_query",
      "transform": [
        {
          "type": "aggregate",
          "fields": ["color"],
          "ops":    ["median"],
          "as":     ["mediancolor"]
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "heat_color",
      "type": "threshold",
      "domain": {"data": "heatmap_statistics", "field": "mediancolor"},
      "range": [
        "blue", "red"
      ]
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {
        "data": "heatmap_query"
      },
      "properties": {
        "shape": "hexagon-horiz",
        "xc": {
          "field": "x"
        },
        "yc": {
          "field": "y"
        },
        "width": 9.931506849315069,
        "height": 11.467916305821335,
        "fillColor": {
          "scale": "heat_color",
          "field": "color"
        }
      }
    }
  ]
}

Quantile

The quantile function takes two additional parameters:

  • numQuantiles is the number of contiguous intervals to create and returns the separators for the intervals. The number of returned separators is numQuantiles - 1.

  • includeExtrema is a true or false value indicating whether to include the extrema (min and max) in the resulting separator array. If true, the number of returned values is numQuantiles + 1.

To see how a quantile works, consider a query that results in this set of values for "followers":

{3, 6, 7, 8, 8, 10, 13, 15, 16, 20}

With a quantile operator defined as {"type": "quantile", "numQuantiles": 4}, the result of the operator would be the following array:

[7, 9, 15]

25% of the data has less than 7 followers, 25% has between 7 and 9, 25% has between 9 and 15, and 25% has more than 15.

With a quantile operator defined as {"type": "quantile", "numQuantiles": 4, "includeExtrema": true}, the result of the operator would be the following array:

[3, 7, 9, 15, 20].

With "includeExtrema" == true, the min and max are included in the resulting array, so 25% of the data has between 3 and 7 followers, 25% has between 7 and 9, 25% has between 9 and 15, and 25% has between 15 and 20.

The following Vega code snippet gets the octiles (8-quantiles) and sextiles (6-quantiles) of a column called "followers":

...
{
 "name": "xformtable",
 "source": "table",
 "transform": [
   {
     "type": "aggregate",
     "fields": ["followers", "followers"],
     "ops":    [{"type": "quantile", "numQuantiles": 8}, {"type": "quantile", "numQuantiles": 6}],
     "as":     ["octile_fol", "sextile_fol"]
   }
 ]
}
...

For more information about quantiles, see: https://en.wikipedia.org/wiki/Quantile.

Here is a more complete example using sextiles. Notice in the resulting image approximately the same number of hexagons appears in each of the six quantile groups colored blue to red, from left to right.

{
  "width": $width,
  "height": $height,
  "data": [
    {
      "name": "heatmap_query",
      "sql": "SELECT reg_hex_horiz_pixel_bin_x(conv_4326_900913_x(lon),$minXBounds,$maxXBounds,conv_4326_900913_y(lat),
	  $minYBounds,$maxYBounds,9.931506849315069,11.467916305821335,0,0,$width,$height) as x,reg_hex_horiz_pixel_bin_y(conv_4326_900913_x(lon),
      $minXBounds,$maxXBounds,conv_4326_900913_y(lat),$minYBounds,$maxYBounds,9.931506849315069, 11.467916305821335,0,0,$width,$height) as y, 
      count(*) as color FROM tweets_2017_may WHERE ((lon >= $minLon AND lon <= $maxLon) AND (lat >= $minLat AND lat <= $maxLat)) GROUP BY x, y"
    },
    {
      "name": "heatmap_statistics",
      "source": "heatmap_query",
      "transform": [
        {
          "type": "aggregate",
          "fields": ["x"],
          "ops":    [{"type": "quantile", "numQuantiles": 6}],
          "as":     ["sextilex"]
        }
      ]
    }
  ],
  "scales": [
     {
      "name": "heat_color",
      "type": "threshold",
      "domain": {"data": "heatmap_statistics", "field": "sextilex"},
      "range": [
        "rgb(0, 0, 255)", "rgb(51, 0, 204)", "rgb(102, 0, 153)", "rgb(153, 0, 102)", "rgb(204, 0, 51)", "rgb(255, 0, 0)"
      ]
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {
        "data": "heatmap_query"
      },
      "properties": {
        "shape": "hexagon-horiz",
        "xc": {
          "field": "x"
        },
        "yc": {
          "field": "y"
        },
        "width": 9.931506849315069,
        "height": 11.467916305821335,
        "fillColor": {
          "scale": "heat_color",
          "field": "x"
        }
      }
    }
  ]
}

Improving Rendering with SQL Extensions

Marks defined in Vega specify how to render data-backed geometric primitives for a visualization. Because these are visual primitives, the default units for defining position and size are in pixels. Pixel units usually are not directly representable by the data space, so the driving data must be mapped to pixel space to be used effectively. In many cases, this data space-to-pixel space mapping can be handled with scales. However, in a number of instances, particularly in geo-related cases, you want to size the primitives in world space units, such as meters. These units cannot be easily converted to pixel units using Vega scales.

This tutorial describes how to use available SQL extension functions in Vega to map meters to pixels, thereby improving map rendering.

Let's look at a basic example. The following uses a public polical contributions dataset, and draws circles for the points positioned using the GPS location of the contributor. The circles are colored by the recipient's political party affiliation and sized to be 10 pixels in diameter:

{
  "width": 1146,
  "height": 1116,
  "data": [
    {
      "name": "pointmap",
      "sql": "SELECT lon, lat, recipient_party, rowid FROM fec_contributions_oct WHERE (lon BETWEEN -119.49268182426508 AND -76.518508633361) AND (lat BETWEEN 21.99999999999997 AND 53.999999999999716) LIMIT 2000000"
    }
  ],
  "scales": [
    {
      "name": "pointmap_fillColor",
      "type": "ordinal",
      "domain": ["D","R","I"],
      "range": ["deepskyblue","crimson","gold"],
      "default": "peru",
      "nullValue": "gainsboro"
    }
  ],
  "projections": [
    {
      "name": "merc",
      "type": "mercator",
      "bounds": {
        "x": [-119.49268182426508,-76.518508633361],
        "y": [21.99999999999997,53.999999999999716]
      }
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {"data": "pointmap"},
      "properties": {
        "xc": {"field": "lon"},
        "yc": {"field": "lat"},
        "fillColor": {"scale": "pointmap_fillColor","field": "recipient_party"},
        "shape": "circle",
        "width": 10,
        "height": 10
      },
      "transform": {"projection": "merc"}
    }
  ]
}

The resulting render, composited over a basemap courtesy of Mapbox, looks like this:

Because the circles are sized using pixels, if you zoom in, the circles stay sized at a fixed 10 pixels. The size of the dots does not stay relative to the area of the map originally covered:

...

"projections": [
  {
    "name": "merc",
    "type": "mercator",
    "bounds": {
      "x": [-112.67762110616854,-112.15822455336946],
      "y": [40.30629722096336,40.69091660556256]
    }
  }
],

...

The resulting render in this case looks like this:

To keep the size of the points relative to an area on the map, you need to define the size of the pixels in meters. Currently, Vega does not provide a scale that maps meters in a mercator-projected space to pixel units. To bypass this limitation, you can use an OmniSci extension function that performs meters-to-pixels conversion using a mercator-projected space.

For scalar columns, such as lon/lat, use the following:

  • convert_meters_to_merc_pixel_width

  • convert_meters_to_merc_pixel_height

For geo POINT columns, you use:

  • convert_meters_to_pixel_width

  • convert_meters_to_pixel_height

Because the extension functions can only return scalar values, each dimension (width and height) must have its own extension function.

To apply these functions to the previous example, add these extension functions to your SQL code, and use the results of the extension functions to determine the width and height of the circles. The following example sizes the points to 1 km in diameter:

javascript
{
  "width": 1146,
  "height": 1116,
  "data": [
    {
      "name": "pointmap",
      "sql": "SELECT lon, lat, convert_meters_to_merc_pixel_width(1000, lon, lat, -119.49268182426508, -76.518508633361, 1146, 1) as width
	  convert_meters_to_merc_pixel_height(1000, lon, lat, 21.99999999999997, 53.999999999999716, 1116, 1) as height, recipient_party as color, 
	  rowid FROM fec_contributions_oct WHERE (lon BETWEEN -119.49268182426508 AND -76.518508633361) AND
	  (lat BETWEEN 21.99999999999997 AND 53.999999999999716) LIMIT 2000000"
    }
  ],
  "scales": [
    {
      "name": "pointmap_fillColor",
      "type": "ordinal",
      "domain": ["D","R","I"],
      "range": ["deepskyblue","crimson","gold"],
      "default": "peru",
      "nullValue": "gainsboro"
    }
  ],
  "projections": [
    {
      "name": "merc",
      "type": "mercator",
      "bounds": {
        "x": [-119.49268182426508,-76.518508633361],
        "y": [21.99999999999997,53.999999999999716]
      }
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {"data": "pointmap"},
      "properties": {
        "xc": {"field": "lon"},
        "yc": {"field": "lat"},
        "fillColor": {"scale": "pointmap_fillColor","field": "recipient_party"},
        "shape": "circle",
        "width": {"field": "width"},
        "height": {"field": "height"}
      },
      "transform": {"projection": "merc"}
    }
  ]
}

Note the differences in this Vega code compared to the earlier example; two projections were added to the SQL code:

  • convert_meters_to_merc_pixel_width(1000, lon, lat, -119.49268182426508, -76.518508633361, 1146, 1) as width

  • convert_meters_to_merc_pixel_height(1000, lon, lat, 21.99999999999997, 53.999999999999716, 1116, 1) as height

This converts 1 km to a pixel value in width/height based on the current view of a mercator-projected map.

The width/height calculated here is now used to drive the width/height of the circle using this JSON in the Vega mark:

"width": {"field": "width"},
"height": {"field": "height"}

The resulting render looks like this:

Now, if you zoom in, the size of the points stays relative to the map:

...

"projections": [
  {
    "name": "merc",
    "type": "mercator",
    "bounds": {
      "x": [-112.67762110616854,-112.15822455336946],
      "y": [40.30629722096336,40.69091660556256]
    }
  }
],

...

...with the following resulting render:

The following code zooms in a bit more:

...

"projections": [
  {
    "name": "merc",
    "type": "mercator",
    "bounds": {
      "x": [-112.52569969159018,-112.37518840098163],
      "y": [40.527435942877986,40.63875135019538]
    }
  }
],

...

and results in the following render:

Notice that the WHERE clause of the SQL filters out points not in view:

... WHERE (lon BETWEEN -119.49268182426508 AND -76.518508633361) AND (lat BETWEEN 21.99999999999997 AND 53.999999999999716) ...

However, when zoomed in far enough, a point can disappear, even though its associated circle is still in view. This occurs because only the center of the circle is checked in this filter and not the whole rendered circle.

To illustrate this, consider a render of the following query:

SELECT lon, lat, convert_meters_to_merc_pixel_width(1000, lon, lat, -112.49286564041051, -112.34235434980197, 1146, 1) as width,
convert_meters_to_merc_pixel_height(1000, lon, lat, 40.53172840847458, 40.64303667787769, 1116, 1) as height, recipient_party, 
rowid FROM fec_contributions_oct WHERE (lon BETWEEN -112.49286564041051 AND -112.34235434980197) AND (lat BETWEEN 40.53172840847458 
AND 40.64303667787769) LIMIT 2000000

The resulting image looks like this:

If you pan to the left, the blue dot disappears, although it should still be visible. Here is the query:

SELECT lon, lat, convert_meters_to_merc_pixel_width(1000, lon, lat, -112.48984490770093, -112.33933361709238, 1146, 1) as width,
convert_meters_to_merc_pixel_height(1000, lon, lat, 40.5315287650088, 40.642837366411584, 1116, 1) as height, recipient_party, 
rowid FROM fec_contributions_oct WHERE (lon BETWEEN -112.48984490770093 AND -112.33933361709238) AND (lat BETWEEN 40.5315287650088 
AND 40.642837366411584) LIMIT 2000000

...and the resulting image:

To alleviate this issue, you can use the extension functions as a filter:

  • For scalar columns (such as lon/lat): is_point_size_in_merc_view

  • For geo POINT columns: is_point_size_in_view

These extension functions take as arguments the parameters of the view along with the point size in meters, and return true if the point is in the defined view, or false otherwise.

Refering back to the original example, replace the WHERE clause with its is_point_size_in_merc_view equivalent:

SELECT lon, lat, convert_meters_to_merc_pixel_width(1000, lon, lat, -112.49286956397471, -112.34028759586535, 1146, 1) as width,
convert_meters_to_merc_pixel_height(1000, lon, lat, 40.531763370983555, 40.64460162657784, 1116, 1) as height, recipient_party, 
rowid FROM fec_contributions_oct WHERE is_point_size_in_merc_view(lon, lat, 1000, -112.49286956397471, -112.34028759586535, 
40.531763370983555, 40.64460162657784) LIMIT 2000000

This results in:

Now, pan slightly to the left again:

SELECT lon, lat, convert_meters_to_merc_pixel_width(1000, lon, lat, -112.48980727316768, -112.33722530505833, 1146, 1) as width, convert_meters_to_merc_pixel_height(1000, lon, lat, 40.531763370983555, 40.64460162657784, 1116, 1) as height, recipient_party, 
rowid FROM fec_contributions_oct WHERE is_point_size_in_merc_view(lon, lat, 1000, -112.48980727316768, -112.33722530505833, 
40.531763370983555, 40.64460162657784) LIMIT 2000000

The result is:

Notice that the blue dot now passes the filter and stays in view.

Current Limitations

  • This approach is not an accurate representation of area on a map. It provides a reasonable approximate, but more error is introduced as you approach the poles, because this approach works only in two dimensions. As you approach the poles, you would realistically see areas that are oblong and egg-shaped. However, this approach works reasonably well for most inhabitable geo locations.

  • The symbol mark types are procedurally generated and use a simple POINT primitive in the underlying graphics API. This primitive has a maximum pixel size for this primitive. The limit is graphics driver–implementation defined, but testing shows this limit to be 2000 pixels in diameter. This limit can have an effect if you zoom in tight on areas where the circles have large areas. You may see points disappear, similar to the filtering issue described earlier. This most likely occurs because the ultimate width/height generated by the convert_meters_to_pixels extension functions exceed this limit.

    As a workaround, use the legacysymbol mark type instead of symbol. The legacysymbol mark type does not render the shape procedurally, so it is not affected by this limit. The legacysymbol mark was deprecated in favor of the improved rendering performance of the procedural approach.

  • When you use extension functions in SQL, you cannot use Vega scales to do further mapping; for example, you cannot use the contribution "amount" column to drive the size of the points in meters with a Vega scale. Any additional mapping must be done in the SQL, which may not be trivial depending on the complexity of the mapping.