Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Vega is a visualization specification language that describes how to map your source data to your viewing area. By creating a JSON Vega specification structure, you define the data and the transformations to apply to the data to produce meaningful visualizations. The specification includes the geometric shape that represents your data, scaling properties that map the data to the visualization area, and graphical rendering properties.
HEAVY.AI uses Vega for backend rendering. Using the HEAVY.AI Connector API, the client sends the Vega JSON to the backend, which renders the visualization and returns a PNG image for display. See the HEAVY.AI Charting Examples for backend rendering examples.
The topics in this guide define and describe the HEAVY.AI implementation of Vega, and provide examples you can use as a basis for your own visualizations:
Tutorials - Introduces you to Vega specification patterns so you can start creating visualizations quickly and easily. Each tutorial has example code that demonstrates a particular feature or pattern. Tutorials start with basic Vega concepts and an introduction to the API for communication with the backend. Other tutorials provide more in-depth information about specific Vega implementations.
Reference - Describes the HEAVY.AI implementation of Vega specification syntax and associated rules. Also includes links to Vega standards and related specifications.
Code Migration - If you are upgrading to Release 5.2 or higher, you need to migrate any code that renders polygons in cached mode to dynamic poly rendering.
Try Vega - Try the HEAVY.AI Vega engine and work with various examples. See your changes to Vega code in real time.
data
property
projections
property
scales
property
marks
property
Source code is located at the end of the tutorial.
This tutorial uses the same visualization as Vega at a Glance but elaborates on the runtime environment and implementation steps. The Vega usage pattern described here applies to all Vega implementations. Subsequent tutorials differ only in describing more advanced Vega features.
This visualization maps a continuous, quantitative input domain to a continuous output range. Again, the visualization shows tweets in the EMEA region, from a tweets data table:
Backend rendering using Vega involves the following steps:
You can create the Vega specification statically, as shown in this tutorial, or programmatically. See the Poly Map with Backend Rendering charting example for a programmatic implementation. Here is the programmatic source code:
A Vega JSON specification has the following general structure:
The width
and height
properties define the width and height of your visualization area, in pixels:
This example uses the following SQL statement to get the tweets data:
The input data are the latitude and longitude coordinates of tweets from the tweets_nov_feb
data table. The coordinates are labeled x
and y
for Field Reference in the marks
property, which references the data using the tweets
name.
The marks
property specifies the graphical attributes of how each data item is rendered:
In this example, each data item from the tweets
data table is rendered as a point. The points
marks type includes position, fill color, and size attributes. The marks property specifies how to visually encode points according to these attributes. Points in this example are three pixels in diameter and colored blue.
Points are scaled to the visualization area using the scales property.
The following scales
specification maps marks
to the visualization area.
Both x
and y
scales specify a linear
mapping of the continuous, quantitative input domain to a continuous output range. In this example, input data values are transformed to predefined width
and height
range
values.
Later tutorials show how to specify data transformation using discrete domain-to-range mapping.
Use the browser-connector.js renderVega()
API to communicate with the backend. The connector is layered on Apache Thrift for cross-language client communication with the server.
Follow these steps to instantiate the connector and to connect to the backend:
Include browser-connector.js
located at https://github.com/omnisci/mapd-connector/tree/master/dist to include the MapD connector and Thrift interface APIs.
Instantiate the MapdCon()
connector and set the server name, protocol information, and your authentication credentials, as described in the MapD Connector API:
Finally, call the MapD connector API connect() function to initiate a connect request, passing a callback function with a (error, success)
signature as the parameter.
For example,
The connect()
function generates client and session IDs for this connection instance, which are unique for each instance and are used in subsequent API calls for the session.
On a successful connection, the callback function is called. The callback function in this example calls the renderVega()
function.
The MapD connector API renderVega() function sends the Vega JSON to the backend, and has the following parameters:
The backend returns the rendered Base64 image in results.image
, which you can display in the browser window using a data URI.
Getting Started Directory Structure
Getting Started index.html
Getting Started vegademo.js
Getting Started vegaspec.js
Property
Description
dbName
OmniSci database name.
host
OmniSci web server name.
password
OmniSci user password.
port
OmniSci web server port
protocol
Communication protocol: http
, https
user
OmniSci user name.
Parameter
Type
Required
Description
widgetid
number
X
Calling widget ID.
vega
string
X
Vega JSON object, as described in Step 1 - Create the Vega Specification.
options
number
Render query options.
compressionLevel
:PNG compression level. 1
(low, fast) to 10
(high, slow). Default = 3
callback
function
Callback function with (error, success)
signature.
Return
Description
Base64 image
PNG image rendered on server
can be found at the end of this tutorial.
This tutorial introduces you to the marks, which uses an implicit polygon data table format. The visualization is a map of zip codes color-coded according to average contribution amount. The data table encodes polygons representing zip code areas.
See the charting example for a programmatic rendering of this visualization.
The following data property extracts the average contribution amount from the contributions_donotmodify
data table, omitting rows that do not have a contribution amount:
When working with polygon data, the "format": "polys"
property must be specified.
The scales
specification scales x
values to the visualization area width and y
values to the height. A color scale, polys_fillColor
is also specified that linearly scales nine contribution amount ranges to nine colors:
Zip code areas for which average contribution amounts are not specified by the domain are color-coded green.
The marks
property specifies visually encoding the data from the polys
data table as polygons:
Polygon x
and y
vertex locations are transformed to the visualization area using the x
and y
scales.
Polygon fill color color-codes the average contribution amount, avgContrib
, linearly scaled by the polys_fillColor
scale:
Finally, the marks property specifies the polygon border width and color, and line join constraints:
Working with Polys Tutorial Directory Structure
Working with Polys Tutorial index.html
Working with Polys Tutorial vegademo.js
Working with Polys Tutorial vegaspec.js
These tutorials introduce you to common Vega specification patterns so you can start creating visualizations quickly and easily. Each tutorial uses a code example that demonstrates a particular Vega feature or pattern. The tutorial covers basic Vega concepts and serves as the foundation for some of the tutorials that follow, and introduces you to the API for communication with the backend. Other tutorials provide more in-depth information about specific Vega implementations.
Use these tutorials to gain a better understanding of Vega by experimenting with them to create new visualizations on your own HEAVY.AI system and database. You can also to make adjustments to Vega code and see real-time changes in charts.
For information about the Vega specification syntax and properties, see .
Because the tutorials focus on the Vega specification, they use a simple client implementation that sends the render request to the HEAVY.AI server and handles the response:
Common index.html
Common vegademo.js
On a connection error, you can view the error message to determine the cause of the error. To determine the cause of Vega specification errors, catch and handle the renderVega()
exception.
is located at the end of this tutorial.
This tutorial introduces you to marks
by creating a heatmap visualization. The heatmap shows contribution level to the Republican party within the continental United States:
The contribution data are obtained using the following SQL query:
The visualization uses a Symbol Type marks type to represent each data item in the heatmap_query
data table:
The marks properties
property specifies the symbol shape, which is a square
. Each square has a pixel width
and height
of one pixel.
Notice that the data x
and y
location values do not reference a scale. The location values are the values of the SQL query, transformed using extension functions.
The fill color of the square uses the heat_color
scale to determine the color used to represent the data item.
Quantize scales are similar to linear scales, except they use a discrete rather than continuous range. The continuous input domain is divided into uniform segments based on the number of values in the output range.
A heatmap shows a continuous input domain divided into uniform segments based on the number of values in the output range. This is a quantize
scales type. In the example, dollar amounts between $10,000 and $1 million are uniformly divided among 21 range values, where the larger amounts are represented by brighter colors.
Values outside the domain and null
values are rendered as dark blue, #0d0887
.
Advanced Chart Type Tutorial Directory Structure
Advanced Chart Type Tutorial index.html
Advanced Chart Type Tutorial vegademo.js
Advanced Chart Type Tutorial vegaspec.js
The x
and y
polygon vertex locations are implicitly encoded in the data table as described in .
The renderVega()
function sends the exampleVega
JSON structure described in the tutorials. covers Vega library dependencies and the renderVega()
function in more detail.
- Provides an overiew of Vega and a simple example to visualize tweets.
- Maps a continuous, quantitative input domain to a continuous output range. Uses the same visualization as , but elaborates on the runtime environment and implementation steps.
- Builds on the tutorial by color-coding tweets according to language.
- Introduces marks
by creating a heatmap visualization of political donations.
- Shows how to use the marks
, which uses an implicit polygon data table format. The visualization in the tutorial is a map of zip codes, color-coded according to average political contribution amount.
- Describes the three modes of accumulation rendering and provides some implementation examples. The data used contains information about political donations, including party affiliation, the amount of the donation, and location of the donor.
- Shows how to create Vega-based visualizations with render properties that are driven by aggregated statistics. Use Vega transform aggregation and formula expressions to automate the process of gathering statistical information about a rendered query.
- Describes how to use SQL extension functions in Vega to map meters to pixels and improve map rendering.
In Release 5.2, the polygon cache for rendering was deprecated and will be completely removed in a subsequent release. Any poly cache rendering in your code must be reworked to use dynamic poly rending. This topic describes how to migrate poly cache code to dynamic, cacheless rendering.
Caching poly buffers has two main drawbacks, both of which have a significant impact on memory:
The cache cannot span multiple GPUs.
The entire table is cached, regardless of the filter in the query.
In contrast, dynamic poly rendering can utilize all available GPUs and only uses the data that passes any filters.
To move to dynamic poly rendering, determine if you are using the poly cache, and then adjust your code if needed.
HEAVY.AI strongly recommends that you also remove the render-poly-cache-bytes
option from your server configuration file, if used. This will help prevent startup warnings or errors in subsequent releases of HEAVY.AI.
It may not be immediately obvious if you are using poly cache rendering because no flag is used to enable it. Instead, poly cache rendering is enabled according to the SQL code used in a poly-formatted data block of Vega code. If the query ultimately projects or results in a POLYGON/MULTIPOLYGON column, the cache is not used and no code changes are requried.
However, if the query does not reference a POLYGON/MULTIPOLYGON column, but projects a rowid
column, then poly caching is in use.
The following Vega code has a poly query that uses the cache system:
The sql
property of the polys
data block, which uses "format": "polys"
, projects a rowid
column. This activates poly caching, even though the geo column (heavyai_geo
in this case) is used in the filter.
To convert this Vega code to dynamic, cacheless rendering, change the SQL query:
You can keep rowid
in this query and use it for later hit-testing. It's presence does not affect dynamic rendering. See the simple projection query in Example 1 for an example.
Alternatively, you can check the INFO logs to see if poly caching is used. Look for a LOG statement similar to the following:
or
If you find either of these, then a poly cache render query is used.
To migrate from poly cache to dynamic poly rendering, you output a POLYGON/MULTIPOLYGON column in the SQL query of a poly-formatted data
block of your Vega code. In the following examples, the heavyai_geo column is a MULTIPOLYGON.
Cached (only rowid
is output):
Dynamic (heavyai_geo is now projected, along with rowid
):
Example 2 - Join query using a WITH subquery
Cached (no geo column is projected in the outer query of the join):
Dynamic (the geo column is projected in place of rowid
):
A Vega specification is a JSON-formatted structure that describes a visualization, which can be sent to the back end for rendering. This document introduces the the Vega specification syntax and provides links to topics that provide more details about each Vega property.
For examples of using Vega, see Tutorials. You can also see and edit examples in Try Vega.
The Vega specification includes properties for describing the source data, mapping the data to the visualization area, and visual encoding. The root Vega specification supported by OmniSci has the following JSON structure and top-level properties:
Property names are case-sensitive.
Property values are typed.
Unsupported properties are ignored by the rendering engine.
Source code can be found at the end of this tutorial.
This tutorial builds on the Getting Started with Vega tutorial by color-coding tweets according to language:
Tweets in English are blue.
Tweets in French are orange.
Tweets in Spanish are green.
All other tweets are light or dark gray.
To highlight language in the visualization, the example specifies the language column query in the Vega data
property, and associates language with color
.
The Scales property maps the language abbreviation string to a color value. Because we want to map discrete domain values to discrete range values, we specify a color
scale with an ordinal type scale:
You can specify a default
color values for values not specified in range
and for data items with a value of null
. In this example, tweets in languages other than English, Spanish, or French are colored gray and tweets with a language value of null
are colored light gray (#cacaca
).
Similar to using x
and y
scales to map Marks property x
and y
fields to the visualization area, you can scale the fillColor
property to the visualization area.
In previous examples the fill color of points representing tweets was statically specified as blue
:
This example, uses Value Reference to specify the fill color:
The fillColor
references the color
scale and performs a lookup on the current language value, from the color
data table field.
Getting More Insight Tutorial Directory Structure
Getting More Insight Tutorial index.html
Getting More Insight Tutorial vegademo.js
Getting More Insight Tutorial vegaspec.js
Source code is located at the end of this topic.
This tutorial provides an overiew of Vega and a simple example to visualize tweets in the EMEA geographic region:
The Vega JSON structure maps data to geometric primitives.
A first task is to specify the data source. You can either define data statically or use a SQL query. This examples uses a SQL query to get tweet geolocation information from a tweets database:
The resulting SQL columns can be referenced in other parts of the specification to drive visualization elements. In this example, the projected columns are goog_x
and goog_y
, which are renamed x
and y
, and rowid, which is a requirement for hit-testing.
The Vega specification for this example includes the following top-level properties:
height
and width
, which define the height and width of the visualization area.
data
, which defines the data source. The SQL data described above is defined here with the label tweets
for later referencing.
marks
, which describes the geometric primitives used to render the visualization.
scales
, which are referenced by marks to map input domain values to appropriate output range values.
Here is the full Vega specification used in this example:
The following sections describe the top-level Vega specification properties.
The width
and height
properties define a visualization area 384
pixels wide and 564
pixels high:
The scales
position encoding properties map the marks
into this visualization area.
The marks
property defines visualization geometric primitives. The OmniSci Vega implementation defines the following primitive types:
lines
A line
points
A point
polys
A polygon
symbol
A geometric symbol, such as a circle or square
Each primitive type has a set of properties that describe how the primitive is positioned and styled.
This example uses points to represent the tweets
data:
Points support the following properties; not all are included in the example:
x
The x position of the point in pixels.
y
The y position of the point in pixels.
z
The depth coordinate of the point in pixels.
fillColor
The color of the point.
fillOpacity
The opacity of the fill, from transparent (0
) to opaque (1
).
opacity
The opacity of the point as a whole, from transparent (0
) to opaque (1
).
size
The diameter of the point in pixels.
The points in the example reference the tweets
SQL data and use the x
and y
columns from the SQL to drive the position of the points. The positions are appropriately mapped to the visualization area using scales as described in Scale Input Domain to Output Range. The fill color is set to blue
and point size is set to three pixels.
The scales
definition maps data domain values to visual range values, where the domain
property determines the input domain for the scale. See the d3-scale reference for background information about how scaling works.
This example uses linear scales to map mercator-projected coordinates into pixel coordinates for rendering.
The x
and y
scales use linear
interpolation to map point x- and y-coordinates to the width
and height
of the viewing area. The width
and height
properties are predefined keywords that equate to the range [0, <current width>]
and [0, <current height>]
.
After completing the Vega specification, you send the JSON structure to the backend for rendering.
The following steps summarize the rendering and visualization sequence:
Instantiate the MapdCon
object for connecting to the backend.
Call the connect method with server information, user credentials, and data table name.
Provide the renderVega()
callback function to connect()
and include the Vega specification as a parameter.
Display the returned PNG image in you client browser window.
OmniSci uses Apache Thrift for cross-language client communication with the backend. Include the browser-connector.js, connector API, which includes Thrift interface libraries and the renderVega()
function:
The following example encapsulates the connect, render request, and response handling sequence:
This example demonstrated the basic concepts for understanding and using Vega. To become comfortable with Vega, try this example using your own OmniSci instance, changing the MapdCon()
parameters according to match your host environment and database.
As you gain experience with Vega and begin writing your own applications, see the Reference for detailed information about Vega code.
Vega at a Glance index.html
Marks visually encode data using geometric primitives.
General JSON format:
A Vega marks
specification includes the following properties:
Each marks property is associated with the specified data property.
Marks are rendered in marks property array order.
Marks property values can be constants or as data references. You can use the scales property to transform marks property values to the visualization area.
Apply the x
and y
scales to the x
and y
database table columns to scale the data to the visualization area width and height. For example:
Marks must include a type
property that specifies the geometric primitive to use to render the data.
Specify x
and y
coordinate values using either constants, or domain and range values of a data
reference. If the from
property is not specified, the x
and y
properties
fields must be constants.
Define a point with size, color, and opacity:
Associate the points
geometric primitive with tweets
data items.
Specifying the data
format
property as lines
causes the rendering engine to assume a lines
database table layout and to extract line-related columns from the table.
Specify x
and y
coordinate values using either constants, or domain and range values of a data
reference. If the from
property is not specified, the x
and y
properties
fields must be constants.
The polys
type renders data as a polygon.
When the data
format property is polys
, the rendering engine assumes a polys
database table layout and extracts the poly-related columns from the table. A polys
database table layout implies that the first data column is the vertex x- and y-positions. The vertices are interleaved x and y values, such that vertex[0] = vert0.x
, vertex[1] = vert0.y
, vertex[2] = vert1.x
, and vertex[3] = vert1.y
, for example. The next three positions of a polys
database table are the triangulated indices, and line loop and drawing information for unpacking multiple, associated polygons that can be packed as a single data item.
The symbol
marks type renders data as one of the supported shapes.
Currently, in symbol
mark types, strokes are not visible beneath other marks, regardless of opacity settings.
Specify x
and y
coordinate values using either constants or domain and range values of a data
reference. If the from
property is not specified, the x
and y
properties
fields must be specified using constant values.
symbol
ExamplesThe following example defines symbol mark types including fill, stroke, and general opacity properties:
The from
field specifies the input database table to use.
Example
Use the tweets
database table for marks input data.
If from
is not specified, the data source is implicitly a single point with the value defined in the points
properties.
The properties
property specifies type-dependent visual encoding that define the position and appearance of mark instances. The property value is specified using one of the Value Reference options.
Typically, a single mark instance is generated per input data element, except for polys
, which uses multiple data elements to represent a line or area shape.
The following table describes the various marks properties
and lists the types for which the property is valid.
A value reference describes how to specify marks properties
values. The value can be a constant or data object reference:
Examples:
Statically set the point fillColor
and size
.
For the x
marks property, apply the x
scale transform to the implicit x-coordinate data column.
A field reference is either a string literal or an object. For object values, the following properties are supported:
Typically, color values are specified as a single RGB color value. To specify specific color fields or use a different color space, use one of the following color value reference formats:
Examples
Set the red and blue channels of an RGB color as constants, and uses a scale transform to determine the green channel:
Use the rgb
color space for the color
field:
The transform
object specifies any Vega projections to be applied to the mark. Each transform is specified as a key:value pair in the transform
object:
The value references an existing Vega object by name.
For example, the following transform references the projection my_mercator_projection
defined in the top-level Vega projections
property.
Currently, the only supported transform is projection
.
You can create Vega-based visualizations with render properties that are driven by aggregated statistics. You can use Vega transform aggregation and formula expressions to automate the process of gathering statistical information about a rendered query. By doing so, you do not have to run an SQL prequery to get the information, thereby reducing the time it takes to process and render a chart.
The following examples show how to use transforms in Vega to do the following:
Render a heatmap that is colored using dynamic statistics of the bins
Create a geo pointmap with different transform modes
NOTE: You can see Vega examples in the . For more information about the OmniSci Vega engine, see .
The following heatmap example demonstrates the benefits of Vega transforms for performance and reducing redundancy:
First, the example shows using an SQL expression to render a heatmap, as well as an additional expression to color the hexagonal bins according to the min
and max
of the cnt
value of the aggregated bins from the query.
Then, you will see how to render the heatmap and color the bins directly in Vega by using source
data definitions and performing aggregation transforms on that data, decreasing chart rendering time and redundancy.
The following is a typical SQL query used for rendering a hexagonal heatmap:
To color the hexagonal bins according to the min
and max
of the cnt
value of the bins from the query, you need to run a prequery to gather these statistics manually. Here, this is done using a subquery SQL statement:
The values returned from this query can then be embedded in the Vega code to color the heatmap bins. Notice that the second query does an aggregation over the query, effectively running the query twice.
To avoid the redundancy and expense of running the query twice, you can instead specify the aggregation in Vega.
The following Vega code renders the heatmap colored by aggregated statistics using transforms.
The data section named heatmap_stats
has a source data table defined by the "source": "heatmap_query"
line:
The "heatmap_stats"
data takes as input the "heatmap_query"
data, which is the data supplied by the SQL query. Use the source
data type to apply intermediary steps or expressions (transforms) to the input source data.
To color the data according to the range of values defined by two standard deviations from the mean, edit the "heatmap_stats"
section as follows to:
Aggregate the minimum, maximum, average, and sampled standard deviation of the count column.
Use formula expressions to calculate values that are two standard deviations from the average.
Then, reference these values in the scale
domain:
Performing these calculations in Vega improves performance because the SQL query is only run once and the aggregated statistics are done “on the fly.” Because the query is not repeated in a statistical prequery step, you can reduce the full render time by half by performing the statistics step in Vega at render time.
This section shows how to use Vega tranforms to drive the color and size of points in a geo pointmap. Specifically, it show examples using the following aggregation transforms:
distinct
: An array of distinct values from an input data column.
median
: The median of an input data column.
quantile
: An array of quantile separators; operates on numeric columns and takes the following pameters:
numQuantiles
: The number of contiguous intervals to create; returns the separators for the intervals. The number of separators equals numQuantiles - 1
.
includeExtrema
: Whether to include min and max values (extrema) in the resulting separator array. The size of the resulting separator array will be numQuantiles
+ 1.
As with the heatmap example described earlier, using Vega transforms eliminate the need for an SQL prequery and significantly improves performance for dynamic operations.
The examples that follow use a Twitter dataset to create a geo pointmap.
In the following example, the size of the points in a geo pointmap are defined by the numeric range two standard deviations from the average number of followers of the input data. The color of the points is driven by the distinct languages of the input data. To calculate the distinct languages, you could run a prequery using DISTINCT and then populate a Vega color scale with the results. However, the query would need to be run before every render update if the distinct data is meant to be dynamic, which would be very costly.
With the distinct
Vega transform, this can be performed when evaluating the Vega code in the backend, so you do not need to run the prequery. This can improve performance considerably.
This Vega code results in this image:
Outliers in a dataset can significantly skew statistics such as AVG and STDDEV. To mitigate this, you can use median
and quantile
to create a more meaningful probability distribution of the data. Median and quantiles are computed dynamically when Vega is evaluated and can be used to drive different render properties.
The following hexmap example uses median
to drive the color of the hex bins. Notice in the final render that roughly half of the bins are colored red, and the other half are blue.
The quantile function takes two additional parameters:
numQuantiles
is the number of contiguous intervals to create and returns the separators for the intervals. The number of returned separators is numQuantiles
- 1.
includeExtrema
is a true
or false
value indicating whether to include the extrema (min and max) in the resulting separator array. If true
, the number of returned values is numQuantiles
+ 1.
To see how a quantile works, consider a query that results in this set of values for "followers"
:
{3, 6, 7, 8, 8, 10, 13, 15, 16, 20}
With a quantile operator defined as {"type": "quantile", "numQuantiles": 4}
, the result of the operator would be the following array:
[7, 9, 15]
25% of the data has less than 7 followers, 25% has between 7 and 9, 25% has between 9 and 15, and 25% has more than 15.
With a quantile operator defined as {"type": "quantile", "numQuantiles": 4, "includeExtrema": true}
, the result of the operator would be the following array:
[3, 7, 9, 15, 20]
.
With "includeExtrema" == true
, the min and max are included in the resulting array, so 25% of the data has between 3 and 7 followers, 25% has between 7 and 9, 25% has between 9 and 15, and 25% has between 15 and 20.
The following Vega code snippet gets the octiles (8-quantiles) and sextiles (6-quantiles) of a column called "followers"
:
Here is a more complete example using sextiles. Notice in the resulting image approximately the same number of hexagons appears in each of the six quantile groups colored blue to red, from left to right.
Use the Vega data
property to specify the visualization data sources by providing an array of one or more data definitions. A data definition must be an object identified by a unique name, which can be referenced in other areas of the specification. Data can be statically defined inline ("values":
), can reference columns from a database table using a SQL statement ("SQL":
), or can be loaded from an existing data set ("source":
).
JSON format:
The data specification has the following properties:
Load discrete x- and y column values using the values
database table type:
Use the sql
database table type to load latitude and longitude coordinates from the tweets_data
database table:
Use the source
type to use the data set defined in the sql
data section and perform aggregation transforms:
The format
property indicates that data preprocessing is needed before rendering the query result. If this property is not specified, data is assumed to be in row-oriented JSON format.
The "short form", where format
is a single string, which must be either polys
or lines
. This form is used for all polygon rendering, and for fast ‘in-situ’ rendering of LINESTRING data.
The "long form", where format
is an object containing other properties, as follows:
For lines
, each row in the query corresponds to a single line.
This lines format
example of interleaved
data renders ten lines, all of the same length.
In this lines format
example of sequential
data, x
only stores points corresponding to the x coordinate and y
only stores points corresponding to the y coordinate. Make sure that columns only contain a single coordinate if using multiple columns in sequential layout.
The following example shows a fast "in-situ" LINESTRING format
:
The following example shows a polys format
:
The database table source property key-value pair specifies the location of the data and defines how the data is loaded:
Transforms process a data stream to calculate new aggregated statistic fields and derive new data streams from them. Currently, transforms are specified only as part of a source
data definition. Transforms are defined as an array of specific transform types that are executed in sequential order. Each element of the array must be an object and must contain a type
property. Currently, two transform types are supported: aggregate
and formula
.
If true
, automatically adds rowid column(s) to the SQL statement where appropriate, enabling the data block for hit-testing using the get_result_row_for_pixel
endpoint.
If false
, the data block is not automatically hit-test enabled, and any later get_result_row_for_pixel
calls return empty hit-test results.
If the enableHitTesting property is not present, the following legacy behavior is used as the default:
If the SQL statement represents a projection query, hit-testing is enabled if a rowid column is explicitly projected.
If the SQL statement represents an aggregate query, hit-testing is always enabled.
This legacy behavior will likely be deprecated and removed in an upcoming version of OmniSci. At that point, the enableHitTesting property will be required for activating hit-test support for the data.
Marks defined in Vega specify how to render data-backed geometric primitives for a visualization. Because these are visual primitives, the default units for defining position and size are in pixels. Pixel units usually are not directly representable by the data space, so the driving data must be mapped to pixel space to be used effectively. In many cases, this data space-to-pixel space mapping can be handled with scales. However, in a number of instances, particularly in geo-related cases, you want to size the primitives in world space units, such as meters. These units cannot be easily converted to pixel units using Vega scales.
This tutorial describes how to use available SQL extension functions in Vega to map meters to pixels, thereby improving map rendering.
Let's look at a basic example. The following uses a public polical contributions dataset, and draws circles for the points positioned using the GPS location of the contributor. The circles are colored by the recipient's political party affiliation and sized to be 10 pixels in diameter:
Because the circles are sized using pixels, if you zoom in, the circles stay sized at a fixed 10 pixels. The size of the dots does not stay relative to the area of the map originally covered:
The resulting render in this case looks like this:
To keep the size of the points relative to an area on the map, you need to define the size of the pixels in meters. Currently, Vega does not provide a scale that maps meters in a mercator-projected space to pixel units. To bypass this limitation, you can use an OmniSci extension function that performs meters-to-pixels conversion using a mercator-projected space.
For scalar columns, such as lon/lat, use the following:
For geo POINT columns, you use:
Because the extension functions can only return scalar values, each dimension (width and height) must have its own extension function.
To apply these functions to the previous example, add these extension functions to your SQL code, and use the results of the extension functions to determine the width and height of the circles. The following example sizes the points to 1 km in diameter:
Note the differences in this Vega code compared to the earlier example; two projections were added to the SQL code:
convert_meters_to_merc_pixel_width(1000, lon, lat, -119.49268182426508, -76.518508633361, 1146, 1)
as width
convert_meters_to_merc_pixel_height(1000, lon, lat, 21.99999999999997, 53.999999999999716, 1116, 1)
as height
This converts 1 km to a pixel value in width/height based on the current view of a mercator-projected map.
The width/height calculated here is now used to drive the width/height of the circle using this JSON in the Vega mark
:
The resulting render looks like this:
Now, if you zoom in, the size of the points stays relative to the map:
...with the following resulting render:
The following code zooms in a bit more:
and results in the following render:
Notice that the WHERE
clause of the SQL filters out points not in view:
However, when zoomed in far enough, a point can disappear, even though its associated circle is still in view. This occurs because only the center of the circle is checked in this filter and not the whole rendered circle.
To illustrate this, consider a render of the following query:
The resulting image looks like this:
If you pan to the left, the blue dot disappears, although it should still be visible. Here is the query:
...and the resulting image:
To alleviate this issue, you can use the extension functions as a filter:
These extension functions take as arguments the parameters of the view along with the point size in meters, and return true
if the point is in the defined view, or false
otherwise.
This results in:
Now, pan slightly to the left again:
The result is:
Notice that the blue dot now passes the filter and stays in view.
This approach is not an accurate representation of area on a map. It provides a reasonable approximate, but more error is introduced as you approach the poles, because this approach works only in two dimensions. As you approach the poles, you would realistically see areas that are oblong and egg-shaped. However, this approach works reasonably well for most inhabitable geo locations.
As a workaround, use the legacysymbol
mark type instead of symbol
. The legacysymbol
mark type does not render the shape procedurally, so it is not affected by this limit. The legacysymbol
mark was deprecated in favor of the improved rendering performance of the procedural approach.
When you use extension functions in SQL, you cannot use Vega scales to do further mapping; for example, you cannot use the contribution "amount" column to drive the size of the points in meters with a Vega scale. Any additional mapping must be done in the SQL, which may not be trivial depending on the complexity of the mapping.
Vega projections
map longitude and latitude data to projected x
and y
coordinates. When working with geospatial data in OmniSci, you can use projections to define geographic points and regions.
General projections
property JSON format:
When you specify a projection, you must reference it in the using the transform
object. For example, if you define the projection my_mercator_projection
:
you then reference it as follows:
The projections specification has the following properties:
Use Vega projection projection
alongside array columns:
Accumulation works by aggregating data per pixel during a backend render. Data is accumulated for every pixel for every shape rendered. Accumulation rendering is activated through color scales – scales that have colors defined for their range.
Note: Currently, only the COUNT aggregation function is supported.
This topic describes accumulation rendering and provides some implementation examples. The data source used here – a table called contributions
– contains information about political donations made in the New York City area, including party affiliation, the amount of the donation, and location of the donor.
There are three accumulation modes:
Density accumulation performs a count aggregation by pixel. It allows you to color a pixel by normalizing the count and applying a color to it, based on a color scale. In Heavy Immerse, if you open or create a Pointmap chart, you can toggle density accumulation on and off by using the Density Gradient attribute. For more information, see .
Note: Blend and percentage accumulation are not currently available in Heavy Immerse.
The density mode examples use the following base code:
This code generates the following image:
All points are rendered with a size of 2 and colored according to the contribution amount:
$100 or less is colored blue.
$10,000 or more is colored red.
Anything in between is colored somewhere between blue and red, depending on the contribution. Amounts closer to $100 are more blue, and amounts closer to $10,000 are more red.
The examples that follow adjust the pointcolor
scale and show the effects of various adjustments. Any changes made to Vega code are isolated to that scale definition.
Density accumulation can be activated for any scale that takes as input a continuous domain (linear
, sqrt
, pow
, log
, and threshold
scales) and outputs a color range. In the following code snippet, the density accumulator has been added to the linear pointcolor
scale:
The final color at a pixel is determined by normalizing the per-pixel aggregated counts and using that value in the scale function to calculate a color. The domains of density accumulation scales should be values between 0 and 1 inclusive, referring to the normalized values between 0 and 1. The normalization is performed according to the minDensityCnt
and maxDensityCnt
properties. After normalization, minDensityCnt
refers to 0 and maxDensityCnt
refers to 1 in the domain. In this case, 0 in the domain equates to a per-pixel count of 1, and 1 in the domain equates to a per-pixel count of 100.
minDensityCnt
and maxDensityCnt
are required properties. They can have explicit integer values, or they can use keywords that automatically compute statistical information about the per-pixel counts. Currently available keywords are:
min
max
1stStdDev
2ndStdDev
-1stStdDev
-2ndStdDev
If you change the color scale to the following:
The minimum aggregated count of all the pixels is used as the minDensityCnt
, and the maximum aggregated count used as the maxDensityCnt
. This results in the following:
Notice that the area with the most overlapping points is in the upper east side of Manhattan.
Now, use +/- 2 standard deviations for your counts:
This produces the following:
In this example, the scale is changed to a threshold scale, and the colors are adjusted to create a more interesting image:
This results in:
Note: You can mix and match explicit values and keywords for minDensityCnt
and maxDensityCnt
. However, if your min
value is greater than your max
value, your image might look inverted.
Blend accumulation works only with ordinal scales. This accumulation type blends the per-category colors set by an ordinal scale so that you can visualize which categories are more or less prevalent in a particular area.
The following Vega code colors the points according to the value in the recipient_party
column:
This results in the following chart:
Each point is colored according to recipient party. Values of R
(republican) are colored red, D
(democrat) are colored blue, and everything else is colored green.
To activate blend accumulation, add the "accumulator": "blend"
property to an ordinal scale.
This generates the following chart:
Activating blend accumulation shows you where one party is more dominant in a particular area. The COUNT aggregation is now being applied for each category, and the colors associated with each category are blended according to the final percentage of each category per pixel.
Note: Unlike in density mode, a field
property is required in mark
properties that reference blend accumulator scales.
Percentage (pct
) mode can help you visualize how prevalent a specific category is based on a percantage. Any scale can be used in percentage mode, but the domain values must be between 0 and 1, where 0 is 0% and 1 is 100%.
Using the political donations database, you can determine where the recipient_party
of “R” (republican) is more prevalent.
Here’s the color scale:
And the resulting image:
Using the threshold scale, anything colored blue is between 0%-33% republican, purple is 33%-66% republican, and red is 66%-100% republican.
pctCategory
is a required property for percentage mode and can be numeric or a string. A string refers to a string value from a dictionary-encoded column.
You can modify the example to use a numeric value for pctCategory. First, modify the SQL in the Vega to select the contribution amount for each data point:
Now use the amount as the driving field for the pct
accumulator scale:
Now, change the pct
scale to the following:
This results in the following output, showing where thousand-dollar contributions are most prevalent:
You can use the pctCategoryMargin
property to buffer numeric pctCategory
values, so you can use a range for the numeric category.
For information about syntax and requirements for the source
and transform
properties, see the property.
For more information about the aggregate functions used in these examples, see in the Vega reference.
For more information about quantiles, see: .
The name
property uniquely identifies a data set, and is used for reference by other Vega properties, such as the property.
This property is required for and mark types. The property has one of two forms:
See for more detailed examples.
The resulting render, composited over a basemap courtesy of , looks like this:
For scalar columns (such as lon/lat):
For geo POINT columns:
Refering back to the original example, replace the WHERE
clause with its equivalent:
The symbol
mark types are procedurally generated and use a simple POINT primitive in the underlying graphics API. This primitive has a maximum pixel size for this primitive. The limit is graphics driver–implementation defined, but testing shows this limit to be 2000 pixels in diameter. This limit can have an effect if you zoom in tight on areas where the circles have large areas. You may see points disappear, similar to the filtering issue described earlier. This most likely occurs because the ultimate width/height generated by the extension functions exceed this limit.
Property
Type
Description
width
and height
unsigned integer
Visualization area width and height, in pixels. Both properties are required. Example: Set the viewing area width to 384 pixels and the height to 564 pixels:
array
Source data. The Vega data model uses tabular data, similar to a spreadsheet. Organized in rows with any number of named columns. JSON format:
array
Projection data. Maps longitude and latitude data to projected x
and y
coordinates.
JSON format:
array
Data-to-visualization area mapping. Maps visually encoded data values to pixel positions with attributes, such as color. JSON format:
array
Geometric primitive used to visually encode data. JSON format:
Property
Data Type
Required
Description
string
X
Graphical marks type or shape:
points
lines
polys
symbol
object
Database table associated with the marks.
object
X
Visual encoding rules. Valid properties depend on marks type
.
object
Transforms applied to a mark.
Marks Type
Description
points
``
Render marks as points.
lines
``
Render marks as lines.
polys
``
Render marks as a polygon.
symbol
``
Render marks as a shape.
Data Source Field
Data Type
Description
data
string
Name of the data source. The data
name must be defined in the data property.
Property
Data Type
Valid Primitive Types
Description
angle
number
symbol
Amount of rotation about the defined center of rotation. The center of rotation depends on the properties that specify the symbol location:
x
and y
: Lower-left corner.
x
and yc
: Left center.
xc
and y
: Bottom center.
xc
and yc
: Center.
Must be a numerical constant or a scale that provides numerical values.
In the following example, the triangle-down symbol is rotated 30 degrees about the downward point:
angleUnit
string
symbol
Optional. Unit of measure for the rotation of a symbol around the center of rotation, defined in angle
. Either degrees
(default) or radians
.
fillColor
color
points, polys, symbol
Fill color. Must be a scale/data reference, a string, or a color represented by a 32-bit integer or unsigned integer. See Color Value Reference.
fillOpacity
number
points, polys, symbol
The fill opacity, from transparent (0
) to opaque (1
). If used with opacity
, the values are multiplied together to determine final opacity.
height
number
symbol
Mark height, in pixels.
lineJoin
string
line, polys, symbol
Line join method:
bevel
- Extension of a line end
miter
- Clipped extension of a line end
round
- Semi-circle at a line end
miterLimit
number
line, polys, symbol
The miter limit at which to bevel a line join, in pixels.
Must be a positive number. Default = 10.0
opacity
number
all
The line opacity as a whole, from transparent (0
) to opaque (1
). If used with fillOpacity
(points
, polys
, symbol
) or strokeOpacity
(lines), the values are multiplied together to determine final opacity.
shape
string
symbol
Shape name:
circle
cross
diamond
hexagon-horiz
hexagon-vert
square
triangle-down
triangle-left
triangle-right
triangle-up
wedge
size
number
points
Graphical primitive size, in pixels. Must be a scale/data reference or a number.
stroke
color
symbol
Stroke color.
strokeColor
color
line, polys
Stroke color. Must be a scale/data reference, a string, or a color represented by a 32-bit integer or unsigned integer. See Color Value Reference.
Default color = white
strokeOpacity
number
line, polys, symbol
Stroke opacity, from transparent (0
) to opaque (1
). If used with opacity
, the values are multiplied together to determine final opacity.
strokeWidth
number
line, polys, symbol
Stroke width, in pixels. Must be a scale/data reference or a number.
width
number
symbol
Mark width, in pixels.
x
number
all
Primary x-coordinate, in pixels. Must be a scale/data reference for polys
, or a scale/data reference or a number for points
, lines
, or symbol
. See Value Reference.
x2
number
symbol
Secondary x-coordinate, in pixels. See Value Reference.
xc
number
symbol
Center x-coordinate, in pixels. Incompatible with x
and x2
. See Value Reference.
y
number
all
Primary y-coordinate, in pixels. Must be a scale/data reference for polys
, or a scale/data reference or a number for points
, lines
, or symbol
. See Value Reference.
y2
number
symbol
Secondary y-coordinate, in pixels. See Value Reference.
yc
number
symbol
Center y-coordinate, in pixels. Incompatible with y
and y2
. See Value Reference.
z
number
points, symbol
Primary depth-coordinate, in pixels. Must be a scale/data reference or a number. See Value Reference.
Name
Type Description
value
Any
Constant value. If field
is specified, value
is ignored.
field
Perform a lookup on the current data value. The marks from
property determines the source data table and the field
name must be a column defined in the data
property.
scale
Name of a scale transform to apply to the mark. If the input is an object, it indicates a field value from which to dynamically look up the scale name and follows the Field Reference format.
Property
Type
Description
Property Name
FieldRef
Perform a lookup on the property name. This is the default operation when a field reference is a string.
Property Value Field
Data Type
Description
field
string
Name of the attribute from the data: sql
field.
colorSpace
string
Space in which the color is defined:
Hue-Chroma-Luminance color space. See HCL color space.
Use r
, g
, and b
property names.
Hue, saturation, and lightness color space. See HSL and HSV color space.
Use h
, s
, and l
property names.
Lab color space. A perceptual color space with distances based on human color judgments. The L dimension represents luminance, the A dimension represents green-red opposition and the B dimension represents blue-yellow opposition. See Lab color space.
Use l
, a
, and b
property names.
RGB color space. A version of LAB, which uses polar coordinates for the AB plane. See RGB color space.
Use h
, c
, and l
property names.
Property | Data Type | Required | Description |
| string | X | User-assigned name of the projection. |
| string | X | Projection type. Currently supported types:
|
| object | Specifies the longitude and latitude bounding box for the projection. Default values:
|
Property | Data Type | Required | Description |
string | X | User-assigned database table name. |
string/object | How the data are parsed. |
string | Data source:
|
string | An array of transforms to perform on the input data. The output of the transform pipeline then becomes the value of this data set. Currently, can only be used with |
boolean | If true, automatically adds rowid column(s) to the SQL statement, which is required for hit-testing using the |
Format Property | Description |
| Marks property type: |
| Applies to Specifies This permits column extraction pertaining to line rendering and place them in a rendering buffer. The Separate x- and y-array columns are also supported. |
| (optional) Applies to Specifies how vertices are packed in the vertices column. All arrays must have the same layout:
|
Key | Value | Description |
| String | Data is loaded from an existing data set. |
| SQL statement | Data is loaded using a SQL statement. |
| JSON data | Data is loaded from static, key-value pair data definitions. |
Type | Description and Properties |
| Performs aggregation operations on input data columns to calculate new aggregated statistic fields and derive new data streams from them. The following properties are required:
|
| Evaluates a user-defined expression. The following properties are required:
Note: Currently, expressions can only be performed against outputs (as values) from prior aggregate transforms. |
The Vega scales
property maps visually encoded data values to pixel positions with attributes, such as color. See the D3 scales documentation for additional background information about scales.
General scales
property JSON format:
The scales specification is one or more arrays with the following properties:
Note: As a general rule, limit the total number of domain and range values used to a maximum of 1000. Exceeding this limit can cause an error.
Define two scales, x
and y
. For the x
scale, linearly transform input data values between -100
and 999
to the visualization area width
. For the y
scale, linearly transform input data values between 0
and 500
to the visualization area height
. The width
and height
range values are pre-defined literals that reference the width
and height
properties.
The name property uniquely identifies the scale for reference by other properties.
The type property specifies how to transform the input, domain data to output, range visual values. Vega supports the following transforms, categorized by quantitative, discrete, and discretizing scales:
The domain
field specifies the domain of input data values. For quantitative data, this can take the form of a two-element array.
Specify minimum and maximum input values.
For ordinal or categorical data, the domain can be an array of valid input values.
Specify valid input data languages.
Scale range specifies the set of visual values. For numeric values, the range can take the form of a two-element array with minimum and maximum values. For ordinal or quantized data, the range can be an array of desired output values, which are mapped to elements in the specified domain.
Scale ranges can be specified in the following ways:
As an array of static values: "range": [0, 500]
or "range": ['a', 'b', 'c']
.
Using pre-defined literals: "range": "width"
or "range": "height"
.
Specify a color scale that quantizes input values between 0
and 100
among five visual output colors.
Scale ranges can accept width
and height
string literals that map to the Width and Height Properties.
Specify a y
scale that linearly maps input values between 0
and 500
to the height of the visualization area.
The default
scales property specifies the output value to use when the input domain value does not map to the range.
The default
property is not applicable to the threshold
scale type, which maps domain values outside of the range to either the lowest or highest range value.
The accumulator property enables you to identify regional density of data in a layer of a backend render and apply pixel coloring based on the accumulation mode that you have defined. Each data point is rendered individually, providing an accurate representation of data distribution in a spatial setting.
Apply a density accumulator to a linear scale named pointcolor
:
The color at a pixel is determined by normalizing per-pixel aggregated counts and using that value in the scale function to calculate a color. Normalization is performed according to the required minDensityCnt
and maxDensityCnt
properties. After normalization, minDensityCnt
== 0
and maxDensityCnt
== 1
.
minDensityCnt
and maxDensityCnt
can have explicit integer values or use one of the following keywords to compute statistical information about per-pixel counts: min
, max
, -1stStdDev
, -2ndStdDev
, 1stStdDev
, 2ndStdDev
.
For more detailed examples of using accumulators, see Tutorial: Vega Accumulator.
source
: Name of an existing Vega data set to use as this data set’s source. Use in combination with a pipeline to derive new data. You can source only one existing data set.
<code></code>
<code></code>
You can use to convert distance in meters from a coordinate or point to a pixel size, and determine if a coordinate or point is located within a view defined by latitude and longitude. For more information, see .
expr
: An expression string to be evaluated. Expressions currently support .
Property Field
Data Type
Required
Description
name
<code></code>
string
X
User-defined scale name.
type
<code></code>
string
Scale type, which specifies the domain
-to-range
transform:
linear
: Quantitative, continuous scale that preserves proportion among data items.
log
: Quantitative scale that applies a logarithmic transform to the data.
ordinal
: Discrete domain and range scale.
pow
: Quantitative scale that applies an exponential transform to the input data.
quantize
: Quantitative, discrete scale that divides input data into segments.
sqrt
: Quantitative scale that applies an square root transform to the input data.
threshold
: Discrete scale that maps arbitrary domain subsets to discrete range values.
domain
<code></code>
array
Domain. Array of input interval data values.
range
<code></code>
string or array
Range. Array of output interval visual data values.
default
<code></code>
number
Default output value to use when domain value does not map to range value.
accumulator
<code></code>
string
Accumulation rendering type:
blend
: Blends colors by category. Works only for discrete output scales (ordinal, quantize, and threshold).
density
: Performs count aggregation per pixel and applies the supplied color based on the normalization of the per-pixel aggregated counts over a specified range. The range is determined by the required minDensityCnt
and maxDensityCnt
properties. minDensityCnt
and maxDensityCnt
can be explicit integer values or one of the following keywords that automatically compute statistical information about the per-pixel counts:
min
max
-1stStdDev
-2ndStdDev
1stStdDev
2ndStdDev
pct
: Apply a color range based on percentage accumulation for a specific category.
nullValue
number
Specify the output value to use when the input value is null
.
Type
Description
Additional Information
linear
Preserves proportional differences, where range value y can be expressed as a linear function of the domain value x: y = mx + b
.
log
Applies a logarithmic transform to the input domain value before the output range value is computed. The mapping to the range value y can be expressed as a logarithmic function of the domain value x: y = m log(x) + b
.
As log(0) = -∞
, a log scale domain must be strictly-positive or strictly-negative. The domain must not include or cross zero. A log scale with a positive domain has a well-defined behavior for positive values. A log scale with a negative domain has a well-defined behavior for negative values. For a negative domain, input and output values are implicitly multiplied by -1
. The behavior of the scale is undefined if you compute a negative value for a log scale with a positive domain, and vice versa.
log
scale values must be positive. Default = base 10
.
pow
Applies an exponential transform to the input domain value before the output range value is computed. Range value y can be expressed as a polynomial function of the domain value x: y = mx^k + b
, where k
is the exponent. Power scales also support negative domain values, and input value and resulting output value are then multiplied by -1.
Default exponent = 1
.
sqrt
A shorthand for power scales with an exponent of 0.5, indicating a square root transform.
sqrt
scale values must be positive.
Type
Description
Resource
ordinal
Applies a discrete domain-to-range transform, and functions as a lookup table from a domain value to a range value.
Specify a default value for domain values that do not map to a range.
Type
Description
Resource
quantize
Divides input domain values into uniform segments based on the number of values in, or the cardinality of, the output range, where range value y can be expressed as a quantized linear function of the domain value x: y = m round(x) + b.
threshold
Maps arbitrary, non-uniform subsets of the domain to discrete range values. The input domain is continuous but divided into slices based on a set of domain threshold values. The range must have N+1 elements, where N is the number of domain threshold boundaries.
Value
Description
width
A spatial range that is the value of t``width``.
height
A spatial range that is the value of height
. The direction of the range, top-to-bottom or bottom-to-top, is determined by to the scale type.
Mode
Description
density
Perform count aggregation per pixel and define a color for a pixel by normalizing the count and applying a color to it based on a color scale.
You can activate density accumulation for any scale that takes as input a continuous domain (linear, sqrt, pow, log, threshold scales) and outputs a color range. The range is determined by the required minDensityCnt
and maxDensityCnt
properties. minDensityCnt
and maxDensityCnt
can be explicit integer values or one of the following keywords that automatically compute statistical information about the per-pixel counts:
min
max
-1stStdDev
-2ndStdDev
1stStdDev
2ndStdDev
Note: Domain values of density
accumulators must be between 0 and 1 inclusive.
blend
Blend by category (ultimately an ordinal scale). You can provide a color to a category and blend those colors to show the density of the distinct categorical values at a pixel.
pct
For a specific category, apply color based on the percentage of the category in a region.