HEAVY.AI Docs
v8.1.0
v8.1.0
  • Welcome to HEAVY.AI Documentation
  • Overview
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
      • Hardware Reference
      • Software Requirements
      • Licensing
    • Installation
      • Free Version
      • Installing on Docker
        • HEAVY.AI Installation using Docker on Ubuntu
      • Installing on Ubuntu
        • HEAVY.AI Installation on Ubuntu
        • Install NVIDIA Drivers and Vulkan on Ubuntu
      • Installing on Rocky Linux / RHEL
        • HEAVY.AI Installation on RHEL
        • Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL
      • Getting Started on AWS
      • Getting Started on GCP
      • Getting Started on Azure
      • Getting Started on Kubernetes (BETA)
      • Upgrading
        • Upgrading HEAVY.AI
        • Upgrading from Omnisci to HEAVY.AI 6.0
        • CUDA Compatibility Drivers
      • Uninstalling
      • Ports
    • Services and Utilities
      • Using Services
      • Using Utilities
    • Executor Resource Manager
    • Configuration Parameters
      • Overview
      • Configuration Parameters for HeavyDB
      • Configuration Parameters for HEAVY.AI Web Server
      • Configuration Parameters for HeavyIQ
    • Security
      • Roles and Privileges
        • Column-Level Security
      • Connecting Using SAML
      • Implementing a Secure Binary Interface
      • Encrypted Credentials in Custom Applications
      • LDAP Integration
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
      • Kafka
      • Using HeavyImmerse Data Manager
      • Importing Geospatial Data
    • Command Line
      • Loading Data with SQL
      • Exporting Data
  • SQL
    • Data Definition (DDL)
      • Datatypes
      • Users and Databases
      • Tables
      • System Tables
      • Views
      • Policies
      • Comment
    • Data Manipulation (DML)
      • SQL Capabilities
        • ALTER SESSION SET
        • ALTER SYSTEM CLEAR
        • DELETE
        • EXPLAIN
        • INSERT
        • KILL QUERY
        • LIKELY/UNLIKELY
        • SELECT
        • SHOW
        • UPDATE
        • Arrays
        • Logical Operators and Conditional and Subquery Expressions
        • Table Expression and Join Support
        • Type Casts
      • Geospatial Capabilities
        • Uber H3 Hexagonal Modeling
      • Functions and Operators
      • System Table Functions
        • generate_random_strings
        • generate_series
        • tf_compute_dwell_times
        • tf_feature_self_similarity
        • tf_feature_similarity
        • tf_geo_rasterize
        • tf_geo_rasterize_slope
        • tf_graph_shortest_path
        • tf_graph_shortest_paths_distances
        • tf_load_point_cloud
        • tf_mandelbrot*
        • tf_point_cloud_metadata
        • tf_raster_contour_lines; tf_raster_contour_polygons
        • tf_raster_graph_shortest_slope_weighted_path
        • tf_rf_prop_max_signal (Directional Antennas)
        • ts_rf_prop_max_signal (Isotropic Antennas)
        • tf_rf_prop
      • Window Functions
      • Reserved Words
      • SQL Extensions
      • HeavyIQ LLM_TRANSFORM
  • HeavyImmerse
    • Introduction to HeavyImmerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
      • Dashboard List
      • Creating a Dashboard
      • Configuring a Dashboard
      • Duplicating and Sharing Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Using Cross-link
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Joins (Beta)
    • Chart Types
      • Overview
      • Bubble
      • Choropleth
      • Combo
      • Contour
      • Cross-Section
      • Gauge
      • Geo Heatmap
      • Heatmap
      • Linemap
      • Number
      • Pie
      • Pointmap
      • Scatter Plot
      • Skew-T
      • Table
      • Text Widget
      • Wind Barb
    • Deprecated Charts
      • Bar
      • Combo - Original
      • Histogram
      • Line
      • Stacked Bar
    • HeavyIQ SQL Notebook
  • HEAVYIQ Conversational Analytics
    • HeavyIQ Overview
      • HeavyIQ Guidance
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
    • Raster Data Wrapper Reference
  • HeavyML (BETA)
    • HeavyML Overview
    • Clustering Algorithms
    • Regression Algorithms
      • Linear Regression
      • Random Forest Regression
      • Decision Tree Regression
      • Gradient Boosting Tree Regression
    • Principal Components Analysis
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)
      • Installation
      • Registering and Using a Function
      • User-Defined Table Functions
      • RBC UDF/UDTF Example Notebooks
      • General UDF/UDTF Tutorial Notebooks
      • RBC API Reference
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
      • Forecasting with HEAVY.AI and Prophet
  • APIs and Interfaces
    • Overview
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
      • Vega Tutorials
        • Vega at a Glance
        • Getting Started with Vega
        • Getting More from Your Data
        • Creating More Advanced Charts
        • Using Polys Marks Type
        • Vega Accumulator
        • Using Transform Aggregation
        • Improving Rendering with SQL Extensions
      • Vega Reference Overview
        • data Property
        • projections Property
        • scales Property
        • marks Property
      • Migration
        • Migrating Vega Code to Dynamic Poly Rendering
      • Try Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • FAQs
    • Troubleshooting
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
      • Release 6.x
      • Release 5.x
      • Release 4.x
      • Release 3.x
Powered by GitBook
On this page
  • Altair and Ibis
  • Examples
  • Simple Example
  • Adding Interactivity
  • Adding More Interactivity
  • Crossfiltering
  • Geospatial Visualization
  • Exploring Further
Export as PDF
  1. Python / Data Science

Interactive Data Exploration with Altair

PreviousIbisNextAdditional Examples

See the Altair open source to learn more and .

The HEAVY.AI data science foundation includes the Altair visualization library. An overview of Altair from the project website:

Altair is a declarative statistical visualization library for Python, based on and , and the source is available on .

With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

Altair and Ibis

Although Altair is typically used with smaller, local datasets, HEAVY.AI has integrated it with (and this integration itself is ). This combination allows interactive visualization over extremely large datasets consisting of billions of data points, all with minimal Python code.

In addition, Altair supports visualization, which allows for more than just local data exploration on small datasets when combined with Ibis. Because Ibis can support multiple storage backends, you can, for example, create charts that cover more than one (remote) data source at a time.

Examples

The following examples highlight the capabilities of Altair and ibis together with HEAVY.AI.

JupyterLab version 2.0 or higher is required for the following examples.

First, install ibis-vega-transform, which in turn installs Altair and Ibis.

pip install ibis-vega-transform
jupyter labextension install ibis-vega-transform

Simple Example

The following minimal example of Ibis and Altair together starts with a simple pandas dataframe.

import altair as alt
import ibis
import ibis_vega_transform
import pandas as pd

source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

connection = ibis.pandas.connect({'source': source })
table = connection.table('source')

alt.Chart(table).mark_bar().encode(
    x='a',
    y='b'
)

This produces an image like this.

You can use Altair directly with pandas, without using Ibis (see the Altair documentation). This example shows how Ibis can support pandas itself as a backend in addition to the SQL backends Ibis supports.

Adding Interactivity

This example connects to a public HEAVY.AI server, but you can use any HEAVY.AI server you have access to.

conn = ibis.heavyai.connect(
    host='metis.mapd.com', user='admin', password='HyperInteractive',
    port=443, database='heavyai', protocol= 'https'
)

t = conn.table('flights_donotmodify')

Here is a chart definition using Altair (and Vega/Vega-Lite) interactivity to parametrize the chart. Unlike a static pandas dataframe shown earlier, this uses an Ibis expression.

heavyai_cli = ibis.heavyai.connect(
    host='metis.mapd.com', user='admin', password='HyperInteractive',
    port=443, database='heavyai', protocol= 'https'
)

t = conn.table('flights_donotmodify')

Next, let's create a simple Altair chart. This chart groups the list of airlines by the number of records (i.e flights) in this dataset. Doing so should produce a bar chart like the earlier example, but the difference here, is that we're connected to an HEAVY.AI backend rather than using a local pandas dataframe.

In the background, the ibis expression t[t.carrier_name]) is translated into a SQL query, and the results are rendered as a chart directly - no SQL knowledge required!

c = alt.Chart(t[t.carrier_name]).mark_bar().encode(
    x='carrier_name',
    y='count()'
)
delay_by_month = alt.Chart(t[t.flight_dayofmonth, t.flight_month, t.depdelay]).mark_rect().encode(
    x='flight_dayofmonth:O',
    y='flight_month:O',
    color='average(depdelay)',
    tooltip=['average(depdelay):Q']
)
delay_by_month

This should create a chart like this, where hovering over the cells shows an interactive tooltip

Adding More Interactivity

#The next 2 lines create a selection slider to drive a parametrized Ibis expression
slider = alt.binding_range(name='Month', min=1, max=12, step=1)
select_month = alt.selection_single(fields=['flight_month'],
                                    bind=slider, init={'flight_month': 1})

#Note how this uses an Ibis expression for the chart data source
alt.Chart(t[t.flight_dayofmonth, t.depdelay, t.flight_month]).mark_line().encode(
    x='flight_dayofmonth:O',
    y='average(depdelay)'
).add_selection(
    select_month
).transform_filter(
    select_month
)

This creates an interactive chart that is parametrized by the slider. Moving the slider changes the selected month and updates the chart. Unlike working with a static, local dataset, you are now running SQL queries against HeavyDB each time the slide value changes.

You can see this in the logs, in the final query generated:

"SELECT ""flight_dayofmonth"", avg(""depdelay"") AS average_depdelay
FROM (
  SELECT ""flight_dayofmonth"", ""depdelay"", ""flight_month""
  FROM flights_2008_7M
  WHERE ""flight_month"" = 3.0 #this is from the slider value
) t0
GROUP BY flight_dayofmonth"T

Crossfiltering

You can build sophisticated chart combinations that combine several Altair capabilities with Ibis to create a cross-filtered visualization, like in Heavy Immerse. In this example, every data source is an Ibis expression that generates SQL queries to a HEAVY.AI backend. A total of five queries are generated and executed to create the cross-filtered visualization.

states = alt.selection_multi(fields=['origin_state'])
airlines = alt.selection_multi(fields=['carrier_name'])

DEBOUNCE_MS = 50

dates = alt.selection_interval(
    fields=['dep_timestamp'],
    encodings=['x'],
    on=f'[mousedown, window:mouseup] > window:mousemove!{{0, {DEBOUNCE_MS}}}',
    translate=f'[mousedown, window:mouseup] > window:mousemove!{{0, {DEBOUNCE_MS}}}',
    zoom=False
)

HEIGHT = 750
WIDTH = 1000

count_filter = alt.Chart(
    t[t.dep_timestamp, t.depdelay, t.origin_state, t.carrier_name],
    title="Selected Rows"
).transform_filter(
    airlines
).transform_filter(
    dates
).transform_filter(
    states
).mark_text().encode(
    text='count()'
)

count_total = alt.Chart(
    t,
    title="Total Rows"
).mark_text().encode(
    text='count()'
)

flights_by_state = alt.Chart(
    t[t.origin_state, t.carrier_name, t.dep_timestamp],
    title="Total Number of Flights by State"
).transform_filter(
    airlines
).transform_filter(
    dates
).mark_bar().encode(
    x='count()',
    y=alt.Y('origin_state', sort=alt.Sort(encoding='x', order='descending')),
    color=alt.condition(states, alt.ColorValue("steelblue"), alt.ColorValue("grey"))
).add_selection(
    states
).properties(
    height= 2 * HEIGHT / 3,
    width=WIDTH / 2
) + alt.Chart(
    t[t.origin_state, t.carrier_name, t.dep_timestamp],
).transform_filter(
    airlines
).transform_filter(
    dates
).mark_text(dx=20).encode(
    x='count()',
    y=alt.Y('origin_state', sort=alt.Sort(encoding='x', order='descending')),
    text='count()'
).properties(
    height= 2 * HEIGHT / 3.25,
    width=WIDTH / 2
)

carrier_delay = alt.Chart(
    t[t.depdelay, t.arrdelay, t.carrier_name, t.origin_state, t.dep_timestamp],
    title="Carrier Departure Delay by Arrival Delay (Minutes)"
).transform_filter(
    states
).transform_filter(
    dates
).transform_aggregate(
    depdelay='mean(depdelay)',
    arrdelay='mean(arrdelay)',
    groupby=["carrier_name"]
).mark_point(filled=True, size=200).encode(
    x='depdelay',
    y='arrdelay',
    color=alt.condition(airlines, alt.ColorValue("steelblue"), alt.ColorValue("grey")),
    tooltip=['carrier_name', 'depdelay', 'arrdelay']
).add_selection(
    airlines
).properties(
    height=2 * HEIGHT / 3.25,
    width=WIDTH / 2
) + alt.Chart(
    t[t.depdelay, t.arrdelay, t.carrier_name, t.origin_state, t.dep_timestamp],
).transform_filter(
    states
).transform_filter(
    dates
).transform_aggregate(
    depdelay='mean(depdelay)',
    arrdelay='mean(arrdelay)',
    groupby=["carrier_name"]
).mark_text().encode(
    x='depdelay',
    y='arrdelay',
    text='carrier_name',
).properties(
    height=2 * HEIGHT / 3.25,
    width=WIDTH / 2
)

time = alt.Chart(
    t[t.dep_timestamp, t.depdelay, t.origin_state, t.carrier_name],
    title='Number of Flights by Departure Time'
).transform_filter(
    'datum.dep_timestamp != null'
).transform_filter(
    airlines
).transform_filter(
    states
).mark_line().encode(
    alt.X(
        'yearmonthdate(dep_timestamp):T',
    ),
    alt.Y(
        'count():Q',
        scale=alt.Scale(zero=False)
    )
).add_selection(
    dates
).properties(
    height=HEIGHT / 3,
    width=WIDTH + 50
)

(
    (count_filter | count_total) & (flights_by_state | carrier_delay) & time
).configure_axis(
    grid=False
).configure_view(
    strokeOpacity=0
).properties(padding=50)

This generates the following Altair visualization, which leverages composable charting and provides greater interactivity with enhanced selections powered by dynamic data loading via Ibis.

Geospatial Visualization

Exploring Further

You can combine Ibis and Altair inside JupyterLab. By defining multiple Ibis backend connections with Ibis, you can create complex interactive visualizations that span multiple data sources, all without moving data into local memory. This allows greater flexibility and productivity in data exploration.

Next, let's use Altair with a more scalable Ibis backend. This example uses HeavyDB, but you can try this with other Ibis backends supported via the project that bridges Altair to Ibis.

Let's create a more interesting chart beyond a simple bar chart - in this case an Altair .

Altair provides many ways to add interactivity to charts. Actions like selection and brush filters can provide more in Altair, that allow you to explore data in a far richer manner, beyond creating static charts.

Altair and Ibis can also be used to visualize geospatial data. Altair supports multiple visualizations and can accept dataframes as input. Some Ibis backends, including HEAVY.AI, support , which output to GeoPandas dataframes. By combining the two, you can create map-based visualizations.

project documentation
stay up to date
Vega
Vega-Lite
GitHub
Vega-Lite
Ibis
open-source
composable
ibis-vega-transform
heatmap
dynamic data visualizations
geospatial
GeoPandas
spatial operations