The HEAVY.AI data science foundation includes the Altair visualization library. An overview of Altair from the project website:
Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub.
With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.
Altair and Ibis
Although Altair is typically used with smaller, local datasets, HEAVY.AI has integrated it with Ibis (and this integration itself is open-source). This combination allows interactive visualization over extremely large datasets consisting of billions of data points, all with minimal Python code.
In addition, Altair supports composable visualization, which allows for more than just local data exploration on small datasets when combined with Ibis. Because Ibis can support multiple storage backends, you can, for example, create charts that cover more than one (remote) data source at a time.
Examples
The following examples highlight the capabilities of Altair and ibis together with HEAVY.AI.
JupyterLab version 2.0 or higher is required for the following examples.
First, install ibis-vega-transform, which in turn installs Altair and Ibis.
You can use Altair directly with pandas, without using Ibis (see the Altair documentation). This example shows how Ibis can support pandas itself as a backend in addition to the SQL backends Ibis supports.
Adding Interactivity
Next, let's use Altair with a more scalable Ibis backend. This example uses HeavyDB, but you can try this with other Ibis backends supported via the ibis-vega-transform project that bridges Altair to Ibis.
This example connects to a public HEAVY.AI server, but you can use any HEAVY.AI server you have access to.
Here is a chart definition using Altair (and Vega/Vega-Lite) interactivity to parametrize the chart. Unlike a static pandas dataframe shown earlier, this uses an Ibis expression.
Next, let's create a simple Altair chart. This chart groups the list of airlines by the number of records (i.e flights) in this dataset. Doing so should produce a bar chart like the earlier example, but the difference here, is that we're connected to an HEAVY.AI backend rather than using a local pandas dataframe.
In the background, the ibis expression t[t.carrier_name]) is translated into a SQL query, and the results are rendered as a chart directly - no SQL knowledge required!
c = alt.Chart(t[t.carrier_name]).mark_bar().encode( x='carrier_name', y='count()')
Let's create a more interesting chart beyond a simple bar chart - in this case an Altair heatmap.
This should create a chart like this, where hovering over the cells shows an interactive tooltip
Adding More Interactivity
Altair provides many ways to add interactivity to charts. Actions like selection and brush filters can provide more dynamic data visualizations in Altair, that allow you to explore data in a far richer manner, beyond creating static charts.
#The next 2 lines create a selection slider to drive a parametrized Ibis expressionslider = alt.binding_range(name='Month', min=1, max=12, step=1)select_month = alt.selection_single(fields=['flight_month'], bind=slider, init={'flight_month': 1})#Note how this uses an Ibis expression for the chart data sourcealt.Chart(t[t.flight_dayofmonth, t.depdelay, t.flight_month]).mark_line().encode( x='flight_dayofmonth:O', y='average(depdelay)').add_selection( select_month).transform_filter( select_month)
This creates an interactive chart that is parametrized by the slider. Moving the slider changes the selected month and updates the chart. Unlike working with a static, local dataset, you are now running SQL queries against HeavyDB each time the slide value changes.
You can see this in the logs, in the final query generated:
"SELECT ""flight_dayofmonth"", avg(""depdelay"") AS average_depdelayFROM ( SELECT ""flight_dayofmonth"", ""depdelay"", ""flight_month"" FROM flights_2008_7M WHERE ""flight_month""=3.0#this is from the slider value) t0GROUP BY flight_dayofmonth"T
Crossfiltering
You can build sophisticated chart combinations that combine several Altair capabilities with Ibis to create a cross-filtered visualization, like in Heavy Immerse. In this example, every data source is an Ibis expression that generates SQL queries to a HEAVY.AI backend. A total of five queries are generated and executed to create the cross-filtered visualization.
This generates the following Altair visualization, which leverages composable charting and provides greater interactivity with enhanced selections powered by dynamic data loading via Ibis.
Geospatial Visualization
Altair and Ibis can also be used to visualize geospatial data. Altair supports multiple geospatial visualizations and can accept GeoPandas dataframes as input. Some Ibis backends, including HEAVY.AI, support spatial operations, which output to GeoPandas dataframes. By combining the two, you can create map-based visualizations.
Exploring Further
You can combine Ibis and Altair inside JupyterLab. By defining multiple Ibis backend connections with Ibis, you can create complex interactive visualizations that span multiple data sources, all without moving data into local memory. This allows greater flexibility and productivity in data exploration.