HEAVY.AI Docs
v8.1.0
v8.1.0
  • Welcome to HEAVY.AI Documentation
  • Overview
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
      • Hardware Reference
      • Software Requirements
      • Licensing
    • Installation
      • Free Version
      • Installing on Docker
        • HEAVY.AI Installation using Docker on Ubuntu
      • Installing on Ubuntu
        • HEAVY.AI Installation on Ubuntu
        • Install NVIDIA Drivers and Vulkan on Ubuntu
      • Installing on Rocky Linux / RHEL
        • HEAVY.AI Installation on RHEL
        • Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL
      • Getting Started on AWS
      • Getting Started on GCP
      • Getting Started on Azure
      • Getting Started on Kubernetes (BETA)
      • Upgrading
        • Upgrading HEAVY.AI
        • Upgrading from Omnisci to HEAVY.AI 6.0
        • CUDA Compatibility Drivers
      • Uninstalling
      • Ports
    • Services and Utilities
      • Using Services
      • Using Utilities
    • Executor Resource Manager
    • Configuration Parameters
      • Overview
      • Configuration Parameters for HeavyDB
      • Configuration Parameters for HEAVY.AI Web Server
      • Configuration Parameters for HeavyIQ
    • Security
      • Roles and Privileges
        • Column-Level Security
      • Connecting Using SAML
      • Implementing a Secure Binary Interface
      • Encrypted Credentials in Custom Applications
      • LDAP Integration
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
      • Kafka
      • Using HeavyImmerse Data Manager
      • Importing Geospatial Data
    • Command Line
      • Loading Data with SQL
      • Exporting Data
  • SQL
    • Data Definition (DDL)
      • Datatypes
      • Users and Databases
      • Tables
      • System Tables
      • Views
      • Policies
      • Comment
    • Data Manipulation (DML)
      • SQL Capabilities
        • ALTER SESSION SET
        • ALTER SYSTEM CLEAR
        • DELETE
        • EXPLAIN
        • INSERT
        • KILL QUERY
        • LIKELY/UNLIKELY
        • SELECT
        • SHOW
        • UPDATE
        • Arrays
        • Logical Operators and Conditional and Subquery Expressions
        • Table Expression and Join Support
        • Type Casts
      • Geospatial Capabilities
        • Uber H3 Hexagonal Modeling
      • Functions and Operators
      • System Table Functions
        • generate_random_strings
        • generate_series
        • tf_compute_dwell_times
        • tf_feature_self_similarity
        • tf_feature_similarity
        • tf_geo_rasterize
        • tf_geo_rasterize_slope
        • tf_graph_shortest_path
        • tf_graph_shortest_paths_distances
        • tf_load_point_cloud
        • tf_mandelbrot*
        • tf_point_cloud_metadata
        • tf_raster_contour_lines; tf_raster_contour_polygons
        • tf_raster_graph_shortest_slope_weighted_path
        • tf_rf_prop_max_signal (Directional Antennas)
        • ts_rf_prop_max_signal (Isotropic Antennas)
        • tf_rf_prop
      • Window Functions
      • Reserved Words
      • SQL Extensions
      • HeavyIQ LLM_TRANSFORM
  • HeavyImmerse
    • Introduction to HeavyImmerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
      • Dashboard List
      • Creating a Dashboard
      • Configuring a Dashboard
      • Duplicating and Sharing Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Using Cross-link
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Joins (Beta)
    • Chart Types
      • Overview
      • Bubble
      • Choropleth
      • Combo
      • Contour
      • Cross-Section
      • Gauge
      • Geo Heatmap
      • Heatmap
      • Linemap
      • Number
      • Pie
      • Pointmap
      • Scatter Plot
      • Skew-T
      • Table
      • Text Widget
      • Wind Barb
    • Deprecated Charts
      • Bar
      • Combo - Original
      • Histogram
      • Line
      • Stacked Bar
    • HeavyIQ SQL Notebook
  • HEAVYIQ Conversational Analytics
    • HeavyIQ Overview
      • HeavyIQ Guidance
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
    • Raster Data Wrapper Reference
  • HeavyML (BETA)
    • HeavyML Overview
    • Clustering Algorithms
    • Regression Algorithms
      • Linear Regression
      • Random Forest Regression
      • Decision Tree Regression
      • Gradient Boosting Tree Regression
    • Principal Components Analysis
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)
      • Installation
      • Registering and Using a Function
      • User-Defined Table Functions
      • RBC UDF/UDTF Example Notebooks
      • General UDF/UDTF Tutorial Notebooks
      • RBC API Reference
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
      • Forecasting with HEAVY.AI and Prophet
  • APIs and Interfaces
    • Overview
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
      • Vega Tutorials
        • Vega at a Glance
        • Getting Started with Vega
        • Getting More from Your Data
        • Creating More Advanced Charts
        • Using Polys Marks Type
        • Vega Accumulator
        • Using Transform Aggregation
        • Improving Rendering with SQL Extensions
      • Vega Reference Overview
        • data Property
        • projections Property
        • scales Property
        • marks Property
      • Migration
        • Migrating Vega Code to Dynamic Poly Rendering
      • Try Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • FAQs
    • Troubleshooting
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
      • Release 6.x
      • Release 5.x
      • Release 4.x
      • Release 3.x
Powered by GitBook
On this page
  • Install the backend ibis-heavyai
  • Ibis and HeavyDB
  • Apache Arrow Support
  • Geospatial Operations
  • User-Defined Functions
  • Additional Resources
Export as PDF
  1. Python / Data Science

Ibis

Explore large datasets in HEAVY.AI with the full power of SQL, with a pandas-like API

PreviousRBC API ReferenceNextInteractive Data Exploration with Altair

The Ibis open source is the best place to learn more and .

Ibis allows you to explore data in a variety of remote storage systems, with an API is inspired by the toolkit.

You can use Ibis to interact directly with HeavyDB and several other supported SQL systems by writing high-level Python code instead of lower-level SQL. Using familiar tools can increase productivity.

Following is summary of what Ibis provides, directly from the :

  • Full coverage of SQL features: You can code in Ibis anything you can implement in a SQL SELECT

  • Transparent to SQL implementation differences: Write standard code that translate to any SQL syntax

  • High performance execution: Execute at the speed of your backend, not your local computer

  • Integration with community data formats and tools (e.g. pandas, Parquet, Avro...)

Ibis support for including common SQL databases and processing engines. This allows you to author analytical workflows or operations once and run them consistently against these backends. You can also create complex workflows that can simultaneously materialize data from multiple backends and combine the resulting outputs in various ways.

Supported backends include PySpark, Google BigQuery, and PostgreSQL.

Install the backend ibis-heavyai

You can install using conda (recommended) or mamba, which is a faster implementation of conda. For more information, see .

conda install -c conda-forge ibis-heavyai

# or to install rbc to a new environemnt, run
conda create -n ibis -c conda-forge ibis-heavyai
conda activate ibis

# check if ibis-heavyai installed succesfully
python -c 'import ibis; ibis.heavyai.version'

You can also use pip for package management:

pip install ibis-heavyai
# check if ibis-heavyai installed succesfully
python -c 'import ibis; ibis.heavyai.version'

Ibis and HeavyDB

This short example demonstrates Ibis. Inside a notebook cell, you can first connect to an HeavyDB database instance:

import ibis

con = ibis.heavyai.connect(
    host='localhost',
    database='heavyai',
    user='admin',
    password='HyperInteractive'
)

If you are launching JupyterLab from Immerse, this connection is already set up for you based on your Immerse login and credentials.

Next, let's identify a table and define a simple Ibis expression against it:

tablename = 'github' 
t = con.table(tablename)
count_expr = t.count()
count_expr.compile()

This expression is compiled by Ibis into SQL that runs against HeavyDB:

SELECT count(*) AS "count"
FROM github

When executed, the above SQL, as expected, counts the rows of the table:

print(f'Count of records in table {tablename} is {count_expr.execute()}')
Count of records in table github is 2033049065

You can also write raw SQL in Ibis. A SQL query can be wrapped as an Ibis expression and composed together with other Ibis expressions.

well_list = ['4201334812', '4201335379', '4201335112', '4201334947', '4201334329', '4201334965', '4201334978', '4201334998', '4201335467', '4201334917', '4201334930', '4212331826', '4212331164', '4212300017', '4212331782', '4212332076', '4212331839', '4228532777', '4228533451', '4212332354', '4212332207', '4212330405', '4212332014']

#Find oil well production for the specific well list above
sum_oil=t.oil_monthly.sum().log10().name('oil_monthly')
expr = (t.filter([t.api_no_10.isin(well_list), t.oil_monthly.notnull()])\
        .group_by(['date_prod', 'api_no_10']).having(sum_oil >= 0.)\
        .aggregate(sum_oil))\
.sort_by('date_prod')

# find the max production month per well in the wellset
max_oil = expr.oil_monthly.max().name('oil_monthly')
expr1 = expr.group_by('api_no_10').aggregate(max_oil)

expr2 = (expr.join(expr1, [expr.api_no_10 == expr1.api_no_10,\
                           expr.oil_monthly == expr1.oil_monthly]))\
        [expr.api_no_10.name("api"), expr.oil_monthly.name("om2"), expr.date_prod.name("dp2")]

print(expr2.compile())

This results in automatically generated SQL:

WITH t0 AS (
  SELECT "date_prod", "api_no_10", log10(sum("oil_monthly")) AS oil_monthly
  FROM upstream_reservoir
  WHERE ("api_no_10" IN ('4201334998', '4212331782', '4212331826', '4201334965', '4212300017', '4201334329', '4228533451', '4212331164', '4201335467', '4201334930', '4212330405', '4201334947', '4212332354', '4201334978', '4201334917', '4201335379', '4201334812', '4212332207', '4228532777', '4201335112', '4212332076', '4212331839', '4212332014')) AND
        ("oil_monthly" IS NOT NULL)
  GROUP BY date_prod, api_no_10
  HAVING log10(sum("oil_monthly")) >= 0.0
  ORDER BY "date_prod"
)
SELECT t0."api_no_10" AS api, t0."oil_monthly" AS om2, t0."date_prod" AS dp2
FROM t0
  JOIN (
    SELECT "api_no_10", max("oil_monthly") AS oil_monthly
    FROM t0
    GROUP BY api_no_10
  ) t1
    ON (t0."api_no_10" = t1."api_no_10") AND
       (t0."oil_monthly" = t1."oil_monthly")

The result of the evaluation is by default a pandas dataframe, making it convenient to use Ibis inside other Python data analysis workflows.

expr2.execute()

api

om2

dp2

0

4201335379

3.944927

2016-11-01

1

4201335467

4.125253

2017-12-01

2

4201334998

3.822887

2014-09-01

3

4201334930

3.698709

2014-05-01

4

4201334917

4.353243

2014-06-01

5

4201334812

4.127299

2014-03-01

6

4201334947

3.715502

2014-06-01

7

4201334978

3.822887

2014-09-01

8

4201335112

4.010766

2015-01-01

9

4201334329

3.721068

2015-07-01

Although Ibis can output to pandas, the result is first materialized and could be transferred over the network to the client running Ibis (such as a Jupyter notebook in a browser on a laptop). Avoid using it for expressions that try to move a million rows to a browser.

Apache Arrow Support

gdf = get_ts_by_period_station(zip_cd=94568).execute(gpu_device=3, ipc=True)

RAPIDs is currently usable only with supported Nvidia GPUs and requires CUDA

Geospatial Operations

Here is an example of how a table that has geospatial data in HEAVY.AI, can output directly to a geopandas dataframe.

heavyai_cli = ibis.heavyai.connect(
    host='metis.heavyai.com', user='heavyai', password='HyperInteractive',
    port=443, database='heavyai', protocol= 'https'
)
t = heavyai_cli.table('zipcodes_2017')
df = t.head().execute()
type(df)

#geopandas.geodataframe.GeoDataFrame

User-Defined Functions

Using Ibis, you can create and use user-defined functions (UDFs) in Python that execute inside HeavyDB. This UDF framework leverages integration with Numba, a Just-in-Time (JIT) compiler backend for python, and produces lower-level code for a more performant execution path than within Python itself.

It also makes it easy to author UDFs in Python, and then make the UDFs usable in a SQL workflow.

Additional Resources

See the following Ibis documentation to get started:

This connection object has that can come in handy, for listing database objects, creating tables, loading data etc.

Ibis offers a powerful way to , and then leverage the processing power and scale of supported storage backends to evaluate them and return results. The following shows a more complex Ibis expression and the SQL it generates. You can leverage a full programming language (Python) to assemble complex expressions that cover SQL features like joins, filters, and aggregates.

Ibis depends on as a low-level connector. It can leverage some of the key features of pyomnisci, including output of query results to the Arrow format via the select_ipcandselect_ipc_gpucalls.

Building on this capability, the HeavyDB backend for Ibis can output query results using Apache Arrow to CPU or GPU memory (the latter by using the dataframe library from NVIDIA). This makes it convenient to use with the GPU-based RAPIDS machine learning methods. Output to cudf is as simple as passing an extra couple of parameters to the execute method in ibis, for example, as follows

See the for how you can build a workflow integrating HEAVY.AI and RAPIDs.

The HeavyDB backend for Ibis supports geospatial functions building directly on HeavyDB for these functions. These operations produce dataframes, allowing for usage in Python-based geospatial analytics workflows.

The Ibis User Guide---in particular, the section, which shows how Ibis simplifies workflows involving analytical SQL.

The extensive .

project documentation
stay up to date
pandas
Ibis website
several execution backends
ibis-heavyai
the Mamba documentation
several utility methods
compose sophisticated analytical expressions
heavyai
cudf
RAPIDS example
SQL support
geopandas
Ibis for SQL programmers
tutorials