Explore large datasets in HEAVY.AI with the full power of SQL, with a pandas-like API
You can use Ibis to interact directly with HeavyDB and several other supported SQL systems by writing high-level Python code instead of lower-level SQL. Using familiar tools can increase productivity.
Full coverage of SQL features: You can code in Ibis anything you can implement in a SQL SELECT
Transparent to SQL implementation differences: Write standard code that translate to any SQL syntax
High performance execution: Execute at the speed of your backend, not your local computer
Integration with community data formats and tools (e.g. pandas, Parquet, Avro...)
Supported backends include PySpark, Google BigQuery, and PostgreSQL.
You can also use pip
for package management:
This short example demonstrates Ibis. Inside a notebook cell, you can first connect to an HeavyDB database instance:
Next, let's identify a table and define a simple Ibis expression against it:
This expression is compiled by Ibis into SQL that runs against HeavyDB:
When executed, the above SQL, as expected, counts the rows of the table:
This results in automatically generated SQL:
The result of the evaluation is by default a pandas dataframe, making it convenient to use Ibis inside other Python data analysis workflows.
Here is an example of how a table that has geospatial data in HEAVY.AI, can output directly to a geopandas dataframe.
Using Ibis, you can create and use user-defined functions (UDFs) in Python that execute inside HeavyDB. This UDF framework leverages integration with Numba, a Just-in-Time (JIT) compiler backend for python, and produces lower-level code for a more performant execution path than within Python itself.
It also makes it easy to author UDFs in Python, and then make the UDFs usable in a SQL workflow.
See the following Ibis documentation to get started: