HEAVY.AI Docs
v7.2.4
v7.2.4
  • Welcome to HEAVY.AI Documentation
  • Overview
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
      • Hardware Reference
      • Software Requirements
    • Installation
      • Free Version
      • Installing on CentOS
        • HEAVY.AI Installation on CentOS/RHEL
        • Install NVIDIA Drivers and Vulkan on CentOS/RHEL
      • Installing on Ubuntu
        • HEAVY.AI Installation on Ubuntu
        • Install NVIDIA Drivers and Vulkan on Ubuntu
      • Installing on Docker
        • HEAVY.AI Installation using Docker on Ubuntu
      • Getting Started on AWS
      • Getting Started on GCP
      • Getting Started on Azure
      • Getting Started on Kubernetes (BETA)
      • Upgrading
        • Upgrading HEAVY.AI
        • Upgrading from Omnisci to HEAVY.AI 6.0
        • CUDA Compatibility Drivers
      • Uninstalling
      • Ports
    • Services and Utilities
      • Using Services
      • Using Utilities
    • Executor Resource Manager
    • Configuration Parameters
      • Overview
      • Configuration Parameters for HeavyDB
      • Configuration Parameters for HEAVY.AI Web Server
    • Security
      • Roles and Privileges
      • Connecting Using SAML
      • Implementing a Secure Binary Interface
      • Encrypted Credentials in Custom Applications
      • LDAP Integration
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
      • Kafka
      • Using Heavy Immerse Data Manager
      • Importing Geospatial Data
    • Command Line
      • Loading Data with SQL
      • Exporting Data
  • SQL
    • Data Definition (DDL)
      • Datatypes
      • Users and Databases
      • Tables
      • System Tables
      • Views
      • Policies
    • Data Manipulation (DML)
      • SQL Capabilities
        • ALTER SESSION SET
        • ALTER SYSTEM CLEAR
        • DELETE
        • EXPLAIN
        • INSERT
        • KILL QUERY
        • LIKELY/UNLIKELY
        • SELECT
        • SHOW
        • UPDATE
        • Arrays
        • Logical Operators and Conditional and Subquery Expressions
        • Table Expression and Join Support
        • Type Casts
      • Geospatial Capabilities
        • Uber H3 Hexagonal Modeling
      • Functions and Operators
      • System Table Functions
        • generate_random_strings
        • generate_series
        • tf_compute_dwell_times
        • tf_feature_self_similarity
        • tf_feature_similarity
        • tf_geo_rasterize
        • tf_geo_rasterize_slope
        • tf_graph_shortest_path
        • tf_graph_shortest_paths_distances
        • tf_load_point_cloud
        • tf_mandelbrot*
        • tf_point_cloud_metadata
        • tf_raster_contour_lines; tf_raster_contour_polygons
        • tf_raster_graph_shortest_slope_weighted_path
        • tf_rf_prop_max_signal (Directional Antennas)
        • ts_rf_prop_max_signal (Isotropic Antennas)
        • tf_rf_prop
      • Window Functions
      • Reserved Words
      • SQL Extensions
  • Heavy Immerse
    • Introduction to Heavy Immerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
      • Dashboard List
      • Creating a Dashboard
      • Configuring a Dashboard
      • Duplicating and Sharing Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Using Cross-link
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Joins (Beta)
    • Chart Types
      • Overview
      • Bar
      • Bubble
      • Choropleth
      • Combo
      • Cross-Section
      • Contour
      • Gauge
      • Geo Heatmap
      • Heatmap
      • Histogram
      • Line
      • Linemap
      • New Combo
      • Number
      • Pie
      • Pointmap
      • Scatter Plot
      • Skew-T
      • Stacked Bar
      • Table
      • Text Widget
      • Wind Barb
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
  • HeavyML (BETA)
    • HeavyML Overview
    • Clustering Algorithms
    • Regression Algorithms
      • Linear Regression
      • Random Forest Regression
      • Decision Tree Regression
      • Gradient Boosting Tree Regression
    • Principal Components Analysis
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)
      • Installation
      • Registering and Using a Function
      • User-Defined Table Functions
      • RBC UDF/UDTF Example Notebooks
      • General UDF/UDTF Tutorial Notebooks
      • RBC API Reference
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
      • Forecasting with HEAVY.AI and Prophet
  • APIs and Interfaces
    • Overview
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
      • Vega Tutorials
        • Vega at a Glance
        • Getting Started with Vega
        • Getting More from Your Data
        • Creating More Advanced Charts
        • Using Polys Marks Type
        • Vega Accumulator
        • Using Transform Aggregation
        • Improving Rendering with SQL Extensions
      • Vega Reference Overview
        • data Property
        • projections Property
        • scales Property
        • marks Property
      • Migration
        • Migrating Vega Code to Dynamic Poly Rendering
      • Try Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • FAQs
    • Troubleshooting
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
      • Release 6.x
      • Release 5.x
      • Release 4.x
      • Release 3.x
Powered by GitBook
On this page
  • JupyterLab
  • heavyai
  • Documentation
  • Examples
  • Remote Backend Compiler (RBC)
  • Ibis
  • Altair
  • NVIDIA RAPIDs
  • Other Tools and Utilities
Export as PDF
  1. Python / Data Science

Data Science Foundation

An Overview of OmniSci Integrated Data Science Foundation

PreviousPrincipal Components AnalysisNextJupyterLab Installation and Configuration

Last updated 2 years ago

HEAVY.AI provides an integrated data science foundation built on several open-source components of the PyData stack. This set of tools is integrated with Heavy Immerse and allows users to switch from dashboards to an integrated notebook environment connected to HeavyDB in the background. You can switch from visual data exploration with Immerse to a deeper dive on a specific dataset, build predictive models using standard python-based data science libraries and tools, and push results back into HeavyDB for use with Immerse.

Several components make up the HEAVY.AI data science foundation.

JupyterLab

HEAVY.AI provides deep integration with JupyterLab, the next-generation version of the most popular notebook environment and workflow used by data scientists for interactive computing. You can access JupyterLab by clicking an icon in Immerse.

>>> from heavyai import connect
>>> con = connect(user="admin", password="HyperInteractive", host="localhost",
...               dbname="heavyai")
>>> con
Connection(mapd://admin:***@localhost:6274/HEAVY.AI?protocol=binary)
con = ibis.heavyai.connect(
    host='localhost',
    database='ibis_testing',
    user='admin',
    password='HyperInteractive',
)

heavyai

The heavyai client interface provides a Python DB API 2.0-compliant HEAVY.AI interface. In addition, it provides methods to get results in the Apache Arrow-based GDF format for efficient data interchange.

Documentation

Examples

Create a Cursor and Execute a Query

Step 1: Create a connection

>>> from heavyai import connect
>>> con = connect(user="heavyai", password= "HyperInteractive", host="my.host.com", dbname="heavyai")

Step 2: Create a cursor

>>> c = con.cursor()
>>> c

Step 3: Query database table of flight departure and arrival delay times

>>> c.execute("SELECT depdelay, arrdelay FROM flights LIMIT 100")

Step 4: Display number of rows returned

>>> c.rowcount
100

Step 5: Display the Description objects list

The list is a named tuple with attributes required by the specification. There is one entry per returned column, and we fill the name, type_code, and null_ok attributes.

>>> c.description
[Description(name=u'depdelay', type_code=0, display_size=None, internal_size=None, precision=None, scale=None, null_ok=True), Description(name=u'arrdelay', type_code=0, display_size=None, internal_size=None, precision=None, scale=None, null_ok=True)]

Step 6: Iterate over the cursor, returning a list of tuples of values

>>> result = list(c)
>>> result[:5]
[(1, 14), (2, 4), (5, 22), (-1, 8), (-1, -2)]

Select Data into a GpuDataFrame Provided by pygdf

Step 1: Create a connection to local HEAVY.AI instance

>>> from heavyai import connect
>>> con = connect(user="heavyai", password="HyperInteractive", host="localhost",
...               dbname="heavyai")

Step 2: Query GpuDataFrame database table of flight departure and arrival delay times

>>> query = "SELECT depdelay, arrdelay FROM flights_2008_10k limit 100"
>>> df = con.select_ipc_gpu(query)

Step 3: Display results

>>> df.head()
  depdelay arrdelay
0       -2      -13
1       -1      -13
2       -3        1
3        4       -3
4       12        7

Remote Backend Compiler (RBC)

You can define your own SQL functions in HeavyDB, but to realize the full power of HeavyDB, you have to re-compile the engine to add your functions. To write GPU-compatible functions to execute on GPUs, HeavyDB supports User Defined Functions (UDFs) and User Defined Table Functions (UDTFs). A UDF operates on elements of tables; a UDTF operates on an entire table itself.

Functions are not persisted on the database and need to be registered if the server is restarted.

Internally, the RBC converts the Python function to an intermediate representation (IR), which is then sent to the server. The IR is compiled on a CPU or a GPU, depending on specified hardware resources .

Ibis

Ibis is a productivity API for working in Python and analyzing data in remote SQL-based data stores such as HeavyDB. Inspired by the pandas toolkit for data analysis, Ibis provides a Pythonic API that compiles to SQL. Combined with HeavyDB scale and speed, Ibis offers a familiar but more powerful method for analyzing very large datasets "in-place."uh b

Ibis supports multiple SQL databases backends, and also supports pandas as a native backend. Combined with Altair, this integration allows you to explore multiple datasets across different data sources.

Altair

NVIDIA RAPIDs

Other Tools and Utilities

In addition to the seamless integration with Immerse, you can also use JupyterLab with HEAVY.AI by creating an explicit connection object, either via the API.

or via the API, which builds on .

For more information, see the .

See the GitHub and for documentation:

Using Python, you can interact with databases in multiple ways. Libraries like provide a translation mechanism that converts Python to SQL; this is an example of an . With SQLAlchemy and similar approaches, user interactions with the database are simplified—and optimized—as a set of high-level functions provided by the ORM. Unfortunately, to run tasks not supported by the ORM, you need to write SQL code.

The package provides a Python interface to define UDFs and UDTFs easily. Any UDF or UDTF written in Python can be registered at run time on the HeavyDB server and subsequently used in any SQL query by any client.

is an ORM that supports defining UDFs in C++ for some type of databases. However, it doesn’t provide a Python interface.

is another key component of the HEAVY.AI data science foundation. Building on the same data visualization engine used by Immerse for geospatial charts, Altair provides a pythonic API over , a subset of the full Vega specification for declarative charting based on the "Grammar of Graphics" paradigm. The HEAVY.AI data science foundation goes further and includes interface code to enable Altair to transparently use Ibis expressions instead of pandas data frames. This allows data visualization over much larger datasets in HEAVY.AI without writing SQL code.

The toolkit is a collection of foundational libraries for GPU-accelerated data science and machine learning. It includes popular algorithms for clustering, classification, and linear models, as well as a GPU-based dataframe (). HEAVY.AI allows configurable output to cudf from any query (including via Ibis or pyomnisci), so you can quickly run machine-learning algorithms on top of query results from HEAVY.AI.

In addition, the data science foundation Docker container includes Facebook's library for forecasting, and , a lightweight but powerful workflow engine that enables you to build and manage workflows in Python.

heavyai
Ibis-heavyai
heavyai
JupyterLab documentation
heavyai repository
Installation
Getting started
API reference
SQLAlchemy
ORM (Object-Relational Mapping)
Remote Backend Compiler (RBC)
Ibis
Altair
Vega
Vega-Lite
Nvidia RAPIDs
cudf
Prophet
Prefect
JupyterLab access in Immerse
JupyterLab access from SQLEditor
User define function schematic. Decorate a Python function to be able to call it with SQL.