HEAVY.AI Docs
v6.4.3
v6.4.3
  • Welcome to HEAVY.AI Documentation
  • Overview
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
      • Hardware Reference
      • Software Requirements
    • Installation
      • Free Version
      • Installing on Rocky Linux / RHEL
        • HEAVY.AI Installation on RHEL
        • Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL
      • Installing on Ubuntu
        • HEAVY.AI Installation on Ubuntu
        • Install NVIDIA Drivers and Vulkan on Ubuntu
      • Installing on Docker
        • HEAVY.AI Installation using Docker on Ubuntu
      • Getting Started on AWS
      • Getting Started on GCP
      • Getting Started on Azure
      • Upgrading
        • Upgrading HEAVY.AI
        • Upgrading from Omnisci to HEAVY.AI 6.0
        • CUDA Compatibility Drivers
      • Uninstalling
      • Ports
    • Services and Utilities
      • Using Services
      • Using Utilities
    • Configuration Parameters
      • Overview
      • Configuration Parameters for HeavyDB
      • Configuration Parameters for HEAVY.AI Web Server
    • Security
      • Roles and Privileges
      • Connecting Using SAML
      • Implementing a Secure Binary Interface
      • Encrypted Credentials in Custom Applications
      • LDAP Integration
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
      • Kafka
      • Using Heavy Immerse Data Manager
      • Importing Geospatial Data
    • Command Line
      • Loading Data with SQL
      • Exporting Data
  • SQL
    • Data Definition (DDL)
      • Datatypes
      • Users and Databases
      • Tables
      • System Tables
      • Views
      • Policies
    • Data Manipulation (DML)
      • SQL Capabilities
        • ALTER SESSION SET
        • ALTER SYSTEM CLEAR
        • DELETE
        • EXPLAIN
        • INSERT
        • KILL QUERY
        • LIKELY/UNLIKELY
        • SELECT
        • SHOW
        • UPDATE
        • Arrays
        • Logical Operators and Conditional and Subquery Expressions
        • Table Expression and Join Support
        • Type Casts
      • Geospatial Capabilities
      • Functions and Operators
      • System Table Functions
        • generate_random_strings
        • generate_series
        • tf_compute_dwell_times
        • tf_feature_self_similarity
        • tf_feature_similarity
        • tf_geo_rasterize
        • tf_geo_rasterize_slope
        • tf_graph_shortest_path
        • tf_graph_shortest_paths_distances
        • tf_load_point_cloud
        • tf_mandelbrot*
        • tf_point_cloud_metadata
        • tf_raster_contour_lines; tf_raster_contour_polygons
        • tf_raster_graph_shortest_slope_weighted_path
        • tf_rf_prop_max_signal (Directional Antennas)
        • ts_rf_prop_max_signal (Isotropic Antennas)
        • tf_rf_prop
      • Window Functions
      • Reserved Words
      • SQL Extensions
  • Heavy Immerse
    • Introduction to Heavy Immerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
      • Dashboard List
      • Creating a Dashboard
      • Configuring a Dashboard
      • Duplicating and Sharing Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Chart Types
      • Overview
      • Bar
      • Bubble
      • Choropleth
      • Combo
      • Cross-Section
      • Contour
      • Gauge
      • Geo Heatmap
      • Heatmap
      • Histogram
      • Line
      • Linemap
      • New Combo
      • Number
      • Pie
      • Pointmap
      • Scatter Plot
      • Skew-T
      • Stacked Bar
      • Table
      • Text Widget
      • Wind Barb
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)
      • Installation
      • Registering and Using a Function
      • User-Defined Table Functions
      • RBC UDF/UDTF Example Notebooks
      • General UDF/UDTF Tutorial Notebooks
      • RBC API Reference
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
      • Forecasting with HEAVY.AI and Prophet
  • APIs and Interfaces
    • Overview
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
      • Vega Tutorials
        • Vega at a Glance
        • Getting Started with Vega
        • Getting More from Your Data
        • Creating More Advanced Charts
        • Using Polys Marks Type
        • Vega Accumulator
        • Using Transform Aggregation
        • Improving Rendering with SQL Extensions
      • Vega Reference Overview
        • data Property
        • projections Property
        • scales Property
        • marks Property
      • Migration
        • Migrating Vega Code to Dynamic Poly Rendering
      • Try Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • FAQs
    • Troubleshooting
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
      • Release 5.x
      • Release 4.x
      • Release 3.x
Powered by GitBook
On this page
  • Declaring the Signature
  • Templating
  • Attaching the Signature
  • Overloading
  • Arrays
  • Selecting a Device
  • Registering the Function
  • Using Registered Functions
Export as PDF
  1. Python / Data Science
  2. Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)

Registering and Using a Function

Register a function and then use it

Making a function available to HeavyDB--registering-–is based on decorating a Python function. Consider the following simple function, which takes a single argument and return a single value.

def fahrenheit2celsius(f):
    return (f - 32) * 5 / 9

Register this function to HeavyDB using the following steps:

  1. Declare the function’s signature.

  2. Attach the signature to the function.

  3. Register the function to the database.

Declaring the Signature

Annotate the function with type information to tell RBC how to translate the function into this intermediate representation, using the following syntax:

'returnType(inputType1, inputType2, ...)'

The function can only return a single element.

Available types are similar to C types:

[Array,Column[List]][int[8,16,32,64],float[32,64],double,bool]
bytes
void
TextEncoding[None,Dict]
Column<[List]TextEncodingDict>
Cursor

In the types listed, items in brackets indicate options to choose from. For example, [List,Array]Int[8,16] is expanded to mean ListInt8, ArrayInt8, ListInt16, and ArrayInt16. The literals float and int can be abbreviated by f and i, respectively.

Returning to the function, if you want both the input argument and the output values as doubles, you could write:

from rbc.heavydb import RemoteHeavyDB
heavy = RemoteHeavyDB()
signature = heavy('double(double)')

Templating

What happens when the input is an integer? RBC does not cast input values to the expected types automatically. If you expect multiple input types, RBC supports templating (as in C++ or generic in Rust or Go). Templating allows you to define a type using a variable, like T in this example:

signature = heavy('T(T)', T=['int32', 'double'])

In this example, T can be replaced by int32 or double. This can also be written without using a variable.

signature = heavy('int32(int32)', 'double(double)')

You can also have different template variables. The Cartesian product is observed.

signature = heavy('T(Z)', T=['double', 'float'], Z=['int8', 'int32'])

Attaching the Signature

Once you have the signature, you can attach it to the function. As a best practice, use the signature as a decorator.

@heavy('double(double)')
def fahrenheit2celsius(f):
    return (f - 32) * 5 / 9

This prevents classical function calls of the decorated function. The function is now “marked” to be registered on the server and used there.

Overloading

RBC supports overloading function definitions. This permits several function implementations using a common identifier, with the execution path determined by specific inputs.

@heavy('double(double, double)')
def fahrenheit2celsius(f, offset):
    return offset + (f - 32) * 5 / 9

@heavy('double(double)')
def fahrenheit2celsius(f):
    return (f - 32) * 5 / 9

Arrays

Both inputs and output can be marked as 1D-arrays or lists of any type. To indicate an array in the function signature, append brackets ([] ) to the type literal.

from rbc.stdlib import array_api
@heavy('double(double[])')
def fahrenheit2celsius(f_array):
    return (array_api.mean(f_array) - 32) * 5 / 9

You can also define an array use the 'Array<double>'syntax.

Some functions with array support are provided. In this example, the imported function rbc.stdlib.array_api.mean computes the mean over an array of inputs f_array. We can also have output arrays.

To create an array within a function, the class Array must be used to define an empty array. It can then be indexed to be filled. Slicing or complex indexing is not currently supported. If the array is returned, it’s important that the type specified during the array creation matches the return type specified in the function signature.

from numba import types
from rbc.stdlib import Array
@heavy('int64[](int64)')
def create_and_fill_array(size):
    arr = Array(size, types.int64)
    for i in range(size):
        arr[i] = i
    return arr

Selecting a Device

You can select explicitly the device on which a function is allowed to be executed by using the keyword argument device in the decorator when registering the function. The device argument is a list that can take 'cpu' and 'gpu'. The option indicates which implementation should be available and used. Hence, if there is no GPU on the server, using 'gpu' would not work on the platform.

@heavy('double(double)', devices=['cpu'])
def fahrenheit2celsius(f):
    return (f-32) * 5 / 9

A function can also be made available on both the CPU and GPU by using device=['cpu', 'gpu'].

For 'gpu', only NVIDIA GPUs that can handle CUDA instructions are currently supported.

Registering the Function

Once you define the functions—with appropriate signatures in the decorator—you have to register them to the HeavyDB. This is done automatically if the function is used in the same Python session. If multiple functions are defined in a file and need to be registered to be used by another process or user, then yo need to register them manually.

heavy.register()

It is less efficient to call RemoteHeavyDB.register() after every function declaration. Instead, use a single call after all functions are defined.

Similarly, you can clean the current session of all previously registered functions. The registration and unregistration of functions take into account only the functions defined in the current session associated with the object heavy.

heavy.unregister()

Using Registered Functions

To use the basic implementation of fahrenheit2celsius:

print(fahrenheit2celsius(32))
# 'fahrenheit2celsius(CAST(32 AS DOUBLE))'

To get the result of the function, you have to explicitly request execution on the server using the execute method:

fahrenheit2celsius(32).execute()
# 0.0

PreviousInstallationNextUser-Defined Table Functions

Last updated 2 years ago

rbc.stdlib.array_api.mean is a special function bundled with RBC. In this case, numpy.mean has been overridden for convenience to users familiar with NumPy’s API. See for more details and information about supported functions.

Standard Python constructors like list , dict or numpy.array cannot be used to construct arrays supported by RBC. See for a complete list of array creation functions.

The execute method is a convenience feature; it should not be used in production code. For production code, use or via the backend to compose SQL queries using an ORM-like syntax.

heavyai
ibis
ibis-heavyai
here
here