Registering and Using a Function
Register a function and then use it
Making a function available to HeavyDB--registering-–is based on decorating a Python function. Consider the following simple function, which takes a single argument and return a single value.
return (f - 32) * 5 / 9
Register this function to HeavyDB using the following steps:
- 1.Declare the function’s signature.
- 2.Attach the signature to the function.
- 3.Register the function to the database.
Annotate the function with type information to tell RBC how to translate the function into this intermediate representation, using the following syntax:
'returnType(inputType1, inputType2, ...)'
The function can only return a single element.
Available types are similar to C types:
In the types listed, items in brackets indicate options to choose from. For example,
[List,Array]Int[8,16]is expanded to mean
ArrayInt16. The literals
intcan be abbreviated by
Returning to the function, if you want both the input argument and the output values as doubles, you could write:
from rbc.heavydb import RemoteHeavyDB
heavy = RemoteHeavyDB()
signature = heavy('double(double)')
What happens when the input is an integer? RBC does not cast input values to the expected types automatically. If you expect multiple input types, RBC supports templating (as in C++ or generic in Rust or Go). Templating allows you to define a type using a variable, like
Tin this example:
signature = heavy('T(T)', T=['int32', 'double'])
In this example,
Tcan be replaced by
double. This can also be written without using a variable.
signature = heavy('int32(int32)', 'double(double)')
You can also have different template variables. The Cartesian product is observed.
signature = heavy('T(Z)', T=['double', 'float'], Z=['int8', 'int32'])
Once you have the signature, you can attach it to the function. As a best practice, use the signature as a decorator.
return (f - 32) * 5 / 9
This prevents classical function calls of the decorated function. The function is now “marked” to be registered on the server and used there.
RBC supports overloading function definitions. This permits several function implementations using a common identifier, with the execution path determined by specific inputs.
def fahrenheit2celsius(f, offset):
return offset + (f - 32) * 5 / 9
return (f - 32) * 5 / 9
Both inputs and output can be marked as 1D-arrays or lists of any type. To indicate an array in the function signature, append brackets (
) to the type literal.
from rbc.stdlib import array_api
return (array_api.mean(f_array) - 32) * 5 / 9
You can also define an array use the
Some functions with array support are provided. In this example, the imported function
rbc.stdlib.array_api.meancomputes the mean over an array of inputs
f_array. We can also have output arrays.
rbc.stdlib.array_api.meanis a special function bundled with RBC. In this case,
numpy.meanhas been overridden for convenience to users familiar with NumPy’s API. See here for more details and information about supported functions.
To create an array within a function, the class
Arraymust be used to define an empty array. It can then be indexed to be filled. Slicing or complex indexing is not currently supported. If the array is returned, it’s important that the type specified during the array creation matches the return type specified in the function signature.
from numba import types
from rbc.stdlib import Array
arr = Array(size, types.int64)
for i in range(size):
arr[i] = i
Standard Python constructors like
numpy.arraycannot be used to construct arrays supported by RBC. See here for a complete list of array creation functions.
You can select explicitly the device on which a function is allowed to be executed by using the keyword argument
devicein the decorator when registering the function. The device argument is a list that can take
'gpu'. The option indicates which implementation should be available and used. Hence, if there is no GPU on the server, using
'gpu'would not work on the platform.
return (f-32) * 5 / 9
A function can also be made available on both the CPU and GPU by using
'gpu', only NVIDIA GPUs that can handle CUDA instructions are currently supported.
Once you define the functions—with appropriate signatures in the decorator—you have to register them to the HeavyDB. This is done automatically if the function is used in the same Python session. If multiple functions are defined in a file and need to be registered to be used by another process or user, then yo need to register them manually.
It is less efficient to call
RemoteHeavyDB.register()after every function declaration. Instead, use a single call after all functions are defined.
Similarly, you can clean the current session of all previously registered functions. The registration and unregistration of functions take into account only the functions defined in the current session associated with the object
To use the basic implementation of
# 'fahrenheit2celsius(CAST(32 AS DOUBLE))'
To get the result of the function, you have to explicitly request execution on the server using the
executemethod is a convenience feature; it should not be used in production code. For production code, use heavyai or ibis via the ibis-heavyai backend to compose SQL queries using an ORM-like syntax.