System Table Functions

HEAVY.AI provides access to a set system-provided table functions, also known as table-valued functions (TVS). System table functions, like user-defined table functions, support execution of queries on both CPU and GPU over one or more SQL result-set inputs. Table function support in HEAVY.AI can be split into two broad categories: system table functions and user-defined table functions (UDTFs). System table functions are built-in to the HEAVY.AI server, while UDTFs can be declared dynamically at run-time by specifying them in Numba, a subset of the Python language. For more information on UDTFs, see User-Defined Table Functions.

To improve performance, table functions can be declared to enable filter pushdown optimization, which allows the Calcite optimizer to "push down" filters on the output(s) of a table functions to its input(s) when the inputs and outputs are declared to be semantically equivalent (for example, a longitude variable that is input and output from a table function). This can significantly increase performance in cases where only a small portion of one or more input tables is required to compute the filtered output of a table function.

Whether system- or user-provided, table functions can execute over one or more result sets specified by subqueries, and can also take any number of additional constant literal arguments specified in the function definition. SQL subquery inputs can consist of any SQL expression (including multiple subqueries, joins, and so on) allowed by HeavyDB, and the output can be filtered, grouped by, joined, and so on like a normal SQL subquery, including being input into additional table functions by wrapping it in a CURSOR argument. The number and types of input arguments, as well as the number and types of output arguments, are specified in the table function definition itself.

Table functions allow for the efficient execution of advanced algorithms that may be difficult or impossible to express in canonical SQL. By allowing execution of code directly over SQL result sets, leveraging the same hardware parallelism used for fast SQL execution and visualization rendering, HEAVY.AI provides orders-of-magnitude speed increases over the alternative of transporting large result sets to other systems for post-processing and then returning to HEAVY.AI for storage or downstream manipulation. You can easily invoke system-provided or user-defined algorithms directly inline with SQL and rendering calls, making prototyping and deployment of advanced analytics capabilities easier and more streamlined.

Concepts

CURSOR Subquery Inputs

Table functions can take as input arguments both constant literals (including scalar results of subqueries) as well as results of other SQL queries (consisting of one or more rows). The latter (SQL query inputs), per the SQL standard, must be wrapped in the keyword CURSOR. Depending on the table function, there can be 0, 1, or multiple CURSOR inputs. For example:

SELECT * FROM (TABLE(my_table_function /* This is only an example! */ (
 CURSOR(SELECT arg1, arg2, arg3 FROM input_1 WHERE x > 10) /* First CURSOR 
 argument consisting of 3 columns */,
 CURSOR(SELECT arg1, AVG(arg2) FROM input_2 GROUP BY arg1 where y < 40) 
 /* Second CURSOR argument constisting of 2 columns. This could be from the same
 table as the first CURSOR, or as is the case here, a completely different table
 (or even joined table or logical value expression) */,
 'Fred' /* TEXT constant literal argument */,
 true /* BOOLEAN constant literal argument */,
 (SELECT COUNT(*) FROM another_table), /* scalar subquery results do not need
 to be wrapped in a CURSOR */,
 27.3 /* FLOAT constant literal argument */))
WHERE output1 BETWEEN 32.2 AND 81.8;

ColumnList Inputs

Certain table functions can take 1 or more columns of a specified type or types as inputs, denoted as ColumnList<TYPE1 | Type2... TypeN>. Even if a function allows aColumnList input of multiple types, the arguments must be all of one type; types cannot be mixed. For example, if a function allows ColumnList<INT | TEXT ENCODING DICT>, one or more columns of either INTEGER or TEXT ENCODING DICT can be used as inputs, but all must be either INT columns or TEXT ENCODING DICT columns.

Named Arguments

All HEAVY.AI system table functions allow you to specify argument either in conventional comma-separated form in the order specified by the table function signature, or alternatively via a key-value map where input argument names are mapped to argument values using the => token. For example, the following two calls are equivalent:

/* The following two table function calls, the first with unnamed
 signature-ordered arguments, and the second with named arguments,
 are equivalent */

select
  *
from
  table(
    tf_compute_dwell_times(
      /* Without the use of named arguments, input arguments must
      be ordered as specified by the table function signature */
      cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      3,
      600,
      10800
    )
  )
order by
  num_dwell_points desc
limit
  10;


select
  *
from
  table(
    tf_compute_dwell_times(
     /* Using named arguments, input arguments can be
     ordered in any order, as long as all arguments are named */
     min_dwell_seconds => 600,
     max_inactive_seconds => 10800
      data => cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      min_dwell_points => 3
    )
  )
order by
  num_dwell_points desc
limit
  10;

Filter Push-Down

For performance reasons, particularly when table functions are used as actual tables in a client like Heavy Immerse, many system table functions in HEAVY.AI automatically "push down" filters on certain output columns in the query onto the inputs. For example, if a table does some computation over an x and y range such that x and y are in both the input and output for the table function, filter push-down would likely be enabled so that a query like the following would automatically push down the filter on the x and y outputs to the x and y inputs. This potentially increases query performance significantly.

SELECT
  *
FROM
  TABLE(
    my_spatial_table_function(
      CURSOR(
        SELECT
          x,
          y
        from
          spatial_data_table
          /* Presuming filter push down is enabled for 
          my_spatial_table_function, the filter applied to 
          x and y will be applied here to the table function
          input CURSOR */
      )
    )
  )
WHERE
  x BETWEEN 38.2
  AND 39.1
  and Y BETWEEN -121.4
  and -120.1;

To determine whether filter push-down is used, you can check the Boolean value of the filter_table_transpose column from the query:

SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

Currently for system table functions, you cannot change push-down behavior.

Querying Registered Table Functions

You can query which table functions are available using SHOW TABLE FUNCTIONS:

SHOW TABLE FUNCTIONS;

Table UDF

tf_feature_similarity
tf_feature_self_similarity
tf_geo_rasterize_slope
...

Query Metadata for a Specific Table Function

Information about the expected input and output argument names and types, as well as other info such as whether the function can run on CPU, GPU or both, and whether filter push-down is enabled, can be queried via SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

name|signature|input_names|input_types|output_names|output_types|CPU|GPU|Runtime|filter_table_transpose
generate_series|(i64 series_start, i64 series_stop, i64 series_step) -> Column<i64>|[series_start, series_stop, series_step]|[i64, i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false
generate_series|(i64 series_start, i64 series_stop) -> Column<i64>|[series_start, series_stop]|[i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false

System Table Functions

The following system table functions are available in HEAVY.AI. The table provides a summary and links to more inforamation about each function.

For information about the HeavyRF radio frequency propagation simulation and HeavyRF table functions, see HeavyRF.

The TABLE command is required to wrap a table function clause; for example: select * from TABLE(generate_series(1, 10));

The CURSOR command is required to wrap any subquery inputs.

Last updated