1 of 15

System Table Functions

HEAVY.AI provides access to a set system-provided table functions, also known as table-valued functions (TVS). System table functions, like user-defined table functions, support execution of queries on both CPU and GPU over one or more SQL result-set inputs. Table function support in HEAVY.AI can be split into two broad categories: system table functions and user-defined table functions (UDTFs). System table functions are built-in to the HEAVY.AI server, while UDTFs can be declared dynamically at run-time by specifying them in Numba, a subset of the Python language. For more information on UDTFs, see User-Defined Table Functions.

To improve performance, table functions can be declared to enable filter pushdown optimization, which allows the Calcite optimizer to "push down" filters on the output(s) of a table functions to its input(s) when the inputs and outputs are declared to be semantically equivalent (for example, a longitude variable that is input and output from a table function). This can significantly increase performance in cases where only a small portion of one or more input tables is required to compute the filtered output of a table function.

Whether system- or user-provided, table functions can execute over one or more result sets specified by subqueries, and can also take any number of additional constant literal arguments specified in the function definition. SQL subquery inputs can consist of any SQL expression (including multiple subqueries, joins, and so on) allowed by HeavyDB, and the output can be filtered, grouped by, joined, and so on like a normal SQL subquery, including being input into additional table functions by wrapping it in a CURSOR argument. The number and types of input arguments, as well as the number and types of output arguments, are specified in the table function definition itself.

Table functions allow for the efficient execution of advanced algorithms that may be difficult or impossible to express in canonical SQL. By allowing execution of code directly over SQL result sets, leveraging the same hardware parallelism used for fast SQL execution and visualization rendering, HEAVY.AI provides orders-of-magnitude speed increases over the alternative of transporting large result sets to other systems for post-processing and then returning to HEAVY.AI for storage or downstream manipulation. You can easily invoke system-provided or user-defined algorithms directly inline with SQL and rendering calls, making prototyping and deployment of advanced analytics capabilities easier and more streamlined.

Concepts

CURSOR Subquery Inputs

Table functions can take as input arguments both constant literals (including scalar results of subqueries) as well as results of other SQL queries (consisting of one or more rows). The latter (SQL query inputs), per the SQL standard, must be wrapped in the keyword CURSOR. Depending on the table function, there can be 0, 1, or multiple CURSOR inputs. For example:

SELECT * FROM (TABLE(my_table_function /* This is only an example! */ (
 CURSOR(SELECT arg1, arg2, arg3 FROM input_1 WHERE x > 10) /* First CURSOR 
 argument consisting of 3 columns */,
 CURSOR(SELECT arg1, AVG(arg2) FROM input_2 GROUP BY arg1 where y < 40) 
 /* Second CURSOR argument constisting of 2 columns. This could be from the same
 table as the first CURSOR, or as is the case here, a completely different table
 (or even joined table or logical value expression) */,
 'Fred' /* TEXT constant literal argument */,
 true /* BOOLEAN constant literal argument */,
 (SELECT COUNT(*) FROM another_table), /* scalar subquery results do not need
 to be wrapped in a CURSOR */,
 27.3 /* FLOAT constant literal argument */))
WHERE output1 BETWEEN 32.2 AND 81.8;

ColumnList Inputs

Certain table functions can take 1 or more columns of a specified type or types as inputs, denoted as ColumnList<TYPE1 | Type2... TypeN>. Even if a function allows aColumnList input of multiple types, the arguments must be all of one type; types cannot be mixed. For example, if a function allows ColumnList<INT | TEXT ENCODING DICT>, one or more columns of either INTEGER or TEXT ENCODING DICT can be used as inputs, but all must be either INT columns or TEXT ENCODING DICT columns.

Named Arguments

All HEAVY.AI system table functions allow you to specify argument either in conventional comma-separated form in the order specified by the table function signature, or alternatively via a key-value map where input argument names are mapped to argument values using the => token. For example, the following two calls are equivalent:

/* The following two table function calls, the first with unnamed
 signature-ordered arguments, and the second with named arguments,
 are equivalent */

select
  *
from
  table(
    tf_compute_dwell_times(
      /* Without the use of named arguments, input arguments must
      be ordered as specified by the table function signature */
      cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      3,
      600,
      10800
    )
  )
order by
  num_dwell_points desc
limit
  10;


select
  *
from
  table(
    tf_compute_dwell_times(
     /* Using named arguments, input arguments can be
     ordered in any order, as long as all arguments are named */
     min_dwell_seconds => 600,
     max_inactive_seconds => 10800
      data => cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      min_dwell_points => 3
    )
  )
order by
  num_dwell_points desc
limit
  10;

Filter Push-Down

For performance reasons, particularly when table functions are used as actual tables in a client like Heavy Immerse, many system table functions in HEAVY.AI automatically "push down" filters on certain output columns in the query onto the inputs. For example, if a table does some computation over an x and y range such that x and y are in both the input and output for the table function, filter push-down would likely be enabled so that a query like the following would automatically push down the filter on the x and y outputs to the x and y inputs. This potentially increases query performance significantly.

SELECT
  *
FROM
  TABLE(
    my_spatial_table_function(
      CURSOR(
        SELECT
          x,
          y
        from
          spatial_data_table
          /* Presuming filter push down is enabled for 
          my_spatial_table_function, the filter applied to 
          x and y will be applied here to the table function
          input CURSOR */
      )
    )
  )
WHERE
  x BETWEEN 38.2
  AND 39.1
  and Y BETWEEN -121.4
  and -120.1;

To determine whether filter push-down is used, you can check the Boolean value of the filter_table_transpose column from the query:

SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

Currently for system table functions, you cannot change push-down behavior.

Querying Registered Table Functions

You can query which table functions are available using SHOW TABLE FUNCTIONS:

SHOW TABLE FUNCTIONS;

Table UDF

tf_feature_similarity
tf_feature_self_similarity
tf_geo_rasterize_slope
...

Query Metadata for a Specific Table Function

Information about the expected input and output argument names and types, as well as other info such as whether the function can run on CPU, GPU or both, and whether filter push-down is enabled, can be queried via SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

name|signature|input_names|input_types|output_names|output_types|CPU|GPU|Runtime|filter_table_transpose
generate_series|(i64 series_start, i64 series_stop, i64 series_step) -> Column<i64>|[series_start, series_stop, series_step]|[i64, i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false
generate_series|(i64 series_start, i64 series_stop) -> Column<i64>|[series_start, series_stop]|[i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false

System Table Functions

The following system table functions are available in HEAVY.AI. The table provides a summary and links to more inforamation about each function.

Function

Purpose

Generates random string data.

Generates a series of integer values.

Generates a series of timestamp values from start_timestamp to end_timestamp.

Given a query input with entity keys and timestamps, and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session.

Given a query input of entity keys/IDs, a set of feature columns, and a metric column, scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.

Given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity. The score is computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.

Loads one or more las or laz point cloud/LiDAR files from a local file or directory source, optionally tranforming the output SRID to out_srs. If not specified, output points are automatically transformed to EPSG:4326 lon/lat pairs).

Returns metadata for one or more las or laz point cloud/LiDAR files from a local file or directory source, optionally constraining the bounding box for metadata retrieved to the lon/lat bounding box specified by the x_min, x_max, y_min, y_max arguments.

Process a raster input to derive contour lines or regions and output as LINESTRING or POLYGON for rendering or further processing.

Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type) across all points in each bin as the output value for the bin.

Used for generating top-k signals where 'k' represents the maximum number of antennas to consider at each geographic location. The full relevant parameter name is strongest_k_sources_per_terrain_bin.

Taking a set of point elevations and a set of signal source locations as input, tf_rf_prop_max_signal executes line-of-sight 2.5D RF signal propagation from the provided sources over a binned 2.5D elevation grid derived from the provided point locations, calculating the max signal in dBm at each grid cell, using the formula for free-space power loss.

For information about the HeavyRF radio frequency propagation simulation and HeavyRF table functions, see HeavyRF.

The TABLE command is required to wrap a table function clause; for example: select * from TABLE(generate_series(1, 10));

The CURSOR command is required to wrap any subquery inputs.

generate_random_strings

Generates random string data.

SELECT * FROM TABLE(generate_random_strings(<num_strings>, <string_length>/)

Input Arguments

Parameter

Description

Data Type

Output Columns

Name

Description

Data Type

Example

heavysql> SELECT * FROM TABLE(generate_random_strings(10, 20);
id|rand_str
0 |He9UeknrGYIOxHzh5OZC
1 |Simnx7WQl1xRihLiH56u
2 |m5H1lBTOErpS8is00YJ
3 |eeDiNHfKzVQsSg0qHFS0
4 |JwOhUoQEI6Z0L78mj8jo
5 |kBTbSIMm25dvf64VMi
6 |W3lUUvC5ajm0W24JML
7 |XdtSQfdXQ85nvaIoyYUY
8 |iUTfGN5Jaj25LjGJhiRN
9 |72GUoTK2BzcBJVTgTGW

generate_series

generate_series (Integers)

Generate a series of integer values.

SELECT * FROM TABLE(
    generate_series(
        <series_start>,
        <series_end>
        [, <increment>]
    )

Input Arguments

Parameter

Description

Data Types

Output Columns

Name

Description

Data Types

Example

heavysql> select * from table(generate_series(2, 10, 2)); 
series 
2 
4 
6 
8 
10 
5 rows returned.

heavysql> select * from table(generate_series(8, -4, -3)); 
series 
8 
5 
2 
-1 
-4
5 rows returned.

generate_series (Timestamps)

Generate a series of timestamp values from start_timestamp to end_timestamp .

SELECT * FROM TABLE(
    generate_series(
        <series_start>,
        <series_end>,
        <series_step>
    )
)

Input Arguments

Output Columns

Example

SELECT
  generate_series AS ts
FROM
  TABLE(
    generate_series(
      TIMESTAMP(0) '2021-01-01 00:00:00',
      TIMESTAMP(0) '2021-09-04 00:00:00',
      INTERVAL '1' MONTH
    )
  )
  ORDER BY ts;
  
ts
2021-01-01 00:00:00.000000000
2021-02-01 00:00:00.000000000
2021-03-01 00:00:00.000000000
2021-04-01 00:00:00.000000000
2021-05-01 00:00:00.000000000
2021-06-01 00:00:00.000000000
2021-07-01 00:00:00.000000000
2021-08-01 00:00:00.000000000
2021-09-01 00:00:00.000000000

tf_compute_dwell_times

Given a query input with entity keys (for example, user IP addresses) and timestamps (for example, page visit timestamps), and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session (dwell time).

Syntax

select * from table( 
  tf_compute_dwell_times( 
    data => CURSOR( 
      select 
        entity_id, 
        site_id, 
        ts, 
      from 
        <table> 
      where 
        ... 
        ), 
        min_dwell_seconds => <seconds>, 
        min_dwell_points => <points>, 
        max_inactive_seconds => <seconds> 
        ) 
      );

Input Arguments

Parameter

Description

Data Type

entity_id

Column containing keys/IDs used to identify the entities for which dwell/session times are to be computed. Examples include IP addresses of clients visiting a website, login IDs of database users, MMSIs of ships, and call signs of airplanes.

Column<TEXT ENCODING DICT | BIGINT>

site_id

Column containing keys/IDs of dwell “sites” or locations that entities visit. Examples include website pages, database session IDs, ports, airport names, or binned h3 hex IDs for geographic location.

Column<TEXT ENCODING DICT | BIGINT>

ts

Column denoting the time at which an event occurred.

Column<TIMESTAMP(0|3|6|0)>

min_dwell_seconds

Constant integer value specifying the minimum number of seconds required between the first and last timestamp-ordered record for an entity_id at a site_id to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3600 (one hour), but only 1800 seconds elapses between an entity’s first and last ordered timestamp records at a site, these records are not considered a valid session and a dwell time for that session is not calculated.

BIGINT (other integer types are automatically casted to BIGINT)

min_dwell_points

A constant integer value specifying the minimum number of successive observations (in ts timestamp order) required to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3, but only two consecutive records exist for a user at a site before they move to a new site, no dwell time is calculated for the user.

BIGINT (other integer types are automatically casted to BIGINT)

max_inactive_seconds

A constant integer value specifying the maximum time in seconds between two successive observations for an entity at a given site before the current session/dwell time is considered finished and a new session/dwell time is started. For example, if this variable is set to 86400 seconds (one day), and the time gap between two successive records for an entity id at a given site id is 86500 seconds, the session is considered ended at the first timestamp-ordered record, and a new session is started at the timestamp of the second record.

BIGINT (other integer types are automatically casted to BIGINT)

Output Columns

Name

Description

Data Type

entity_id

The ID of the entity for the output dwell time, identical to the corresponding entity_id column in the input.

Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the entity_id input column type)

site_id

The site ID for the output dwell time, identical to the corresponding site_id column in the input.

Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id input column type)

prev_site_id

The site ID for the session preceding the current session, which might be a different site_id, the same site_id (if successive records for an entity at the same site were split into multiple sessions because the max_inactive_seconds threshold was exceeded), or null if the last site_id visited was null.

Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id input column type)

next_site_id

The site id for the session after the current session, which might be a different site_id, the same site_id (if successive records for an entity at the same site were split into multiple sessions due to exceeding the max_inactive_seconds threshold, or null if the next site_id visited was null.

Column<TEXT ENCODING DICT> | Column<BIGINT> (type will be the same as the site_id input column type)

session_id

An auto-incrementing session ID specific/relative to the current entity_id, starting from 1 (first session) up to the total number of valid sessions for an entity_id, such that each valid session dwell time increments the session_id for an entity by 1.

Column<INT>

start_seq_id

The index of the nth timestamp (ts-ordered) record for a given entity denoting the start of the current output row's session.

Column<INT>

dwell_time_sec

The duration in seconds for the session.

Column<INT>

num_dwell_points

The number of records/observations constituting the current output row's session.

Column<INT>

Example

/* Data from https://www.kaggle.com/datasets/vodclickstream/netflix-audience-behaviour-uk-movies */

select
  *
from
  table(
    tf_compute_dwell_times(
      data => cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      min_dwell_points => 3,
      min_dwell_seconds => 600,
      max_inactive_seconds => 10800
    )
  )
order by
  num_dwell_points desc
limit
  10;

entity_id|site_id|prev_site_id|next_site_id|session_id|start_seq_id|ts|dwell_time_sec|num_dwell_points
59416738c3|cbdf9820bc|d058594d1c|863b39bbe8|2|19|2017-02-21 15:12:11.000000000|4391|54
16d994f6dd|1bae944666|4f1cf3c2dc|NULL|5|61|2017-11-11 20:27:02.000000000|9570|36
3675d9ba4a|948f2b5bf6|948f2b5bf6|69cb38018a|2|11|2018-11-26 18:42:52.000000000|3600|34
da01959c0b|fd711679f9|1f579d43c3|NULL|5|90|2019-03-21 05:37:22.000000000|7189|31
23c52f9b50|df00041e47|df00041e47|NULL|2|39|2019-01-21 15:53:33.000000000|1227|29
da01959c0b|8ab46a0cb1|f1fffa6ff4|1f579d43c3|3|29|2019-03-12 04:33:01.000000000|6026|29
23c52f9b50|df00041e47|NULL|df00041e47|1|10|2019-01-21 15:33:39.000000000|1194|28
da01959c0b|1f579d43c3|8ab46a0cb1|fd711679f9|4|63|2019-03-17 02:01:49.000000000|7240|27
3261cb81a5|1cb40406ae|NULL|NULL|1|2|2019-04-28 20:48:24.000000000|11240|27
dbed64ce9e|c5830185ca|NULL|NULL|1|3|2019-03-01 06:43:32.000000000|7261|25

tf_feature_self_similarity

Given a query input of entity keys/IDs (for example, airplane tail numbers), a set of feature columns (for example, airports visited), and a metric column (for example number of times each airport was visited), scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.

select * from table(
  tf_feature_self_similarity(
    primary_features => cursor(
      select
        primary_key,
        pivot_features,
        metric
      from
        table
      group by
        primary_key,
        pivot_features
    ),
    use_tf_idf => <boolean>))

Input Arguments

Parameter

Description

Data Type

Output Columns

Example

/* Compute similarity of airlines by the airports they fly from */

select
  *
from
  table(
    tf_feature_self_similarity(
      primary_features => cursor(
        select
          carrier_name,
          origin,
          count(*) as num_flights
        from
          flights_2008
        group by
          carrier_name,
          origin
      ),
      use_tf_idf => false
    )
  )
where
  similarity_score <= 0.99
order by
  similarity_score desc
limit
  20;
  
class1|class2|similarity_score
Expressjet Airlines|Continental Air Lines|0.9564615
Delta Air Lines|Atlantic Southeast Airlines|0.9436753
Delta Air Lines|AirTran Airways Corporation|0.9379856
Atlantic Southeast Airlines|AirTran Airways Corporation|0.9326661
American Eagle Airlines|American Airlines|0.8906327
Northwest Airlines|Pinnacle Airlines|0.8222722
Skywest Airlines|United Air Lines|0.6857293
Mesa Airlines|US Airways|0.6116939
United Air Lines|Frontier Airlines|0.5921053
Mesa Airlines|United Air Lines|0.5686765
United Air Lines|American Eagle Airlines|0.5272493
Skywest Airlines|Frontier Airlines|0.4684323
Southwest Airlines|US Airways|0.4166781
United Air Lines|American Airlines|0.397027
Comair|JetBlue Airways|0.3631534
Mesa Airlines|American Eagle Airlines|0.3379275
Skywest Airlines|American Eagle Airlines|0.3331468
Mesa Airlines|Skywest Airlines|0.3235496
Comair|Delta Air Lines|0.3075919
Southwest Airlines|Mesa Airlines|0.2901711

/* Compute the similarity of US States by the TF-IDF
 weighted cosine similarity of the words tweeted in each state */
 
 select
  *
from
  table(
    tf_feature_self_similarity(
      primary_features => cursor(
        select
          state_abbr,
          unnest(tweet_tokens),
          count(*)
        from
          tweets_2022_06
        where country = 'US'
        group by
          state_abbr,
          unnest(tweet_tokens)
      ),
      use_tf_idf => TRUE
    )
  )
where
  class1 <> class2
order by
  similarity_score desc;
  
TX|GA|0.9928479
IL|TN|0.9920474
IL|NC|0.9920027
TX|IL|0.9917723
IN|OH|0.9916649
TN|NC|0.9915619
CA|TX|0.9910875
IN|VA|0.9909871
CA|IL|0.9909689
IL|OH|0.9909481
TX|NC|0.9908867
IL|MO|0.9907863
IN|MI|0.990751
TN|OH|0.9907123
IL|MD|0.9907106
OH|NC|0.9905779
VA|OH|0.990536
IN|IL|0.9904549
IN|MO|0.9903805
TX|TN|0.9903381

tf_feature_similarity

SELECT
  *
FROM
  TABLE(
    tf_feature_similarity(
      primary_features => CURSOR(
        SELECT
          primary_key,
          pivot_features,
          metric
        from
          table
        where
          ...
        group by
          primary_key,
          pivot_features
      ),
      comparison_features => CURSOR(
        SELECT
          comparison_metric
        from
          table
        where
          ...
        group by <column>
      ),
      use_tf_idf => <boolean>
    )
  )

Input Arguments

Parameter

Description

Data Type

primary_key

Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function will compute the similarity to the search vector specified by the comparison_features cursor. Examples include countries, census block groups, user IDs of website visitors, and aircraft call signs.

Column<TEXT ENCODING DICT | INT | BIGINT>

pivot_features

One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by primary_key based on whether they visited the same census block group in the same hour. If a single census block group feature column is used, the primary_key entities are compared only by the census block groups visited, regardless of time overlap.

Column<TEXT ENCODING DICT | INT | BIGINT>

metric

Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is simply COUNT(*) such that feature overlaps are weighted by the number of co-occurrences.

Column<INT | BIGINT | FLOAT | DOUBLE>

comparison_pivot_features

One or more columns constituting a compound feature for the search vector. This should match in number of sub-features, types, and semantics pivot features.

Column<TEXT ENCODING DICT | INT | BIGINT>

comparison_metric

Column denoting the values used as input for the cosine similarity metric computation from the search vector. In many cases, this is simply COUNT(*) such that feature overlaps are weighted by the number of co-occurrences.

Column<TEXT ENCODING DICT | INT | BIGINT>

use_tf_idf

BOOLEAN

Output Columns

Name

Description

Data Types

class

ID of the primary key being compared against the search vector.

Column<TEXT ENCODING DICT | INT | BIGINT> (type will be the same of primary_key input column)

similarity_score

Computed cosine similarity score between each primary_key pair, with values falling between 0 (completely dissimilar) and 1 (completely similar).

Column<FLOAT>

Example

/* Compute the similarity of US airline flight nums to a particular
Delta flight (DAL795) based on the cosine similarity of the overlap of
flight paths binned to a H3 Hex at zoom level 7 (roughly 5 sq km),  
and return the top 10 most similar flight nums */

SELECT
  *
FROM
  TABLE(
    tf_feature_similarity(
      primary_features => CURSOR(
        SELECT
          callsign,
          geotoh3(st_x(location), st_y(location), 7) as h3,
          count(*) as n
        from
          adsb_2021_03_01
        where
          operator in (
            'Delta Air Lines',
            'Alaska Airlines',
            'Southwest Airlines',
            'American Airlines',
            'United Airlines'
          )
          and altitude >= 1000
        group by
          callsign,
          h3
      ),
      comparison_features => CURSOR(
        SELECT
          geotoh3(st_x(location), st_y(location), 7) as h3,
          COUNT(*) as n
        from
          adsb_2021_03_01
        where
          callsign = 'DAL795'
          and altitude >= 1000
        group by
          h3
      ),
      use_tf_idf => false
    )
  )
ORDER BY
  similarity_score desc
limit
  10;
  
class|similarity_score
DAL795|1
DAL538|0.610889
DAL1192|0.3419932
DAL1185|0.3391671
SWA4346|0.3206964
DAL365|0.3037131
SWA953|0.2912168
UAL1559|0.2747431
SWA2098|0.2511763
DAL526|0.2473387

tf_geo_rasterize

Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, with taking the maximum z value across all points in each bin as the output value for the bin. The aggregate performed to compute the value for each bin is specified by agg_type, with allowed aggregate types of AVG, COUNT, SUM, MIN, and MAX. If neighborhood_fill_radius is set greater than 0, a blur pass/kernel will be computed on top of the results according to the optionally-specified fill_agg_type, with allowed types of GAUSS_AVG, BOX_AVG, COUNT, SUM, MIN, and MAX (if not specified, defaults to GAUSS_AVG, or a Gaussian-average kernel). if fill_only_nulls is set to true, only null bins from the first aggregate step will have final output values computed from the blur pass, otherwise if false all values will be affected by the blur pass.

Note that the arguments to bound the spatial output grid (x_min, x_max, y_min, y_max) are optional, however either all or none of these arguments must be supplied. If the arguments are not supplied, the bounds of the spatial output grid will be bounded by the x/y range of the input query, and if SQL filters are applied on the output of the tf_geo_rasterize table function, these filters will also constrain the output range.

SELECT * FROM TABLE(
  tf_geo_rasterize(
      raster => CURSOR(
        SELECT 
           x, y, z FROM table
      ),
      agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
      /* fill_agg_type is optional */
      [<fill_agg_type> => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'|'GAUSS_AVG'|'BOX_AVG'>,] 
      bin_dim_meters => <meters>, 
      geographic_coords => <true/false>, 
      neighborhood_fill_radius => <radius in bins>,
      fill_only_nulls => <true/false> [,
      <x_min> => <minimum output x-coordinate>,
      <x_max> => <maximum output x-coordinate>,
      <y_min> => <minimum output y-coordinate>,
      <y_max> => <maximum output y-coordinate>]
    ) 
  )...

Input Arguments

Parameter

Description

Data Types

x

X-coordinate column or expression

Column<FLOAT | DOUBLE>

y

Y-coordinate column or expression

Column<FLOAT | DOUBLE>

z

Z-coordinate column or expression. The output bin is computed as the maximum z-value for all points falling in each bin.

Column<FLOAT | DOUBLE>

agg_type

The aggregate to be performed to compute the output z-column. Should be one of 'AVG', 'COUNT', 'SUM', 'MIN', or 'MAX'.

TEXT ENCODING NONE

fill_agg_type (optional)

The aggregate to be performed when computing the blur pass on the output bins. Should be one of 'AVG', 'COUNT', 'SUM', 'MIN', 'MAX', ' 'AVG', 'COUNT', 'SUM', 'GAUSS_AVG', or 'BOX_AVG'. Note that AVG is synonymous with GAUSS_AVG in this context, and the default fill_agg_type if not specified is GAUSS_AVG.

TEXT ENCODING NONE

bin_dim_meters

The width and height of each x/y bin in meters. If geographic_coords is not set to true, the input x/y units are already assumed to be in meters.

DOUBLE

geographic_coords

If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.

BOOLEAN

neighborhood_fill_radius

The radius in bins to compute the box blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius bins.

DOUBLE

fill_only_nulls

Specifies that the box blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).

BOOLEAN

x_min (optional)

Min x-coordinate value (in input units) for the spatial output grid.

DOUBLE

x_max (optional)

Max x-coordinate value (in input units) for the spatial output grid.

DOUBLE

y_min (optional)

Min y-coordinate value (in input units) for the spatial output grid.

DOUBLE

y_max (optional)

Max y-coordinate value (in input units) for the spatial output grid.

DOUBLE

Output Columns

Name

Description

Data Types

x

The x-coordinates for the centroids of the output spatial bins.

Column<FLOAT | DOUBLE> (same as input x-coordinate column/expression)

y

The y-coordinates for the centroids of the output spatial bins.

Column<FLOAT | DOUBLE> (same as input y-coordinate column/expression)

z

The maximum z-coordinate of all input data assigned to a given spatial bin.

Column<FLOAT | DOUBLE> (same as input z-coordinate column/expression)

Example

/* Bin 10cm USGS LiDAR from Tallahassee to 1 meter, taking the minimum z-value
for each xy-bin. Then for each xy-bin, perform a Gaussian-average over the neighboring
100 xy-bins. This query yields the approximate terrain for an area after removing human-made
structures (due to the wide 100-bin Gaussian-average window), as can be seen in the 
right-hand render result in the screenshot below. Note that the LIMIT was only
applied to this SQL query and is not used in the rendered-screenshot below. */

SELECT
  x,
  y,
  z
FROM
  TABLE(
    tf_geo_rasterize(
      raster => CURSOR(
        SELECT
          ST_X(pt),
          ST_Y(pt),
          z
        FROM
          USGS_LPC_FL_LeonCo_2018_049377_N_LAS_2019
      ),
      bin_dim_meters => 1,
      geographic_coords => TRUE,
      neighborhood_fill_radius => 100,
      fill_only_nulls => FALSE,
      agg_type => 'MIN',
      fill_agg_type => 'GAUSS_AVG'
    )
  ) limit 20;
  
x|y|z
-84.29857764791747|30.40240526206634|-15.30264
-84.29086331121893|30.40264801040913|-17.25718
-84.29856722313815|30.40240526206634|-15.31047
-84.29855679835883|30.40240526206634|-15.31835
-84.29085288643959|30.40264801040913|-17.25859
-84.2985463735795|30.40240526206634|-15.32627
-84.30278925876371|30.402198476441|-17.09047
-84.29084246166028|30.40264801040913|-17.25993
-84.30277883398438|30.402198476441|-17.10194
-84.29853594880018|30.40240526206634|-15.33422
-84.30276840920506|30.402198476441|-17.11329
-84.29083203688096|30.40264801040913|-17.26122
-84.30275798442574|30.402198476441|-17.12446
-84.29852552402086|30.40240526206634|-15.34223
-84.30274755964642|30.402198476441|-17.1354
-84.29878614350392|30.40263002905041|-14.74146
-84.29119690415723|30.40236030866953|-17.22919
-84.30449892257258|30.40238728070761|-15.9867
-84.29328186002171|30.40223443915845|-17.63177
-84.29432433795395|30.40263901972977|-17.85748

tf_geo_rasterize_slope

Similar to tf_geo_rasterize, but also computes the slope and aspect per output bin. Aggregates point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type) across all points in each bin as the output value for the bin. A Gaussian average is then taken over the neighboring bins, with the number of bins specified by neighborhood_fill_radius, optionally only filling in null-valued bins if fill_only_nulls is set to true. The slope and aspect is then computed for every bin, based on the z values of that bin and its neighboring bins. The slope can be returned in degrees or as a fraction between 0 and 1, depending on the boolean argument to compute_slope_in_degrees.

Note that the bounds of the spatial output grid will be bounded by the x/y range of the input query, and if SQL filters are applied on the output of the tf_geo_rasterize_slope table function, these filters will also constrain the output range.

SELECT * FROM TABLE(
  tf_geo_rasterize_slope(
      raster => CURSOR(
        SELECT 
           x, y, z FROM table
      ),
      agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
      bin_dim_meters => <meters>, 
      geographic_coords => <true/false>, 
      neighborhood_fill_radius => <radius in bins>,
      fill_only_nulls => <true/false>,
      compute_slope_in_degrees => <true/false>
    )
 )

Input Arguments

Parameter

Description

Data Types

Output Columns

Example

/* Compute the slope and aspect ratio for a 30-meter Copernicus 
Digital Elevation Model (DEM) raster, binned to 90-meters */

select
  *
from
  table(
    tf_geo_rasterize_slope(
      raster => cursor(
        select
          st_x(raster_point),
          st_y(raster_point),
          CAST(z AS float)
        from
          copernicus_30m_mt_everest
      ),
      agg_type => 'AVG',
      bin_dim_meters => 90.0,
      geographic_coords => true,
      neighborhood_fill_radius => 1,
      fill_only_nulls => false,
      compute_slope_in_degrees => true
    )
  )
order by
  slope desc nulls last
limit
  20;
  
x|y|z|slope|aspect
86.96533511629579|27.96534132281817|6212.096|78.37033|18.09232
87.23751907091268|27.78489838800869|3793.584|78.17864|125.03
87.23660262662104|27.78408922686605|3929.989|78.06877|127.629
86.96625156058742|27.96534132281817|6041.277|78.00574|19.00616
87.2356861823294|27.78328006572341|3981.662|77.53327|127.3175
86.96441867200414|27.96615048396082|5869.373|77.3751|20.82031
86.95800356196267|27.96857796738875|6083.791|77.13709|29.89468
86.96350222771251|27.96615048396082|6081.35|77.08266|21.6792
87.23843551520432|27.78570754915134|3630.32|77.04676|125.2154
86.96441867200414|27.96534132281817|6378.94|76.95021|17.77107
87.22468885082972|27.81321902800121|4771.554|76.71017|253.2764
87.2356861823294|27.78247090458076|3520.049|76.63997|113.6511
87.23660262662104|27.78328006572341|3445.282|76.38319|127.2889
86.96716800487906|27.96534132281817|5864.711|76.16835|19.27573
87.23476973803776|27.78166174343812|3945.683|76.13519|102.7789
86.95708711767104|27.96857796738875|6336.072|76.13168|24.90349
87.22468885082972|27.81240986685857|4732.937|76.07494|264.7046
87.23751907091268|27.78408922686605|3367.659|76.0099|126.7463
86.9589200062543|27.9677688062461|6223.083|75.46346|26.85898
87.22377240653809|27.81402818914385|4704.619|75.41299|205.3219

tf_graph_shortest_path

Given a distance-weighted directed graph, consisting of a queryCURSOR input consisting of the starting and ending node for each edge and a distance, and a specified origin and destination node, tf_graph_shortest_path computes the shortest distance-weighted path through the graph between origin_node and destination_node, returning a row for each node along the computed shortest path, with the traversal-ordered index of that node and the cumulative distance from the origin_node to that node. If either origin_node or destination_node do not exist, an error is returned.

SELECT * FROM TABLE(
    tf_graph_shortest_path(
        edge_list => CURSOR(
            SELECT node1, node2, distance FROM table
        ),
        origin_node => <origin node>,
        destination_node => <destination node>
    )

Input Arguments

Parameter

Description

Data Types

Output Columns

Name

Description

Data Types

Example A

/* Compute the shortest flight route on United Airlines for the year 2008 as measured
by flight time between origin airport 'RDU' (Raleigh-Durham, NC) and destination 
airport 'SAT' (San Antonio, TX), adding 60 minutes for each leg to account for 
boarding/plane change time costs, and only counting routes that were flown at least
300 times during the year. */
 
SELECT
  *
FROM
  TABLE(
    tf_graph_shortest_path(
      edge_list => CURSOR(
        SELECT
          origin,
          dest,
          /* Add 60 minutes to each leg to account
          for boarding/plane change costs */
          AVG(airtime) + 60 as avg_airtime
        FROM
          flights_2008
        WHERE
          carrier_name = 'United Air Lines'
        GROUP by
          origin,
          dest
        HAVING
          COUNT(*) > 300
      ),
      origin_node => 'RDU',
      destination_node => 'SAT'
    )
  )
ORDER BY
  path_step
 
path_step|node|cume_distance
1|RDU|0
2|ORD|167
3|DEN|354
4|SAT|519

Example B

/* Compute the shortest path between along a time-traversal weighted
edge graph of roads in the Eastern United States between a location in North Carolina and
a location in Maine, joining to a node locations table to output the lon/lat pairs 
of each node. */

select
  path_step,
  node,
  lon,
  lat,
  cume_distance
from
  table(
    tf_graph_shortest_path(
      cursor(
        select
          node1,
          node2,
          traversal_time
        from
          usa_roads_east_time
      ),
      1561955,
      1591319
    )
  ),
  USA_roads_east_coords
where
  node = node_id 
order by 
  cume_distance desc
limit 20;

path_step|node|lon|lat|cume_distance
4380|1591319|-71.55136299999999|43.75256|13442017
4379|1591989|-71.55174099999999|43.75245|13441199
4378|1589348|-71.554147|43.752464|13436371
4377|2315795|-71.554867|43.752489|13434924
4376|1589286|-71.55497099999999|43.752113|13434214
4375|1589285|-71.555049|43.751833|13433685
4374|2315785|-71.555999|43.750704|13431238
4373|2315973|-71.55798799999999|43.748622|13426553
4372|2315950|-71.56366299999999|43.746268|13417798
4371|1589788|-71.56476599999999|43.745765|13416053
4370|1591997|-71.56484|43.745691|13415884
4369|1589787|-71.564886|43.745645|13415779
4368|2315951|-71.56517599999999|43.745353|13415113
4367|2315952|-71.56659499999999|43.744599|13412756
4366|1591999|-71.56685899999999|43.744565|13412397
4365|543394|-71.567357|43.744335|13411606
4364|543393|-71.567832|43.744116|13410852
4363|543392|-71.571827|43.743673|13405444
4362|541181|-71.57268499999999|43.743802|13404271
4361|1589786|-71.572964|43.743844|13403890

tf_graph_shortest_paths_distances

Given a distance-weighted directed graph, consisting of a queryCURSOR input consisting of the starting and ending node for each edge and a distance, and a specified origin node, tf_graph_shortest_paths_distances computes the shortest distance-weighted path distance between the origin_node and every other node in the graph. It returns a row for each node in the graph, with output columns consisting of the input origin_node, the given destination_node, the distance for the shortest path between the two nodes, and the number of edges or graph "hops" between the two nodes. If origin_node does not exist in the node1 column of the edge_list CURSOR, an error is returned.

SELECT * FROM TABLE(
    tf_graph_shortest_paths_distances(
        edge_list => CURSOR(
            SELECT node1, node2, distance FROM table
        ),
        origin_node => <origin node>
    )

Input Arguments

Parameter

Description

Data Types

Output Columns

Name

Description

Data Types

Example A

/* Compute the 10 furthest destination airports as measured by average travel-time
when departing origin airport 'RDU' (Raleigh-Durham, NC) on United Airlines for the
year 2008, adding 60 minutes for each leg to account forboarding/plane change time 
costs. */

SELECT
  *
FROM
  TABLE(
    tf_graph_shortest_paths_distances(
      edge_list => CURSOR(
        SELECT
          origin,
          dest,
          /* Add 60 minutes to each leg to account for boarding/plane change costs */
          AVG(airtime) + 60 as avg_airtime
        FROM
          flights_2008
        WHERE
          carrier_name = 'United Air Lines'
        GROUP by
          origin,
          dest
      ),
      origin_node => 'RDU'
    )
  )
ORDER BY
  distance DESC
LIMIT
  10;
  
origin_node|destination_node|distance|num_edges_traversed
RDU|JFK|803|3
RDU|LIH|757|2
RDU|KOA|746|2
RDU|HNL|735|2
RDU|OGG|728|2
RDU|EUG|595|3
RDU|ANC|586|2
RDU|SJC|468|2
RDU|SFO|468|2
RDU|OAK|468|2

Example B

/* Compute the all-destinations path distances along a time-traversal weighted
edge graph of roads in the Eastern United States from a location in North Carolina joining to a node locations table to output the lon/lat pairs 
of each destination node. */

select
  destination_node,
  lon,
  lat
  distance,
  num_edges_traversed
from
  table(
    tf_graph_shortest_paths_distances(
      cursor(
        select
          node1,
          node2,
          traversal_time
        from
          usa_roads_east_time
      ),
      1561955
    )
  ),
  USA_roads_east_coords
where
  destination_node = node_id
order by
  distance desc
limit
  20;
  
destination_node|lon|lat|distance|num_edges_traversed
2228153|-69.74701|46.941648|22021532|5387
324156|-69.67822799999999|46.990543|21916494|5386
324151|-69.687833|46.933106|21906798|5386
1372661|-69.64962799999999|46.942144|21830101|5385
320610|-69.47672399999999|46.967413|21807384|5379
324152|-69.637714|46.958516|21798959|5385
1372667|-69.633437|46.95189999999999|21793379|5385
1372662|-69.63483099999999|46.954334|21786119|5384
2228156|-69.622767|46.949534|21768541|5383
1372670|-69.58720599999999|46.942504|21759257|5382
1372663|-69.62387099999999|46.968569|21741445|5383
2226724|-69.557773|46.969276|21714682|5381
324159|-69.607209|46.967823|21709789|5382
324160|-69.59385999999999|46.967445|21691648|5382
2228155|-69.59575599999999|46.967461|21688053|5381
320578|-69.57176699999999|47.067628|21683322|5377
1372669|-69.58906999999999|46.977104|21675010|5382
2226740|-69.582106|46.991048|21673764|5379
320609|-69.55000199999999|46.966089|21668411|5378
324158|-69.585776|46.973521|21663260|5381

tf_load_point_cloud

Loads one or more las or laz point cloud/LiDAR files from a local file or directory source, optionally tranforming the output SRID to out_srs (if not specified, output points are automatically transformed to EPSG:4326 lon/lat pairs).

If use_cache is set to true, an internal point cloud-specific cache will be used to hold the results per input file, and if queried again will significantly speed up the query time, allowing for interactive querying of a point cloud source. If the results of tf_load_point_cloud will only be consumed once (for example, as part of a CREATE TABLE statement), it is highly recommended that use_cache is set to false or left unspecified (as it is defaulted to false) to avoid the performance and memory overhead incurred by used of the cache.

The bounds of the data retrieved can be optionally specified with the x_min, x_max, y_min, y_max arguments. These arguments can be useful when the user desires to retrieve a small geographic area from a large point-cloud file set, as files containing data outside the bounds of the specified bounding box will be quickly skipped by tf_load_point_cloud, only requiring a quick read of the spatial metadata for the file.

SELECT * FROM TABLE(
    tf_load_point_cloud(
        path => <path>,
        [out_srs => <out_srs>,
        use_cache => <use_cache>,
        x_min => <x_min>,
        x_max => <x_max>,
        y_min => <y_min>,
        y_max => <y_max>]
    )
)

Input Arguments

Parameter

Description

Data Types

Output Columns

Example A

CREATE TABLE wake_co_lidar_test AS
SELECT
  *
FROM
  TABLE(
    tf_load_point_cloud(
      path => '/path/to/20150118_LA_37_20066601.laz'
    )
  );

Example B

SELECT
  x, y, z, classification
FROM
  TABLE(
    tf_load_point_cloud(
      path => '/path/to/las_files/*.las',
      out_srs => 'EPSG:4326',
      use_cache => true,
      y_min => 37.0,
      y_max => 38.0,
      x_min => -123.0,
      x_max => -122.0
    )
  )

System Table Functions

Concepts

CURSOR Subquery Inputs

SELECT * FROM (TABLE(my_table_function /* This is only an example! */ (
 CURSOR(SELECT arg1, arg2, arg3 FROM input_1 WHERE x > 10) /* First CURSOR 
 argument consisting of 3 columns */,
 CURSOR(SELECT arg1, AVG(arg2) FROM input_2 GROUP BY arg1 where y < 40) 
 /* Second CURSOR argument constisting of 2 columns. This could be from the same
 table as the first CURSOR, or as is the case here, a completely different table
 (or even joined table or logical value expression) */,
 'Fred' /* TEXT constant literal argument */,
 true /* BOOLEAN constant literal argument */,
 (SELECT COUNT(*) FROM another_table), /* scalar subquery results do not need
 to be wrapped in a CURSOR */,
 27.3 /* FLOAT constant literal argument */))
WHERE output1 BETWEEN 32.2 AND 81.8;

ColumnList Inputs

Named Arguments

/* The following two table function calls, the first with unnamed
 signature-ordered arguments, and the second with named arguments,
 are equivalent */

select
  *
from
  table(
    tf_compute_dwell_times(
      /* Without the use of named arguments, input arguments must
      be ordered as specified by the table function signature */
      cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      3,
      600,
      10800
    )
  )
order by
  num_dwell_points desc
limit
  10;


select
  *
from
  table(
    tf_compute_dwell_times(
     /* Using named arguments, input arguments can be
     ordered in any order, as long as all arguments are named */
     min_dwell_seconds => 600,
     max_inactive_seconds => 10800
      data => cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      min_dwell_points => 3
    )
  )
order by
  num_dwell_points desc
limit
  10;

Filter Push-Down

SELECT
  *
FROM
  TABLE(
    my_spatial_table_function(
      CURSOR(
        SELECT
          x,
          y
        from
          spatial_data_table
          /* Presuming filter push down is enabled for 
          my_spatial_table_function, the filter applied to 
          x and y will be applied here to the table function
          input CURSOR */
      )
    )
  )
WHERE
  x BETWEEN 38.2
  AND 39.1
  and Y BETWEEN -121.4
  and -120.1;

To determine whether filter push-down is used, you can check the Boolean value of the filter_table_transpose column from the query:

SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

Currently for system table functions, you cannot change push-down behavior.

Querying Registered Table Functions

You can query which table functions are available using SHOW TABLE FUNCTIONS:

SHOW TABLE FUNCTIONS;

Table UDF

tf_feature_similarity
tf_feature_self_similarity
tf_geo_rasterize_slope
...

Query Metadata for a Specific Table Function

SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

name|signature|input_names|input_types|output_names|output_types|CPU|GPU|Runtime|filter_table_transpose
generate_series|(i64 series_start, i64 series_stop, i64 series_step) -> Column<i64>|[series_start, series_stop, series_step]|[i64, i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false
generate_series|(i64 series_start, i64 series_stop) -> Column<i64>|[series_start, series_stop]|[i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false

System Table Functions

The following system table functions are available in HEAVY.AI. The table provides a summary and links to more inforamation about each function.

Function

Purpose

Generates random string data.

Generates a series of integer values.

Generates a series of timestamp values from start_timestamp to end_timestamp.

Computes the over the complex domain [x_min, x_max), [y_min, y_max), discretizing the xy-space into an output of dimensions x_pixels X y_pixels.

Process a raster input to derive contour lines or regions and output as LINESTRING or POLYGON for rendering or further processing.

Used for generating top-k signals where 'k' represents the maximum number of antennas to consider at each geographic location. The full relevant parameter name is strongest_k_sources_per_terrain_bin.

For information about the HeavyRF radio frequency propagation simulation and HeavyRF table functions, see HeavyRF.

The TABLE command is required to wrap a table function clause; for example: select * from TABLE(generate_series(1, 10));

The CURSOR command is required to wrap any subquery inputs.

tf_geo_rasterize

SELECT * FROM TABLE(
  tf_geo_rasterize(
      raster => CURSOR(
        SELECT 
           x, y, z FROM table
      ),
      agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
      /* fill_agg_type is optional */
      [<fill_agg_type> => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'|'GAUSS_AVG'|'BOX_AVG'>,] 
      bin_dim_meters => <meters>, 
      geographic_coords => <true/false>, 
      neighborhood_fill_radius => <radius in bins>,
      fill_only_nulls => <true/false> [,
      <x_min> => <minimum output x-coordinate>,
      <x_max> => <maximum output x-coordinate>,
      <y_min> => <minimum output y-coordinate>,
      <y_max> => <maximum output y-coordinate>]
    ) 
  )...

Input Arguments

Parameter

Description

Data Types

x

X-coordinate column or expression

Column<FLOAT | DOUBLE>

y

Y-coordinate column or expression

Column<FLOAT | DOUBLE>

z

Z-coordinate column or expression. The output bin is computed as the maximum z-value for all points falling in each bin.

Column<FLOAT | DOUBLE>

agg_type

The aggregate to be performed to compute the output z-column. Should be one of 'AVG', 'COUNT', 'SUM', 'MIN', or 'MAX'.

TEXT ENCODING NONE

fill_agg_type (optional)

TEXT ENCODING NONE

bin_dim_meters

The width and height of each x/y bin in meters. If geographic_coords is not set to true, the input x/y units are already assumed to be in meters.

DOUBLE

geographic_coords

BOOLEAN

neighborhood_fill_radius

The radius in bins to compute the box blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius bins.

DOUBLE

fill_only_nulls

Specifies that the box blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).

BOOLEAN

x_min (optional)

Min x-coordinate value (in input units) for the spatial output grid.

DOUBLE

x_max (optional)

Max x-coordinate value (in input units) for the spatial output grid.

DOUBLE

y_min (optional)

Min y-coordinate value (in input units) for the spatial output grid.

DOUBLE

y_max (optional)

Max y-coordinate value (in input units) for the spatial output grid.

DOUBLE

Output Columns

Name

Description

Data Types

x

The x-coordinates for the centroids of the output spatial bins.

Column<FLOAT | DOUBLE> (same as input x-coordinate column/expression)

y

The y-coordinates for the centroids of the output spatial bins.

Column<FLOAT | DOUBLE> (same as input y-coordinate column/expression)

z

The maximum z-coordinate of all input data assigned to a given spatial bin.

Column<FLOAT | DOUBLE> (same as input z-coordinate column/expression)

Example

/* Bin 10cm USGS LiDAR from Tallahassee to 1 meter, taking the minimum z-value
for each xy-bin. Then for each xy-bin, perform a Gaussian-average over the neighboring
100 xy-bins. This query yields the approximate terrain for an area after removing human-made
structures (due to the wide 100-bin Gaussian-average window), as can be seen in the 
right-hand render result in the screenshot below. Note that the LIMIT was only
applied to this SQL query and is not used in the rendered-screenshot below. */

SELECT
  x,
  y,
  z
FROM
  TABLE(
    tf_geo_rasterize(
      raster => CURSOR(
        SELECT
          ST_X(pt),
          ST_Y(pt),
          z
        FROM
          USGS_LPC_FL_LeonCo_2018_049377_N_LAS_2019
      ),
      bin_dim_meters => 1,
      geographic_coords => TRUE,
      neighborhood_fill_radius => 100,
      fill_only_nulls => FALSE,
      agg_type => 'MIN',
      fill_agg_type => 'GAUSS_AVG'
    )
  ) limit 20;
  
x|y|z
-84.29857764791747|30.40240526206634|-15.30264
-84.29086331121893|30.40264801040913|-17.25718
-84.29856722313815|30.40240526206634|-15.31047
-84.29855679835883|30.40240526206634|-15.31835
-84.29085288643959|30.40264801040913|-17.25859
-84.2985463735795|30.40240526206634|-15.32627
-84.30278925876371|30.402198476441|-17.09047
-84.29084246166028|30.40264801040913|-17.25993
-84.30277883398438|30.402198476441|-17.10194
-84.29853594880018|30.40240526206634|-15.33422
-84.30276840920506|30.402198476441|-17.11329
-84.29083203688096|30.40264801040913|-17.26122
-84.30275798442574|30.402198476441|-17.12446
-84.29852552402086|30.40240526206634|-15.34223
-84.30274755964642|30.402198476441|-17.1354
-84.29878614350392|30.40263002905041|-14.74146
-84.29119690415723|30.40236030866953|-17.22919
-84.30449892257258|30.40238728070761|-15.9867
-84.29328186002171|30.40223443915845|-17.63177
-84.29432433795395|30.40263901972977|-17.85748

System Table Functions

Concepts

CURSOR Subquery Inputs

ColumnList Inputs

Named Arguments

Filter Push-Down

Querying Registered Table Functions

Query Metadata for a Specific Table Function

System Table Functions

generate_random_strings

Input Arguments

Output Columns

generate_series

generate_series (Integers)

Input Arguments

Output Columns

generate_series (Timestamps)

tf_compute_dwell_times

Syntax

Input Arguments

Output Columns

tf_feature_self_similarity

Input Arguments

Output Columns

tf_feature_similarity

Input Arguments

Output Columns

Example

tf_geo_rasterize

Input Arguments

Output Columns

tf_geo_rasterize_slope

Input Arguments

Output Columns

tf_graph_shortest_path

tf_graph_shortest_paths_distances

tf_load_point_cloud

tf_feature_similarity

Input Arguments

Output Columns

Example

System Table Functions

Concepts

CURSOR Subquery Inputs

ColumnList Inputs

Named Arguments

Filter Push-Down

Querying Registered Table Functions

Query Metadata for a Specific Table Function

System Table Functions

tf_feature_self_similarity

Input Arguments

Output Columns

generate_random_strings

Input Arguments

Output Columns

generate_series

generate_series (Integers)

Input Arguments

Output Columns

generate_series (Timestamps)

tf_compute_dwell_times

Syntax

Input Arguments

Output Columns

tf_geo_rasterize

Input Arguments

Output Columns

tf_geo_rasterize_slope

Input Arguments

Output Columns

tf_graph_shortest_paths_distances

tf_load_point_cloud

tf_graph_shortest_path

tf_point_cloud_metadata

tf_mandelbrot*

tf_mandelbrot

tf_mandelbrot_cuda

tf_mandelbrot_float

tf_mandelbrot_cuda_float