Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Use INSERT for both single- and multi-row ad hoc inserts. (When inserting many rows, use the more efficient COPY command.)
You can also insert into a table as SELECT, as shown in the following examples:
You can insert array literals into array columns. The inserts in the following example each have three array values, and demonstrate how you can:
Create a table with variable-length and fixed-length array columns.
Insert NULL
arrays into these colums.
Specify and insert array literals using {...}
or ARRAY[...]
syntax.
Insert empty variable-length arrays using{}
and ARRAY[]
syntax.
Insert array values that contain NULL
elements.
If you create a table with column that has a default value, or alter a table to add a column with a default value, using the INSERT command creates a record that includes the default value if it is omitted from the INSERT. For example, assume a table created as follows:
If you omit the name column from an INSERT or INSERT FROM SELECT statement, the missing value for column name
is set to 'John Doe'
.
INSERT INTO tbl (id, age) VALUES (1, 36);
creates the record 1|'John Doe'|36
.
INSERT INTO tbl (id, age) SELECT id, age FROM old_tbl;
also sets all the name values to John Doe
.
Deletes rows that satisfy the WHERE
clause from the specified table. If the WHERE clause is absent, all rows in the table are deleted, resulting in a valid but empty table.
In Release 6.4 and higher, you can run DELETE queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases.
To execute queries against another database, you must have ACCESS privilege on that database, as well as DELETE privilege.
Delete rows from a table in the my_other_db
database:
Shows generated Intermediate Representation (IR) code, identifying whether it is executed on GPU or CPU. This is primarily used internally by HEAVY.AI to monitor behavior.
For example, when you use the EXPLAIN
command on a basic statement, the utility returns 90 lines of IR code that is not meant to be human readable. However, at the top of the listing, a heading indicates whether it is IR for the CPU
or IR for the GPU
, which can be useful to know in some situations.
Returns a relational algebra tree describing the high-level plan to execute the statement.
The table below lists the relational algebra classes used to describe the execution plan for a SQL statement.
For example, a SELECT
statement is described as a table scan and projection.
If you add a sort order, the table projection is folded under a LogicalSort
procedure.
When the SQL statement is simple, the EXPLAIN CALCITE version is actually less “human readable.” EXPLAIN CALCITE is more useful when you work with more complex SQL statements, like the one that follows. This query performs a scan on the BOOK table before scanning the BOOK_ORDER table.
Revising the original SQL command results in a more natural selection order and a more performant query.
Augments the EXPLAIN CALCITE command by adding details about referenced columns in the query plan.
For example, for the following EXPLAIN CALCITE command execution:
EXPLAIN CALCITE DETAILED adds more column details as seen below:
Change a parameter value for the current session.
Paremeter name | Values |
---|---|
Switch to another database without need of re-login.
Your session will silently switch to the requested database.
The database exists, but the user does not have access to it:
The database does not exist:
Force the session to run the subsequent SQL commands in CPU mode:
Switch back the session to run in GPU mode
HEAVY.AI supports arrays in dictionary-encoded text and number fields (TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, and DOUBLE). Data stored in arrays are not normalized. For example, {green,yellow} is not the same as {yellow,green}. As with many SQL-based services, OmniSci array indexes are 1-based.
HEAVY.AI supports NULL variable-length arrays for all integer and floating-point data types, including dictionary-encoded string arrays. For example, you can insert NULL
into BIGINT[ ], DOUBLE[ ], or TEXT[ ] columns. HEAVY.AI supports NULL fixed-length arrays for all integer and floating-point data types, but not for dictionary-encoded string arrays. For example, you can insert NULL
into BIGINT[2] DOUBLE[3], but not into TEXT[2] columns.
Expression | Description |
---|
The following examples show query results based on the table test_array
created with the following statement:
The following queries use arrays in an INTEGER field:
Usage Notes
SQL normally assumes that terms in the WHERE
clause that cannot be used by indices are usually true. If this assumption is incorrect, it could lead to a suboptimal query plan. Use the LIKELY(X)
and UNLIKELY(X)
SQL functions to provide hints to the query planner about clause terms that are probably not true, which helps the query planner to select the best possible plan.
Use LIKELY
/UNLIKELY
to optimize evaluation of OR
/AND
logical expressions. LIKELY
/UNLIKELY
causes the left side of an expression to be evaluated first. This allows the right side of the query to be skipped when possible. For example, in the clause UNLIKELY(A) AND B
, if A
evaluates to FALSE
, B
does not need to be evaluated.
Consider the following:
If x
is one of the values 7
, 8
, 9
, or 10
, the filter y > 42
is applied. If x
is not one of those values, the filter y > 42
is not applied.
Geospatial and array column projections are not supported in the COALESCE
function and CASE expressions.
You can use a subquery anywhere an expression can be used, subject to any runtime constraints of that expression. For example, a subquery in a CASE statement must return exactly one row, but a subquery can return multiple values to an IN expression.
You can use a subquery anywhere a table is allowed (for example, FROM
subquery), using aliases to name any reference to the table and columns returned by the subquery.
The SELECT command returns a set of records from one or more tables.
Sort order defaults to ascending (ASC).
Sorts null values after non-null values by default in an ascending sort, before non-null values in a descending sort. For any query, you can use NULLS FIRST to sort null values to the top of the results or NULLS LAST to sort null values to the bottom of the results.
Allows you to use a positional reference to choose the sort column. For example, the command SELECT colA,colB FROM table1 ORDER BY 2
sorts the results on colB
because it is in position 2.
HEAVY.AI provides various query hints for controlling the behavior of the query execution engine.
SELECT hints must appear first, immediately after the SELECT statement; otherwise, the query fails.
By default, a hint is applied to the query step in which it is defined. If you have multiple SELECT clauses and define a query hint in one of those clauses, the hint is applied only to the specific query step; the rest of the query steps are unaffected. For example, applying the /*+ cpu_mode */
hint affects only the SELECT clause in which it exists.
You can define a hint to apply to all query steps by prepending g_
to the query hint. For example, if you define /*+ g_cpu_mode */
, CPU execution is applied to all query steps.
HEAVY.AI supports the following query hints.
The marker hint type represents a Boolean flag.
The key-value pair type is a hint name and its value.
In Release 6.4 and higher, you can run SELECT queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases. This enables more efficient storage and memory utilization by eliminating the need for table duplication across databases, and simplifies access to shared data and tables.
To execute queries against another database, you must have ACCESS privilege on that database, as well as SELECT privilege.
Execute a join query involving a table in the current database and another table in the my_other_db
database:
If a join column name or alias is not unique, it must be prefixed by its table name.
You can use BIGINT, INTEGER, SMALLINT, TINYINT, DATE, TIME, TIMESTAMP, or TEXT ENCODING DICT data types. TEXT ENCODING DICT is the most efficient because corresponding dictionary IDs are sequential and span a smaller range than, for example, the 65,535 values supported in a SMALLINT field. Depending on the number of values in your field, you can use TEXT ENCODING DICT(32) (up to approximately 2,150,000,000 distinct values), TEXT ENCODING DICT(16) (up to 64,000 distinct values), or TEXT ENCODING DICT(8) (up to 255 distinct values). For more information, see .
When possible, joins involving a geospatial operator (such as ST_Contains
) build a binned spatial hash table (overlaps hash join), falling back to a Cartesian loop join if a spatial hash join cannot be constructed.
The enable-overlaps-hashjoin
flag controls whether the system attempts to use the overlaps spatial join strategy (true
by default). If enable-overlaps-hashjoin
is set to false, or if the system cannot build an overlaps hash join table for a geospatial join operator, the system attempts to fall back to a loop join. Loop joins can be performant in situations where one or both join tables have a small number of rows. When both tables grow large, loop join performance decreases.
Two flags control whether or not the system allows loop joins for a query (geospatial for not): allow-loop-joins
and trivial-loop-join-threshold
. By default, allow-loop-joins
is set to false
and trivial-loop-join-threshold
to 1,000 (rows). If allow allow-loop-joins
is set to true
, the system allows any query with a loop join, regardless of table cardinalities (measured in number of rows). If left to the implicit default of false
or set explicitly to false
, the system allows loop join queries as long as the inner table (right-side table) has fewer rows than the threshold specified by trivial-loop-join-threshold
.
For optimal performance, the system should utilize overlaps hash joins whenever possible. Use the following guidelines to maximize the use of the overlaps hash join framework and minimize fallback to loop joins when conducting geospatial joins:
The inner (right-side) table should always be the more complicated primitive. For example, for ST_Contains(polygon, point)
, the point table should be the outer (left) table and the polygon table should be the inner (right) table.
Currently, ST_CONTAINS
and ST_INTERSECTS
joins between point and polygons/multi-polygon tables, and ST_DISTANCE < {distance}
between two point tables are supported for accelerated overlaps hash join queries.
For pointwise-distance joins, only the pattern WHERE ST_DISTANCE(table_a.point_col, table_b.point_col) < distance_in_degrees
supports overlaps hash joins. Patterns like the following fall back to loop joins:
WHERE ST_DWITHIN(table_a.point_col, table_b.point_col, distance_in_degrees)
WHERE ST_DISTANCE(ST_TRANSFORM(table_a.point_col, 900913), ST_TRANSFORM(table_b.point_col, 900913)) < 100
You can create joins in a distributed environment in two ways:
Replicate small dimension tables that are used in the join.
Create a shard key on the column used in the join (note that there is a limit of one shard key per table). If the column involved in the join is a TEXT ENCODED field, you must create a SHARED DICTIONARY that references the FACT table key you are using to make the join.
The join order for one small table and one large table matters. If you swap the sales and customer tables on the join, it throws an exception stating that table "sales" must be replicated.
Interrupt a queued query. Specify the query by using its session ID.
To see the queries in the queue, use the command:
To interrupt the last query in the list (ID 946-ooNP
):
Showing the queries again indicates that 946-ooNP
has been deleted:
KILL QUERY is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt
) is set.
Interrupting a query in ‘PENDING_QUEUE’ status is supported in both distributed and single-server mode.
To enable query interrupt for tables imported from data files in local storage, set enable_non_kernel_time_query_interrupt
to TRUE. (It is enabled by default.)
Changes the values of the specified columns based on the assign
argument (identifier=expression
) in all rows that satisfy the condition in the WHERE
clause.
Currently, HEAVY.AI does not support updating a geo column type (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, or MULTIPOLYGON) in a table.
You can update a table via subquery, which allows you to update based on calculations performed on another table.
Examples
In Release 6.4 and higher, you can run UPDATE queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases.
To execute queries against another database, you must have ACCESS privilege on that database, as well as UPDATE privilege.
Update a row in a table in the my_other_db
database:
For more information, see .
Hint | Details | Example |
---|
Hint | Details | Example |
---|
Method
Description
LogicalAggregate
Operator that eliminates duplicates and computes totals.
LogicalCalc
Expression that computes project expressions and also filters.
LogicalChi
Operator that converts a stream to a relation.
LogicalCorrelate
Operator that performs nested-loop joins.
LogicalDelta
Operator that converts a relation to a stream.
LogicalExchange
Expression that imposes a particular distribution on its input without otherwise changing its content.
LogicalFilter
Expression that iterates over its input and returns elements for which a condition evaluates to true.
LogicalIntersect
Expression that returns the intersection of the rows of its inputs.
LogicalJoin
Expression that combines two relational expressions according to some condition.
LogicalMatch
Expression that represents a MATCH_RECOGNIZE node.
LogicalMinus
Expression that returns the rows of its first input minus any matching rows from its other inputs. Corresponds to the SQL EXCEPT operator.
LogicalProject
Expression that computes a set of ‘select expressions’ from its input relational expression.
LogicalSort
Expression that imposes a particular sort order on its input without otherwise changing its content.
LogicalTableFunctionScan
Expression that calls a table-valued function.
LogicalTableModify
Expression that modifies a table. Similar to TableScan, but represents a request to modify a table instead of read from it.
LogicalTableScan
Reads all the rows from a RelOptTable.
LogicalUnion
Expression that returns the union of the rows of its inputs, optionally eliminating duplicates.
LogicalValues
Expression for which the value is a sequence of zero or more literal row values.
LogicalWindow
Expression representing a set of window aggregates. See Window Functions
EXECUTOR_DEVICE
CPU - Set the session to CPU execution mode:
ALTER SESSION SET EXECUTOR_DEVICE='CPU';
GPU - Set the session to GPU execution mode:
ALTER SESSION SET EXECUTOR_DEVICE='GPU';
NOTE: These parameter values have the same effect as the \cpu
and \gpu
commands in heavysql, but can be used with any tool capable of running sql commands.
CURRENT_DATABASE
Can be set to any string value.
If the value is a valid database name, and the current user has access to it, the session switches to the new database. If the user does not have access or the database does not exist, an error is returned and the session will fall back to the starting database.
Operator | Description |
| Logical AND |
| Negates value |
| Logical OR |
Expression | Description |
| Case operator |
| Returns the first non-null value in the list |
Expression | Description |
| Evaluates whether expr equals any value of the IN list. |
| Evaluates whether expr does not equal any value of the IN list. |
allow_loop_join | Enable loop joins. |
|
cpu_mode | Force CPU execution mode. |
|
columnar_output | Enable columnar output for the input query. |
|
disable_loop_join | Disable loop joins. |
|
dynamic_watchdog | Enable dynamic watchdog. |
|
dynamic_watchdog_off | Disable dynamic watchdog. |
|
force_baseline_hash_join | Use the baseline hash join scheme by skipping the perfect hash join scheme, which is used by default. |
|
force_one_to_many_hash_join | Deploy a one-to-many hash join by skipping one-to-one hash join, which is used by default. |
|
keep_result | Add result set of the input query to the result set cache. |
|
keep_table_function_result | Add result set of the table function query to the result set cache. |
|
overlaps_allow_gpu_build | Use GPU (if available) to build an overlaps join hash table. (CPU is used by default.) |
|
overlaps_no_cache | Skip adding an overlaps join hash table to the hash table cache. |
|
rowwise_output | Enable row-wise output for the input query. |
|
watchdog | Enable watchdog. |
|
watchdog_off | Disable watchdog. |
|
aggregate_tree_fanout | Defines a fan out of a tree used to compute window aggregation over frame. Depending on the frame size, the tree fanout affects the performance of aggregation and the tree construction for each window function with a frame clause.
|
|
loop_join_inner_table_max_num_rows | Set the maximum number of rows available for a loop join.
| Set the maximum number of rows to 100:
|
max_join_hash_table_size | Set the maximum size of the hash table.
| Set the maximum size of the join hash table to 100:
|
overlaps_bucket_threshold | Set the overlaps bucket threshold.
| Set the overlaps threshold to 10:
|
overlaps_max_size | Set the maximum overlaps size.
| Set the maximum overlap to 10:
|
overlaps_keys_per_bin | Set the number of overlaps keys per bin.
|
|
query_time_limit | Set the maximum time for the query to run.
|
|
Expression | Example | Description |
|
| Converts an expression to another data type. For conversions to a TEXT type, use TRY_CAST. |
|
| Converts a text to a non-text type, returning null if the conversion could not be successfully performed. |
|
| Converts a none-encoded text type to a dictionary-encoded text type. |
FROM/TO: |
|
|
|
|
|
|
|
|
|
|
|
|
| - | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | No | n/a |
| Yes | - | Yes | Yes | Yes | Yes | Yes | Yes | No | No | No | n/a |
| Yes | Yes | - | Yes | Yes | Yes | Yes | Yes | Yes | No | No | No |
| Yes | Yes | Yes | - | Yes | Yes | Yes | Yes | No | No | No | No |
| Yes | Yes | Yes | Yes | - | Yes | No | Yes | No | No | No | No |
| Yes | Yes | Yes | Yes | Yes | - | No | Yes | No | No | No | n/a |
| Yes | Yes | Yes | Yes | Yes | Yes | - | Yes | No | No | No | n/a |
| Yes (Use TRY_CAST) | Yes (Use TRY_CAST) | Yes (Use TRY_CAST) | Yes (Use TRY_CAST) | Yes (Use TRY_CAST) | Yes (Use TRY_CAST) | Yes (Use TRY_CAST) | - | Yes (Use TRY_CAST) | Yes (Use TRY_CAST) | Yes (Use TRY_CAST) | Yes (Use TRY_CAST) |
| No | No | Yes | No | No | No | No | Yes | - | n/a | n/a | n/a |
| No | No | No | No | No | No | No | Yes | n/a | - | No | Yes |
| No | No | No | No | No | No | No | Yes | n/a | No | - | n/a |
| No | No | No | No | No | No | No | Yes | n/a | Yes | No | - |
Expression | Description |
| Provides a hint to the query planner that argument |
| Provides a hint to the query planner that argument |
| Returns value(s) from specific location |
| Extract the values in the array to a set of rows. Requires |
|
|
|
|
| Returns the number of elements in an array. For example: |
Given a query input with entity keys (for example, user IP addresses) and timestamps (for example, page visit timestamps), and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session (dwell time).
Example
Use SHOW
commands to get information about databases, tables, and user sessions.
Shows the CREATE SERVER statement that could have been used to create the server.
Shows the CREATE TABLE statement that could have been used to create the table.
Retrieve the databases accessible for the current user, showing the database name and owner.
Show registered compile-time UDFs and extension functions in the system and their arguments.
Displays a list of all row-level security (RLS) policies that exist for a user or role; admin rights are required. If EFFECTIVE is used, the list also includes any policies that exist for all roles that apply to the requested user or role.
Returns a list of queued queries in the system; information includes session ID, status, query string, account login name, client address, database name, and device type (CPU or GPU).
Admin users can see and interrupt all queries, and non-admin users can see and interrupt only their own queries
NOTE: SHOW QUERIES is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt
) is set.
To interrupt a query in the queue, see KILL QUERY.
If included with a name, lists the role granted directly to a user or role. SHOW EFFECTIVE ROLES with a name lists the roles directly granted to a user or role, and also lists the roles indirectly inherited through the directly granted roles.
If the user name or role name is omitted, then a regular user sees their own roles, and a superuser sees a list of all roles existing in the system.
Show user-defined runtime functions and table functions.
Show data connectors.
Displays storage-related information for a table, such as the table ID/name, number of data/metadata files used by the table, total size of data/metadata files, and table epoch values.
You can see table details for all tables that you have access to in the current database, or for only those tables you specify.
Show details for all tables you have access to:
Show details for table omnisci_states
:
The number of columns returned includes system columns. As a result, the number of columns in column_count
can be up to two greater than the number of columns created by the user.
Displays the list of available system (built-in) table functions.
For more information, see System Table Functions.
Show detailed output information for the specified table function. Output details vary depending on the table function specified.
View SHOW output for the generate_series
table function:
Retrieve the servers accessible for the current user.
Retrieve the tables accessible for the current user.
Lists name, ID, and default database for all or specified users for the current database. If the command is issued by a superuser, login permission status is also shown. Only superusers see users who do not have permission to log in.
SHOW [ALL] USER DETAILS lists name, ID, superuser status, default database, and login permission status for all users across the HeavyDB instance. This variant of the command is available only to superusers. Regular users who run the SHOW ALL USER DETAILS command receive an error message.
Show all user details for all users:
Show all user details for specified users ue, ud, ua, and uf:
If a specified user is not found, the superuser sees an error message:
Show user details for specified users ue, ud, and uf:
Show user details for all users:
Running SHOW ALL USER DETAILS results in an error message:
Show user details for all users:
If a specified user is not found, the user sees an error message:
Show user details for user ua:
Retrieve all persisted user sessions, showing the session ID, user login name, client address, and database name. Admin or superuser privileges required.
Interrupt a queued query. Specify the query by using its session ID.
To see the queries in the queue, use the SHOW QUERIES command:
To interrupt the last query in the list (ID 946-ooNP
):
Showing the queries again indicates that 946-ooNP
has been deleted:
KILL QUERY is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt
) is set.
Interrupting a query in ‘PENDING_QUEUE’ status is supported in both distributed and single-server mode.
To enable query interrupt for tables imported from data files in local storage, set enable_non_kernel_time_query_interrupt
to TRUE. (It is enabled by default.)
HEAVY.AI provides access to a set system-provided table functions, also known as table-valued functions (TVS). System table functions, like user-defined table functions, support execution of queries on both CPU and GPU over one or more SQL result-set inputs. Table function support in HEAVY.AI can be split into two broad categories: system table functions and user-defined table functions (UDTFs). System table functions are built-in to the HEAVY.AI server, while UDTFs can be declared dynamically at run-time by specifying them in Numba, a subset of the Python language. For more information on UDTFs, see User-Defined Table Functions.
To improve performance, table functions can be declared to enable filter pushdown optimization, which allows the Calcite optimizer to "push down" filters on the output(s) of a table functions to its input(s) when the inputs and outputs are declared to be semantically equivalent (for example, a longitude variable that is input and output from a table function). This can significantly increase performance in cases where only a small portion of one or more input tables is required to compute the filtered output of a table function.
Whether system- or user-provided, table functions can execute over one or more result sets specified by subqueries, and can also take any number of additional constant literal arguments specified in the function definition. SQL subquery inputs can consist of any SQL expression (including multiple subqueries, joins, and so on) allowed by HeavyDB, and the output can be filtered, grouped by, joined, and so on like a normal SQL subquery, including being input into additional table functions by wrapping it in a CURSOR
argument. The number and types of input arguments, as well as the number and types of output arguments, are specified in the table function definition itself.
Table functions allow for the efficient execution of advanced algorithms that may be difficult or impossible to express in canonical SQL. By allowing execution of code directly over SQL result sets, leveraging the same hardware parallelism used for fast SQL execution and visualization rendering, HEAVY.AI provides orders-of-magnitude speed increases over the alternative of transporting large result sets to other systems for post-processing and then returning to HEAVY.AI for storage or downstream manipulation. You can easily invoke system-provided or user-defined algorithms directly inline with SQL and rendering calls, making prototyping and deployment of advanced analytics capabilities easier and more streamlined.
Table functions can take as input arguments both constant literals (including scalar results of subqueries) as well as results of other SQL queries (consisting of one or more rows). The latter (SQL query inputs), per the SQL standard, must be wrapped in the keyword CURSOR
. Depending on the table function, there can be 0, 1, or multiple CURSOR inputs. For example:
Certain table functions can take 1 or more columns of a specified type or types as inputs, denoted as ColumnList<TYPE1 | Type2... TypeN>
. Even if a function allows aColumnList
input of multiple types, the arguments must be all of one type; types cannot be mixed. For example, if a function allows ColumnList<INT | TEXT ENCODING DICT>
, one or more columns of either INTEGER or TEXT ENCODING DICT can be used as inputs, but all must be either INT columns or TEXT ENCODING DICT columns.
All HEAVY.AI system table functions allow you to specify argument either in conventional comma-separated form in the order specified by the table function signature, or alternatively via a key-value map where input argument names are mapped to argument values using the =>
token. For example, the following two calls are equivalent:
For performance reasons, particularly when table functions are used as actual tables in a client like Heavy Immerse, many system table functions in HEAVY.AI automatically "push down" filters on certain output columns in the query onto the inputs. For example, if a table does some computation over an x
and y
range such that x
and y
are in both the input and output for the table function, filter push-down would likely be enabled so that a query like the following would automatically push down the filter on the x and y outputs to the x and y inputs. This potentially increases query performance significantly.
To determine whether filter push-down is used, you can check the Boolean value of the filter_table_transpose
column from the query:
Currently for system table functions, you cannot change push-down behavior.
You can query which table functions are available using SHOW TABLE FUNCTIONS
:
Information about the expected input and output argument names and types, as well as other info such as whether the function can run on CPU, GPU or both, and whether filter push-down is enabled, can be queried via SHOW TABLE FUNCTIONS DETAILS <table_function_name
>;
The following system table functions are available in HEAVY.AI. The table provides a summary and links to more inforamation about each function.
For information about the HeavyRF radio frequency propagation simulation and HeavyRF table functions, see HeavyRF.
The TABLE
command is required to wrap a table function clause; for example:
select * from TABLE(generate_series(1, 10));
The CURSOR
command is required to wrap any subquery inputs.
Given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity. The score is computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.
Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, with taking the maximum z value across all points in each bin as the output value for the bin. The aggregate performed to compute the value for each bin is specified by agg_type
, with allowed aggregate types of AVG
, COUNT
, SUM
, MIN
, and MAX
. If neighborhood_fill_radius
is set greater than 0, a blur pass/kernel will be computed on top of the results according to the optionally-specified fill_agg_type
, with allowed types of GAUSS_AVG, BOX_AVG
, COUNT
, SUM
, MIN
, and MAX
(if not specified, defaults to GAUSS_AVG
, or a Gaussian-average kernel). if fill_only_nulls
is set to true, only null bins from the first aggregate step will have final output values computed from the blur pass, otherwise if false all values will be affected by the blur pass.
Note that the arguments to bound the spatial output grid (x_min, x_max, y_min, y_max) are optional, however either all or none of these arguments must be supplied. If the arguments are not supplied, the bounds of the spatial output grid will be bounded by the x/y range of the input query, and if SQL filters are applied on the output of the tf_geo_rasterize
table function, these filters will also constrain the output range.
Example
HEAVY.AI supports a subset of object types and functions for storing and writing queries for geospatial definitions.
For information about geospatial datatype sizes, see Storage and Compression in Datatypes.
For more information on WKT primitives, see Wikipedia: Well-known Text: Geometric objects.
HEAVY.AI supports SRID 4326 (WGS 84) and 900913 (Google Web Mercator), and 32601-32660,32701-32760 (Universal Transverse Mercator (UTM) Zones). When using geospatial fields, you set the SRID to determine which reference system to use. HEAVY.AI does not assign a default SRID.
If you do not set the SRID of the geo field in the table, you can set it in a SQL query using ST_SETSRID(column_name, SRID)
. For example, ST_SETSRID(a.pt,4326)
.
When representing longitude and latitude, the first coordinate is assumed to be longitude in HEAVY.AI geospatial primitives.
You create geospatial objects as geometries (planar spatial data types), which are supported by the planar geometry engine at run time. When you call ST_DISTANCE
on two geometry objects, the engine returns the shortest straight-line planar distance, in degrees, between those points. For example, the following query returns the shortest distance between the point(s) in p1
and the polygon(s) in poly1
:
For information about importing data, see Importing Geospatial Data.
Geospatial functions that expect geospatial object arguments accept geospatial columns, geospatial objects returned by other functions, or string literals containing WKT representations of geospatial objects. Supplying a WKT string is equivalent to calling a geometry constructor. For example, these two queries are identical:
You can create geospatial literals with a specific SRID. For example:
HEAVY.AI provides support for geography objects and geodesic distance calculations, with some limitations.
HeavyDB supports import from any coordinate system supported by the Geospatial Data Abstraction Library (GDAL). On import, HeavyDB will convert to and store in WGS84 encoding, and rendering is accurate in Immerse.
However, no built-in way to reference the original coordinates currently exists in Immerse, and coordinates exported from Immerse will be WGS84 coordinates. You can work around this limitation by adding to the dataset a column or columns in non-geo format that could be included for display in Immerse (for example, in a popup) or on export.
Currently, HEAVY.AI supports spheroidal distance calculation between:
Two points using either SRID 4326 or 900913.
A point and a polygon/multipolygon using SRID 900913.
Using SRID 900913 results in variance compared to SRID 4326 as polygons approach the North and South Poles.
The following query returns the points and polygons within 1,000 meters of each other:
See the tables in Geospatial Functions below for examples.
HEAVY.AI supports the functions listed.
FunctionDescription Special processing is automatically applied to WGS84 input geometries (SRID=4326) to limit buffer distortion:
Implementation first determines the best planar SRID to which to project the 4326 input geometry.
Preferred SRIDs are UTM and Lambert (LAEA) North/South zones, with Mercator used as a fallback.
Buffer distance is interpreted as distance in meters (units of all planar SRIDs being considered).
The input geometry is transformed to the best planar SRID and handed to GEOS, along with buffer distance.
The buffer geometry built by GEOS is then transformed back to SRID=4326 and returned.
Example: Build 10-meter buffer geometries (SRID=4326) with limited distortion:SELECT ST_Buffer(poly4326, 10.0) FROM tbl;
.ST_Centroid
Computes the geometric center of a geometry as a POINT.
You can use SQL code similar to the examples in this topic as global filters in Immerse.
CREATE TABLE AS SELECT
is not currently supported for geo data types in distributed mode.
GROUP BY
is not supported for geo types (POINT
, MULTIPOINT
, LINESTRING
, MULTILINESTRING
, POLYGON
, or MULTIPOLYGON
.
You can use \d table_name
to determine if the SRID is set for the geo field:
If no SRID is returned, you can set the SRID using ST_SETSRID(column_name, SRID)
. For example, ST_SETSRID(myPoint, 4326)
.
Uber H3 is an open-source geospatial system created by Uber Technologies . H3 provides a hierarchical grid system that divides the Earth's surface into hexagons of varying sizes, allowing for easy location-based indexing, search, and analysis.
Hexagons can be created at a single scale, for instance to fill an arbitrary polygon at one resolution (see below). They can also be used to generate a much-smaller number of hexagons at multiple scales. In general, operating on h3 hexagons is much faster than on raw arbitrary geometries, at a cost of some precision. Because each hexagon is exactly the same size, this is particularly advantageous for GPU-accelerated workflows.
A principal advantage of the system is that for a given scale, hexagons are approximately-equal area. This stands in contrast to other subdivision schemes based on longitudes and latitudes or web Mercator map projections.
A second advantage is that with hexagons, neighbors in all directions are equidistant. This is not true for rectangular subdivisions like pixels, whose 8 neighbors are at different distances.
The exact amount of precision lost can be tightly bounded, with the smallest sized hexagons supported being about 1m2. That’s more accurate than most current available data sources, short of survey data.
There are some disadvantages to be aware of. The first is that the world can not actually be divided up completely cleanly into hexagons. It turns out that a few pentagons are needed, and this introduces discontinuities. However the system has cleverly placed those pentagons far away from any land masses, so this is only practically a concern for specific maritime operations.
The second issue is that hexagons at adjacent scales do not nest exactly:
This doesn’t much affect practical operations at any single given scale. But if you look carefully at the California multiscale plot above you will discover tiny discontinuities in the form of gaps or overlaps. These don’t amount to a large percentage of the total area, but nonetheless mean this method is not appropriate when exact counts are required.
Encodes columnar point geometry into a globally-unique h3 cell ID for the specified h3_scale. Scales run from 0 to 15 inclusive, where 0 represents the coarsest resolution and 15 the finest. For details on h3 scales, please see the base library documentation.
This can be applied to literal values:
Or to columnar geographic points:
Note that if you have geographic point data rather than columnar latitude and longitude, you can use the ST_X and ST_Y functions. Also, if you wish to encode the centroids of polygons, such as for building footprints, you can combine this with the ST_CENTROID function.
To retrieve geometric coordinates from an H3 code, two functions are available.
h3ToLat and h3ToLon extract the latitude and longitude respectively, for example:
Given an H3 code, the function h3ToParent is available to find cells above that cell at any hierarchical level. This means that once codes are computed at high resolution, they can be compared to codes at other scales.
Uber's h3 python library provides a wider range of functions than those available above (although at significantly slower performance). The library defaults to generating h3 codes as hexadecimal strings, but can be configured to support BIGINT codes. Please see Uber's documentation for details.
H3 codes can be used in regular joins, including joins in Immerse. They can also be used as aggregators, such as in Immerse custom dimensions. For points which are exactly aligned, such as imports from raster data bands of the same source, aggregating on H3 codes is faster than the exact geographic overlaps function ST_EQUALS
Functions and Operators (DML)
Parenthesization
Multiplication and division
Addition and subtraction
Usage Notes
The following wildcard characters are supported by LIKE
and ILIKE
:
%
matches any number of characters, including zero characters.
_
matches exactly one character.
Supported date_part types:
Supported interval types:
For two-digit years, years 69-99 are assumed to be previous century (for example, 1969), and 0-68 are assumed to be current century (for example, 2016).
For four-digit years, negative years (BC) are not supported.
Hours are expressed in 24-hour format.
When time components are separated by colons, you can write them as one or two digits.
Months are case insensitive. You can spell them out or abbreviate to three characters.
For timestamps, decimal seconds are ignored. Time zone offsets are written as +/-HHMM.
For timestamps, a numeric string is converted to +/- seconds since January 1, 1970. Supported timestamps range from -30610224000 (January 1, 1000) through 29379456000 (December 31, 2900).
On output, dates are formatted as YYYY-MM-DD. Times are formatted as HH:MM:SS.
Linux EPOCH values range from -30610224000 (1/1/1000) through 185542587100800 (1/1/5885487). Complete range in years: +/-5,883,517 around epoch.
Both double-precision (standard) and single-precision floating point statistical functions are provided. Single-precision functions run faster on GPUs but might cause overflow errors.
COUNT(DISTINCT
x
)
, especially when used in conjunction with GROUP BY, can require a very large amount of memory to keep track of all distinct values in large tables with large cardinalities. To avoid this large overhead, use APPROX_COUNT_DISTINCT.
APPROX_COUNT_DISTINCT(
x
,
e
)
gives an approximate count of the value x, based on an expected error rate defined in e. The error rate is an integer value from 1 to 100. The lower the value of e, the higher the precision, and the higher the memory cost. Select a value for e based on the level of precision required. On large tables with large cardinalities, consider using APPROX_COUNT_DISTINCT
when possible to preserve memory. When data cardinalities permit, OmniSci uses the precise implementation of COUNT(DISTINCT
x
)
for APPROX_COUNT_DISTINCT
. Set the default error rate using the -hll-precision-bits
configuration parameter.
The accuracy of APPROX_MEDIAN (
x
)
upon the distribution of data. For example:
For 100,000,000 integers (1, 2, 3, ... 100M) in random order, APPROX_MEDIAN can provide a highly accurate answer 5+ significant digits.
For 100,000,001 integers, where 50,000,000 have value of 0 and 50,000,001 have value of 1, APPROX_MEDIAN returns a value close to 0.5, even though the median is 1.
Currently, OmniSci does not support grouping by non-dictionary-encoded strings. However, with the SAMPLE
aggregate function, you can select non-dictionary-encoded strings that are presumed to be unique in a group. For example:
If the aggregated column (user_description in the example above) is not unique within a group, SAMPLE
selects a value that might be nondeterministic because of the parallel nature of OmniSci query execution.
You can create your own C++ functions and use them in your SQL queries.
User-defined Functions (UDFs) require clang++ version 9. You can verify the version installed using the command clang++ --version
.
UDFs currently allow any authenticated user to register and execute a runtime function. By default, runtime UDFs are globally disabled but can be enabled with the runtime flag enable-runtime-udf
.
Create your function and save it in a .cpp file; for example, /var/lib/omnisci/udf_myFunction.cpp.
Add the UDF configuration flag to omnisci.conf. For example:
Use your function in a SQL query. For example:
This function, udf_diff.cpp, returns the difference of two values from a table.
Include the standard integer library, which supports the following datatypes:
bool
int8_t (cstdint), char
int16_t (cstdint), short
int32_t (cstdint), int
int64_t (cstdint), size_t
float
double
void
The next four lines are boilerplate code that allows OmniSci to determine whether the server is running with GPUs. OmniSci chooses whether it should compile the function inline to achieve the best possible performance.
The next line is the actual user-defined function, which returns the difference between INTEGER values x and y.
To run the udf_diff
function, add this line to your /var/lib/omnisci/omnisci.conf file (in this example, the .cpp file is stored at /var/lib/omnisci/udf_diff.cpp):
Restart the OmniSci server.
Use your command from an OmniSci SQL client to query, for example, a table named myTable that contains the INTEGER columns myInt1
and myInt2
.
OmniSci returns the difference as an INTEGER value.
Given a distance-weighted directed graph, consisting of a queryCURSOR
input consisting of the starting and ending node for each edge and a distance, and a specified origin and destination node, tf_graph_shortest_path
computes the shortest distance-weighted path through the graph between origin_node
and destination_node
, returning a row for each node along the computed shortest path, with the traversal-ordered index of that node and the cumulative distance from the origin_node
to that node. If either origin_node
or destination_node
do not exist, an error is returned.
Input Arguments
Parameter | Description | Data Types |
---|---|---|
Output Columns
Name | Description | Data Types |
---|---|---|
Example A
Example B
Similar to tf_geo_rasterize
, but also computes the slope and aspect per output bin.
Aggregates point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type
) across all points in each bin as the output value for the bin. A Gaussian average is then taken over the neighboring bins, with the number of bins specified by neighborhood_fill_radius
, optionally only filling in null-valued bins if fill_only_nulls
is set to true. The slope and aspect is then computed for every bin, based on the z values of that bin and its neighboring bins. The slope can be returned in degrees or as a fraction between 0 and 1, depending on the boolean argument to compute_slope_in_degrees
.
Note that the bounds of the spatial output grid will be bounded by the x/y range of the input query, and if SQL filters are applied on the output of the tf_geo_rasterize_slope
table function, these filters will also constrain the output range.
Parameter | Description | Data Types |
---|---|---|
Example
Given a distance-weighted directed graph, consisting of a queryCURSOR
input consisting of the starting and ending node for each edge and a distance, and a specified origin node, tf_graph_shortest_paths_distances
computes the shortest distance-weighted path distance between the origin_node
and every other node in the graph. It returns a row for each node in the graph, with output columns consisting of the input origin_node
, the given destination_node
, the distance for the shortest path between the two nodes, and the number of edges or graph "hops" between the two nodes. If origin_node
does not exist in the node1
column of the edge_list
CURSOR
, an error is returned.
Input Arguments
Parameter | Description | Data Types |
---|---|---|
Output Columns
Name | Description | Data Types |
---|---|---|
Example A
Example B
Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type
) across all points in each bin as the output value for the bin. A Gaussian average is then taken over the neighboring bins, with the number of bins specified by neighborhood_fill_radius
, optionally only filling in null-valued bins if fill_only_nulls
is set to true.
The graph shortest path is then computed between an origin point on the grid specified by origin_x
and origin_y
and a destination point on the grid specified by destination_x
and destination_y
, where the shortest path is weighted by the nth exponent of the computed slope between a bin and its neighbors, with the nth exponent being specified by slope_weighted_exponent
. A max allowed traversable slope can be specified by slope_pct_max
, such that no traversal is considered or allowed between bins with absolute computed slopes greater than the percentage specified by slope_pct_max
.
Input Arguments
Output Columns
Loads one or more las
or laz
point cloud/LiDAR files from a local file or directory source, optionally tranforming the output SRID to out_srs
(if not specified, output points are automatically transformed to EPSG:4326 lon/lat pairs).
If use_cache
is set to true
, an internal point cloud-specific cache will be used to hold the results per input file, and if queried again will significantly speed up the query time, allowing for interactive querying of a point cloud source. If the results of tf_load_point_cloud
will only be consumed once (for example, as part of a CREATE TABLE
statement), it is highly recommended that use_cache
is set to false
or left unspecified (as it is defaulted to false
) to avoid the performance and memory overhead incurred by used of the cache.
The bounds of the data retrieved can be optionally specified with the x_min
, x_max
, y_min
, y_max
arguments. These arguments can be useful when the user desires to retrieve a small geographic area from a large point-cloud file set, as files containing data outside the bounds of the specified bounding box will be quickly skipped by tf_load_point_cloud
, only requiring a quick read of the spatial metadata for the file.
Input Arguments
Parameter | Description | Data Types |
---|
Output Columns
Example A
Example B
Returns metadata for one or more las
or laz
point cloud/LiDAR files from a local file or directory source, optionally constraining the bounding box for metadata retrieved to the lon/lat bounding box specified by the x_min
, x_max
, y_min
, y_max
arguments.
Note: specified path must be contained in global allowed-import-paths
, otherwise an error will be returned.
Input Arguments
Parameter | Description | Data Types |
---|
Output Columns
Name | Description | Data Types |
---|
Example
Window functions allow you to work with a subset of rows related to the currently selected row. For a given dimension, you can find the most associated dimension by some other measure (for example, number of records or sum of revenue).
Window functions must always contain an OVER clause. The OVER clause splits up the rows of the query for processing by the window function.
The PARTITION BY list divides the rows into groups that share the same values of the PARTITION BY expression(s). For each row, the window function is computed using all rows in the same partition as the current row.
Rows that have the same value in the ORDER BY clause are considered peers. The ranking functions give the same answer for any two peer rows.
HeavyDB supports the aggregate functions AVG
, MIN
, MAX
, SUM
, and COUNT
in window functions.
Updates on window functions are supported, assuming the target table is single-fragment. Updates on multi-fragment target tables are not currently supported.
This query shows the top airline carrier for each state, based on the number of departures.
A window function can include a frame clause that specifies a set of neighboring rows of the current row belonging to the same partition. This allows us to compute a window aggregate function over the window frame, instead of computing it against the entire partition. Note that a window frame for the current row is computed based on either 1) the number of rows before or after the current row (called rows mode) or 2) the specified ordering column value in the frame clause (called range mode).
For example:
From the starting row of the partition to the current row: Using the sum
aggregate function, you can compute the running sum of the partition.
You can construct a frame based on the position of the rows (called rows mode): For example, a row before 3 rows and after 2 rows:
You can compute the aggregate function of the frame having up to six rows (including the current row).
You can organize a frame based on the value of the ordering column (called range mode): Assuming C as the current ordering column value, we can compute aggregate value of the window frame which contains rows having ordering column values between (C - 3) and (C + 2).
Window functions that ignore the frame are evaluated on the entire partition.
Note that we can define the window frame clause using rows mode with an ordering column.
You can use the following aggregate functions with the window frame clause.
<frame_mode>
| <frame_bound>
<frame_mode>
can be one of the following:
rows
range
1 | 2 | 3 | 4 | 5.5 | 7.5 | 8 | 9 | 10 → value of a each tuple’s order by expression.
When the current row has a value 5.5:
ROWS BETWEEN 3 PRECEDING and 3 FOLLOWING : 3 rows before and 3 rows after → {2, 3, 4, 5.5, 7.5, 8, 9 }
RANGE BETWEEN 3 PRECEDING and 3 FOLLOWING: 5.5 - 3 <= x <= 5.5 + 3 → { 3, 4, 5.5, 8 }
<frame_bound
>:
frame_start or
frame_between: between frame_start and frame_end
frame_start and frame_end can be one of the following:
UNBOUNDED PRECEDING: The start row of the partition that the current row belongs to.
UNBOUNDED FOLLOWING: The end row of the partition that the current row belongs to.
CURRENT ROW
For rows mode: the current row.
For range mode: the peers of the current row. A peer is a row having the same value as the ordering column expression of the current row. Note that all null values are peers of each other.
expr PRECEDING
For rows mode: expr row before the current row.
For range mode: rows with the current row’s ordering expression value minus expr.
For DATE, TIME, and TIMESTAMP: Use the INTERVAL keyword with a specific time unit, depending on a data type:
TIMESTAMP type: NANOSECOND, MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR, DAY, MONTH, and YEAR
TIME type: SECOND, MINUTE, and HOUR
DATE type: DAY, MONTH, and YEAR
For example:
RANGE BETWEEN INTERVAL 1 DAY PRECEDING and INTERVAL 3 DAY FOLLOWING
Currently, only literal expressions as expr such as 1 PRECEDING and 100 PRECEDING are supported.
expr FOLLOWING
For rows mode: expr row after the current row.
For range mode: rows with the current row’s ordering expression value plus expr.
For DATE, TIME, and TIMESTAMP: Use the INTERVAL keyword with a specific time unit, depending on a data type:
TIMESTAMP type: NANOSECOND, MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR, DAY, MONTH, and YEAR
TIME type: SECOND, MINUTE, and HOUR
DATE type: DAY, MONTH, and YEAR
For example:
RANGE BETWEEN INTERVAL 1 DAY PRECEDING and INTERVAL 3 DAY FOLLOWING
Currently, only support literal expression as expr such as 1 FOLLOWING and 100 FOLLOWING are supported.
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING have the same meaning in both rows and range mode.
When the query has no window frame bound, the window aggregate function is computed differently depending on the existence of the ORDER BY clause:
Has ORDER BY clause: The window function is computed with the default frame bound, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
No Order BY clause: The window function is computed over the entire partition.
You can refer to the same window clause in multiple window aggregate functions by defining it with a unique name in the query definition.
For example, you can define the named window clauses W1 and W2 as follows:
Named window function clause w1
refers to a window function clause without a window frame clause, and w2
refers to a named window frame clause.
To use window framing, you may need an ORDER BY clause in the window definition. Depending on the framing mode used, the constraint varies:
Row mode: no restriction of the existence of the ordering column. It also can include multiple ordering columns.
Range mode: only a single ordering column is required (not multi-column ordering).
Currently, all window functions including aggregation over window frame are computed via CPU-mode.
For window frame bound expressions, only non-negative integer literals are supported.
GROUPING mode and EXCLUDING are not currently supported.
Computes the over the complex domain [x_min
, x_max
), [y_min
, y_max
), discretizing the xy-space into an output of dimensions x_pixels
X y_pixels
. The output for each cell is the number of iterations needed to escape to infinity, up to and including the specified max_iterations
.
Parameter | Data Type |
---|
Example
Process a raster input to derive contour lines or regions and output as LINESTRING or POLYGON for rendering or further processing. Each has two variants:
One that re-rasterizes the input points ()
One which accepts raw raster points directly ()
Use the rasterizing variants if the raster table rows are not already sorted in row-major order (for example, if they represent an arbitrary 2D point cloud), or if filtering or binning is required to reduce the input data to a manageable count (to speed up the contour processing) or to smooth the input data before contour processing. If the input rows do not already form a rectilinear region, the output region will be their 2D bounding box. Many of the parameters of the rasterizing variant are directly equivalent to those of ; see that function for details.
The direct variants require that the input rows represent a rectilinear region of pixels in nonsparse row-major order. The dimensions must also be provided, and (raster_width * raster_height) must match the input row count. The contour processing is then performed directly on the raster values with no preprocessing.
The line variants generate LINESTRING geometries that represent the contour lines of the raster space at the given interval with the optional given offset. For example, a raster space representing a height field with a range of 0.0 to 1000.0 will likely result in 10 or 11 lines, each with a corresponding contour_values
value, 0.0, 100.0, 200.0 etc. If contour_offset
is set to 50.0, then the lines are generated at 50.0, 150.0, 250.0, and so on. The lines can be open or closed and can form rings or terminate at the edges of the raster space.
The polygon variants generate POLYGON geometries that represent regions between contour lines (for example from 0.0 to 100.0), and from 100.0 to 200.0. If the raster space has multiple regions with that value range, then a POLYGON row is output for each of those regions. The corresponding contour_values
value for each is the lower bound of the range for that region.
Following is a list of HEAVY.AI keywords.
Parameter | Description | Data Type |
---|---|---|
Name | Description | Data Type |
---|---|---|
Output Header | Output Details |
---|---|
Function | Purpose |
---|---|
Parameter | Description | Data Type |
---|---|---|
Name | Description | Data Types |
---|---|---|
Parameter | Description | Data Types |
---|---|---|
Name | Description | Data Types |
---|---|---|
Parameter | Description | Data Types |
---|---|---|
Name | Description | Output Types |
---|---|---|
Function | Description |
---|---|
Function | Description |
---|---|
Name | Description | Data Types |
---|---|---|
Parameter | Description | Data Types |
---|
Name | Description | Data Types |
---|
Category | Supported Functions |
---|
Parameter | Data Type |
---|
Parameter | Data Type |
---|
Parameter | Data Type |
---|
Parameter | Description | Data Types |
---|
Name | Description | Data Types |
---|
<series_start>
Starting integer value, inclusive.
BIGINT
<series_end>
Ending integer value, inclusive.
BIGINT
<series_step> (optional, defaults to 1)
Increment to increase or decrease and values that follow. Integer.
BIGINT
generate_series
The integer series specified by the input arguments.
Column<BIGINT>
series_start
Starting timestamp value, inclusive.
TIMESTAMP(9) (Timestamp literals with other precisions will be auto-casted to TIMESTAMP(9) )
series_end
Ending timestamp value, inclusive.
TIMESTAMP(9) (Timestamp literals with other precisions will be auto-casted to TIMESTAMP(9) )
series_step
Time/Date interval signifying step between each element in the returned series.
INTERVAL
generate_series
The timestamp series specified by the input arguments.
COLUMN<TIMESTAMP(9)>
Type
Size
Example
LINESTRING
Variable
A sequence of 2 or more points and the lines that connect them. For example: LINESTRING(0 0,1 1,1 2)
MULTIPOLYGON
Variable
A set of one or more polygons. For example:MULTIPOLYGON(((0 0,4 0,4 4,0 4,0 0),(1 1,2 1,2 2,1 2,1 1)), ((-1 -1,-1 -2,-2 -2,-2 -1,-1 -1)))
POINT
Variable
A point described by two coordinates. When the coordinates are longitude and latitude, HEAVY.AI stores longitude first, and then latitude. For example: POINT(0 0)
POLYGON
Variable
A set of one or more rings (closed line strings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings). For example: POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1))
MULTIPOINT
Variable
A set of one or more points. For example: MULTIPOINT((0 0), (1 1), (2 2))
MULTILINESTRING
Variable
A set of one or more associated lines, each of two or more points. For example: MULTILINESTRING((0 0, 1 0, 2 0), (0 1, 1 1, 2 1))
Function
Description
ST_Centroid
Computes the geometric center of a geometry as a POINT.
ST_GeomFromText(WKT)
Return a specified geometry value from Well-known Text representation.
ST_GeomFromText(WKT, SRID)
Return a specified geometry value from Well-known Text representation and an SRID.
ST_GeogFromText(WKT)
Return a specified geography value from Well-known Text representation.
ST_GeogFromText(WKT, SRID)
Return a specified geography value from Well-known Text representation and an SRID.
ST_Point(double lon, double lat)
Return a point constructed on the fly from the provided coordinate values. Constant coordinates result in construction of a POINT literal.
Example: ST_Contains(poly4326, ST_SetSRID(ST_Point(lon, lat), 4326))
ST_AsText(geom) | ST_AsWKT(geom)
Converts a geometry input to a Well-Known-Text (WKT) string
ST_AsBinary(geom) | ST_AsWKB(geom)
Converts a geometry input to a Well-Known-Binary (WKB) string
ST_Buffer
Returns a geometry covering all points within a specified distance from the input geometry. Performed by the GEOS module. The output is currently limited to the MULTIPOLYGON type.
Calculations are in the units of the input geometry’s SRID. Buffer distance is expressed in the same units. Example:
SELECT ST_Buffer('LINESTRING(0 0, 10 0, 10 10)', 1.0);
Special processing is automatically applied to WGS84 input geometries (SRID=4326) to limit buffer distortion:
Implementation first determines the best planar SRID to which to project the 4326 input geometry.
Preferred SRIDs are UTM and Lambert (LAEA) North/South zones, with Mercator used as a fallback.
Buffer distance is interpreted as distance in meters (units of all planar SRIDs being considered).
The input geometry is transformed to the best planar SRID and handed to GEOS, along with buffer distance.
The buffer geometry built by GEOS is then transformed back to SRID=4326 and returned.
Example: Build 10-meter buffer geometries (SRID=4326) with limited distortion:
SELECT ST_Buffer(poly4326, 10.0) FROM tbl;
ST_Centroid
Computes the geometric center of a geometry as a POINT.
Function
Description
ST_TRANSFORM
Returns a geometry with its coordinates transformed to a different spatial reference. Currently, WGS84 to Web Mercator transform is supported. For example:ST_DISTANCE(
ST_TRANSFORM(ST_GeomFromText('POINT(-71.064544 42.28787)', 4326), 900913),
ST_GeomFromText('POINT(-13189665.9329505 3960189.38265416)', 900913)
)
ST_TRANSFORM
is not currently supported in projections. It can be used only to transform geo inputs to other functions, such as ST_DISTANCE.
ST_SETSRID
Set the SRID to a specific integer value. For example:
ST_TRANSFORM(
ST_SETSRID(ST_GeomFromText('POINT(-71.064544 42.28787)'), 4326), 900913 )
Function
Description
ST_X
Returns the X value from a POINT column.
ST_Y
Returns the Y value from a POINT column.
ST_XMIN
Returns X minima of a geometry.
ST_XMAX
Returns X maxima of a geometry.
ST_YMIN
Returns Y minima of a geometry.
ST_YMAX
Returns Y maxima of a geometry.
ST_STARTPOINT
Returns the first point of a LINESTRING as a POINT.
ST_ENDPOINT
Returns the last point of a LINESTRING as a POINT.
ST_POINTN
Return the Nth point of a LINESTRING as a POINT.
ST_NPOINTS
Returns the number of points in a geometry.
ST_NRINGS
Returns the number of rings in a POLYGON or a MULTIPOLYGON.
ST_SRID
Returns the spatial reference identifier for the underlying object.
ST_NUMGEOMETRIES
Returns the MULTI count of MULTIPOINT, MULTILINESTRING or MULTIPOLYGON. Returns 1 for non-MULTI geometry.
Function
Description
ST_INTERSECTION
Returns a geometry representing an intersection of two geometries; that is, the section that is shared between the two input geometries. Performed by the GEOS module.
The output is currently limited to MULTIPOLYGON type, because HEAVY.AI does not support mixed geometry types within a geometry column, and ST_INTERSECTION
can potentially return points, lines, and polygons from a single intersection operation.
Lower-dimension intersecting features such as points and line strings are returned as very small buffers around those features. If needed, true points can be recovered by applying the ST_CENTROID method to point intersection results. In addition, ST_PERIMETER/2 of resulting line intersection polygons can be used to approximate line length.
Empty/NULL geometry outputs are not currently supported.
Examples:
SELECT ST_Intersection('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))');
SELECT ST_Area(ST_Intersection(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;
ST_DIFFERENCE
Returns a geometry representing the portion of the first input geometry that does not intersect with the second input geometry. Performed by the GEOS module. Input order is important; the return geometry is always a section of the first input geometry.
The output is currently limited to MULTIPOLYGON type, for the same reasons described in ST_INTERSECTION
. Similar post-processing methods can be applied if needed.
Empty/NULL geometry outputs are not currently supported.
Examples:
SELECT ST_Difference('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))');
SELECT ST_Area(ST_Difference(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;
ST_UNION
Returns a geometry representing the union (or combination) of the two input geometries. Performed by the GEOS module.
The output is currently limited to MULTIPOLYGON type for the same reasons described in ST_INTERSECTION
. Similar post-processing methods can be applied if needed.
Empty/NULL geometry outputs are not currently supported.
Examples:
SELECT ST_UNION('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))');
SELECT ST_AREA(ST_UNION(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;
Function
Description
ST_DISTANCE
Returns shortest planar distance between geometries. For example:
ST_DISTANCE(poly1, ST_GeomFromText('POINT(0 0)'))
Returns shortest geodesic distance between two points, in meters, if given two point geographies. Point geographies can be specified through casts from point geometries or as literals. For example:
ST_DISTANCE(
CastToGeography(p2),
ST_GeogFromText('POINT(2.5559 49.0083)', 4326)
)
SELECT a.name,
ST_DISTANCE(
CAST(a.pt AS GEOGRAPHY),
CAST(b.pt AS GEOGRAPHY)
) AS dist_meters
FROM starting_point a, destination_points b;
You can also calculate the distance between a POLYGON and a POINT. If both fields use SRID 4326, then the calculated distance is in 4326 units (degrees). If both fields use SRID 4326, and both are transformed into 900913, then the results are in 900913 units (meters).
The following SQL code returns the names of polygons where the distance between the point and polygon is less than 1,000 meters.
SELECT a.poly_name FROM poly a, point b WHERE ST_DISTANCE(
ST_TRANSFORM(b.location,900913),
ST_TRANSFORM(a.heavyai_geo,900913)
) < 1000;
ST_EQUALS
Returns TRUE if the first input geometry and the second input geometry are spatially equal; that is, they occupy the same space. Different orderings of points can be accepted as equal if they represent the same geometry structure.
POINTs comparison is performed natively. All other geometry comparisons are performed by GEOS.
If input geometries are both uncompressed or compressed, all comparisons to identify equality are precise. For mixed combinations, the comparisons are performed with a compression-specific tolerance that allows recognition of equality despite subtle precision losses that the compression may introduce. Note: Geo columns and literals with SRID=4326
are compressed by default.
Examples:
SELECT COUNT(*) FROM tbl WHERE ST_EQUALS('POINT(2 2)', pt);
SELECT ST_EQUALS('POLYGON ((0 0,1 0,0 1))', 'POLYGON ((0 0,0 0.5,0 1,1 0,0 0))');
ST_MAXDISTANCE
Returns longest planar distance between geometries. In effect, this is the diameter of a circle that encloses both geometries.For example:
Currently supported variants:
ST_CONTAINS
Returns true if the first geometry object contains the second object. For example:
You can also use ST_CONTAINS
to:
Return the count of polys that contain the point (here as WKT):
SELECT count(*) FROM geo1 WHERE ST_CONTAINS(poly1, 'POINT(0 0)');
Return names from a polys table that contain points in a points table:
SELECT a.name FROM polys a, points b WHERE ST_CONTAINS(a.heavyai_geo, b.location);
Return names from a polys table that contain points in a points table, using a single point in WKT instead of a field in another table:
SELECT name FROM poly WHERE ST_CONTAINS(
heavyai_geo, ST_GeomFromText('POINT(-98.4886935 29.4260508)', 4326)
);
ST_INTERSECTS
Returns true if two geometries intersect spatially, false if they do not share space. For example:
SELECT ST_INTERSECTS(
'POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))',
'POINT(1 1)'
) FROM tbl;
ST_AREA
Returns the area of planar areas covered by POLYGON and MULTIPOLYGON geometries. For example:
SELECT ST_AREA(
'POLYGON((1 0, 0 1, -1 0, 0 -1, 1 0),(0.1 0, 0 0.1, -0.1 0, 0 -0.1, 0.1 0))'
) FROM tbl;
ST_AREA
does not support calculation of geographic areas, but rather uses planar coordinates. Geographies must first be projected in order to use ST_AREA
. You can do this ahead of time before import or at runtime, ideally using an equal area projection (for example, a national equal-area Lambert projection). The area is calculated in the projection's units. For example, you might use Web Mercator runtime projection to get the area of a polygon in square meters:
ST_AREA(
ST_TRANSFORM(
ST_GeomFromText(
'POLYGON((-76.6168198439371 39.9703199555959,
-80.5189990254673 40.6493554919257,
-82.5189990254673 42.6493554919257,
-76.6168198439371 39.9703199555959)
)', 4326
),
900913)
)
<code></code>
Web Mercator is not an equal area projection, however. Unless compensated by a scaling factor, Web Mercator areas can vary considerably by latitude.
ST_PERIMETER
Returns the cartesian perimeter of POLYGON and MULTIPOLYGON geometries. For example:
SELECT ST_PERIMETER('POLYGON(
(1 0, 0 1, -1 0, 0 -1, 1 0),
(0.1 0, 0 0.1, -0.1 0, 0 -0.1, 0.1 0)
)'
)
from tbl;
It will also return the geodesic perimeter of POLYGON and MULTIPOLYGON geometries. For example:
SELECT ST_PERIMETER(
ST_GeogFromText(
'POLYGON(
(-76.6168198439371 39.9703199555959,
-80.5189990254673 40.6493554919257,
-82.5189990254673 42.6493554919257,
-76.6168198439371 39.9703199555959)
)',
4326)
)
from tbl;
ST_LENGTH
Returns the cartesian length of LINESTRING geometries. For example:
SELECT ST_LENGTH('LINESTRING(1 0, 0 1, -1 0, 0 -1, 1 0)') FROM tbl;
It also returns the geodesic length of LINESTRING geographies. For example:
SELECT ST_LENGTH(
ST_GeogFromText('LINESTRING(
-76.6168198439371 39.9703199555959,
-80.5189990254673 40.6493554919257,
-82.5189990254673 42.6493554919257)',
4326)
) FROM tbl;
ST_WITHIN
Returns true if geometry A is completely within geometry B. For example the following SELECT
statement returns true:
SELECT ST_WITHIN(
'POLYGON ((1 1, 1 2, 2 2, 2 1))',
'POLYGON ((0 0, 0 3, 3 3, 3 0))'
) FROM tbl;
ST_DWITHIN
Returns true if the geometries are within the specified distance of each one another. Distance is specified in units defined by the spatial reference system of the geometries. For example:
SELECT ST_DWITHIN(
'POINT(1 1)',
'LINESTRING (1 2,10 10,3 3)', 2.0
) FROM tbl;
ST_DWITHIN
supports geodesic distances between geographies, currently limited to geographic points. For example, you can check whether Los Angeles and Paris, specified as WGS84 geographic point literals, are within 10,000km of one another.
SELECT ST_DWITHIN(
ST_GeogFromText(
'POINT(-118.4079 33.9434)', 4326),
ST_GeogFromText('POINT(2.5559 49.0083)',
4326 ),
10000000.0) FROM tbl;
ST_DFULLYWITHIN
Returns true if the geometries are fully within the specified distance of one another. Distance is specified in units defined by the spatial reference system of the geometries. For example:
SELECT ST_DFULLYWITHIN(
'POINT(1 1)',
'LINESTRING (1 2,10 10,3 3)',
10.0) FROM tbl;
This function supports:
ST_DFULLYWITHIN(POINT, LINESTRING, distance)
ST_DFULLYWITHIN(LINESTRING, POINT, distance)
ST_DISJOINT
Returns true if the geometries are spatially disjoint (that is, the geometries do not overlap or touch. For example:
SELECT ST_DISJOINT(
'POINT(1 1)',
'LINESTRING (0 0,3 3)'
) FROM tbl;
<num_strings>
The number of strings to randomly generate.
BIGINT
<string_length>
Length of the generated strings.
BIGINT
id
Integer id of output, starting at 0 and increasing monotonically
Column<BIGINT>
rand_str
Random String
Column<TEXT ENCODING DICT>
entity_id
Column containing keys/IDs used to identify the entities for which dwell/session times are to be computed. Examples include IP addresses of clients visiting a website, login IDs of database users, MMSIs of ships, and call signs of airplanes.
Column<TEXT ENCODING DICT | BIGINT>
site_id
Column containing keys/IDs of dwell “sites” or locations that entities visit. Examples include website pages, database session IDs, ports, airport names, or binned h3 hex IDs for geographic location.
Column<TEXT ENCODING DICT | BIGINT>
ts
Column denoting the time at which an event occurred.
Column<TIMESTAMP(0|3|6|0)>
min_dwell_seconds
Constant integer value specifying the minimum number of seconds required between the first and last timestamp-ordered record for an entity_id at a site_id to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3600 (one hour), but only 1800 seconds elapses between an entity’s first and last ordered timestamp records at a site, these records are not considered a valid session and a dwell time for that session is not calculated.
BIGINT (other integer types are automatically casted to BIGINT)
min_dwell_points
A constant integer value specifying the minimum number of successive observations (in ts
timestamp order) required to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3, but only two consecutive records exist for a user at a site before they move to a new site, no dwell time is calculated for the user.
BIGINT (other integer types are automatically casted to BIGINT)
max_inactive_seconds
A constant integer value specifying the maximum time in seconds between two successive observations for an entity at a given site before the current session/dwell time is considered finished and a new session/dwell time is started. For example, if this variable is set to 86400 seconds (one day), and the time gap between two successive records for an entity id at a given site id is 86500 seconds, the session is considered ended at the first timestamp-ordered record, and a new session is started at the timestamp of the second record.
BIGINT (other integer types are automatically casted to BIGINT)
entity_id
The ID of the entity for the output dwell time, identical to the corresponding entity_id
column in the input.
Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the entity_id
input column type)
site_id
The site ID for the output dwell time, identical to the corresponding site_id
column in the input.
Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id
input column type)
prev_site_id
The site ID for the session preceding the current session, which might be a different site_id
, the same site_id
(if successive records for an entity at the same site were split into multiple sessions because the max_inactive_seconds
threshold was exceeded), or null
if the last site_id
visited was null
.
Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id
input column type)
next_site_id
The site id for the session after the current session, which might be a different site_id
, the same site_id
(if successive records for an entity at the same site were split into multiple sessions due to exceeding the max_inactive_seconds
threshold, or null
if the next site_id
visited was null
.
Column<TEXT ENCODING DICT> | Column<BIGINT> (type will be the same as the site_id
input column type)
session_id
An auto-incrementing session ID specific/relative to the current entity_id
, starting from 1 (first session) up to the total number of valid sessions for an entity_id
, such that each valid session dwell time increments the session_id
for an entity by 1.
Column<INT>
start_seq_id
The index of the nth timestamp (ts
-ordered) record for a given entity denoting the start of the current output row's session.
Column<INT>
dwell_time_sec
The duration in seconds for the session.
Column<INT>
num_dwell_points
The number of records/observations constituting the current output row's session.
Column<INT>
name
generate_series
signature
(i64 series_start, i64 series_stop, i64 series_step)
(i64 series_start, i64 series_stop) -> Column
input_names
series_start, series_stop, series_step
series_start, series_stop
input_types
i64
output_names
generate_series
output_types
Column i64
CPU
true
GPU
true
runtime
false
filter_table_transpose
false
Generates random string data.
Generates a series of integer values.
Generates a series of timestamp values from start_timestamp
to end_timestamp
.
Given a query input with entity keys and timestamps, and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session.
Given a query input of entity keys/IDs, a set of feature columns, and a metric column, scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.
Given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity. The score is computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.
Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, with taking the maximum z value across all points in each bin as the output value for the bin. The aggregate performed to compute the value for each bin is specified by agg_type
, with allowed aggregate types of AVG
, COUNT
, SUM
, MIN
, and MAX
.
Similar to tf_geo_rasterize
, but also computes the slope and aspect per output bin. Aggregates point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type
) across all points in each bin as the output value for the bin.
Given a distance-weighted directed graph, consisting of a queryCURSOR
input consisting of the starting and ending node for each edge and a distance, and a specified origin and destination node, computes the shortest distance-weighted path through the graph between origin_node
and destination_node
.
Given a distance-weighted directed graph, consisting of a queryCURSOR
input consisting of the starting and ending node for each edge and a distance, and a specified origin node, computes the shortest distance-weighted path distance between the origin_node
and every other node in the graph.
Loads one or more las
or laz
point cloud/LiDAR files from a local file or directory source, optionally tranforming the output SRID to out_srs
. If not specified, output points are automatically transformed to EPSG:4326 lon/lat pairs).
Computes the Mandelbrot set over the complex domain [x_min
, x_max
), [y_min
, y_max
), discretizing the xy-space into an output of dimensions x_pixels
X y_pixels
.
Returns metadata for one or more las
or laz
point cloud/LiDAR files from a local file or directory source, optionally constraining the bounding box for metadata retrieved to the lon/lat bounding box specified by the x_min
, x_max
, y_min
, y_max
arguments.
Process a raster input to derive contour lines or regions and output as LINESTRING or POLYGON for rendering or further processing.
Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type
) across all points in each bin as the output value for the bin.
Used for generating top-k signals where 'k' represents the maximum number of antennas to consider at each geographic location. The full relevant parameter name is strongest_k_sources_per_terrain_bin.
Taking a set of point elevations and a set of signal source locations as input, tf_rf_prop_max_signal
executes line-of-sight 2.5D RF signal propagation from the provided sources over a binned 2.5D elevation grid derived from the provided point locations, calculating the max signal in dBm at each grid cell, using the formula for free-space power loss.
Operator
Description
+
numeric
Returns numeric
–
numeric
Returns negative value of numeric
numeric1
+
numeric2
Sum of numeric1
and numeric2
numeric1
–
numeric2
Difference of numeric1
and numeric2
numeric1
*
numeric2
Product of numeric1
and numeric2
numeric1
/
numeric2
Quotient (numeric1
divided by numeric2
)
Operator
Description
=
Equals
<>
Not equals
>
Greater than
>=
Greater than or equal to
<
Less than
<=
Less than or equal to
BETWEEN
x
AND
y
Is a value within a range
NOT BETWEEN
x
AND
y
Is a value not within a range
IS NULL
Is a value that is null
IS NOT NULL
Is a value that is not null
NULLIF(
x
,
y
)
Compare expressions x and y. If different, return x. If they are the same, return null
. For example, if a dataset uses ‘NA’ for null
values, you can use this statement to return null
using SELECT NULLIF(field_name,'NA')
.
IS TRUE
True if a value resolves to TRUE.
IS NOT TRUE
True if a value resolves to FALSE.
Function
Description
ABS(
x
)
Returns the absolute value of x
CEIL(
x
)
Returns the smallest integer not less than the argument
DEGREES(
x
)
Converts radians to degrees
EXP(
x
)
Returns the value of e to the power of x
FLOOR(
x
)
Returns the largest integer not greater than the argument
LN(
x
)
Returns the natural logarithm of x
LOG(
x
)
Returns the natural logarithm of x
LOG10(
x
)
Returns the base-10 logarithm of the specified float expression x
MOD(
x,y
)
Returns the remainder of int x divided by int y
PI()
Returns the value of pi
POWER(
x,y
)
Returns the value of x raised to the power of y
RADIANS(
x
)
Converts degrees to radians
ROUND(
x
)
Rounds x to the nearest integer value, but does not change the data type. For example, the double value 4.1 rounds to the double value 4.
ROUND_TO_DIGIT (
x,y
)
Rounds x to y decimal places
SIGN(
x
)
Returns the sign of x as -1, 0, 1 if x is negative, zero, or positive
SQRT(
x
)
Returns the square root of x.
TRUNCATE(
x,y
)
Truncates x to y decimal places
WIDTH_BUCKET(
target,lower-boundary,upper-boundary,bucket-count
)
Define equal-width intervals (buckets) in a range between the lower boundary and the upper boundary, and returns the bucket number to which the target expression is assigned.
target
- A constant, column variable, or general expression for which a bucket number is returned.
lower-boundary
- Lower boundary for the range of values to be partitioned equally.
upper-boundary
- Upper boundary for the range of values to be partitioned equally.
partition_count
- Number of equal-width buckets in the range defined by the lower and upper boundaries.
Expressions can be constants, column variables, or general expressions.
Example Create 10 age buckets of equal size, with lower bound 0 and upper bound 100 ([0,10], [10,20]... [90,100]), and classify the
age of a customer accordingly:
SELECT WIDTH_BUCKET(age, 0, 100, 10) FROM customer;
For example, a customer of age 34 is assigned to bucket 3 ([30,40]) and the function returns the value 3.
Function
Description
ACOS(
x
)
Returns the arc cosine of x
ASIN(
x
)
Returns the arc sine of x
ATAN(
x
)
Returns the arc tangent of x
ATAN2(
y
,
x
)
Returns the arc tangent of (x, y) in the range (-π,π]. Equal to ATAN(y/x)
for x > 0
.
COS(
x
)
Returns the cosine of x
COT(
x
)
Returns the cotangent of x
SIN(
x
)
Returns the sine of x
TAN(
x
)
Returns the tangent of x
Function
Description
DISTANCE_IN_METERS(
fromLon
,
fromLat
,
toLon
,
toLat
)
Calculates distance in meters between two WGS84 positions.
CONV_4326_900913_X(
x
)
Converts WGS84 latitude to WGS84 Web Mercator x coordinate.
CONV_4326_900913_Y(
y
)
Converts WGS84 longitude to WGS84 Web Mercator y coordinate.
Function
Description
BASE64_DECODE(
str
)
Decodes a BASE64-encoded string.
BASE64_ENCODE(
str
)
Encodes a string to a BASE64-encoded string.
CHAR_LENGTH(
str
)
Returns the number of characters in a string. Only works with unencoded fields (ENCODING set to none
).
str1
|| str2
[ || str3
... ]
Returns the string that results from concatenating the strings specified. Note that numeric, date, timestamp, and time types will be implicitly casted to strings as necessary, so explicit casts of non-string types to string types is not required for inputs to the concatenation operator.
Note that concatenating a variable string with a string literal, i.e. county_name || ' County'
is significantly more performant than concatenating two or more variable strings, i.e. county_name || ', ' || state_name. Hence for
for multi-variable string concatenation, it is recommended to use an update statement to materialize the concatenated output rather than performing it inline when such operations are expected to be routinely repeated.
ENCODE_TEXT(
none_encoded_str
)
Converts a none-encoded string to a transient dictionary-encoded string to allow for operations like group-by on top. When the watchdog is enabled, the number of strings that can be casted using this operator is capped by the value set with the watchdog-none-encoded-string-translation-limit
flag (1,000,000 by default).
HASH(
str
)
Deterministically Hashes a string input to a BIGINT output using a pseudo-random function. Can be useful for bucketing string values or deterministcally coloring by string values for a high-cardinality TEXT column. Note that currently HASH only accepts TEXT inputs, but in the future may also accept other data types. It should also be noted that NULL values always hash to NULL outputs.
INITCAP(
str
)
Returns the string with initial caps after any of the defined delimiter characters, with the remainder of the characters lowercased. Valid delimiter characters are !
, ?
, @
, "
, ^
, #
, $
, &
, ~
, _
, ,
, .
, :
, ;
, +
, -
, *
, %
, /
, |
, \
, [
, ]
, (
, )
, {
, }
, <
, >
.
JAROWINKLER_SIMILARITY( str1, str2 )
Computes the Jaro-Winkler similarity score between two input strings. The output will be an integer between 0 and 100, with 0 representing completely dissimilar strings, and 100 representing exactly matching strings.
JSON_VALUE(
json_str, path
)
Returns the string of a field given by path in
str. Paths start with the $
character, with sub-fields split by .
and array members indexed by []
, with array indices starting at 0. For example, JSON_VALUE('{"name": "Brenda", "scores": [89, 98, 94]}', '$.scores[1]')
would yield a TEXT return field of '98'
.
Note that currentlyLAX
parsing mode (any unmatched path returns null rather than errors) is the default, and STRICT
parsing mode is not supported.
KEY_FOR_STRING(
str
)
Returns the dictionary key of a dictionary-encoded string column.
LCASE(
str
)
Returns the string in all lower case. Only ASCII character set is currently supported. Same as LOWER
.
LEFT(
str, num
)
Returns the left-most number (num
) of characters in the string (str
).
LENGTH(
str
)
Returns the length of a string in bytes. Only works with unencoded fields (ENCODING set to none
).
LEVENSHTEIN_DISTANCE( str1, str2 )
Computes the edit distance, or number of single-character insertions, deletions, or substitutions, that must be made to make the first string equal the second. It returns an integer greater than or equal to 0, with 0 meaning the strings are equal. The higher the return value, the more the two strings can be thought of as dissimilar.
LOWER(
str
)
Returns the string in all lower case. Only ASCII character set is currently supported. Same as LCASE
.
LPAD(
str
,
len
, [
lpad_str
])
Left-pads the string with the string defined in lpad_str
to a total length of len
. If the optional lpad_str
is not specified, the space character is used to pad.
If the length of str
is greater than len
, then characters from the end of str
are truncated to the length of len
.
Characters are added from lpad_str
successively until the target length len
is met. If lpad_str
concatenated with str
is not long enough to equal the target len
, lpad_str
is repeated, partially if necessary, until the target length is met.
LTRIM(
str
,
chars
)
Removes any leading characters specified in chars
from the string. Alias for TRIM
.
OVERLAY(
str
PLACING
replacement_str
FROM
start
[FOR
len
])
Replaces in str
the number of characters defined in len
with characters defined in replacement_str
at the location start
.
Regardless of the length of replacement_str
, len
characters are removed from str
unless start
+ replacement_str
is greater than the length of str
, in which case all characters from start
to the end of str
are replaced.
Ifstart
is negative, it specifies the number of characters from the end of str
.
POSITION (
search_str
IN
str
[FROM
start_position
])
Returns the position of the first character in search_str
if found in str
, optionally starting the search at start_position
.
If search_str
is not found, 0 is returned. If search_str
or str
are null, null is returned.
REGEXP_COUNT(
str
,
pattern
[,
position
, [
flags
]])
Returns the number of times that the provided pattern occurs in the search string str.
position specifies the starting position in str for which the search for pattern will start (all matches before position will be ignored. If position is negative, the search will start that many characters from the end of the string str.
Use the following optional flags
to control the matching behavior:
c
- Case-sensitive matching.
i
- Case-insensitive matching.
REGEXP_REPLACE(
str
,
pattern
[,
new_str
,
position
,
occurrence
, [
flags
]])
Replace one or all matches of a substring in string str
that matches pattern
, which is a regular expression in POSIX regex syntax.
new_str
(optional) is the string that replaces the string matching the pattern. If new_str
is empty or not supplied, all found matches are removed.
The occurrence
integer argument (optional) specifies the single match occurrence of the pattern to replace, starting from the beginning of str
; 0 (replace all) is the default. Use a negative occurrence
argument to signify the nth-to-last occurrence to be replaced.
pattern
uses POSIX regular expression syntax.
Use a positive position
argument to indicate the number of characters from the beginning of str
. Use a negative position
argument to indicate the number of characters from the end of str
.
Back-references/capture groups can be used to capture and replace specific sub-expressions.
Use the following optional flags
to control the matching behavior:
c
- Case-sensitive matching.
i
- Case-insensitive matching.
If not specified, REGEXP_REPLACE defaults to case sensitive search.
REGEXP_SUBSTR(
str
,
pattern
[,
position
,
occurrence
,
flags
, group_num
])
Search string str
for pattern
, which is a regular expression in POSIX syntax, and return the matching substring.
Use position
to set the character position to begin searching. Use occurrence
to specify the occurrence of the pattern to match.
Use a positive position
argument to indicate the number of characters from the beginning of str
. Use a negative position
argument to indicate the number of characters from the end of str
.
The occurrence
integer argument (optional) specifies the single match occurrence of the pattern to replace, with 0 being mapped to the first (1) occurrence. Use a negative occurrence
argument to signify the nth-to-last group in pattern
is returned.
Use optional flags
to control the matching behavior:
c
- Case-sensitive matching.
e
- Extract submatches.
i
- Case-insensitive matching.
The c
and i
flags cannot be used together; e
can be used with either. If neither c
nor i
are specified, or if pattern
is not provided, REGEXP_SUBSTR defaults to case-sensitive search.
If the e
flag is used, REGEXP_SUBSTR returns the capture group group_num
of pattern
matched in str
. If the e
flag is used, but no capture groups are provided in pattern
, REGEXP_SUBSTR returns the entire matching pattern
, regardless of group_num
. If the e flag is used but no group_num
is provided, a value of 1 for group_num
is assumed, so the first capture group is returned.
REPEAT(
str
,
num
)
Repeats the string the number of times defined in num
.
REPLACE(
str
,
from_str
,
new_str
)
Replaces all occurrences of substring from_str
within a string, with a new substring new_str
.
REVERSE(
str
)
Reverses the string.
RIGHT(
str, num
)
Returns the right-most number (num
) of characters in the string (str
).
RPAD(
str
,
len
,
rpad_str
)
Right-pads the string with the string defined in rpad_str
to a total length of len
. If the optional rpad_str
is not specified, the space character is used to pad.
If the length of str
is greater than len
, then characters from the beginning of str
are truncated to the length of len
.
Characters are added from rpad_str
successively until the target length len
is met. If rpad_str
concatenated with str
is not long enough to equal the target len
, rpad_str
is repeated, partially if necessary, until the target length is met.
RTRIM(
str
)
Removes any trailing spaces from the string.
SPLIT_PART(
str
,
delim
,
field_num
)
Split the string based on a delimiter delim
and return the field identified by field_num
. Fields are numbered from left to right.
STRTOK_TO_ARRAY(
str
, [
delim
])
Tokenizes the string str
using optional delimiter(s) delim
and returns an array of tokens.
An empty array is returned if no tokens are produced in tokenization. NULL is returned if either parameter is a NULL.
SUBSTR(
str
,
start
, [
len
])
Alias for SUBSTRING
.
SUBSTRING(
str FROM
start [ FOR
len
])
Returns a substring of str
starting at index start
for len
characters.
The start position is 1-based (that is, the first character of str
is at index 1, not 0). However, start
0 aliases to start
1.
If start
is negative, it is considered to be |start|
characters from the end of the string.
If len
is not specified, then the substring from start
to the end of str
is returned.
If len
is not specified, then the substring from start
to the end of str is returned.
If start
+ len
is greater than the length of str
, then the characters in str
from start
to the end of the string are returned.
TRIM([BOTH | LEADING | TRAILING] [
trim_str
FROM
str
])
Removes characters defined in trim_str
from the beginning, end, or both of str
. If trim_str
is not specified, the space character is the default.
If the trim location is not specified, defined characters are trimmed from both the beginning and end of str
.
TRY_CAST( str AS type)
Attempts to cast/convert a string type to any valid numeric, timestamp, date, or time type. If the conversion cannot be performed, null is returned.
Note that TRY_CAST
is not valid for non-string input types.
UCASE(
str
)
Returns the string in uppercase format. Only ASCII character set is currently supported. Same as UPPER
.
UPPER(
str
)
Returns the string in uppercase format. Only ASCII character set is currently supported. Same as UCASE
.
URL_DECODE( str )
Decode a url-encoded string. This is the inverse of the URL_ENCODE
function.
URL_ENCODE( str )
Url-encode a string. Alphanumeric and the 4 characters: _-.~
are untranslated. The space character is translated to +
. All other characters are translated into a 3-character sequence %XX
where XX
is the 2-digit hexadecimal ASCII value of the character.
Name
Example
Description
str
LIKE
pattern
'ab' LIKE 'ab'
Returns true if the string matches the pattern (case-sensitive)
str
NOT LIKE
pattern
'ab' NOT LIKE 'cd'
Returns true if the string does not match the pattern
str
ILIKE
pattern
'AB' ILIKE 'ab'
Returns true if the string matches the pattern (case-insensitive). Supported only when the right side is a string literal; for example, colors.name ILIKE 'b%
str
REGEXP
POSIX pattern
'^[a-z]+r$'
Lowercase string ending with r
REGEXP_LIKE (
str
,
POSIX pattern
)
'^[hc]at'
cat or hat
Function
Description
CURRENT_DATE
CURRENT_DATE()
Returns the current date in the GMT time zone.
Example:
SELECT CURRENT_DATE();
CURRENT_TIME
CURRENT_TIME()
Returns the current time of day in the GMT time zone.
Example:
SELECT CURRENT_TIME();
CURRENT_TIMESTAMP
CURRENT_TIMESTAMP()
Return the current timestamp in the GMT time zone. Same as NOW()
.
Example:
SELECT CURRENT_TIMESTAMP();
DATEADD(
'date_part'
,
interval
,
date
|
timestamp
)
Returns a date after a specified time/date interval has been added.
Example:
SELECT DATEADD('MINUTE', 6000, dep_timestamp) Arrival_Estimate FROM flights_2008_10k LIMIT 10;
DATEDIFF(
'date_part'
,
date
,
date
)
Returns the difference between two dates, calculated to the lowest level of the date_part you specify. For example, if you set the date_part as DAY, only the year, month, and day are used to calculate the result. Other fields, such as hour and minute, are ignored.
Example:
SELECT DATEDIFF('YEAR', plane_issue_date, now()) Years_In_Service FROM flights_2008_10k LIMIT 10;
DATEPART(
'interval'
,
date
|
timestamp
)
Returns a specified part of a given date or timestamp as an integer value. Note that 'interval' must be enclosed in single quotes.
Example:
SELECT DATEPART('YEAR', plane_issue_date) Year_Issued FROM flights_2008_10k LIMIT 10;
DATE_TRUNC(
date_part
,
timestamp
)
Truncates the timestamp to the specified date_part. DATE_TRUNC(week,...)
starts on Monday (ISO), which is different than EXTRACT(dow,...)
, which starts on Sunday.
Example:
SELECT DATE_TRUNC(MINUTE, arr_timestamp) Arrival FROM flights_2008_10k LIMIT 10;
EXTRACT(
date_part
FROM
timestamp
)
Returns the specified date_part from timestamp.
Example:
SELECT EXTRACT(HOUR FROM arr_timestamp) Arrival_Hour FROM flights_2008_10k LIMIT 10;
INTERVAL
'count'
date_part
Adds or Subtracts count date_part units from a timestamp. Note that 'count' is enclosed in single quotes.
Example:
SELECT arr_timestamp + INTERVAL '10' YEAR FROM flights_2008_10k LIMIT 10;
NOW()
Return the current timestamp in the GMT time zone. Same as CURRENT_TIMESTAMP().
Example:
NOW();
TIMESTAMPADD(
date_part
,
count
,
timestamp
|
date
)
Adds an interval of count date_part to timestamp or date and returns signed date_part units in the provided timestamp or date form.
Example:
SELECT TIMESTAMPADD(DAY, 14, arr_timestamp) Fortnight FROM flights_2008_10k LIMIT 10;
TIMESTAMPDIFF(
date_part
,
timestamp1
,
timestamp2
)
Subtracts timestamp1 from timestamp2 and returns the result in signed date_part units.
Example:
SELECT TIMESTAMPDIFF(MINUTE, arr_timestamp, dep_timestamp) Flight_Time FROM flights_2008_10k LIMIT 10;
Datatype
Formats
Examples
DATE
YYYY-MM-DD
2013-10-31
DATE
MM/DD/YYYY
10/31/2013
DATE
DD-MON-YY
31-Oct-13
DATE
DD/Mon/YYYY
31/Oct/2013
EPOCH
1383262225
TIME
HH:MM
23:49
TIME
HHMMSS
234901
TIME
HH:MM:SS
23:49:01
TIMESTAMP
DATE TIME
31-Oct-13 23:49:01
TIMESTAMP
DATETTIME
31-Oct-13T23:49:01
TIMESTAMP
DATE:TIME
11/31/2013:234901
TIMESTAMP
DATE TIME ZONE
31-Oct-13 11:30:25 -0800
TIMESTAMP
DATE HH.MM.SS PM
31-Oct-13 11.30.25pm
TIMESTAMP
DATE HH:MM:SS PM
31-Oct-13 11:30:25pm
TIMESTAMP
1383262225
Double-precision FP Function
Single-precision FP Function
Description
AVG(
x
)
Returns the average value of x
COUNT()
Returns the count of the number of rows returned
COUNT(DISTINCT
x
)
Returns the count of distinct values of x
APPROX_COUNT_DISTINCT(
x
,
e
)
Returns the approximate count of distinct values of x with defined expected error rate e, where e is an integer from 1 to 100. If no value is set for e, the approximate count is calculated using the system-widehll-precision-bits
configuration parameter.
APPROX_MEDIAN(
x
)
Returns the approximate median of x. Two server configuration parameters affect memory usage:
<code></code>approx_quantile_centroids
<code></code>
<code></code>approx_quantile_buffer
Accuracy of APPROX_MEDIAN depends on the distribution of data; see Usage Notes.
APPROX_PERCENTILE(
x
,
y
)
Returns the approximate quantile of x
, where y
is the value between 0 and 1.
For example, y=0
returns MIN(x)
, y=1
returns MAX(x)
, and y=0.5
returns APPROX_MEDIAN(x)
.
MAX(
x
)
Returns the maximum value of x
MIN(
x
)
Returns the minimum value of x
SINGLE_VALUE
Returns the input value if there is only one distinct value in the input; otherwise, the query fails.
SUM(
x
)
Returns the sum of the values of x
SAMPLE(
x
)
Returns one sample value from aggregated column x. For example, the following query returns population grouped by city, along with one value from the state column for each group:
Note: This was previously LAST_SAMPLE
, which is now deprecated.
CORRELATION(x, y)
CORRELATION_FLOAT(x, y)
Alias of CORR. Returns the coefficient of correlation of a set of number pairs.
CORR(x, y)
CORR_FLOAT(x, y)
Returns the coefficient of correlation of a set of number pairs.
COUNT_IF(conditional_expr)
Returns the number of rows satisfying the given condition_expr
.
COVAR_POP(x, y)
COVAR_POP_FLOAT(x, y)
Returns the population covariance of a set of number pairs.
COVAR_SAMP(x, y)
COVAR_SAMP_FLOAT(x, y)
Returns the sample covariance of a set of number pairs.
STDDEV(x)
STDDEV_FLOAT(x)
Alias of STDDEV_SAMP. Returns sample standard deviation of the value.
STDDEV_POP(x)
STDDEV_POP_FLOAT(x)
Returns the population standard the standard deviation of the value.
STDDEV_SAMP(x)
STDDEV_SAMP_FLOAT(x)
Returns the sample standard deviation of the value.
SUM_IF(conditional_expr)
Returns the sum of all expression values satisfying the given condition_expr
.
VARIANCE(x)
VARIANCE_FLOAT(x)
Alias of VAR_SAMP. Returns the sample variance of the value.
VAR_POP(x)
VAR_POP_FLOAT(x)
Returns the population variance sample variance of the value.
VAR_SAMP(x)
VAR_SAMP_FLOAT(x)
Returns the sample variance of the value.
Function
Description
SAMPLE_RATIO(
x
)
Returns a Boolean value, with the probability of True
being returned for a row equal to the input argument. The input argument is a numeric value between 0.0 and 1.0. Negative input values (return False
), input values greater than 1.0 returns True
, and null input values return False
.
The result of the function is deterministic per row; that is, all calls of the operator for a given row return the same result. The sample ratio is probabilistic, but is generally within a thousandth of a percentile of the actual range when the underlying dataset is millions of records or larger.
The following example filters approximately 50% of the rows from t
and returns a count that is approximately half the number of rows in t
:
SELECT COUNT(*) FROM t WHERE SAMPLE_RATIO(0.5)
primary_key
Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function will compute the similarity to the search vector specified by the comparison_features
cursor. Examples include countries, census block groups, user IDs of website visitors, and aircraft call signs.
Column<TEXT ENCODING DICT | INT | BIGINT>
pivot_features
One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by primary_key
based on whether they visited the same census block group in the same hour. If a single census block group feature column is used, the primary_key
entities are compared only by the census block groups visited, regardless of time overlap.
Column<TEXT ENCODING DICT | INT | BIGINT>
metric
Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is simply COUNT(*)
such that feature overlaps are weighted by the number of co-occurrences.
Column<INT | BIGINT | FLOAT | DOUBLE>
comparison_
pivot_features
One or more columns constituting a compound feature for the search vector. This should match in number of sub-features, types, and semantics pivot features
.
Column<TEXT ENCODING DICT | INT | BIGINT>
comparison_metric
Column denoting the values used as input for the cosine similarity metric computation from the search vector. In many cases, this is simply COUNT(*)
such that feature overlaps are weighted by the number of co-occurrences.
Column<TEXT ENCODING DICT | INT | BIGINT>
use_tf_idf
Boolean constant denoting whether TF-IDF weighting should be used in the cosine similarity score computation.
BOOLEAN
class
ID of the primary key
being compared against the search vector.
Column<TEXT ENCODING DICT | INT | BIGINT> (type will be the same of primary_key
input column)
similarity_score
Computed cosine similarity score between each primary_key
pair, with values falling between 0 (completely dissimilar) and 1 (completely similar).
Column<FLOAT>
node1
Origin node column in directed edge list CURSOR
Column< INT | BIGINT | TEXT ENCODED DICT>
node2
Destination node column in directed edge list CURSOR
Column< INT | BIGINT | TEXT ENCODED DICT> (must be the same type as node1
)
distance
Distance between origin and destination node in directed edge list CURSOR
Column< INT | BIGINT | FLOAT | DOUBLE >
origin_node
The origin node to start graph traversal from. If not a value present in edge_list.node1
, will cause empty result set to be returned.
BIGINT | TEXT ENCODED DICT
destination_node
The destination node to finish graph traversal at. If not a value present in edge_list.node1
, will cause empty result set to be returned.
BIGINT | TEXT ENCODED DICT
path_step
The index of this node along the path traversal from origin_node
to destination_node
, with the first node (the origin_node)
indexed as 1.
Column< INT >
node
The current node along the path traversal from origin_node
to destination_node
. The first node (as denoted by path_step
= 1) will always be the input origin_node
, and the final node (as denoted by MAX(path_step)
) will always be the input destination_node
.
Column < INT | BIGINT | TEXT ENCODED DICT> (same type as the node1
and node2
input columns)
cume_distance
The cumulative distance adding all input distance
values from the origin_node
to the current node.
Column < INT | BIGINT | FLOAT | DOUBLE> (same type as the distance
input column)
x
X-coordinate column or expression
Column<FLOAT | DOUBLE>
y
Y-coordinate column or expression
Column<FLOAT | DOUBLE>
z
Z-coordinate column or expression. The output bin is computed as the maximum z-value for all points falling in each bin.
Column<FLOAT | DOUBLE>
agg_type
The aggregate to be performed to compute the output z-column. Should be one of 'AVG'
, 'COUNT'
, 'SUM',
'MIN'
, or 'MAX'.
TEXT ENCODING NONE
fill_agg_type
(optional)
The aggregate to be performed when computing the blur pass on the output bins. Should be one of 'AVG'
, 'COUNT'
, 'SUM'
, 'MIN'
, 'MAX'
, ' 'AVG', 'COUNT', 'SUM',
'GAUSS_AVG'
, or 'BOX_AVG'
. Note that AVG
is synonymous with GAUSS_AVG
in this context, and the default fill_agg_type
if not specified is GAUSS_AVG
.
TEXT ENCODING NONE
bin_dim_meters
The width and height of each x/y bin in meters. If geographic_coords
is not set to true, the input x/y units are already assumed to be in meters.
DOUBLE
geographic_coords
If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.
BOOLEAN
neighborhood_fill_radius
The radius in bins to compute the box blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius
bins.
DOUBLE
fill_only_nulls
Specifies that the box blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).
BOOLEAN
x_min
(optional)
Min x-coordinate value (in input units) for the spatial output grid.
DOUBLE
x_max
(optional)
Max x-coordinate value (in input units) for the spatial output grid.
DOUBLE
y_min
(optional)
Min y-coordinate value (in input units) for the spatial output grid.
DOUBLE
y_max
(optional)
Max y-coordinate value (in input units) for the spatial output grid.
DOUBLE
x
The x-coordinates for the centroids of the output spatial bins.
Column<FLOAT | DOUBLE> (same as input x-coordinate column/expression)
y
The y-coordinates for the centroids of the output spatial bins.
Column<FLOAT | DOUBLE> (same as input y-coordinate column/expression)
z
The maximum z-coordinate of all input data assigned to a given spatial bin.
Column<FLOAT | DOUBLE> (same as input z-coordinate column/expression)
x
Input x-coordinate column or expression.
Column<FLOAT | DOUBLE>
y
Input y-coordinate column or expression.
Column<FLOAT | DOUBLE>
z
Input z-coordinate column or expression. The output bin is computed as the maximum z-value for all points falling in each bin.
Column<FLOAT | DOUBLE>
agg_type
The aggregate to be performed to compute the output z-column. Should be one of 'AVG'
, 'COUNT'
, 'SUM',
'MIN'
, or 'MAX'.
TEXT ENCODING NONE
bin_dim_meters
The width and height of each x/y bin in meters. If geographic_coords
is not set to true, the input x/y units are already assumed to be in meters.
DOUBLE
geographic_coords
If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.
BOOLEAN
neighborhood_fill_radius
The radius in bins to compute the box blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius
bins.
BIGINT
fill_only_nulls
Specifies that the box blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).
BOOLEAN
compute_slope_in_degrees
If true, specifies the slope should be computed in degrees (with 0 degrees perfectly flat and 90 degrees perfectly vertical). If false, specifies the slope should be computed as a fraction from 0 (flat) to 1 (vertical). In a future release, we are planning to move the default output to percentage slope.
BOOLEAN
x
The x-coordinates for the centroids of the output spatial bins.
Column<FLOAT | DOUBLE> (same as input x column/expression)
y
The y-coordinates for the centroids of the output spatial bins.
Column<FLOAT | DOUBLE> (same as input y column/expression)
z
The maximum z-coordinate of all input data assigned to a given spatial bin.
Column<FLOAT | DOUBLE> (same as input z column/expression)
slope
The average slope of an output grid cell (in degrees or a fraction between 0 and 1, depending on the argument to compute_slope_in_degrees
).
Column<FLOAT | DOUBLE> (same as input z column/expression)
aspect
The direction from 0 to 360 degrees pointing towards the maximum downhill gradient, with 0 degrees being due north and moving clockwise from N (0°) -> NE (45°) -> E (90°) -> SE (135°) -> S (180°) -> SW (225°) -> W (270°) -> NW (315°).
Column<FLOAT | DOUBLE> (same as input z column/expression)
node1
Origin node column in directed edge list CURSOR
Column<INT | BIGINT | TEXT ENCODED DICT>
node2
Destination node column in directed edge list CURSOR
Column<INT | BIGINT | TEXT ENCODED DICT> (must be the same type as node1
)
distance
Distance between origin and destination node in directed edge list CURSOR
Column INT | BIGINT | FLOAT | DOUBLE>
origin_node
The origin node to start graph traversal from. If not a value present in edge_list.node1
, will cause empty result set to be returned.
BIGINT | TEXT ENCODED DICT
origin_node
Starting node in graph traversal. Always equal to input origin_node
.
Column <INT | BIGINT | TEXT ENCODED DICT> (same type as the node1
and node2
input columns)
destination_node
Final node in graph traversal. Will be equal to one of values of node2
input column.
Column <INT | BIGINT | TEXT ENCODED DICT> (same type as the node1
and node2
input columns)
distance
Cumulative distance between origin and destination node for shortest path graph traversal.
Column<INT | BIGINT | FLOAT | DOUBLE> (same type as the distance
input column)
num_edges_traversed
Number of edges (or "hops") traversed in the graph to arrive at destination_node
from origin_node
for the shortest path graph traversal between these two nodes.
Column <INT>
| Input x-coordinate column or expression of the data to be rasterized. | Column <FLOAT | DOUBLE> |
| Input y-coordinate column or expression of the data to be rasterized. | Column <FLOAT | DOUBLE> (must be the same type as |
| Input z-coordinate column or expression of the data to be rasterized. | Column <FLOAT | DOUBLE> |
| The aggregate to be performed to compute the output z-column. Should be one of | TEXT ENCODING NONE |
| The width and height of each x/y bin . If | DOUBLE |
| If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max. | BOOLEAN |
| The radius in bins to compute the gaussian blur/filter over, such that each output bin will be the average value of all bins within | BIGINT |
| Specifies that the gaussian blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values). | BOOLEAN |
| The x-coordinate for the starting point for the graph traversal, in input (not bin) units. | DOUBLE |
| The y-coordinate for the starting point for the graph traversal, in input (not bin) units. | DOUBLE |
| The x-coordinate for the destination point for the graph traversal, in input (not bin) units. | DOUBLE |
| The y-coordinate for the destination point for the graph traversal, in input (not bin) units. | DOUBLE |
| The slope weight between neighboring raster cells will be weighted by the | DOUBLE |
| The max absolute value of slopes (measured in percentages) between neighboring raster cells that will be considered for traversal. A neighboring graph cell with an absolute slope greater than this amount will not be considered in the shortest slope-weighted path graph traversal | DOUBLE |
| 32-bit integer |
| 32-bit integer |
| DOUBLE |
| DOUBLE |
| DOUBLE |
| DOUBLE |
| 32-bit integer |
| 32-bit integer |
| 32-bit integer |
| DOUBLE |
| DOUBLE |
| DOUBLE |
| DOUBLE |
| 32-bit integer |
| 32-bit integer |
| 32-bit integer |
| DOUBLE |
| DOUBLE |
| DOUBLE |
| DOUBLE |
| 32-bit integer |
Function | Arguments and Return |
| Converts a distance in meters in a longitudinal direction from a latitude/longitude coordinate to a pixel size using mercator projection:
Returns: Floating-point value in pixel units. Can be used for the width of a symbol or a point in Vega. |
| Converts a distance in meters in a latitudinal direction from a latitude/longitude coordinate to a pixel size, using mercator projection:
Returns: Floating-point value in pixel units. Can be used for the height of a symbol or a point in Vega. |
| Converts a distance in meters in a longitudinal direction from a latitude/longitude POINT to a pixel size. Supports only mercator-projected points.
Returns: Floating-point value in pixel units. Can be used for the width of a symbol or a point in Vega. |
| Converts a distance in meters in a latitudinal direction from an EPSG:4326 POINT to a pixel size. Currently only supports mercator-projected points:
Returns: Floating-point value in pixel units. Can be used for the height of a symbol or a point in Vega. |
| Returns true if a latitude/longitude coordinate is within a mercator-projected view defined by
Returns:True if the point is within the view defined by the |
| Returns true if a latitude/longitude coordinate, offset by a distance in meters, is within a mercator-projected view defined by
Returns: True if the point is within the view defined by the |
| Returns true if a latitude/longitude POINT defined in EPSG:4326 is within a mercator-projected view defined by
Returns: True if the point is within the view defined by |
| Returns true if a latitude/longitude POINT defined in EPSG:4326 is within a mercator-projected view defined by
Returns: True if a latitude/longitude POINT defined in EPSG:4326, offset by a distance in meters, is within the view defined by |
| Output geometries. | Column<LINESTRING | POLYGON> |
| Raster values associated with each contour geometry. | Column<FLOAT | DOUBLE> (will be the same type as value) |
| The path of the file or directory containing the las/laz file or files. Can contain globs. Path must be in | TEXT ENCODING NONE |
| EPSG code of the output SRID. If not specified, output points are automatically converted to lon/lat (EPSG 4326). | TEXT ENCODING NONE |
| If true, use internal point cloud cache. Useful for inline querying of the output of | BOOLEAN |
| Min x-coordinate value (in degrees) for the output data. | DOUBLE |
| Max x-coordinate value (in degrees) for the output data. | DOUBLE |
| Min y-coordinate value (in degrees) for the output data. | DOUBLE |
| Max y-coordinate value (in degrees) for the output data. | DOUBLE |
| The path of the file or directory containing the las/laz file or files. Can contain globs. Path must be in | TEXT ENCODING NONE |
| Min x-coordinate value for point cloud files to retrieve metadata from. | DOUBLE |
| Max x-coordinate value for point cloud files to retrieve metadata from. | DOUBLE |
| Min y-coordinate value for point cloud files to retrieve metadata from. | DOUBLE |
| Max y-coordinate value for point cloud files to retrieve metadata from. | DOUBLE |
| Full path for the las or laz file | Column<TEXT ENCODING DICT> |
| Filename for the las or laz file | Column<TEXT ENCODING DICT> |
| File source id per file metadata | Column<SMALLINT> |
| LAS version major number | Column<SMALLINT> |
| LAS version minor number | Column<SMALLINT> |
| Data creation year | Column<SMALLINT> |
| Whether data is compressed, i.e. LAZ format | Column<BOOLEAN> |
| Number of points in this file | Column<BIGINT> |
| Number of data dimensions for this file | Column<SMALLINT> |
| Not currently used | Column<SMALLINT> |
| Whether data has time value | COLUMN<BOOLEAN> |
| Whether data contains rgb color value | COLUMN<BOOLEAN> |
| Whether data contains wave info | COLUMN<BOOLEAN> |
| Whether data contains infrared value | COLUMN<BOOLEAN> |
| Data adheres to 14-attribute standard | COLUMN<BOOLEAN> |
| UTM zone of data | Column<INT> |
| Minimum x-coordinate in source projection | Column<DOUBLE> |
| Maximum x-coordinate in source projection | Column<DOUBLE> |
| Minimum y-coordinate in source projection | Column<DOUBLE> |
| Maximum y-coordinate in source projection | Column<DOUBLE> |
| Minimum z-coordinate in source projection | Column<DOUBLE> |
| Maximum z-coordinate in source projection | Column<DOUBLE> |
| Minimum x-coordinate in lon/lat degrees | Column<DOUBLE> |
| Maximum x-coordinate in lon/lat degrees | Column<DOUBLE> |
| Minimum y-coordinate in lon/lat degrees | Column<DOUBLE> |
| Maximum y-coordinate in lon/lat degrees | Column<DOUBLE> |
| Minimum z-coordinate in meters above sea level (AMSL) | Column<DOUBLE> |
| Maximum z-coordinate in meters above sea level (AMSL) | Column<DOUBLE> |
| 32-bit integer |
| 32-bit integer |
| DOUBLE |
| DOUBLE |
| DOUBLE |
| DOUBLE |
| 32-bit integer |
| Point x-coordinate | Column<DOUBLE> |
| Point y-coordinate | Column<DOUBLE> |
| Point z-coordinate | Column<DOUBLE> |
| Point intensity | Column<INT> |
| The ordered number of the return for a given LiDAR pulse. The first returns (lowest return numbers) are generally associated with the highest-elevation points for a LiDAR pulse, i.e. the forest canopy will generally have a lower | Column<TINYINT> |
| The total number of returns for a LiDAR pulse. Multiple returns occur when there are multiple objects between the LiDAR source and the lowest ground or water elevation for a location. | Column<TINYINT> |
| Column<TINYINT> |
| Column<TINYINT> |
| Column<SMALLINT> |
| Column<TINYINT> |
Function | Description |
| Replace the null value by using the nearest non-null value of the value column, using backward search. For example, for column
At least one ordering column must be defined in the window clause. NULLS FIRST ordering of the input value is added automatically for any user-defined ordering of the input value. For example:
|
| For each partition, a zero-initialized counter is incremented every time the result of expr changes as the expression is evaluated over the partition. Requires an ORDER BY clause for the window. |
| Aggregate function that can be used as a window function for both a nonframed window partition and a window frame. Returns the number of rows satisfying the given |
| Cumulative distribution value of the current row: (number of rows preceding or peers of the current row)/(total rows). Window framing is ignored. |
| Rank of the current row without gaps. This function counts peer groups. Window framing is ignored. |
| Returns the value from the first row of the window frame (the rows from the start of the partition to the last peer of the current row). |
| Replace the null value by using the nearest non-null value of the value column, using forward search.
For example, for column NULLS FIRST ordering of the input value is added automatically for any user-defined ordering of the input value. For example:
|
|
| Returns the value from the last row of the window frame. |
|
| Returns a value of |
| Subdivide the partition into buckets. If the total number of rows is divisible by num_buckets, each bucket has a equal number of rows. If the total is not divisible by num_buckets, the function returns groups of two sizes with a difference of 1. Window framing is ignored. |
| Relative rank of the current row: (rank-1)/(total rows-1). Window framing is ignored. |
| Rank of the current row with gaps. Equal to the |
| Number of the current row within the partition, counting from 1. Window framing is ignored. |
| Aggregate function that can be used as a window function for both a nonframed window partition and a window frame. Returns the sum of all expression values satisfying the given |
Frame aggregation |
|
Frame navigation |
|
| Longitude value of raster point (degrees, SRID 4326). | Column<FLOAT | DOUBLE> |
| Latitude value of raster point (degrees, SRID 4326). | Column<FLOAT | DOUBLE> (must be the same as <lon>) |
| Raster band value from which to derive contours. | Column<FLOAT | DOUBLE> |
|
|
|
|
|
|
|
|
|
|
| Optionally flip resulting geometries in latitude (default FALSE). (This parameter may be removed in future releases) | BOOLEAN |
| Desired contour interval. The function will generate a line at each interval, or a polygon region that covers that interval. | FLOAT/DOUBLE (must be same type as value) |
| Optional offset for resulting intervals. | FLOAT/DOUBLE (must be same type as value) |
| Pixel width (stride) of the raster data. | INTEGER |
| Pixel height of the raster data. | INTEGER |
Given a query input of entity keys/IDs (for example, airplane tail numbers), a set of feature columns (for example, airports visited), and a metric column (for example number of times each airport was visited), scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.
Parameter | Description | Data Type |
---|---|---|
Example
From the : "The scan direction flag denotes the direction at which the scanner mirror was traveling at the time of the output pulse. A bit value of 1 is a positive scan direction, and a bit value of 0 is a negative scan direction."
From the : "The edge of flight line data bit has a value of 1 only when the point is at the end of a scan. It is the last point on a given scan line before it changes direction."
From the : "The classification field is a number to signify a given classification during filter processing. The ASPRS standard has a public list of classifications which shall be used when mixing vendor specific user software."
From the : "The angle at which the laser point was output from the laser system, including the roll of the aircraft... The scan angle is an angle based on 0 degrees being NADIR, and –90 degrees to the left side of the aircraft in the direction of flight."
Returns the value at the row that is offset rows before the current row within the partition. is the window-frame-aware version.
Returns the value at the row that is offset rows after the current row within the partition. is the window-frame-aware version.
These are window-frame-aware versions of the , , , , and NTH_VALUE
functions.
See
See
See
See
See
Name | Description | Data Types |
---|---|---|
primary_key
Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function computes co-similarity. Examples include countries, census block groups, user IDs of website visitors, and aircraft callsigns.
Column<TEXT ENCODING DICT | INT | BIGINT>
pivot_features
One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by primary_key
based on whether they visited the same census block group in the same hour. If a single census block group feature column is used, the primary_key
entities would be compared only by the census block groups visited, regardless of time overlap.
Column<TEXT ENCODING DICT | INT | BIGINT>
metric
Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is COUNT(*)
such that feature overlaps are weighted by the number of co-occurrences.
Column<INT | BIGINT | FLOAT | DOUBLE>
use_tf_idf
Boolean constant denoting whether TF-IDF weighting should be used in the cosine similarity score computation.
BOOLEAN
class1
ID of the first primary key
in the pair-wise comparison.
Column<TEXT ENCODING DICT | INT | BIGINT> (type is the same of primary_key
input column)
class2
ID of the second primary key
in the pair-wise comparison. Because the computed similarity score for a pair of primary keys
is order-invariant, results are output only for ordering such that class1
<= class2
. For primary keys of type TextEncodingDict
, the order is based on the internal integer IDs for each string value and not lexicographic ordering.
Column<TEXT ENCODING DICT | INT | BIGINT> (type is the same of primary_key
input column)
similarity_score
Computed cosine similarity score between each primary_key
pair, with values falling between 0 (completely dissimilar) and 1 (completely similar).
Column<Float>