Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity. The score is computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.
Parameter | Description | Data Type |
---|---|---|
Name | Description | Data Types |
---|---|---|
primary_key
Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function will compute the similarity to the search vector specified by the comparison_features
cursor. Examples include countries, census block groups, user IDs of website visitors, and aircraft call signs.
Column<TEXT ENCODING DICT | INT | BIGINT>
pivot_features
One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by primary_key
based on whether they visited the same census block group in the same hour. If a single census block group feature column is used, the primary_key
entities are compared only by the census block groups visited, regardless of time overlap.
Column<TEXT ENCODING DICT | INT | BIGINT>
metric
Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is simply COUNT(*)
such that feature overlaps are weighted by the number of co-occurrences.
Column<INT | BIGINT | FLOAT | DOUBLE>
comparison_
pivot_features
One or more columns constituting a compound feature for the search vector. This should match in number of sub-features, types, and semantics pivot features
.
Column<TEXT ENCODING DICT | INT | BIGINT>
comparison_metric
Column denoting the values used as input for the cosine similarity metric computation from the search vector. In many cases, this is simply COUNT(*)
such that feature overlaps are weighted by the number of co-occurrences.
Column<TEXT ENCODING DICT | INT | BIGINT>
use_tf_idf
Boolean constant denoting whether TF-IDF weighting should be used in the cosine similarity score computation.
BOOLEAN
class
ID of the primary key
being compared against the search vector.
Column<TEXT ENCODING DICT | INT | BIGINT> (type will be the same of primary_key
input column)
similarity_score
Computed cosine similarity score between each primary_key
pair, with values falling between 0 (completely dissimilar) and 1 (completely similar).
Column<FLOAT>
Given a query input with entity keys (for example, user IP addresses) and timestamps (for example, page visit timestamps), and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session (dwell time).
Example
Parameter | Description | Data Types |
---|---|---|
Name | Description | Output Types |
---|---|---|
Parameter | Description | Data Type |
---|
Name | Description | Data Type |
---|
<series_start>
Starting integer value, inclusive.
BIGINT
<series_end>
Ending integer value, inclusive.
BIGINT
<series_step> (optional, defaults to 1)
Increment to increase or decrease and values that follow. Integer.
BIGINT
generate_series
The integer series specified by the input arguments.
Column<BIGINT>
series_start
Starting timestamp value, inclusive.
TIMESTAMP(9) (Timestamp literals with other precisions will be auto-casted to TIMESTAMP(9) )
series_end
Ending timestamp value, inclusive.
TIMESTAMP(9) (Timestamp literals with other precisions will be auto-casted to TIMESTAMP(9) )
series_step
Time/Date interval signifying step between each element in the returned series.
INTERVAL
generate_series
The timestamp series specified by the input arguments.
COLUMN<TIMESTAMP(9)>
| Column containing keys/IDs used to identify the entities for which dwell/session times are to be computed. Examples include IP addresses of clients visiting a website, login IDs of database users, MMSIs of ships, and call signs of airplanes. | Column<TEXT ENCODING DICT | BIGINT> |
| Column containing keys/IDs of dwell “sites” or locations that entities visit. Examples include website pages, database session IDs, ports, airport names, or binned h3 hex IDs for geographic location. | Column<TEXT ENCODING DICT | BIGINT> |
| Column denoting the time at which an event occurred. | Column<TIMESTAMP(0|3|6|0)> |
| Constant integer value specifying the minimum number of seconds required between the first and last timestamp-ordered record for an entity_id at a site_id to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3600 (one hour), but only 1800 seconds elapses between an entity’s first and last ordered timestamp records at a site, these records are not considered a valid session and a dwell time for that session is not calculated. | BIGINT (other integer types are automatically casted to BIGINT) |
| A constant integer value specifying the minimum number of successive observations (in | BIGINT (other integer types are automatically casted to BIGINT) |
| A constant integer value specifying the maximum time in seconds between two successive observations for an entity at a given site before the current session/dwell time is considered finished and a new session/dwell time is started. For example, if this variable is set to 86400 seconds (one day), and the time gap between two successive records for an entity id at a given site id is 86500 seconds, the session is considered ended at the first timestamp-ordered record, and a new session is started at the timestamp of the second record. | BIGINT (other integer types are automatically casted to BIGINT) |
| The ID of the entity for the output dwell time, identical to the corresponding | Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the |
| The site ID for the output dwell time, identical to the corresponding | Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the |
| The site ID for the session preceding the current session, which might be a different | Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the |
| The site id for the session after the current session, which might be a different | Column<TEXT ENCODING DICT> | Column<BIGINT> (type will be the same as the |
| An auto-incrementing session ID specific/relative to the current | Column<INT> |
| The index of the nth timestamp ( | Column<INT> |
| The duration in seconds for the session. | Column<INT> |
| The number of records/observations constituting the current output row's session. | Column<INT> |
| The number of strings to randomly generate. | BIGINT |
| Length of the generated strings. | BIGINT |
id | Integer id of output, starting at 0 and increasing monotonically | Column<BIGINT> |
rand_str | Random String | Column<TEXT ENCODING DICT> |
Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, with taking the maximum z value across all points in each bin as the output value for the bin. The aggregate performed to compute the value for each bin is specified by agg_type
, with allowed aggregate types of AVG
, COUNT
, SUM
, MIN
, and MAX
. If neighborhood_fill_radius
is set greater than 0, a blur pass/kernel will be computed on top of the results according to the optionally-specified fill_agg_type
, with allowed types of GAUSS_AVG, BOX_AVG
, COUNT
, SUM
, MIN
, and MAX
(if not specified, defaults to GAUSS_AVG
, or a Gaussian-average kernel). if fill_only_nulls
is set to true, only null bins from the first aggregate step will have final output values computed from the blur pass, otherwise if false all values will be affected by the blur pass.
Note that the arguments to bound the spatial output grid (x_min, x_max, y_min, y_max) are optional, however either all or none of these arguments must be supplied. If the arguments are not supplied, the bounds of the spatial output grid will be bounded by the x/y range of the input query, and if SQL filters are applied on the output of the tf_geo_rasterize
table function, these filters will also constrain the output range.
Example
HEAVY.AI provides access to a set system-provided table functions, also known as table-valued functions (TVS). System table functions, like user-defined table functions, support execution of queries on both CPU and GPU over one or more SQL result-set inputs. Table function support in HEAVY.AI can be split into two broad categories: system table functions and user-defined table functions (UDTFs). System table functions are built-in to the HEAVY.AI server, while UDTFs can be declared dynamically at run-time by specifying them in Numba, a subset of the Python language. For more information on UDTFs, see User-Defined Table Functions.
To improve performance, table functions can be declared to enable filter pushdown optimization, which allows the Calcite optimizer to "push down" filters on the output(s) of a table functions to its input(s) when the inputs and outputs are declared to be semantically equivalent (for example, a longitude variable that is input and output from a table function). This can significantly increase performance in cases where only a small portion of one or more input tables is required to compute the filtered output of a table function.
Whether system- or user-provided, table functions can execute over one or more result sets specified by subqueries, and can also take any number of additional constant literal arguments specified in the function definition. SQL subquery inputs can consist of any SQL expression (including multiple subqueries, joins, and so on) allowed by HeavyDB, and the output can be filtered, grouped by, joined, and so on like a normal SQL subquery, including being input into additional table functions by wrapping it in a CURSOR
argument. The number and types of input arguments, as well as the number and types of output arguments, are specified in the table function definition itself.
Table functions allow for the efficient execution of advanced algorithms that may be difficult or impossible to express in canonical SQL. By allowing execution of code directly over SQL result sets, leveraging the same hardware parallelism used for fast SQL execution and visualization rendering, HEAVY.AI provides orders-of-magnitude speed increases over the alternative of transporting large result sets to other systems for post-processing and then returning to HEAVY.AI for storage or downstream manipulation. You can easily invoke system-provided or user-defined algorithms directly inline with SQL and rendering calls, making prototyping and deployment of advanced analytics capabilities easier and more streamlined.
Table functions can take as input arguments both constant literals (including scalar results of subqueries) as well as results of other SQL queries (consisting of one or more rows). The latter (SQL query inputs), per the SQL standard, must be wrapped in the keyword CURSOR
. Depending on the table function, there can be 0, 1, or multiple CURSOR inputs. For example:
Certain table functions can take 1 or more columns of a specified type or types as inputs, denoted as ColumnList<TYPE1 | Type2... TypeN>
. Even if a function allows aColumnList
input of multiple types, the arguments must be all of one type; types cannot be mixed. For example, if a function allows ColumnList<INT | TEXT ENCODING DICT>
, one or more columns of either INTEGER or TEXT ENCODING DICT can be used as inputs, but all must be either INT columns or TEXT ENCODING DICT columns.
All HEAVY.AI system table functions allow you to specify argument either in conventional comma-separated form in the order specified by the table function signature, or alternatively via a key-value map where input argument names are mapped to argument values using the =>
token. For example, the following two calls are equivalent:
For performance reasons, particularly when table functions are used as actual tables in a client like Heavy Immerse, many system table functions in HEAVY.AI automatically "push down" filters on certain output columns in the query onto the inputs. For example, if a table does some computation over an x
and y
range such that x
and y
are in both the input and output for the table function, filter push-down would likely be enabled so that a query like the following would automatically push down the filter on the x and y outputs to the x and y inputs. This potentially increases query performance significantly.
To determine whether filter push-down is used, you can check the Boolean value of the filter_table_transpose
column from the query:
Currently for system table functions, you cannot change push-down behavior.
You can query which table functions are available using SHOW TABLE FUNCTIONS
:
Information about the expected input and output argument names and types, as well as other info such as whether the function can run on CPU, GPU or both, and whether filter push-down is enabled, can be queried via SHOW TABLE FUNCTIONS DETAILS <table_function_name
>;
The following system table functions are available in HEAVY.AI. The table provides a summary and links to more inforamation about each function.
For information about the HeavyRF radio frequency propagation simulation and HeavyRF table functions, see HeavyRF.
The TABLE
command is required to wrap a table function clause; for example:
select * from TABLE(generate_series(1, 10));
The CURSOR
command is required to wrap any subquery inputs.
Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type
) across all points in each bin as the output value for the bin. A Gaussian average is then taken over the neighboring bins, with the number of bins specified by neighborhood_fill_radius
, optionally only filling in null-valued bins if fill_only_nulls
is set to true.
The graph shortest path is then computed between an origin point on the grid specified by origin_x
and origin_y
and a destination point on the grid specified by destination_x
and destination_y
, where the shortest path is weighted by the nth exponent of the computed slope between a bin and its neighbors, with the nth exponent being specified by slope_weighted_exponent
. A max allowed traversable slope can be specified by slope_pct_max
, such that no traversal is considered or allowed between bins with absolute computed slopes greater than the percentage specified by slope_pct_max
.
Input Arguments
Output Columns
Given a distance-weighted directed graph, consisting of a queryCURSOR
input consisting of the starting and ending node for each edge and a distance, and a specified origin and destination node, tf_graph_shortest_path
computes the shortest distance-weighted path through the graph between origin_node
and destination_node
, returning a row for each node along the computed shortest path, with the traversal-ordered index of that node and the cumulative distance from the origin_node
to that node. If either origin_node
or destination_node
do not exist, an error is returned.
Input Arguments
Parameter | Description | Data Types |
---|---|---|
Output Columns
Name | Description | Data Types |
---|---|---|
Example A
Example B
Similar to tf_geo_rasterize
, but also computes the slope and aspect per output bin.
Aggregates point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type
) across all points in each bin as the output value for the bin. A Gaussian average is then taken over the neighboring bins, with the number of bins specified by neighborhood_fill_radius
, optionally only filling in null-valued bins if fill_only_nulls
is set to true. The slope and aspect is then computed for every bin, based on the z values of that bin and its neighboring bins. The slope can be returned in degrees or as a fraction between 0 and 1, depending on the boolean argument to compute_slope_in_degrees
.
Note that the bounds of the spatial output grid will be bounded by the x/y range of the input query, and if SQL filters are applied on the output of the tf_geo_rasterize_slope
table function, these filters will also constrain the output range.
Parameter | Description | Data Types |
---|---|---|
Example
Computes the Mandelbrot set over the complex domain [x_min
, x_max
), [y_min
, y_max
), discretizing the xy-space into an output of dimensions x_pixels
X y_pixels
. The output for each cell is the number of iterations needed to escape to infinity, up to and including the specified max_iterations
.
Parameter | Data Type |
---|---|
Example
Given a query input of entity keys/IDs (for example, airplane tail numbers), a set of feature columns (for example, airports visited), and a metric column (for example number of times each airport was visited), scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.
Parameter | Description | Data Type |
---|---|---|
Example
Given a distance-weighted directed graph, consisting of a queryCURSOR
input consisting of the starting and ending node for each edge and a distance, and a specified origin node, tf_graph_shortest_paths_distances
computes the shortest distance-weighted path distance between the origin_node
and every other node in the graph. It returns a row for each node in the graph, with output columns consisting of the input origin_node
, the given destination_node
, the distance for the shortest path between the two nodes, and the number of edges or graph "hops" between the two nodes. If origin_node
does not exist in the node1
column of the edge_list
CURSOR
, an error is returned.
Input Arguments
Parameter | Description | Data Types |
---|---|---|
Output Columns
Name | Description | Data Types |
---|---|---|
Example A
Example B
Process a raster input to derive contour lines or regions and output as LINESTRING or POLYGON for rendering or further processing. Each has two variants:
One that re-rasterizes the input points ()
One which accepts raw raster points directly ()
Use the rasterizing variants if the raster table rows are not already sorted in row-major order (for example, if they represent an arbitrary 2D point cloud), or if filtering or binning is required to reduce the input data to a manageable count (to speed up the contour processing) or to smooth the input data before contour processing. If the input rows do not already form a rectilinear region, the output region will be their 2D bounding box. Many of the parameters of the rasterizing variant are directly equivalent to those of ; see that function for details.
The direct variants require that the input rows represent a rectilinear region of pixels in nonsparse row-major order. The dimensions must also be provided, and (raster_width * raster_height) must match the input row count. The contour processing is then performed directly on the raster values with no preprocessing.
The line variants generate LINESTRING geometries that represent the contour lines of the raster space at the given interval with the optional given offset. For example, a raster space representing a height field with a range of 0.0 to 1000.0 will likely result in 10 or 11 lines, each with a corresponding contour_values
value, 0.0, 100.0, 200.0 etc. If contour_offset
is set to 50.0, then the lines are generated at 50.0, 150.0, 250.0, and so on. The lines can be open or closed and can form rings or terminate at the edges of the raster space.
The polygon variants generate POLYGON geometries that represent regions between contour lines (for example from 0.0 to 100.0), and from 100.0 to 200.0. If the raster space has multiple regions with that value range, then a POLYGON row is output for each of those regions. The corresponding contour_values
value for each is the lower bound of the range for that region.
Loads one or more las
or laz
point cloud/LiDAR files from a local file or directory source, optionally tranforming the output SRID to out_srs
(if not specified, output points are automatically transformed to EPSG:4326 lon/lat pairs).
If use_cache
is set to true
, an internal point cloud-specific cache will be used to hold the results per input file, and if queried again will significantly speed up the query time, allowing for interactive querying of a point cloud source. If the results of tf_load_point_cloud
will only be consumed once (for example, as part of a CREATE TABLE
statement), it is highly recommended that use_cache
is set to false
or left unspecified (as it is defaulted to false
) to avoid the performance and memory overhead incurred by used of the cache.
The bounds of the data retrieved can be optionally specified with the x_min
, x_max
, y_min
, y_max
arguments. These arguments can be useful when the user desires to retrieve a small geographic area from a large point-cloud file set, as files containing data outside the bounds of the specified bounding box will be quickly skipped by tf_load_point_cloud
, only requiring a quick read of the spatial metadata for the file.
Input Arguments
Parameter | Description | Data Types |
---|
Output Columns
Example A
Example B
Parameter | Description | Data Types |
---|---|---|
Name | Description | Data Types |
---|---|---|
Function | Purpose |
---|---|
Parameter | Description | Data Types |
---|---|---|
Name | Description | Data Types |
---|---|---|
Parameter | Data Type |
---|---|
Parameter | Data Type |
---|---|
Parameter | Data Type |
---|---|
Name | Description | Data Types |
---|---|---|
Parameter | Description | Data Types |
---|
Name | Description | Data Types |
---|
Name | Description | Data Types |
---|
x
X-coordinate column or expression
Column<FLOAT | DOUBLE>
y
Y-coordinate column or expression
Column<FLOAT | DOUBLE>
z
Z-coordinate column or expression. The output bin is computed as the maximum z-value for all points falling in each bin.
Column<FLOAT | DOUBLE>
agg_type
The aggregate to be performed to compute the output z-column. Should be one of 'AVG'
, 'COUNT'
, 'SUM',
'MIN'
, or 'MAX'.
TEXT ENCODING NONE
fill_agg_type
(optional)
The aggregate to be performed when computing the blur pass on the output bins. Should be one of 'AVG'
, 'COUNT'
, 'SUM'
, 'MIN'
, 'MAX'
, ' 'AVG', 'COUNT', 'SUM',
'GAUSS_AVG'
, or 'BOX_AVG'
. Note that AVG
is synonymous with GAUSS_AVG
in this context, and the default fill_agg_type
if not specified is GAUSS_AVG
.
TEXT ENCODING NONE
bin_dim_meters
The width and height of each x/y bin in meters. If geographic_coords
is not set to true, the input x/y units are already assumed to be in meters.
DOUBLE
geographic_coords
If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.
BOOLEAN
neighborhood_fill_radius
The radius in bins to compute the box blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius
bins.
DOUBLE
fill_only_nulls
Specifies that the box blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).
BOOLEAN
x_min
(optional)
Min x-coordinate value (in input units) for the spatial output grid.
DOUBLE
x_max
(optional)
Max x-coordinate value (in input units) for the spatial output grid.
DOUBLE
y_min
(optional)
Min y-coordinate value (in input units) for the spatial output grid.
DOUBLE
y_max
(optional)
Max y-coordinate value (in input units) for the spatial output grid.
DOUBLE
x
The x-coordinates for the centroids of the output spatial bins.
Column<FLOAT | DOUBLE> (same as input x-coordinate column/expression)
y
The y-coordinates for the centroids of the output spatial bins.
Column<FLOAT | DOUBLE> (same as input y-coordinate column/expression)
z
The maximum z-coordinate of all input data assigned to a given spatial bin.
Column<FLOAT | DOUBLE> (same as input z-coordinate column/expression)
node1
Origin node column in directed edge list CURSOR
Column< INT | BIGINT | TEXT ENCODED DICT>
node2
Destination node column in directed edge list CURSOR
Column< INT | BIGINT | TEXT ENCODED DICT> (must be the same type as node1
)
distance
Distance between origin and destination node in directed edge list CURSOR
Column< INT | BIGINT | FLOAT | DOUBLE >
origin_node
The origin node to start graph traversal from. If not a value present in edge_list.node1
, will cause empty result set to be returned.
BIGINT | TEXT ENCODED DICT
destination_node
The destination node to finish graph traversal at. If not a value present in edge_list.node1
, will cause empty result set to be returned.
BIGINT | TEXT ENCODED DICT
path_step
The index of this node along the path traversal from origin_node
to destination_node
, with the first node (the origin_node)
indexed as 1.
Column< INT >
node
The current node along the path traversal from origin_node
to destination_node
. The first node (as denoted by path_step
= 1) will always be the input origin_node
, and the final node (as denoted by MAX(path_step)
) will always be the input destination_node
.
Column < INT | BIGINT | TEXT ENCODED DICT> (same type as the node1
and node2
input columns)
cume_distance
The cumulative distance adding all input distance
values from the origin_node
to the current node.
Column < INT | BIGINT | FLOAT | DOUBLE> (same type as the distance
input column)
Generates random string data.
Generates a series of integer values.
Generates a series of timestamp values from start_timestamp
to end_timestamp
.
Given a query input with entity keys and timestamps, and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session.
Given a query input of entity keys/IDs, a set of feature columns, and a metric column, scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.
Given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity. The score is computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.
Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, with taking the maximum z value across all points in each bin as the output value for the bin. The aggregate performed to compute the value for each bin is specified by agg_type
, with allowed aggregate types of AVG
, COUNT
, SUM
, MIN
, and MAX
.
Similar to tf_geo_rasterize
, but also computes the slope and aspect per output bin. Aggregates point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type
) across all points in each bin as the output value for the bin.
Given a distance-weighted directed graph, consisting of a queryCURSOR
input consisting of the starting and ending node for each edge and a distance, and a specified origin and destination node, computes the shortest distance-weighted path through the graph between origin_node
and destination_node
.
Given a distance-weighted directed graph, consisting of a queryCURSOR
input consisting of the starting and ending node for each edge and a distance, and a specified origin node, computes the shortest distance-weighted path distance between the origin_node
and every other node in the graph.
Loads one or more las
or laz
point cloud/LiDAR files from a local file or directory source, optionally tranforming the output SRID to out_srs
. If not specified, output points are automatically transformed to EPSG:4326 lon/lat pairs).
Computes the Mandelbrot set over the complex domain [x_min
, x_max
), [y_min
, y_max
), discretizing the xy-space into an output of dimensions x_pixels
X y_pixels
.
Returns metadata for one or more las
or laz
point cloud/LiDAR files from a local file or directory source, optionally constraining the bounding box for metadata retrieved to the lon/lat bounding box specified by the x_min
, x_max
, y_min
, y_max
arguments.
Process a raster input to derive contour lines or regions and output as LINESTRING or POLYGON for rendering or further processing.
Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type
) across all points in each bin as the output value for the bin.
Used for generating top-k signals where 'k' represents the maximum number of antennas to consider at each geographic location. The full relevant parameter name is strongest_k_sources_per_terrain_bin.
Taking a set of point elevations and a set of signal source locations as input, tf_rf_prop_max_signal
executes line-of-sight 2.5D RF signal propagation from the provided sources over a binned 2.5D elevation grid derived from the provided point locations, calculating the max signal in dBm at each grid cell, using the formula for free-space power loss.
x
Input x-coordinate column or expression.
Column<FLOAT | DOUBLE>
y
Input y-coordinate column or expression.
Column<FLOAT | DOUBLE>
z
Input z-coordinate column or expression. The output bin is computed as the maximum z-value for all points falling in each bin.
Column<FLOAT | DOUBLE>
agg_type
The aggregate to be performed to compute the output z-column. Should be one of 'AVG'
, 'COUNT'
, 'SUM',
'MIN'
, or 'MAX'.
TEXT ENCODING NONE
bin_dim_meters
The width and height of each x/y bin in meters. If geographic_coords
is not set to true, the input x/y units are already assumed to be in meters.
DOUBLE
geographic_coords
If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.
BOOLEAN
neighborhood_fill_radius
The radius in bins to compute the box blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius
bins.
BIGINT
fill_only_nulls
Specifies that the box blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).
BOOLEAN
compute_slope_in_degrees
If true, specifies the slope should be computed in degrees (with 0 degrees perfectly flat and 90 degrees perfectly vertical). If false, specifies the slope should be computed as a fraction from 0 (flat) to 1 (vertical). In a future release, we are planning to move the default output to percentage slope.
BOOLEAN
x
The x-coordinates for the centroids of the output spatial bins.
Column<FLOAT | DOUBLE> (same as input x column/expression)
y
The y-coordinates for the centroids of the output spatial bins.
Column<FLOAT | DOUBLE> (same as input y column/expression)
z
The maximum z-coordinate of all input data assigned to a given spatial bin.
Column<FLOAT | DOUBLE> (same as input z column/expression)
slope
The average slope of an output grid cell (in degrees or a fraction between 0 and 1, depending on the argument to compute_slope_in_degrees
).
Column<FLOAT | DOUBLE> (same as input z column/expression)
aspect
The direction from 0 to 360 degrees pointing towards the maximum downhill gradient, with 0 degrees being due north and moving clockwise from N (0°) -> NE (45°) -> E (90°) -> SE (135°) -> S (180°) -> SW (225°) -> W (270°) -> NW (315°).
Column<FLOAT | DOUBLE> (same as input z column/expression)
x_pixels
32-bit integer
y_pixels
32-bit integer
x_min
DOUBLE
x_max
DOUBLE
y_min
DOUBLE
y_max
DOUBLE
max_iterations
32-bit integer
x_pixels
32-bit integer
y_pixels
32-bit integer
x_min
DOUBLE
x_max
DOUBLE
y_min
DOUBLE
y_max
DOUBLE
max_iterations
32-bit integer
x_pixels
32-bit integer
y_pixels
32-bit integer
x_min
DOUBLE
x_max
DOUBLE
y_min
DOUBLE
y_max
DOUBLE
max_iterations
32-bit integer
x_pixels
32-bit integer
y_pixels
32-bit integer
x_min
DOUBLE
x_max
DOUBLE
y_min
DOUBLE
y_max
DOUBLE
max_iterations
32-bit integer
primary_key
Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function computes co-similarity. Examples include countries, census block groups, user IDs of website visitors, and aircraft callsigns.
Column<TEXT ENCODING DICT | INT | BIGINT>
pivot_features
One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by primary_key
based on whether they visited the same census block group in the same hour. If a single census block group feature column is used, the primary_key
entities would be compared only by the census block groups visited, regardless of time overlap.
Column<TEXT ENCODING DICT | INT | BIGINT>
metric
Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is COUNT(*)
such that feature overlaps are weighted by the number of co-occurrences.
Column<INT | BIGINT | FLOAT | DOUBLE>
use_tf_idf
Boolean constant denoting whether TF-IDF weighting should be used in the cosine similarity score computation.
BOOLEAN
class1
ID of the first primary key
in the pair-wise comparison.
Column<TEXT ENCODING DICT | INT | BIGINT> (type is the same of primary_key
input column)
class2
ID of the second primary key
in the pair-wise comparison. Because the computed similarity score for a pair of primary keys
is order-invariant, results are output only for ordering such that class1
<= class2
. For primary keys of type TextEncodingDict
, the order is based on the internal integer IDs for each string value and not lexicographic ordering.
Column<TEXT ENCODING DICT | INT | BIGINT> (type is the same of primary_key
input column)
similarity_score
Computed cosine similarity score between each primary_key
pair, with values falling between 0 (completely dissimilar) and 1 (completely similar).
Column<Float>
x
Input x-coordinate column or expression of the data to be rasterized.
Column <FLOAT | DOUBLE>
y
Input y-coordinate column or expression of the data to be rasterized.
Column <FLOAT | DOUBLE> (must be the same type as x
)
z
Input z-coordinate column or expression of the data to be rasterized.
Column <FLOAT | DOUBLE>
agg_type
The aggregate to be performed to compute the output z-column. Should be one of 'AVG'
, 'COUNT'
, 'SUM',
'MIN'
, or 'MAX'.
TEXT ENCODING NONE
bin_dim
The width and height of each x/y bin . If geographic_coords
is true, the input x/y units will be translated to meters according to a local coordinate transform appropriate for the x/y bounds of the data.
DOUBLE
geographic_coords
If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.
BOOLEAN
neighborhood_bin_radius
The radius in bins to compute the gaussian blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius
bins.
BIGINT
fill_only_nulls
Specifies that the gaussian blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).
BOOLEAN
origin_x
The x-coordinate for the starting point for the graph traversal, in input (not bin) units.
DOUBLE
origin_y
The y-coordinate for the starting point for the graph traversal, in input (not bin) units.
DOUBLE
destination_x
The x-coordinate for the destination point for the graph traversal, in input (not bin) units.
DOUBLE
destination_y
The y-coordinate for the destination point for the graph traversal, in input (not bin) units.
DOUBLE
slope_weighted_exponent
The slope weight between neighboring raster cells will be weighted by the slope_weighted_exponent
power. A value of 1 signifies that the raw slopes between neighboring cells should be used, increasing this value from 1 will more heavily penalize paths that traverse steep slopes.
DOUBLE
slope_pct_max
The max absolute value of slopes (measured in percentages) between neighboring raster cells that will be considered for traversal. A neighboring graph cell with an absolute slope greater than this amount will not be considered in the shortest slope-weighted path graph traversal
DOUBLE
node1
Origin node column in directed edge list CURSOR
Column<INT | BIGINT | TEXT ENCODED DICT>
node2
Destination node column in directed edge list CURSOR
Column<INT | BIGINT | TEXT ENCODED DICT> (must be the same type as node1
)
distance
Distance between origin and destination node in directed edge list CURSOR
Column INT | BIGINT | FLOAT | DOUBLE>
origin_node
The origin node to start graph traversal from. If not a value present in edge_list.node1
, will cause empty result set to be returned.
BIGINT | TEXT ENCODED DICT
origin_node
Starting node in graph traversal. Always equal to input origin_node
.
Column <INT | BIGINT | TEXT ENCODED DICT> (same type as the node1
and node2
input columns)
destination_node
Final node in graph traversal. Will be equal to one of values of node2
input column.
Column <INT | BIGINT | TEXT ENCODED DICT> (same type as the node1
and node2
input columns)
distance
Cumulative distance between origin and destination node for shortest path graph traversal.
Column<INT | BIGINT | FLOAT | DOUBLE> (same type as the distance
input column)
num_edges_traversed
Number of edges (or "hops") traversed in the graph to arrive at destination_node
from origin_node
for the shortest path graph traversal between these two nodes.
Column <INT>
| Output geometries. | Column<LINESTRING | POLYGON> |
| Raster values associated with each contour geometry. | Column<FLOAT | DOUBLE> (will be the same type as value) |
| The path of the file or directory containing the las/laz file or files. Can contain globs. Path must be in | TEXT ENCODING NONE |
| EPSG code of the output SRID. If not specified, output points are automatically converted to lon/lat (EPSG 4326). | TEXT ENCODING NONE |
| If true, use internal point cloud cache. Useful for inline querying of the output of | BOOLEAN |
| Min x-coordinate value (in degrees) for the output data. | DOUBLE |
| Max x-coordinate value (in degrees) for the output data. | DOUBLE |
| Min y-coordinate value (in degrees) for the output data. | DOUBLE |
| Max y-coordinate value (in degrees) for the output data. | DOUBLE |
| Longitude value of raster point (degrees, SRID 4326). | Column<FLOAT | DOUBLE> |
| Latitude value of raster point (degrees, SRID 4326). | Column<FLOAT | DOUBLE> (must be the same as <lon>) |
| Raster band value from which to derive contours. | Column<FLOAT | DOUBLE> |
|
|
|
|
|
|
|
|
|
|
| Optionally flip resulting geometries in latitude (default FALSE). (This parameter may be removed in future releases) | BOOLEAN |
| Desired contour interval. The function will generate a line at each interval, or a polygon region that covers that interval. | FLOAT/DOUBLE (must be same type as value) |
| Optional offset for resulting intervals. | FLOAT/DOUBLE (must be same type as value) |
| Pixel width (stride) of the raster data. | INTEGER |
| Pixel height of the raster data. | INTEGER |
| Point x-coordinate | Column<DOUBLE> |
| Point y-coordinate | Column<DOUBLE> |
| Point z-coordinate | Column<DOUBLE> |
| Point intensity | Column<INT> |
| The ordered number of the return for a given LiDAR pulse. The first returns (lowest return numbers) are generally associated with the highest-elevation points for a LiDAR pulse, i.e. the forest canopy will generally have a lower | Column<TINYINT> |
| The total number of returns for a LiDAR pulse. Multiple returns occur when there are multiple objects between the LiDAR source and the lowest ground or water elevation for a location. | Column<TINYINT> |
| Column<TINYINT> |
| Column<TINYINT> |
| Column<SMALLINT> |
| Column<TINYINT> |
Returns metadata for one or more las
or laz
point cloud/LiDAR files from a local file or directory source, optionally constraining the bounding box for metadata retrieved to the lon/lat bounding box specified by the x_min
, x_max
, y_min
, y_max
arguments.
Note: specified path must be contained in global allowed-import-paths
, otherwise an error will be returned.
Input Arguments
Parameter | Description | Data Types |
---|---|---|
Output Columns
Name | Description | Data Types |
---|---|---|
Example
See
See
See
See
See
From the : "The scan direction flag denotes the direction at which the scanner mirror was traveling at the time of the output pulse. A bit value of 1 is a positive scan direction, and a bit value of 0 is a negative scan direction."
From the : "The edge of flight line data bit has a value of 1 only when the point is at the end of a scan. It is the last point on a given scan line before it changes direction."
From the : "The classification field is a number to signify a given classification during filter processing. The ASPRS standard has a public list of classifications which shall be used when mixing vendor specific user software."
From the : "The angle at which the laser point was output from the laser system, including the roll of the aircraft... The scan angle is an angle based on 0 degrees being NADIR, and –90 degrees to the left side of the aircraft in the direction of flight."
path
The path of the file or directory containing the las/laz file or files. Can contain globs. Path must be in allowed-import-paths
.
TEXT ENCODING NONE
x_min
(optional)
Min x-coordinate value for point cloud files to retrieve metadata from.
DOUBLE
x_max
(optional)
Max x-coordinate value for point cloud files to retrieve metadata from.
DOUBLE
y_min
(optional)
Min y-coordinate value for point cloud files to retrieve metadata from.
DOUBLE
y_max
(optional)
Max y-coordinate value for point cloud files to retrieve metadata from.
DOUBLE
file_path
Full path for the las or laz file
Column<TEXT ENCODING DICT>
file_name
Filename for the las or laz file
Column<TEXT ENCODING DICT>
file_source_id
File source id per file metadata
Column<SMALLINT>
version_major
LAS version major number
Column<SMALLINT>
version_minor
LAS version minor number
Column<SMALLINT>
creation_year
Data creation year
Column<SMALLINT>
is_compressed
Whether data is compressed, i.e. LAZ format
Column<BOOLEAN>
num_points
Number of points in this file
Column<BIGINT>
num_dims
Number of data dimensions for this file
Column<SMALLINT>
point_len
Not currently used
Column<SMALLINT>
has_time
Whether data has time value
COLUMN<BOOLEAN>
has_color
Whether data contains rgb color value
COLUMN<BOOLEAN>
has_wave
Whether data contains wave info
COLUMN<BOOLEAN>
has_infrared
Whether data contains infrared value
COLUMN<BOOLEAN>
has_14_point_format
Data adheres to 14-attribute standard
COLUMN<BOOLEAN>
specified_utm_zone
UTM zone of data
Column<INT>
x_min_source
Minimum x-coordinate in source projection
Column<DOUBLE>
x_max_source
Maximum x-coordinate in source projection
Column<DOUBLE>
y_min_source
Minimum y-coordinate in source projection
Column<DOUBLE>
y_max_source
Maximum y-coordinate in source projection
Column<DOUBLE>
z_min_source
Minimum z-coordinate in source projection
Column<DOUBLE>
z_max_source
Maximum z-coordinate in source projection
Column<DOUBLE>
x_min_4326
Minimum x-coordinate in lon/lat degrees
Column<DOUBLE>
x_max_4326
Maximum x-coordinate in lon/lat degrees
Column<DOUBLE>
y_min_4326
Minimum y-coordinate in lon/lat degrees
Column<DOUBLE>
y_max_4326
Maximum y-coordinate in lon/lat degrees
Column<DOUBLE>
z_min_4326
Minimum z-coordinate in meters above sea level (AMSL)
Column<DOUBLE>
z_max_4326
Maximum z-coordinate in meters above sea level (AMSL)
Column<DOUBLE>