arrow-left

Only this pageAll pages
gitbookPowered by GitBook
triangle-exclamation
Couldn't generate the PDF for 228 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

v8.3.0

Loading...

Overview

Loading...

Loading...

Installation and Configuration

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading and Exporting Data

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

SQL

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

System Requirements

Installing on Rocky Linux / RHEL

In this section you will find a recipe to install the HEAVY.AI platform on Red Hat and derivates like Rocky Linux.

Installing on Ubuntu

In this section, you will find recipes to install HEAVY.AI platform and NVIDIA drivers using package manager like apt or tarball.

Command Line

Security

Configuration Parameters

Installing on Docker

Installing HEAVY.AI on Docker

In this section you will find the recipes to install HEAVY.AI platform using Docker. We provide instructions for installing Docker and HEAVY.AI using an Ubuntu host machine. However, advanced users may install the CPU version of HEAVY.AI (EE/Free or Open Source) on any host system running Docker, or install the GPU (EE/Free or Open Source) version of HEAVY.AI on any Linux-based system with supported NVIDIA drivers and nvidia-docker packages (See Software Requirements).

Services and Utilities

Data Definition (DDL)

Data Manipulation (DML)

Configuration Parameters for HeavyIQ

Following are the configuration parameters for runtime settings for HeavyIQ.

Flag
Description
Default
Example

heavydb_host

The hostname/IP of the HeavyDB instance (Optional)

localhost

http://heavydb.example.com

Supported Data Sources

SQL Capabilities

ALTER SYSTEM CLEAR

Clear CPU, GPU, or RENDER memory. Available to super users only.

ALTER SYSTEM CLEAR (CPU|GPU|RENDER) MEMORY

hashtag
Examples

ALTER SYSTEM CLEAR CPU MEMORY
ALTER SYSTEM CLEAR GPU MEMORY
circle-info

Generally, the server handles memory management, and you do not need to use this command. If you are having unexpected memory issues, try clearing the memory to see if performance improves.

ALTER SYSTEM CLEAR RENDER MEMORY

Software Requirements

  • Operating Systems

HEAVY Version
Bare Metal
Hosts / Docker
Information

8.x

  • Additional Components

    • OpenJDK version 8 or higher

    • EPEL

HEAVY.AI Version
Minimum NVIDIA Driver
  • Up to date Vulkan drivers

  • Supported web browsers (Enterprise Edition, Immerse). Latest stable release of:

    • Chrome

Installation

You can download the latest version of HEAVY.AI for your preferred platform by reaching our support sitearrow-up-right or referring to the links provided in the instructions.

The CPU (no GPUs) install does not support backend rendering. For example, Pointmap and Scatterplot charts are not available. The GPU install supports all chart types.

The Open Source options do not require a license, and omit HeavyImmerse.

  • Docker

DELETE

Deletes rows that satisfy the WHERE clause from the specified table. If the WHERE clause is absent, all rows in the table are deleted, resulting in a valid but empty table.

DELETE FROM table_name [ * ] [ [ AS ] alias ]
[ WHERE condition ]

hashtag
Cross-Database Queries

In Release 6.4 and higher, you can run DELETE queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases.

To execute queries against another database, you must have ACCESS privilege on that database, as well as DELETE privilege.

hashtag
Example

Delete rows from a table in the my_other_db database:

Overview

HEAVY.AI is an analytics platform designed to handle very large datasets. It leverages the processing power of GPUs alongside traditional CPUs to achieve very high performance. HEAVY.AI combines an open-source SQL engine (), server-side rendering (), and web-based data visualization () to provide a comprehensive platform for data analysis.

hashtag
HeavyDB

The foundation of the platform is HeavyDB, an open-source, GPU-accelerated database. HeavyDB harnesses GPU processing power and returns SQL query results in milliseconds, even on tables with billions of rows. HeavyDB delivers high performance with rapid query compilation, query vectorization, and advanced memory management.

Free Version

HEAVY.AI Free is a full-featured version of the HEAVY.AI platform available at no cost for non-hosted commercial use.

HEAVY.AI Free includes access to the following:

  • Up to 32GB of RAM

  • Supports 1 GPU

Uninstalling

This is a recipe to permanently remove HEAVY.AI Software, services, and data from your system.

hashtag
Uninstalling HEAVY.AI from Docker

To uninstall HEAVY.AI in Docker, stop and delete the current Docker container.

In a terminal window, get the Docker container ID:

Ports

HEAVY.AI uses the following ports.

Port
Service
Use

Upgrading

In this section, you will find recipes to upgrade from the OmniSci to the HEAVY.AI platform and upgrade between versions of the HEAVY.AI platform.

hashtag
Supported Upgrade Path

The following table shows the steps needed to move from one version to a later one.

Initial Version

Views

DDL - Views

A view is a virtual table based on the result set of a SQL statement. It derives its fields from a SELECT statement. You can do anything with a HEAVY.AI view query that you can do in a non-view HEAVY.AI query.

hashtag
Nomenclature Constraints

View object names must use the NAME format, described in notation as:

LIKELY/UNLIKELY

Usage Notes

KILL QUERY

Interrupt a queued query. Specify the query by using its session ID.

To see the queries in the queue, use the command:

To interrupt the last query in the list (ID 946-ooNP):

Showing the queries again indicates that 946-ooNP has been deleted:

circle-info

Comment

Adds a comment or removes an existing comment for an existing table or column object.

hashtag
COMMENT

Create or remove a comment for a TABLE or COLUMN object of name object_name. The comment must be a string literal or NULL

Logical Operators and Conditional and Subquery Expressions

hashtag
Logical Operator Support

Operator
Description

UPDATE

Changes the values of the specified columns based on the assign argument (identifier=expression) in all rows that satisfy the condition in the WHERE clause.

hashtag
Example

Policies

You can use policies to provide row-level security (RLS) in HEAVY.AI.

hashtag
CREATE POLICY

Create an RLS policy for a user or role (<name>); admin rights are required. All queries on the table for the user or role are automatically filtered to include only rows where the column contains any one of the values from the VALUES clause.

RLS filtering works similarly to a

ALTER SESSION SET

Change a parameter value for the current session.

Paremeter name
Values

Up to 3 active users

  • Advanced Analytics

  • Rendering Engine

  • Immerse Dashboards

  • HeavyDB

  • Sharing

  • Support access through the HEAVY.AI Communityarrow-up-right

  • To get started with HEAVY.AI Free:

    1. Go to Get Started with HEAVY.AIarrow-up-right, and in the HEAVY.AI Free section, click Free License.

    2. On the Get HEAVY.AI Free page, enter the requested information, agree to the HEAVY.AI EULA and the HeavyIQ EULA Addendum, then click I Agree.

    3. Open the Your HEAVY.AI Free License Key email to view and download the free edition license key. You will need this license key to run HEAVY.AI after you install it.

    4. In the Download HEAVY.AI section of the Free License Key email, click See Install Options to select the best version of HEAVY.AI for your hardware and software configuration. Follow the instructions for the download or cloud version you choose.

    5. Under the Install / Configure section of the Free License Key email, click the Installation Guide link and follow our documentation to install HEAVY.AI and get your environment up and running.

    hashtag
    Add Users

    You can create additional HEAVY.AI users to collaborate with.

    1. Connect to Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.

    2. Open the SQL Editor.

    3. Use the CREATE USER command to create a new user. For information on syntax and options, see CREATE USER.

    wget or curl

  • Kernel headers

  • Kernel development packages

  • log4j 2.15.0 or higher

  • NVIDIA hardware and software (for GPU installs only)

    • Hardware: Ampere, Turing, Volta, or Pascal series GPU cards. HEAVY.AI recommends that each GPU card in a server or distributed environment be of the same series.

    • NOTE: The recommended NVIDIA driver for all supported HEAVY.AI versions is 535.

    • Software (Run nvidia-smi to determine the currently running driver):

  • FireFox

  • Safari version 15.x or higher

  • Ubuntu 22.04

    Redhat, CentOS, Ubuntu / Ubuntu 22.04

    Ubuntu 22.04 - EOS: June 2027 https://wiki.ubuntu.com/Releasesarrow-up-right

    7.x

    Ubuntu 20.04

    Redhat, CentOS, Ubuntu / Ubuntu 20.04

    Ubuntu 20.04 - EOS: April 2025 https://wiki.ubuntu.com/Releasesarrow-up-right

    Cento OS 7 EOL - June 30, 2024 https://www.redhat.com/en/topics/linux/centos-linux-eolarrow-up-right

    6.x

    Ubuntu 18.04 CentOS 7

    Redhat, CentOS, Ubuntu / Ubuntu 18.04

    Ubuntu 18.04 - EOS - June 2023 https://wiki.ubuntu.com/Releasesarrow-up-right

    CentOS 8 EOL - December 31, 2021 https://www.redhat.com/en/events/webinar/centos-linux-reaching-its-end-life-now-whatarrow-up-right

    8.x

    535

    7.x

    520

    Ubuntu
    Rocky Linux / RHEL
    AWS
    GCP
    Azure
    Jupyter
    Upgrading
    Uninstalling

    Used by connectors (heavyai, omnisql, odbc, and jdbc) to access the more efficient Thrift API.

    6276

    heavy_web_server

    Used to access the HTTP/JSON thrift API.

    6278

    heavydb http

    Used to directly access the HTTP/binary thrift API, without having to proxy through heavy_web_server. Recommended for debugging use only.

    6273

    heavy_web_server

    Used to access Heavy Immerse.

    6274

    heavydb tcp

    NOT

    Negates value

    OR

    Logical OR

    hashtag
    Conditional Expression Support

    Expression
    Description

    CASE WHEN condition THEN result ELSE default END

    Case operator

    COALESCE(val1, val2, ..)

    Returns the first non-null value in the list

    circle-info

    Geospatial and array column projections are not supported in the COALESCE function and CASE expressions.

    hashtag
    Subquery Expression Support

    Expression
    Description

    expr IN (subquery or list of values)

    Evaluates whether expr equals any value of the IN list.

    expr NOT IN (subquery or list of values)

    Evaluates whether expr does not equal any value of the IN list.

    hashtag
    Usage Notes

    • You can use a subquery anywhere an expression can be used, subject to any runtime constraints of that expression. For example, a subquery in a CASE statement must return exactly one row, but a subquery can return multiple values to an IN expression.

    • You can use a subquery anywhere a table is allowed (for example, FROM subquery), using aliases to name any reference to the table and columns returned by the subquery.

    AND

    Logical AND

    DELETE FROM my_other_db.customers WHERE id > 100;
    • KILL QUERY is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt) is set.

    • Interrupting a query in ‘PENDING_QUEUE’ status is supported in both distributed and single-server mode.

    • To enable query interrupt for tables imported from data files in local storage, set enable_non_kernel_time_query_interrupt to TRUE. (It is enabled by default.)

    show queries;
    query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
    713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
    491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
    451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
    190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
    720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU
    947-ooNP        |RUNNING_IMPORTER    |0          |2021-08-03 ...|IMPORT_GEO_TABLE|Rio       |tcp:::ffff:127.0.0.1:47314|omnisci|CPU
    SHOW QUERIES
    You should see an output similar to the following. The first entry is the container ID. In this example, it is 9e01e520c30c:
    circle-info

    To see all containers, both running and stopped, use the following command:

    Stop the HEAVY.AI Docker container. For example:

    Remove the HEAVY.AI Docker container to save disk space. For example:

    hashtag
    Uninstalling HEAVY.AI on Redhat and Ubuntu

    To uninstall an existing system installed with Yum, Apt, or Tarball connect using the user that runs the platform, typically heavyai.

    Disable and stop all HEAVY.AI services.

    Remove the HEAVY.AI Installation files. (the $HEAVYAI_PATH defaults to /opt/heavyai)

    Delete the configuration files and the storage removing the $HEAVYAI_BASE directory. (defaults to /var/lib/heavyai)

    Remove permanently the configuration of the services.

    hashtag
    CREATE VIEW

    Creates a view based on a SQL statement.

    hashtag
    Example

    You can describe the view as you would a table.

    You can query the view as you would a table.

    hashtag
    DROP VIEW

    Removes a view created by the CREATE VIEW statement. The view definition is removed from the database schema, but no actual data in the underlying base tables is modified.

    hashtag
    Example

    regexarrow-up-right
    SQL normally assumes that terms in the WHERE clause that cannot be used by indices are usually true. If this assumption is incorrect, it could lead to a suboptimal query plan. Use the LIKELY(X) and UNLIKELY(X) SQL functions to provide hints to the query planner about clause terms that are probably not true, which helps the query planner to select the best possible plan.

    Use LIKELY/UNLIKELY to optimize evaluation of OR/AND logical expressions. LIKELY/UNLIKELY causes the left side of an expression to be evaluated first. This allows the right side of the query to be skipped when possible. For example, in the clause UNLIKELY(A) AND B, if A evaluates to FALSE, B does not need to be evaluated.

    Consider the following:

    If x is one of the values 7, 8, 9, or 10, the filter y > 42 is applied. If x is not one of those values, the filter y > 42 is not applied.

    Expression

    Description

    LIKELY(X)

    Provides a hint to the query planner that argument X is a Boolean value that is usually true. The planner can prioritize filters on the value X earlier in the execution cycle and return results more efficiently.

    UNLIKELY(X)

    Provides a hint to the query planner that argument X is a Boolean value that is usually not true. The planner can prioritize filters on the value X later in the execution cycle and return results more efficiently.

    . If
    NULL
    , the comment is removed. Only super-users or owners of the object can modify comments on the object.

    Column and table comments can be viewed either in the information_schema system tables, or in the result of the SHOW CREATE TABLE command run on the relevant table.

    circle-info

    Currently comments are not supported with the CREATE TABLE command, and COMMENT ON is the canonical means to set or unset comments.

    hashtag
    Examples

    1. Create a table and add a comments to it.

    circle-info

    When specifying the name of the COLUMN object, it must be of the form <TABLE>.COLUMN> to uniquely identify it.

    1. Show the comments and the DDL of the table.

    circle-info

    Currently COMMENT ON is supported only on tables and and columns of that table. Other objects such as VIEW are not currently supported.

    1. View the table and column comment in respective system table.

    WHERE column = value
    clause, appended to every query or subquery on the table, would work. If policies on multiple columns in the same table are defined for a user or role, then a row is visible to that user or role if any one or more of the policies matches that row.

    hashtag
    DROP POLICY

    Drop an RLS policy for a user or role (<name>); admin rights are required. All values specified for the column by the policy are dropped. Effective values from another policy on an inherited role are not dropped.

    hashtag
    SHOW POLICIES

    Displays a list of all RLS policies that exist for a user or role. If EFFECTIVE is used, the list also include any policies that exist for all roles that apply to the requested user or role.

    CREATE POLICY ON COLUMN table.column TO <name> VALUES ('string', 123, ...);
    hashtag
    Alter Session Examples

    hashtag
    CURRENT_DATABASE

    Switch to another database without need of re-login.

    Your session will silently switch to the requested database.

    The database exists, but the user does not have access to it:

    The database does not exist:

    hashtag
    EXECUTOR_DEVICE

    Force the session to run the subsequent SQL commands in CPU mode:

    Switch back the session to run in GPU mode

    EXECUTOR_DEVICE

    CPU - Set the session to CPU execution mode:

    ALTER SESSION SET EXECUTOR_DEVICE='CPU'; GPU - Set the session to GPU execution mode:

    ALTER SESSION SET EXECUTOR_DEVICE='GPU'; NOTE: These parameter values have the same effect as the \cpu and \gpu commands in heavysql, but can be used with any tool capable of running sql commands.

    CURRENT_DATABASE

    Can be set to any string value.

    If the value is a valid database name, and the current user has access to it, the session switches to the new database. If the user does not have access or the database does not exist, an error is returned and the session will fall back to the starting database.

    kill query '946-ooNP'
    show queries;
    query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
    713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
    491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
    451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
    190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
    720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU
    sudo docker container ps -a
    sudo yum remove heavyai.x86_64
    sudo apt remove heavyai
    sudo rm -r $(readlink $HEAVYAI_PATH) $HEAVYAI_PATH
    sudo docker container ps --format "{{.Id}} {{.Image}}" \
    -f status=running | grep heavyai\/
    9e01e520c30c omnisci/omnisci-ee-gpu
    sudo docker container stop 9e01e520c30c
    sudo docker container rm 9e01e520c30c
    sudo systemctl disable heavy_web_server --now
    sudo systemctl disable heavydb --now
    sudo rm  -r $HEAVYAI_BASE
    sudo rm /lib/systemd/heavydb*.service
    sudo rm /lib/systemd/heavy_web_server*.service
    sudo systemctl daemon-reload
    sudo systemctl reset-failed
    [A-Za-z_][A-Za-z0-9\$_]*
    CREATE VIEW view_movies
    AS SELECT movies.movieId, movies.title, movies.genres, avg(ratings.rating)
    FROM ratings
    JOIN movies on ratings.movieId=movies.movieId
    GROUP BY movies.title, movies.movieId, movies.genres;
    \d view_movies
    VIEW defined AS: SELECT  movies.movieId, movies.title, movies.genres,
    avg(ratings.rating) FROM ratings JOIN movies ON ratings.movieId=movies.movieId
    GROUP BY movies.title, movies.movieId, movies.genres
    Column types:
        movieId INTEGER,
        title TEXT ENCODING DICT(32),
        genres TEXT ENCODING DICT(32),
        EXPR$3 DOUBLE
    SELECT title, EXPR$3 from view_movies where movieId=260;
    Star Wars: Episode IV - A New Hope (1977)|4.048937
    DROP VIEW IF EXISTS v_reviews;
    SELECT COUNT(*) FROM test WHERE UNLIKELY(x IN (7, 8, 9, 10)) AND y > 42;
    COMMENT ON (TABLE | COLUMN) <object_name> IS (<string_literal> | NULL);
    CREATE TABLE employees (id INT, salary BIGINT);
    -- Add a comment to the 'employees' table
    COMMENT ON TABLE employees IS 'This table stores employee information';
    -- Add a comment to the 'salary' column
    COMMENT ON COLUMN employees.salary IS 'Stores the salary of the employee';
    SHOW CREATE TABLE employees;
    
    CREATE TABLE employees /* This table stores employee information */ (
      id INTEGER,
      salary BIGINT /* Stores the salary of the employee */);
    1 rows returned.
    -- Connect to information_schema database
    \c information_schema admin XXXXXXXX
    
    -- Select subset of columns from the tables system table
    SELECT table_id,table_name,"comment" FROM tables where table_name = 'employees';
    
    -- Returns one result for the table comment
    table_id|table_name|comment
    5|employees|This table stores employee information
    1 rows returned.
    
    -- Select subset of columns from the columns system table
    SELECT table_id,table_name,column_id,column_name,"comment" FROM columns where table_name = 'employees';
    
    -- Returns two results, one of the columns has no comment.
    table_id|table_name|column_id|column_name|comment
    5|employees|1|id|NULL
    5|employees|2|salary|Stores the salary of the employee
    2 rows returned.
    DROP POLICY ON COLUMN table.column FROM <name>;
    SHOW [EFFECTIVE] POLICIES <name>;
    ALTER SESSION SET <parameter_name>=<parameter_value>
    ALTER SESSION SET CURRENT_DATABASE='owned_database'; 
    ALTER SESSION SET CURRENT_DATABASE='information_schema';
    TException - service has thrown: TDBException(error_msg=Unauthorized access: 
    user test is not allowed to access database information_schema.)
    ALTER SESSION SET CURRENT_DATABASE='not_existent_db'; 
    TException - service has thrown: TDBException(error_msg=Database name 
    not_existent_db does not exist.)
    ALTER SESSION SET EXECUTOR_DEVICE='CPU';
    ALTER SESSION SET EXECUTOR_DEVICE='GPU';

    hashtag
    Native SQL

    With native SQL support, HeavyDB returns query results hundreds of times faster than CPU-only analytical database platforms. Use your existing SQL knowledge to query data. You can use the standalone SQL engine with the command line, or the SQL editor that is part of the Heavy Immerse visual analytics interface. Your SQL query results can output to Heavy Immerse or to third-party software such as Birst, Power BI, Qlik, or Tableau.

    hashtag
    Geospatial Data

    HeavyDB can store and query data using native Open Geospatial Consortium (OGC) types, including POINT, LINESTRING, POLYGON, and MULTIPOLYGON. With geo type support, you can query geo data at scale using special geospatial functions. Using the power of GPU processing, you can quickly and interactively calculate distances between two points and intersections between objects.

    hashtag
    Open Source

    HeavyDB is open source and encourages contribution and innovation from a global community of users. It is available on Githubarrow-up-right under the Apache 2.0 license, along with components like a Python interface (heavyai) and JavaScript infrastructure (mapd-connector, mapd-charting), making HEAVY.AI the leader in open-source analytics.

    hashtag
    HeavyRender

    HeavyRender works on the server side, using GPU buffer caching, graphics APIs, and a Vega-based interface to generate custom pointmaps, heatmaps, choropleths, scatterplots, and other visualizations. HEAVY.AI enables data exploration by creating and sending lightweight PNG images to the web browser, avoiding high-volume data transfers. Fast SQL queries make metadata in the visualizations appear as if the data exists on the browser side.

    Network bandwidth is a bottleneck for complex chart data, so HEAVY.AI uses in-situ rendering of on-GPU query results to accelerate visual rendering. This differentiates HEAVY.AI from systems that execute queries quickly but then transfer the results to the client for rendering, which slows performance.

    hashtag
    Geospatial Analysis

    Efficient geospatial analysis requires fast data-rendering of complex shapes on a map. HEAVY.AI can import and display millions of lines or polygons on a geo chart with minimal lag time. Server-side rendering technology prevents slowdowns associated with transferring data over the network to the client. You can select location shapes down to a local level, like census tracts or building footprints, and cross-filter interactively.

    hashtag
    Visualize with Vega

    Complex server-side visualizations are specified using an adaptation of the Vega Visualization Grammar. Heavy Immerse generates Vega rendering specifications behind the scenes; however, you can also generate custom visualizations using the same API. This customizable visualization system combines the agility of a lightweight frontend with the power of a GPU engine.

    hashtag
    Heavy Immerse

    Heavy Immerse is a web-based data visualization interface that uses HeavyDB and HeavyRender for visual interaction. Intuitive and easy to use, Heavy Immerse provides standard visualizations, such as line, bar, and pie charts, as well as complex data visualizations, such as geo point maps, geo heat maps, choropleths, and scatter plots. Heavy Immerse provides quick insights and makes them easy to recognize.

    hashtag
    Dashboards

    Use dashboards to create and organize your charts. Dashboards automatically cross-filter when interacting with data, and refresh with zero latency. You can create dashboards and interact with conventional charts and data tables, as well as scatterplots and geo charts created by HeavyRender. You can also create your own queries in the SQL editor.

    hashtag
    Charts

    Heavy Immerse lets you create a variety of different chart types. You can display pointmaps, heatmaps, and choropleths alongside non-geographic charts, graphs, and tables. When you zoom into any map, visualizations refresh immediately to show data filtered by that geographic context. Multiple sources of geographic data can be rendered as different layers on the same map, making it easy to find the spatial relationships between them.

    Create geo charts with multiple layers of data to visualize the relationship between factors within a geographic area. Each layer represents a distinct metric overlaid on the same map. Those different metrics can come from the same or a different underlying dataset. You can manipulate the layers in various ways, including reorder, show or hide, adjust opacity, or add or remove legends.

    hashtag
    Use Multiple Sources

    Heavy Immerse can visually display dozens of datasets in the same dashboard, allowing you to find multi-factor relationships that you might not otherwise consider. Each chart (or groups of charts) in a dashboard can point to a different table, and filters are applied at the dataset level. Multisource dashboards make it easier to quickly compare across datasets, without merging the underlying tables.

    hashtag
    Streaming Data

    Heavy Immerse is ideal for high-velocity data that is constantly streaming; for example, sensor, clickstream, telematics, or network data. You can see the latest data to spot anomalies and trend variances rapidly. Immerse auto-refresh automatically updates dashboards at flexible intervals that you can tailor to your use case.

    hashtag
    Ready to Get Started?

    I want to...

    See...

    Install HEAVY.AI

    Upgrade to the latest version

    Configure HEAVY.AI

    HeavyDB
    HeavyRender
    Heavy Immerse
    HEAVY.AI Platform Architecture Diagram
    Final Version
    Upgrade Steps

    OmniSci less than 5.5

    HEAVY.AI 7.0

    Upgrade to 5.5 --> --> 7.0

    OmniSci 5.5 - 5.10

    HEAVY.AI 7.0

    Upgrade to --> 7.0

    HEAVY.AI 6.0

    HEAVY.AI 7.0

    Upgrade to 7.0

    circle-info

    Versions 5.x and 6.0.0 are no longer supported; use these only as needed to facilitate an upgrade to a supported version.

    Example: if you are running an OmniSci version older than 5.5, you must first upgrade to 5.5, then upgrade to 6.0 and after that upgrade to 7.0. If you are running 6.0 - 6.4, you can upgrade directly to 7.0 in a single step.

    circle-info

    Currently, HEAVY.AI does not support updating a geo column type (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, or MULTIPOLYGON) in a table.

    hashtag
    Update Via Subquery

    You can update a table via subquery, which allows you to update based on calculations performed on another table.

    Examples

    UPDATE test_facts SET lookup_id = (SELECT SAMPLE(test_lookup.id
    
    UPDATE test_facts SET val = val+1, lookup_id = (SELECT SAMPLE
    
    UPDATE test_facts SET lookup_id = (SELECT SAMPLE(test_lookup.id
    

    hashtag
    Cross-Database Queries

    In Release 6.4 and higher, you can run UPDATE queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases.

    To execute queries against another database, you must have ACCESS privilege on that database, as well as UPDATE privilege.

    hashtag
    Example

    Update a row in a table in the my_other_db database:

    Overview

    HEAVY.AI has minimal configuration requirements with a number of additional configuration options. This topic describes the required and optional configuration changes you can use in your HEAVY.AI instance.

    circle-exclamation

    In release 4.5.0 and higher, HEAVY.AI requires that all configuration flags used at startup match a flag on the HEAVY.AI server. If any flag is misspelled or invalid, the server does not start. This helps ensure that all settings are intentional and will not have an unexpected impact on performance or data integrity.

    hashtag
    Storage Directory

    Before starting the HEAVY.AI server, you must initialize the persistent storage directory. To do so, create an empty directory at the desired path, such as /var/lib/heavyai.

    1. Create the environment variable $HEAVYAI_BASE.

    2. Then, change the owner of the directory to the user that the server will run as ($HEAVYAI_USER):

    where $HEAVYAI_USER is the system user account that the server runs as, such as heavyai, and $HEAVYAI_BASE is the path to the parent of the HEAVY.AI server storage directory.

    3. Run $HEAVYAI_PATH/bin/initheavy with the storage directory path as the argument:

    hashtag
    Configuring a Custom Heavy Immerse Subdirectory

    Immerse serves the application from the root path (/) by default. To serve the application from a sub-path, you must modify the $HEAVYAI_PATH/frontend/app-config.js file to change the IMMERSE_PATH_PREFIX value. The Heavy Immerse path must start with a forward slash (/).

    hashtag
    Configuration File

    The configuration file stores runtime options for your HEAVY.AI servers. You can use the file to change the default behavior.

    The heavy.conf file is stored in the $HEAVYAI_BASE directory. The configuration settings are picked up automatically by the sudo systemctl start heavydb and sudo systemctl start heavy_web_server commands.

    Set the flags in the configuration file using the format <flag> = <value>. Strings must be enclosed in quotes.

    The following is a sample configuration file. The entry for data path is a string and must be in quotes. The last entry in the first section, for null-div-by-zero, is the Boolean value true and does not require quotes.

    To comment out a line in heavy.conf, prepend the line with the pound sign (#) character.

    triangle-exclamation

    For encrypted backend connections, if you do not use a configuration file to start the database, Calcite expects passwords to be supplied through the command line, and calcite passwords will be visible in the processes table. If a configuration file is supplied, then passwords must be supplied in the file. If they are not, Calcite will fail.

    CUDA Compatibility Drivers

    circle-exclamation

    This procedure is considered experimental.

    In some situations, you might not be able to upgrade NVIDIA CUDA drivers on a regular basis. To work around this issue, NVIDIA provides compatibility drivers that allow users to use newer features without requiring a full upgrade. For information about compatibility drivers, see https://docs.nvidia.com/deploy/cuda-compatibility/index.htmlarrow-up-right.

    hashtag
    Installing the Drivers

    Use the following commands to install the CUDA 11 compatibility drivers on Ubuntu:

    After the last nvidia-smi, ensure that CUDA shows the correct version.

    circle-info

    The driver version will still show as the old version.

    hashtag
    Updating systemd Files

    After installing the drivers, update the systemd files in /lib/systemd/system/heavydb.service.

    In the service section, add or update the environment property

    The file should look like that

    Then force the reload of the systemd configuration

    Using Utilities

    HeavyDB includes the utilities initdb for database initialization and generate_cert for generating certificates and private keys for an HTTPS server.

    hashtag
    initdb

    Before using HeavyDB, initialize the data directory using initdb:

    This creates three subdirectories:

    • catalogs: Stores HeavyDB catalogs

    • data: Stores HeavyDB data

    • log

    The -f flag forces initdb to overwrite existing data and catalogs in the specified directory.

    By default, initdb adds a sample table of geospatial data. Use the --skip-geo flag if you prefer not to load sample geospatial data.

    hashtag
    generate_cert

    This command generates certificates and private keys for an HTTPS server. The options are:

    • [{-ca} <bool>]: Whether this certificate should be its own Certificate Authority. The default is false.

    • [{-duration} <duration>]: Duration that the certificate is valid for. The default is 8760h0m0s.

    Kafka

    Apache Kafkaarrow-up-right is a distributed streaming platform. It allows you to create publishers, which create data streams, and consumers, which subscribe to and ingest the data streams produced by publishers.

    You can use HeavyDB KafkaImporterarrow-up-right C++ program to consume a topic created by running Kafka shell scripts from the command line. Follow the procedure below to use a Kafka producer to send data, and a Kafka consumer to store the data, in HeavyDB.

    circle-info

    This example assumes you have already installed and configured Apache Kafka. See the Kafka websitearrow-up-right.

    hashtag
    Creating a Topic

    Create a sample topic for your Kafka producer.

    1. Run the kafka-topics.sh script with the following arguments:

    2. Create a file named myfile that consists of comma-separated data. For example:

    3. Use heavysql

    hashtag
    Using the Producer

    Load your file into the Kafka producer.

    1. Create and start a producer using the following command.

    hashtag
    Using the Consumer

    Load the data to HeavyDB using the Kafka console consumer and the KafkaImporter program.

    1. Pull the data from Kafka into the KafkaImporter program.

    2. Verify that the data arrived using heavysql.

    Getting Started on Azure

    Getting Started with HEAVY.AI on Microsoft Azure

    Follow these instructions to get started with HEAVY.AI on Microsoft Azure.

    hashtag
    Prerequisites

    You must have a Microsoft Azure account. If you do not have an account, go to the Micrsoft Azure home pagearrow-up-right to sign up for one.

    hashtag
    Configure Your HEAVY.AI Instance

    To launch HEAVY.AI on Microsoft Azure, you configure a GPU-enabled instance.

    1) Log in to you Microsoft Azure portal.

    2) On the left side menu, create a Resource group, or use one that your organization has created.

    3) On the left side menu, click Virtual machines, and then click Add.

    4) Create your virtual machine:

    • On the Basics tab:

      • In Project Details, specify the Resource group.

      • Specify the Instance Details:

    5) Click Review + create. Azure reviews your entries, creates the required services, deploys them, and starts the VM.

    6) Once the VM is running, select the VM you just created and click the Networking tab.

    7) Click the Add inbound button and configure security rules to allow any source, any destination, and destination port 6273 so you can access Heavy Immerse from a browser on that port. Consider renaming the rule to 6273-Immerse or something similar so that the default name makes sense.

    8) Click Add and verify that your new rule appears.

    Azure-specific configuration is complete. Now, follow standard for your Linux distribution and installation method.

    INSERT

    Use INSERT for both single- and multi-row ad hoc inserts. (When inserting many rows, use the more efficient COPY command.)

    hashtag
    Examples

    You can also insert into a table as SELECT, as shown in the following examples:

    INSERT INTO destination_table SELECT * FROM source_table;
    INSERT INTO destination_table (id, name, age, gender) SELECT * FROM source_table;

    You can insert array literals into array columns. The inserts in the following example each have three array values, and demonstrate how you can:

    • Create a table with variable-length and fixed-length array columns.

    • Insert NULL arrays into these colums.

    • Specify and insert array literals using {...} or ARRAY[...]

    hashtag
    Default Values

    If you with column that has a default value, or to add a column with a default value, using the INSERT command creates a record that includes the default value if it is omitted from the INSERT. For example, assume a table created as follows:

    If you omit the name column from an INSERT or INSERT FROM SELECT statement, the missing value for column name is set to 'John Doe'.

    INSERT INTO tbl (id, age) VALUES (1, 36); creates the record 1|'John Doe'|36 .

    INSERT INTO tbl (id, age) SELECT id, age FROM old_tbl; also sets all the name values to John Doe .

    generate_random_strings

    Generates random string data.

    SELECT * FROM TABLE(generate_random_strings(<num_strings>, <string_length>/)

    hashtag
    Input Arguments

    Parameter
    Description
    Data Type

    hashtag
    Output Columns

    Name
    Description
    Data Type

    Example

    Getting Started on GCP

    Getting Started with HEAVY.AI on Google Cloud Platform

    Follow these instructions to get started with HEAVY.AI on Google Cloud Platform (GCP).

    hashtag
    Prerequisites

    You must have a Google Cloud Platform account. If you do not have an account, follow to sign up for one.

    To launch HEAVY.AI on Google Cloud Platform, you select and configure an instance.

    Encrypted Credentials in Custom Applications

    HEAVY.AI can accept a set of encrypted credentials for secure authentication of a custom application. This topic provides a method for providing an encryption key to generate encrypted credentials and configuration options for enabling decryption of those encrypted credentials.

    hashtag
    Generating an Encryption Key

    Generate a 128- or 256-bit encryption key and save it to a file. You can use to generate a suitable encryption key.

    Column-Level Security

    Grant or revoke SELECT privileges to columns in a table. These privileges can be managed separately of table-level privileges, allowing for SELECT operations on a subset of columns.

    circle-exclamation
    • Column privileges are only enabled for tables.

    Executor Resource Manager

    hashtag
    Overview

    To enable concurrent execution of queries, we introduce the concept of an Executor Resource Manager (ERM). This keeps track of compute and memory resources to gate query execution and ensures that compute resources are not over-subscribed. As of version 7.0, ERM is enabled by default.

    The ERM evaluates several kinds of resources required by a query. Currently this includes CPU cores, GPUs, buffer and result set memory. It will leverage all available resources unless policy limits have been established, such as for maximum memory use or query time. It determines both the ideal/maximum amount of resources desirable for optimal performance and the minimum required. For example, a CPU query scanning 8 fragments could run with up to 8 threads, but could execute with as little as a single CPU thread with correspondingly less memory if needed.

    UPDATE table_name SET assign [, assign ]* [ WHERE booleanExpression ]
    UPDATE UFOs SET shape='ovate' where shape='eggish';
    UPDATE my_other_db.customers SET name = 'Joe' WHERE id = 10;
    initdb [-f | --skip-geo] $HEAVYAI_BASE/storage
    INSERT INTO <table> (column1, ...) VALUES (row_1_value_1, ...), ..., (row_n_value_1, ...);
    CREATE TABLE ar (ai INT[], af FLOAT[], ad2 DOUBLE[2]); 
    INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}); 
    INSERT INTO ar VALUES (ARRAY[NULL,2],NULL,NULL); 
    INSERT INTO ar VALUES (NULL,{},{2.0,NULL});
    -- or a multi-row insert equivalent
    INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}), (ARRAY[NULL,2],NULL,NULL), (NULL,{},{2.0,NULL});
    )
    FROM test_lookup WHERE test_lookup.val = test_facts.val);
    (
    test_lookup
    .
    id
    )
    FROM test_lookup WHERE test_lookup.val = test_facts.val);
    )
    FROM test_lookup WHERE test_lookup.val = test_facts.val) WHERE id < 10;
    The ERM establishes a request queue. On every new request, as well as every time an existing request is completed, it checks available resources and picks the next resource request to grant. It currently always gives preference to earlier requests if resources permit launching them (first in, first out, or “FIFO”).

    If the system-level multi-executor flag is enabled, the ERM will allow multiple queries to execute at once so long as resources are available. Currently, multiple execution is allowed for CPU queries (and multiple CPU queries and a single GPU query). This supports significant throughput gains by allowing inter-query-kernel concurrency, in addition to the major win of not having a long-running CPU query block the queue for other CPU queries or interactive GPU queries. The number of queries that can be run in parallel is limited by the number of executors

    hashtag
    Use of CPU and GPU

    By default, if HeavyDB is compiled to run on GPUs and if GPUs are available, query steps/kernels will execute on GPU UNLESS:

    1. Some operations in the query step cannot run on GPU. Operators like MODE, APPROX_MEDIAN/PERCENTILE, and certain string functions are examples.

    2. Update and delete queries currently run on CPU.

    3. The query step requires more memory than available on GPU, but less than available on CPU.

    4. A user explicitly requests their query run on CPU, either via setting a session flag or via a query hint.

    At the instance level, this behavior can be configured with system flags on startup. For example a system with GPU can be configured to use only CPU using the cpu-only flag. Or the system use of CPU RAM can be controlled using cpu-buffer-mem-bytes. Execution can also be routed to different device types with query hints such as “SELECT /*+ cpu_mode */ …” These controls do not require the ERM but are platform-wide.

    hashtag
    Example Use Cases

    hashtag
    Example 1: (no tuning required)

    In a scenario where the system hasn’t enough memory available for the cpu-cache or the cache itself is too fragmented to accommodate all the columns’ chunks into cpu-caches, the EMR instead of failing the query with an OOM error, will

    1. run the query reading a single chunk at a time and moving data to GPU caches for a GPU execution.

    2. in case that there isn’t enough GPU memory will run the query chunk by chunk in CPU mode. In this case the query will run slower, but this will free up the GPU executor for less memory demanding queries.

    hashtag
    Example 2: (minimal tuning required)

    You are deploying a new dashboard or chart which doesn’t require big data or high performance, and so you prefer to run it just on CPU. This way it doesn’t interfere with other performance-critical dashboards or charts.

    1. Set the dashboard chart execution to CPU using query hints. Instead of referencing data directly, set a new “custom data source.” For example, if your data is in a table called ‘mydata’, In the custom source, after your SELECT keyword, add the CPU query hint: SELECT /*+ g_cpu_mode */ * FROM mydata You can repeat this for a data source supporting any number of charts desired, including all charts.\

    2. Bump up the number of executors (default 4) to 6-8. With more executors free, the dashboard will perform better, without impacting the performance of the other dashboards.\

    hashtag
    Example 3: (some tuning required)

    Improving performance of memory-intensive operations like high cardinality aggregates.

    A user conducting exact “count distinct” operations on large datasets, with high cardinality that are likely to be run on CPU, on a server having many CPU cores might employ the following strategy:

    1. Increase the number of executors (default 4) to 8-16. --num-executors=16

    2. Limit CPU total memory use using --cpu-buffer-mem-bytes from default 80% to make some room for large result sets, that now are limited by the executor-cpu-result-mem-ratio.

    If those query have sparse value or and high cardinality and are using a wide count distinct will be pushed to CPU execution. Change the executor-per-query-max-cpu-threads-ratio parameter to lower the number of cores that will run a single query; doing that the groupby buffers will be built in a faster way, lowering the memory footprint and speeding up the runtime of query.

    6.0
    6.0
  • Virtual machine name

  • Region

  • Image (Ubuntu 16.04 or higher, or CentOS/RHEL 7.0 or higher)

  • Size. Click Change size and use the Family filter to filter on GPU, based on your use case and requirements. Not all GPU VM variants are available in all regions.

  • For Username, add any user name other than admin.

  • In Inbound Port Rules, click Allow selected ports and select one or more of the following:

    • HTTP (80)

    • HTTPS (443)

    • SSH (22)

  • On the Disks tab, select Premium or Standard SSD, depending on your needs.

  • For the rest of the tabs and sections, use the default values.

  • HEAVY.AI installation instructions
    azure_resource_group.png
    azure_vm
    azure_configure_vm
    azure_networking
    : Contains all HeavyDB log files.
  • disk_cache: Stores the data cached by HEAVY COnnect

  • [{-ecdsa-curve} <string>]: ECDSA curve to use to generate a key. Valid values are P224, P256, P384, P521.

  • [{-host} <string>]: Comma-separated hostnames and IPs to generate a certificate for.

  • [{-rsa-bits} <int>]: Size of RSA key to generate. Ignored if –ecdsa-curve is set. The default is 2048.

  • [{-start-date} <string>]: Start date formatted as Jan 1 15:04:05 2011

  • syntax.
  • Insert empty variable-length arrays using{} and ARRAY[] syntax.

  • Insert array values that contain NULL elements.

  • create a table
    alter a table
    INSERT INTO destination_table (name, gender, age, id) SELECT name, gender, age, id  FROM source_table;
    INSERT INTO votes_summary (vote_id, vote_count) SELECT vote_id, sum(*) FROM votes GROUP_BY vote_id;

    <num_strings>

    The number of strings to randomly generate.

    BIGINT

    <string_length>

    Length of the generated strings.

    BIGINT

    id

    Integer id of output, starting at 0 and increasing monotonically

    Column<BIGINT>

    rand_str

    Random String

    Column<TEXT ENCODING DICT>

    to create a table to store the stream.
    hashtag
    Configuring the Web Server

    Set the file path of the encryption key file to the encryption-key-file-path web server parameter in heavyai.conf:

    Alternatively, you can set the path using the --encryption-key-file-path=path/to/file command-line argument.

    hashtag
    Generating Encrypted Credentials

    Generate encrypted credentials for a custom application by running the following Go program, replacing the example key and credentials strings with an actual key and actual credentials. You can also run the program in a web browser at https://play.golang.org/p/nNBsZ8dhqr0arrow-up-right.

    https://acte.ltd/utils/randomkeygenarrow-up-right
    [web]
    encryption-key-file-path = “path/to/file”

    Column privileges other than SELECT such as UPDATE, DELETE are currently unsupported.

  • Column-level security is not supported on queries that use one or more views.

  • hashtag
    Synopsis

    circle-info

    The <entity> referred to above can either be a role or user.

    circle-info

    The above GRANT and REVOKE commands can be compounded with other privileges. For example

    grants the SELECT column privilege on the table employees to test_user as well as UPDATE privileges.

    triangle-exclamation

    When using UPDATE or DELETE on a table, any columns used in the WHERE condition must allow for SELECT. That is, the entity issuing the command must have sufficient SELECT privileges to all columns in use. For example, SELECT privilege on the table being operated on is sufficient.

    triangle-exclamation

    Currently, when a query utilizes a view, column-level privileges are disabled. In such cases, only table-level privileges are considered. Consequently, queries that might have adequate column-level privileges but also involve a view will result in an insufficient privileges error.

    hashtag
    Examples

    1. Grant SELECT on a single column.

    1. Revoke SELECT on a single column.

    The following also revokes column privileges.

    1. Grant SELECT on multiple columns.

    1. Revoke SELECT on multiple columns.

    1. Granting SELECT on any column allows access to metadata.

    1. Allowing SELECT privilege on a subset of columns will enable certain queries and disable others.

    1. Any subqueries used within a query will enforce similar column-level security.

    1. Table-level privileges supersede column-level privileges. Revoking column-privilege will not affect table-level privileges.

    export HEAVYAI_BASE=/var/lib/heavyai
    sudo mkdir -p $HEAVYAI_BASE sudo chown -R $HEAVYAI_USER $HEAVYAI_BASE
    $HEAVYAI_PATH/bin/initheavy $HEAVYAI_BASE/storage
    port = 6274 
    http-port = 6278
    data = "/var/lib/heavyai/storage"
    null-div-by-zero = true
    
    [web]
    port = 6273
    frontend = "/opt/heavyai/frontend"
    servers-json = "/var/lib/heavyai/servers.json"
    enable-https = true
    wget 
    https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
    
    mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 
    
    apt-key adv --fetch-keys 
    https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
    
    add-apt-repository "deb 
    https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" 
    
    apt update 
    
    nvidia-smi 
    
    apt install cuda-compat-11-0 
    
    nvidia-smi 
    
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/compat/ 
    
    nvidia-smi
    Environment=LD_LIBRARY_PATH=/usr/local/cuda-11.0/compat:$LD_LIBRARY_PATHbash
    [Unit] 
    Description=HEAVY.AI database server 
    After=network.target remote-fs.target
    
    [Service] 
    Environment=LD_LIBRARY_PATH=/usr/local/cuda-11.0/compat:$LD_LIBRARY_PATH
    User=heavyai 
    Group=heavyai 
    WorkingDirectory=/opt/heavyai
    ExecStart=/opt/heavyai/bin/heavydb --config /var/lib/heavyai/heavy.conf 
    KillMode=control-group 
    SuccessExitStatus=143 
    LimitNOFILE=65536 
    Restart=always
    
    [Install] 
    WantedBy=multi-user.target
    // Some code
    generate_cert [{-ca} <bool>]
                  [{-duration} <duration>]
                  [{-ecdsa-curve} <string>]
                  [{-host} <host1,host2>]
                  [{-rsa-bits} <int>]
                  [{-start-date} <string>]
    CREATE TABLE ar (ai INT[], af FLOAT[], ad2 DOUBLE[2]); 
    INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}); 
    INSERT INTO ar VALUES (ARRAY[NULL,2],NULL,NULL); 
    INSERT INTO ar VALUES (NULL,{},{2.0,NULL});
    CREATE TABLE tbl (
       id INTEGER NOT NULL, 
       name TEXT NOT NULL DEFAULT 'John Doe', 
       age SMALLINT NOT NULL);
    heavysql> SELECT * FROM TABLE(generate_random_strings(10, 20);
    id|rand_str
    0 |He9UeknrGYIOxHzh5OZC
    1 |Simnx7WQl1xRihLiH56u
    2 |m5H1lBTOErpS8is00YJ
    3 |eeDiNHfKzVQsSg0qHFS0
    4 |JwOhUoQEI6Z0L78mj8jo
    5 |kBTbSIMm25dvf64VMi
    6 |W3lUUvC5ajm0W24JML
    7 |XdtSQfdXQ85nvaIoyYUY
    8 |iUTfGN5Jaj25LjGJhiRN
    9 |72GUoTK2BzcBJVTgTGW
    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1
    --partitions 1 --topic matstream
    michael,1
    andrew,2
    ralph,3
    sandhya,4
    cat myfile | bin/kafka-console-producer.sh --broker-list localhost:9097
    --topic matstream
    /home/heavyai/build/bin/KafkaImporter stream1 heavyai -p HyperInteractive -u heavyai --port 6274 --batch 1 --brokers localhost:6283  
    --topic matstream --group-id 1
    
    Field Delimiter: ,
    Line Delimiter: \n
    Null String: \N
    Insert Batch Size: 1
    1 Rows Inserted, 0 rows skipped.
    2 Rows Inserted, 0 rows skipped.
    3 Rows Inserted, 0 rows skipped.
    4 Rows Inserted, 0 rows skipped.
    heavysql> select * from stream1;
    name|id
    michael|1
    andrew|2
    ralph|3
    sandhya|4
    create table stream1(name text, id int);
    package main
    
    import (
        "crypto/aes"
        "crypto/cipher"
        "crypto/rand"
        
        "fmt"
        "io")
        
    // 1. Replace example key with encryption string
    var key = "v9y$B&E(H+MbQeThWmZq4t7w!z%C*F-J"
    
    // 2. Replace strings "username", "password", "dbName"with credentials
    var stringsToBeEncrypted = []string{
        "username",
        "password",
        "dbName",
    }
    
    // 3. Run program to see encrypted credentials in console
    func main() {
        for i := range stringsToBeEncrypted {
            encrypted, err := EncryptString(stringsToBeEncrypted[i])
            if err != nil {
                panic(err)
            }
            fmt.Printf("%s => %s\n", stringsToBeEncrypted[i],encrypted)
        }
    }
    
    func EncryptString(str string) (encrypted string,err error) {
        keyBytes := []byte(key)
        
        block, err := aes.NewCipher(keyBytes)
        if err != nil {
            panic(err.Error())
        }
        aesGCM, err := cipher.NewGCM(block)
        if err != nil {
            panic(err.Error())
        }
        nonce := make([]byte, aesGCM.NonceSize())
        if _, err = io.ReadFull(rand.Reader, nonce); err!= nil {
            panic(err.Error())
        }
        strBytes := []byte(str)
        
        cipherBytes := aesGCM.Seal(nonce, nonce, strBytes,nil)
        
        return fmt.Sprintf("%x", cipherBytes), err
    }
    GRANT SELECT (salary), UPDATE ON TABLE employees TO test_user;
    GRANT SELECT (<column1>,<column2>,...<columnN>) ON TABLE <table> TO <entity>;
    
    REVOKE SELECT (<column1>,<column2>,...<columnN>) ON TABLE <table> FROM <entity>;
    CREATE USER test_user (PASSWORD='test');
    CREATE TABLE employees (id INT, salary BIGINT);
    GRANT SELECT(id) ON TABLE employees TO test_user;
    REVOKE SELECT(id) ON TABLE employees FROM test_user;
    REVOKE ALL ON TABLE employees FROM test_user;
    GRANT SELECT (id,salary) ON TABLE employees TO test_user;
    REVOKE SELECT (id,salary) ON TABLE employees FROM test_user;
    -- Without privilege, the following exception will occur for test_user.
    -- "Violation of access privileges: user test_user has no proper privileges for object employees"
    SELECT count(*) FROM employees;
    
    -- The following is run as an super-user or administrator. 
    GRANT SELECT(id) ON TABLE employees TO test_user;
    -- The following works without issue for test_user.
    SELECT count(*) FROM employees; 
    -- The following is run as an super-user or administrator.
    GRANT SELECT(id) ON TABLE employees TO test_user;
    
    -- The following query completes without error for test_user.
    SELECT id FROM employees;
    -- The following query does not complete and reports no proper privileges for test_user.
    SELECT id, salary FROM employees;
    -- The following query completes without error for test_user.
    SELECT * FROM (SELECT id FROM employees);
    
    -- The following query does not complete and reports no proper privileges for test_user.
    SELECT * FROM (SELECT id, salary FROM employees);
    -- The following is run as an super-user or administrator.
    GRANT SELECT ON TABLE employees TO test_user;
    GRANT SELECT(id) ON TABLE employees TO test_user;
    
    -- The following query completes without error for test_user.
    SELECT id FROM employees;
    -- The following query completes without error for test_user.
    SELECT id, salary FROM employees;
    
    -- The following is run as an super-user or administrator.
    REVOKE SELECT(id) ON TABLE employees FROM test_user;
    -- The following query completes without error for test_user. The user still has table-level privileges.
    SELECT id, salary FROM employees;
    hashtag
    Launching Your HEAVY.AI Instance

    On the solution Launcher Page, click Launch on Compute Engine to begin configuring your deployment.

    circle-exclamation

    Before deploying a solution with a GPU machine type, avoid potential deployment failure by checking your available quota for a projectarrow-up-right to make sure that you have not exceeded your limit.

    To launch HEAVY.AI on Google Cloud Platform, you select and configure a GPU-enabled instance.

    1. Search for HEAVY.AI on the heavyai-launcher-public project on Google Cloud Platformarrow-up-right, and select a solution. HEAVY.AI has four instance types:

      • HEAVY.AI Enterprise Edition (BYOL)arrow-up-right.

      • HEAVY.AI Enterprise Edition for CPU (BYOL)arrow-up-right.

      • .

      • .

    2. On the solution Launcher Page, click Launch to begin configuring your deployment.

    3. On the new deployment page, configure the following:

      • Deployment name

      • Zone

    4. Accept the GCP Marketplace Terms of Service and click Deploy.

    5. In the Deployment Manager, click the instance that you deployed.

    6. Launch the Heavy Immerse client:

      • Record the Admin password (Temporary).

      • Click the Site address link to go to the Heavy Immerse login page. Enter the password you recorded, and click Connect.

    these instructionsarrow-up-right

    See some tutorials and demos to help get up and running

    • Loading Data

    • Using OmniSci Immerse

    • Vega Tutorials

    Learn more about charts in Heavy Immerse

    • Heavy Immerse Chart Types

    Use HEAVY.AI in the cloud

    • Try HEAVY.AI Cloudarrow-up-right

    • Getting Started with AWS AMI

    • Getting Started with Microsoft Azure

    See what APIs work with HEAVY.AI

    • Vega Rendering API Overview

    • omnisql

    • Thrift

    Learn about features and resolved issues for each release

    • Release Notes

    Know what issues and limitations to look out for

    • Known Issues, Limitations, and Changes to Default Behavior

    See answers to frequently asked questions

    • FAQarrow-up-right

    Installation Recipes
    Upgrading OmniSci
    Configuration Flags and Runtime Settings

    Implementing a Secure Binary Interface

    Follow these instructions to start an HEAVY.AI server with an encrypted main port.

    hashtag
    Required PKI Components

    You need the following PKI (Public Key Infrastructure) components to implement a Secure Binary Interface.

    • A CRT (short for certificate) file containing the server's PKI certificate. This file must be shared with the clients that connect using encrypted communications. Ideally, this file is signed by a recognized certificate issuing agency.

    • A key file containing the server's private key. Keep this file secret and secure.

    • A Java TrustStore containing the server's PKI certificate. The password for the trust store is also required.

    circle-info

    Although in this instance the trust store contains only information that can be shared, the Java TrustStore program requires it to be password protected.

    • A Java KeyStore and password.

    • In a distributed system, add the configuration parameters to the heavyai.conf file on the aggregator and all leaf nodes in your HeavyDB cluster.

    hashtag
    Demonstration Script to Create "Mock/Test" PKI Components

    You can use OpenSSL utilities to create the various PKI elements. The server certificate in this instance is self-signing, and should not be used in a production system.

    1. Generate a new private key.

    2. Use the private key to generate a certificate signing request.

    3. Self sign the certificate signing request to create a public certificate.

    To generate a keystore file from your server key:

    1. Copy server.key to server.txt. Concatenate it with server.crt.

    2. Use server.txt to create a PKCS12 file.

    3. Use server.p12 to create a keystore.

    hashtag
    Start the Server in Encrypted Mode with PKI Client Authentication

    Start the server using the following options.

    hashtag
    Example

    hashtag
    Configuring heavyai.conf for Encrypted Connection

    Alternatively, you can add the following configuration parameters to heavyai.conf to establish a Secure Binary Interface. The following configuration flags implement the same encryption shown in the runtime example above:

    circle-info

    Passwords for the SSL truststore and keystore can be enclosed in single (') or double (") quotes.

    hashtag
    Why Use Both server.crt and a Java TrustStore?

    The server.crt file and the Java truststore contain the same public key information in different formats. Both are required by the server to establish both the secure client communication with the various interfaces and with its Calcite server. At startup, the Java truststore is passed to the Calcite server for authentication and to encrypt its traffic with the HEAVY.AI server.

    Using Services

    HEAVY.AI features two system services: heavydb and heavy_web_server. You can start these services individually using systemd.

    hashtag
    Starting and Stopping HeavyDB Using systemd

    For permanent installations of HeavyDB, HEAVY.AI recommends that you use systemd to manage HeavyDB services. systemd automatically handles tasks such as log management, starting the services on restart, and restarting the services if there is a problem.

    In addition, systemd manages the open-file limit in Linux. Some cloud providers and distributions set this limit too low, which can result in errors as your HEAVY.AI environment and usage grow. For more information about adjusting the limits on open files, see in the section of our knowledgebase.

    hashtag
    Initial Setup

    You use the install_heavy_systemd.sh script to prepare systemd to run HEAVY.AI services. The script asks questions about your environment, then installs the systemd service files in the correct location. You must run the script as the root user so that the script can perform tasks such as creating directories and changing ownership.

    The install_heavy_systemd.sh script asks for the information described in the following table.

    Variable
    Use
    Default
    Notes

    hashtag
    Starting HeavyDB Using systemd

    To manually start HeavyDB using systemd, run:

    hashtag
    Restarting HeavyDB Using systemd

    You can use systemd to restart HeavyDB — for example, after making configuration changes:

    hashtag
    Stopping HeavyDB Using systemd

    To manually stop HeavyDB using systemd, run:

    hashtag
    Enabling HeavyDB on Startup

    To enable the HeavyDB services to start on restart, run:

    hashtag
    Using Configuration Parameters

    You can customize the behavior of your HEAVY.AI servers by modifying your heavy.conf configuration file. See .

    Licensing

    The Enterprise Edition of HEAVY.AI contains a rich set of analytical and location intelligence features at various capacity levels. A license provides the ability to use these features at specific capacities (e.g. specific number of GPUs) for a specified period of time, typically based on an end user license agreement.

    A new license version and mechanism was implement in the 8.0 HEAVY.AI release. This is a breaking change that would require all enterprise customers to request a new license before upgrading to the 8.0 release. This document provides details about the new licensing options and provides guidance on how to decide on the best licensing option for your organization and how to request and apply an appropriate license.

    Starting at the 8.0 release, HEAVY.AI supports two types of licenses: Node Locked Licenses and Floating Licenses.

    hashtag

    Upgrading HEAVY.AI

    This section is giving a recipe to upgrade between fully compatible products version.

    This section is giving a recipe to upgrade between fully compatible products version.

    circle-exclamation

    As with any software upgrade, it is important that you back up your data before upgrading. Each release introduces efficiencies that are not necessarily compatible with earlier releases of the platform. HeavyAI is never expected to be backward compatible.

    Back up the contents of your $HEAVYAI_STORAGE directory.

    Exporting Data

    hashtag
    COPY TO

    <file path> must be a path on the server. This command exports the results of any SELECT statement to the file. There is a special mode when <file path> is empty. In that case, the server automatically generates a file in <HEAVY.AI Directory>/export that is the client session id with the suffix .txt.

    generate_series

    hashtag
    generate_series (Integers)

    Generate a series of integer values.

    hashtag

    Table Expression and Join Support

    If a join column name or alias is not unique, it must be prefixed by its table name.

    You can use BIGINT, INTEGER, SMALLINT, TINYINT, DATE, TIME, TIMESTAMP, or TEXT ENCODING DICT data types. TEXT ENCODING DICT is the most efficient because corresponding dictionary IDs are sequential and span a smaller range than, for example, the 65,535 values supported in a SMALLINT field. Depending on the number of values in your field, you can use TEXT ENCODING DICT(32) (up to approximately 2,150,000,000 distinct values), TEXT ENCODING DICT(16) (up to 64,000 distinct values), or TEXT ENCODING DICT(8) (up to 255 distinct values). For more information, see .

    hashtag
    Geospatial Joins

    SQL Extensions

    HEAVY.AI implements a number of custom extension functions to SQL.

    hashtag
    Rendering

    The following table describes SQL extensions available for the HEAVY.AI implementation of Vega.

    Uber H3 Hexagonal Modeling

    hashtag
    Uber H3 Functions

    hashtag
    Overview

    Uber H3 is an open-source geospatial system created by Uber Technologies. H3 provides a hierarchical grid system that divides the Earth's surface into hexagons of varying sizes, allowing for easy location-based indexing, search, and analysis.

    Try Vega
    Getting Started on Google Cloud Platform
    JDBC
    ODBC
    Vega
    RJDBC
    pyomnisciarrow-up-right

    Must be dedicated to HEAVY.AI. The installation script creates the directory $HEAVYAI_STORAGE/data, generates an appropriate configuration file, and saves the file as $HEAVYAI_STORAGE/heavy.conf.

    HEAVYAI_USER

    User HeavyDB is run as

    Current user

    User must exist before you run the script.

    HEAVYAI_GROUP

    Group HeavyDB is run as

    Current user's primary group

    Group must exist before you run the script.

    HEAVYAI_PATH

    Path to HeavyDB installation directory

    Current install directory

    HEAVY.AI recommends heavyai as the install directory.

    HEAVYAI_BASE

    Path to the storage directory for HeavyDB data and configuration files

    Why am I seeing the error "Too many open files...erno24"arrow-up-right
    Troubleshooting and Monitoring Solutionsarrow-up-right
    Configuration Parameters

    heavyai

    When possible, joins involving a geospatial operator (such as ST_Contains) build a binned spatial hash table (overlaps hash join), falling back to a Cartesian loop join if a spatial hash join cannot be constructed.

    The enable-overlaps-hashjoin flag controls whether the system attempts to use the overlaps spatial join strategy (true by default). If enable-overlaps-hashjoin is set to false, or if the system cannot build an overlaps hash join table for a geospatial join operator, the system attempts to fall back to a loop join. Loop joins can be performant in situations where one or both join tables have a small number of rows. When both tables grow large, loop join performance decreases.

    Two flags control whether or not the system allows loop joins for a query (geospatial for not): allow-loop-joins and trivial-loop-join-threshold. By default, allow-loop-joins is set to false and trivial-loop-join-threshold to 1,000 (rows). If allow allow-loop-joins is set to true, the system allows any query with a loop join, regardless of table cardinalities (measured in number of rows). If left to the implicit default of false or set explicitly to false, the system allows loop join queries as long as the inner table (right-side table) has fewer rows than the threshold specified by trivial-loop-join-threshold.

    For optimal performance, the system should utilize overlaps hash joins whenever possible. Use the following guidelines to maximize the use of the overlaps hash join framework and minimize fallback to loop joins when conducting geospatial joins:

    • The inner (right-side) table should always be the more complicated primitive. For example, for ST_Contains(polygon, point), the point table should be the outer (left) table and the polygon table should be the inner (right) table.

    • Currently, ST_CONTAINS and ST_INTERSECTS joins between point and polygons/multi-polygon tables, and ST_DISTANCE < {distance} between two point tables are supported for accelerated overlaps hash join queries.

    • For pointwise-distance joins, only the pattern WHERE ST_DISTANCE(table_a.point_col, table_b.point_col) < distance_in_degrees supports overlaps hash joins. Patterns like the following fall back to loop joins:

      • WHERE ST_DWITHIN(table_a.point_col, table_b.point_col, distance_in_degrees)

      • WHERE ST_DISTANCE(ST_TRANSFORM(table_a.point_col, 900913), ST_TRANSFORM(table_b.point_col, 900913)) < 100

    hashtag
    Using Joins in a Distributed Environment

    You can create joins in a distributed environment in two ways:

    • Replicate small dimension tables that are used in the join.

    • Create a shard key on the column used in the join (note that there is a limit of one shard key per table). If the column involved in the join is a TEXT ENCODED field, you must create a SHARED DICTIONARY that references the FACT table key you are using to make the join.

    circle-info

    The join order for one small table and one large table matters. If you swap the sales and customer tables on the join, it throws an exception stating that table "sales" must be replicated.

    Data Types and Fixed Encodingarrow-up-right
    Use the Java tools to create a key store from the public certificate.
    Node Locked Licenses

    Node locked licenses restrict the use of the HEAVY.AIarrow-up-right platform to a single machine. A node locked license is generated for a specific machine that an organization has set up to run their HEAVY.AIarrow-up-right instance and is appropriate for organizations with single instance deployment agreements with dedicated machines for HEAVY.AIarrow-up-right use.

    In order to get started with a node locked license, a unique machine identifier called a “Host ID“ has to be derived from the machine on which HEAVY.AIarrow-up-right will be running. The Host ID can be displayed using the heavysql “\status” command. For example:

    The HEAVY.AIarrow-up-right instance that the heavysql client connects to has to be launched on the machine in question and with privileged access (e.g. using “sudo“) in order to view the Host ID and use a Node Locked License. See the “Using systemd to Launch HeavyDB with Privileged Access” section for more details about how to launch HeavyDB with privileged access using systemd. For HEAVY.AIarrow-up-right docker deployments, no additional deployment steps are required.

    triangle-exclamation

    Note that changing fundamental hardware components of the machine can result in a change in the Host ID. If this happens, please reach out to the Customer Success team for assistance.

    The Host ID is only meaningful within the context of HEAVY.AIarrow-up-right and cannot be used to derive any information about the machine on which HEAVY.AIarrow-up-right is running.

    If you intend on getting a Node Locked License, create a new Node Locked License Request support ticket and provide the Host ID in the ticket description.

    hashtag
    Floating Licenses

    Floating licenses allow for the deployment of the HEAVY.AIarrow-up-right platform on multiple machines on the same network. Floating licenses require the deployment of another type of server called a “License Server”. The License Server monitors the use of HEAVY.AIarrow-up-right licenses and resources across deployments and determines whether a deployment should proceed based on license restrictions related to the number of allowed deployments and total resources consumed across all deployments. When all license specified resources are used to the limit, no additional HEAVY.AIarrow-up-right deployments will be allowed. A floating license is appropriate for organizations who have multiple instance deployment agreements or single instance deployment agreements but cannot rely on a dedicated machines for HEAVY.AI use (e.g. cloud deployments on non-dedicated hosts).

    Similar to the process for Node Locked License deployments, a Host ID has to be derived for the machine on which the License Server will run (i.e. the license server is node locked). The steps for getting the Host ID is the same as those mentioned in the “Node Locked Licenses“ section above. If you intend on getting a floating license, create a new Floating License Request support ticket and provide the following information in the ticket description:

    1. Host ID of machine on which the License Server will be running.

    2. Hostname of License Server (as seen by the HeavyDB instances).

    3. License Server port number (as seen by the HeavyDB instances).

    hashtag
    License Server Deployment

    The License Server can be deployed using the heavyai/heavyai-license-server docker image. A directory containing an initially empty storage directory and license_server.conf file should be bound to the /var/lib/heavyai path. Example deployment:

    circle-info

    Organizations that have multiple teams, each using a different HEAVY.AI floating license, can use the same license server for all HEAVY.AIarrow-up-right deployments across all teams. The license server is a lightweight application that can be deployed on a small CPU only machine.

    hashtag
    Upgrade Scenarios

    hashtag
    No Existing HEAVY.AIarrow-up-right Deployments

    Customers with no existing HEAVY.AIarrow-up-right deployments can follow the above steps for getting either a Node Locked or Floating License using the latest HEAVY.AIarrow-up-right release.

    hashtag
    Existing License on a HEAVY.AIarrow-up-right 7.x.x Deployment

    Customers with existing 7.x.x HEAVY.AIarrow-up-right deployments should follow the above steps for getting either a Node Locked or Floating License using their existing deployment. An upgrade to the latest (or any 8.0 and later releases) should occur only after getting a new license from the Customer Success team. The old heavy.license file should be deleted or renamed before attempting to upgrade.

    hashtag
    Existing license on a HEAVY.AIarrow-up-right 6.x.x deployment

    Customers with existing 6.x.x HEAVY.AIarrow-up-right deployments should first upgrade to a 7.0.x release and follow the steps specified in the above “Existing license on a HEAVY.AI 7.x.x deployment“ section.

    hashtag
    Using systemd to Launch HeavyDB with Privileged Access

    In order to launch HeavyDB with privileged access using systemd, the [Service] User and [Service] Group fields in the /lib/systemd/system/heavydb.service file (see CUDA Compatibility Drivers | v7.2.4 (latest) | HEAVY.AI Docsarrow-up-right for more details) should be removed. The updated file should look like:

    After updating the service file, force reload of the systemd configuration by executing the following command:

    HEAVY.AIarrow-up-right
    hashtag
    Upgrading from Omnisci

    If you need to upgrade from Omnisci to HEAVY.AI 6.0 or later, please refer to the specific recipe.

    triangle-exclamation

    Direct upgrades from Omnisci to HEAVY.AI version later than 6.0 aren't allowed nor supported.

    hashtag
    Upgrading Using Docker

    To upgrade HEAVY.AI in place in Docker

    In a terminal window, get the Docker container ID.

    You should see output similar to the following. The first entry is the container ID. In this example, it is 9e01e520c30c:

    Stop the HEAVY.AI Docker container. For example:

    Optionally, remove the HEAVY.AI Docker container. This removes unused Docker containers on your system and saves disk space.

    Backup the Omnisci data directory (typically /var/lib/omnisci)

    Download the latest version of the HEAVY.AI Docker image according to the Edition and device you are actually coming from Select the tab depending on the Edition (Enterprise, Free, or Open Source) and execution Device (GPU or CPU) you are upgrading.

    circle-info

    If you don't want to upgrade to the latest version but want to upgrade to a specific version, change thelatesttag with the version needed.

    If the version needed is the 6.0 use v6.0.0 as the version tag in the image name

    heavyai/heavyai-ee-cuda:v6.0.0

    Check that the docker is up and running a docker ps commnd:

    You should see an output similar to the following.

    This runs both the HEAVY.AI database and Immerse in the same container.

    circle-info

    You can optionally add --rm to the Docker run command so that the container is removed when it is stopped.

    See also the note regarding the CUDA JIT Cachearrow-up-right in Optimizing Performance.

    hashtag
    Upgrading HEAVY.AI Using Package Managers and Tarball

    To upgrade an existing system installed with package managers or tarball. The commands upgrade HEAVY.AI in place without disturbing your configuration or stored data

    Stop the HEAVY.AI services.

    Back up your $HEAVYAI_STORAGE directory (the default location is /var/lib/heavyai).

    Run the appropriate set of commands depending on the method used to install the previous version of the software.

    Make a backup of your actual installation

    Download and Install the latest version following the install documentation for your Operative System CentOS/RHEL and Ubuntu

    When the upgrade is complete, start the HEAVY.AI services.

    Upgrading from Omnisci to HEAVY.AI 6.0chevron-right
    cd $HEAVYAI_PATH/systemd
    sudo ./install_heavy_systemd.sh
    sudo systemctl start heavydb
    sudo systemctl start heavy_web_server
    sudo systemctl restart heavydb
    sudo systemctl restart heavy_web_server
    sudo systemctl stop heavydb
    sudo systemctl stop heavy_web_server
    sudo systemctl enable heavydb
    sudo systemctl enable heavy_web_server
    # Table customers is very small
    CREATE TABLE sales (
    id INTEGER,
    customerid TEXT ENCODING DICT(32),
    saledate DATE ENCODING DAYS(32),
    saleamt DOUBLE);
    
    CREATE TABLE customers (
    id TEXT ENCODING DICT(32),
    someid INTEGER,
    name TEXT ENCODING DICT(32))
    WITH (partitions = 'replicated') #this causes the entire contents of this table to be replicated to each leaf node. Only recommened for small dimension tables.
    SELECT c.id, c.name from sales s inner join customers c on c.id = s.customerid limit 10;
    CREATE TABLE sales (
    id INTEGER,
    customerid BIGINT, #note the numeric datatype, so we don't need to specify a shared dictionary on the customer table
    saledate DATE ENCODING DAYS(32),
    saleamt DOUBLE,
    SHARD KEY (customerid))
    WITH (SHARD_COUNT = <num gpus in cluster>)
    
    CREATE TABLE customers (
    id TEXT BIGINT,
    someid INTEGER,
    name TEXT ENCODING DICT(32)
    SHARD KEY (id))
    WITH (SHARD_COUNT=<num gpus in cluster>);
    
    SELECT c.id, c.name FROM sales s INNER JOIN customers c ON c.id = s.customerid LIMIT 10;
    CREATE TABLE sales (
    id INTEGER,
    customerid TEXT ENCODING DICT(32),
    saledate DATE ENCODING DAYS(32),
    saleamt DOUBLE,
    SHARD KEY (customerid))
    WITH (SHARD_COUNT = <num gpus in cluster>)
    
    #note the difference when customerid is a text encoded field:
    
    CREATE TABLE customers (
    id TEXT,
    someid INTEGER,
    name TEXT ENCODING DICT(32),
    SHARD KEY (id),
    SHARED DICTIONARY (id) REFERENCES sales(customerid))
    WITH (SHARD_COUNT = <num gpus in cluster>)
    
    SELECT c.id, c.name FROM sales s INNER JOIN customers c ON c.id = s.customerid LIMIT 10;
    <table> , <table> WHERE <column> = <column>
    <table> [ LEFT ] JOIN <table> ON <column> = <column>
    openssl genrsa -out server.key 2048
    openssl req -new -key server.key -out server.csr
    openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt
    cp server.key server.txt
    cat server.crt >> server.txt
    openssl pkcs12 -export -in server.txt -out server.p12
    keytool -importkeystore -v -srckeystore server.p12  -srcstoretype PKCS12 -destkeystore keystore.jks -deststoretype pkcs12
    --pki-db-client-auth true
    --ssl-cert 
    --ssl-private-key 
    --ssl-trust-store 
    --ssl-trust-password 
    --ssl-keystore 
    --ssl-keystore-password 
    --ssl-trust-ca 
    --ssl-trust-ca-server 
    sudo start heavyai_server --port 6274 --data /data --pki-db-client-auth true  
    --ssl-cert /tls_certs/self_signed_server.example.com_self_signed/self_signed_server.example.com.pem 
    --ssl-private-key /tls_certs/self_signed_server.example.com_self_signed/private/self_signed_server.example.com_key.pem 
    --ssl-trust-store /tls_certs/self_signed_server.example.com_self_signed/trust_store_self_signed_server.example.com.jks 
    --ssl-trust-password truststore_password 
    --sslkeystore /tls_certs/self_signed_server.example.com_self_signed/key_store_self_signed_server.example.com.jks
    --ssl-keystore-password keystore_password 
    --ssl-trust-ca = "/tls_certs/self_signed_server.example.com_self_signed/self_signed_server.example.com.pem" 
    --ssl-trust-ca-server /tls_certs/ca_primary/ca_primary_cert.pem
    # Start pki authentication 
    pki-db-client-auth = true 
    ssl-cert = "/tls_certs/self_signed_server.example.com_self_signed/self_signed_server.example.com.pem" 
    ssl-private-key = "/tls_certs/self_signed_server.example.com_self_signed/private/self_signed_server.example.com_key.pem" 
    ssl-trust-store = "/tls_certs/self_signed_server.example.com_self_signed/trust_store_self_signed_server.example.com.jks" 
    ssl-trust-password = "truststore_password"  
    ssl-keystore = "/tls_certs/self_signed_server.example.com_self_signed/key_store_self_signed_server.example.com.jks" 
    ssl-keystore-password = "keystore_password" 
    ssl-trust-ca = "/tls_certs/self_signed_server.example.com_self_signed/self_signed_server.example.com.pem" 
    ssl-trust-ca-server = "/tls_certs/ca_primary/ca_primary_cert.pem" 
    keytool -importcert  -file server.crt -keystore server.jks
    $ echo "\status" | ./bin/heavysql -p HyperInteractive
    License invalid
    Not connected to any database. See \h for help.
    Server Version                      : 7.2.4 Enterprise Edition
    Host ID                             : 1234567890abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqr
    --------------------------------------------------
    Server Name                         : example.com
    Server Start Time                   : 2024-03-01 : 00:00:00
    $ mkdir -p ~/heavyai-license-server/storage
    $ echo "port = 6278" > ~/heavyai-license-server/license_server.conf
    
    $ docker run --rm -p 6280:6280 \
      -v ~/heavyai-license-server:/var/lib/heavyai \
      heavyai/heavyai-license-server
    [Unit] 
    Description=HEAVY.AI database server 
    After=network.target remote-fs.target
    
    [Service] 
    Environment=LD_LIBRARY_PATH=/usr/local/cuda-11.0/compat:$LD_LIBRARY_PATH
    WorkingDirectory=/opt/heavyai
    ExecStart=/opt/heavyai/bin/heavydb --config /var/lib/heavyai/heavy.conf 
    KillMode=control-group 
    SuccessExitStatus=143 
    LimitNOFILE=65536 
    Restart=always
    
    [Install] 
    WantedBy=multi-user.target
    sudo systemctl daemon-reload
    sudo docker run -d --gpus=all \
      -v /var/lib/heavyai:/var/lib/heavyai \
      -p 6273-6278:6273-6278 \
      heavyai/heavyai-ee-cuda:latest
    sudo docker run -d -v \
    /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/heavyai-ee-cpu:latest
    sudo docker run -d --gpu=all \
      -v /var/lib/heavyai:/var/lib/heavyai \
      -p 6273-6278:6273-6278 \
      heavyai/core-os-cuda:latest
    sudo docker run -d -v \
    /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/core-os-cpu:latest
    sudo yum update heavyai.x86_64
    sudo apt update
    sudo apt upgrade heavyai
    sudo mv /opt/heavyai /opt/heavyai_backup
    sudo docker container ps --format "{{.Id}} {{.Image}}" \
    -f status=running | grep omnisci\/
    9e01e520c30c omnisci/omnisci-ee-gpu
    docker container stop 9e01e520c30c
    docker container rm 9e01e520c30c
    tar zcvf /backup_dir/omnisci_storage_backup.tar.gz /var/lib/omnisci
    sudo docker container ps --format "{{.Image}} {{.Status}}" \
    -f status=running | grep heavyai\/
    heavyai/heavyai-ee-cuda Up 48 seconds ago 
    sudo systemctl stop heavydb heavy_web_server
    sudo systemctl start heavydb heavy_web_server
    Machine type - Click Customize and configure Cores and Memory, and select Extend memory if necessary.
    gcp_machinetype
  • GPU type. (Not applicable for CPU configurations.)

  • Number of GPUs - (Not applicable for CPU configurations.) Select the number of GPUs; subject to quota and GPU type by region. For more information about GPU-equipped instances and associated resources, see GPU Models for Compute Enginearrow-up-right.

  • Boot disk type

  • Boot disk size in GB

  • Networking - Set the Network, Subnetwork, and External IP.

  • Firewall - Select the required ports to allow TCP-based connectivity to HEAVY.AI. Click More to set IP ranges for port traffic and IP forwarding.

  • Copy your license key from the registration email message. If you have not received your license key, contact your Sales Representative or register for your 30-day trial herearrow-up-right.

  • Connect to Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.

  • When prompted, paste your license key in the text box and click Apply.

  • Click Connect to start using HEAVY.AI.

  • On successful login, you see a list of sample dashboards loaded into your instance.

    HEAVY.AI Open Source Editionarrow-up-right
    HEAVY.AI for CPU (Open Source)arrow-up-right
    Available properties in the optional WITH clause are described in the following table.
    Parameter
    Description
    Default Value

    array_null_handling

    Define how to export with arrays that have null elements:

    • 'abort' - Abort the export. Default.

    • 'raw' - Export null elements as raw values.

    'abort'

    delimiter

    A single-character string for the delimiter between column values; most commonly:

    • , for CSV files

    • for tab-delimited files

    Other delimiters include

    ,~, ^, and;.

    Applies to only CSV and tab-delimited files.

    Note: HEAVY.AI does not use file extensions to determine the delimiter.

    circle-info

    When using the COPY TO command, you might encounter the following error:

    To avoid this error, use the heavysql command \cpu to put your HEAVY.AI server in CPU mode before using the COPY TO command. See Configurationarrow-up-right.

    hashtag
    Example

    Input Arguments
    Parameter
    Description
    Data Types

    <series_start>

    Starting integer value, inclusive.

    BIGINT

    <series_end>

    Ending integer value, inclusive.

    BIGINT

    hashtag
    Output Columns

    Name
    Description
    Data Types

    generate_series

    The integer series specified by the input arguments.

    Column<BIGINT>

    Example

    hashtag
    generate_series (Timestamps)

    Generate a series of timestamp values from start_timestamp to end_timestamp .

    Input Arguments

    Parameter
    Description
    Data Types

    series_start

    Starting timestamp value, inclusive.

    TIMESTAMP(9) (Timestamp literals with other precisions will be auto-casted to TIMESTAMP(9) )

    series_end

    Ending timestamp value, inclusive.

    TIMESTAMP(9) (Timestamp literals with other precisions will be auto-casted to TIMESTAMP(9) )

    Output Columns

    Name
    Description
    Output Types

    generate_series

    The timestamp series specified by the input arguments.

    COLUMN<TIMESTAMP(9)>

    Example

    hashtag
    SQL SELECT
    Function
    Arguments and Return

    convert_meters_to_merc_pixel_width(meters, lon, lat, min_lon, max_lon, img_width, min_width)

    Converts a distance in meters in a longitudinal direction from a latitude/longitude coordinate to a pixel size using mercator projection:

    • meters: Distance in meters in a longitudinal direction to convert to pixel units.

    • lon: Longitude coordinate of the center point to size from.

    convert_meters_to_merc_pixel_height(meters, lon, lat, min_lat, max_lat, img_height, min_height)

    Converts a distance in meters in a latitudinal direction from a latitude/longitude coordinate to a pixel size, using mercator projection:

    • meters: Distance in meters in a longitudinal direction to convert to pixel units.

    • lon: Longitude coordinate of the center point to size from.

    convert_meters_to_pixel_width(meters, pt, min_lon, max_lon, img_width, min_width)

    Converts a distance in meters in a longitudinal direction from a latitude/longitude POINT to a pixel size. Supports only mercator-projected points.

    • meters: Distance in meters in a latitudinal direction to convert to pixel units.

    • pt: The center POINT to size from. The point must be defined in the EPSG:4326 spatial reference system.

    convert_meters_to_pixel_height(meters, pt, min_lat, max_lat, img_height, min_height)

    Hexagons can be created at a single scale, for instance to fill an arbitrary polygon at one resolution (see below). They can also be used to generate a much-smaller number of hexagons at multiple scales. In general, operating on H3 hexagons is much faster than on raw arbitrary geometries, at a cost of some precision. Because each hexagon is exactly the same size, this is particularly advantageous for GPU-accelerated workflows.

    A single-scale tessellation of California into Uber h3 hexagons
    A multi-scale tesselation of California into Uber h3 hexagons

    hashtag
    Advantages

    A principal advantage of the system is that for a given scale, hexagons are approximately-equal area. This stands in contrast to other subdivision schemes based on longitudes and latitudes or web Mercator map projections.

    A second advantage is that with hexagons, neighbors in all directions are equidistant. This is not true for rectangular subdivisions like pixels, whose 8 neighbors are at different distances.

    Pixel neighbors vary in distance
    Hexagon neighbors are equidistant

    The exact amount of precision lost can be tightly bounded, with the smallest sized hexagons supported being about 1m2. That’s more accurate than most current available data sources, short of survey data.

    hashtag
    Disadvantages

    There are some disadvantages to be aware of. The first is that the world can not actually be divided up completely cleanly into hexagons. It turns out that a few pentagons are needed, and this introduces discontinuities. However the system has cleverly placed those pentagons far away from any land masses, so this is only practically a concern for specific maritime operations.

    The second issue is that hexagons at adjacent scales do not nest exactly:

    Hexagons at varying scales do not nest cleanly

    This doesn’t much affect practical operations at any single given scale. But if you look carefully at the California multiscale plot above you will discover tiny discontinuities in the form of gaps or overlaps. These don’t amount to a large percentage of the total area, but nonetheless mean this method is not appropriate when exact counts are required.

    hashtag
    Supported Methods

    hashtag
    Coordinates to H3 Indices

    H3_PointToCell(POINT p, INTEGER resolution) -> BIGINT

    H3_LonLatToCell(DOUBLE lon, DOUBLE lat, INTEGER resolution) -> BIGINT

    These functions take a world-space coordinate (in WGS84/SRID4326) as either POINT or DOUBLE, and a resolution value as INTEGER (which must be in the range 0 to 15) and return an H3Index as a BIGINT corresponding to the cell at that resolution with a center point nearest to the given point. The index value may be projected, stored in a table, used as a Join Key, or as the input to one of the other functions.

    Note: H3 indices in this form are intended to be immutable values and not human-readable. Any manipulation of the value may destroy its content.

    Note: The H3_PointToCell function only accepts POINT projections from columns. It does not accept temporary values or literals.

    hashtag
    H3 Indices to Coordinates

    H3_CellToLon(BIGINT cell) -> DOUBLE

    H3_CellToLat(BIGINT cell) -> DOUBLE

    These functions take an H3 Index value as BIGINT and convert it back to a world-space coordinate (in WGS84/SRID4326) returning the longitude or the latitude value respectively of the center point of the cell represented by the input index.

    hashtag
    Conversions to and from Hex String Representation

    H3_CellToString_TEXT(BIGINT cell) -> TEXT ENCODING DICT

    H3_CellToString_TEXT_NONE(BIGINT cell) -> TEXT ENCODING NONE

    These functions take an H3 Index value as BIGINT and convert it to the corresponding H3 Hex String representation. There are two variants of the function depending on whether the output needs to be a Dictionary-Encoded or None-Encoded TEXT value. The latter is fine for projection for display. The former should be used if required as a Join Key. Note that heavy use of the Dictionary-Encoded version to generate many unique values may result in very large string dictionaries in the database.

    H3_StringToCell([TEXT ENCODING DICT|TEXT ENCODING NONE]) -> BIGINT

    This function performs the reverse, taking a H3 Hex String as either a Dictionary-Encoded or None-Encoding TEXT value, and returning the corresponding numeric representation as BIGINT. This must be used to convert H3 Indices stored as TEXT columns into numeric values for passing to the other functions.

    hashtag
    Getting the Index Parent

    H3_CellToParent(BIGINT cell, INTEGER resolution) -> BIGINT

    This function takes an H3 Index value as BIGINT and returns the H3 Index value of the parent cell containing the input cell, at the specified resolution (larger).

    hashtag
    Checking Index Validity

    H3_IsValidCell(BIGINT cell) -> BOOL

    This function checks an H3 Index value for validity. Invalid input values result in a FALSE result.

    hashtag
    Converting Index to Geometry

    H3_CellToBoundary_WKT(BIGINT cell) -> TEXT ENCODING DICT

    This function takes an H3 Index as BIGINT and returns a WKT (Well-Known Text) representation of the geo POLYGON of the cell boundary as a Dictionary-Encoded TEXT value. Note that the polygon may be a hexagon or a pentagon, depending on the cell location. See the H3 documentationarrow-up-right for more details.

    hashtag
    Query Execution

    As the implementation of these functions is provided by the open-source H3 SDK, query stages containing them will run on CPU only.

    hashtag
    H3 Usage Notes

    Uber's H3 python library provides a wider range of functions than those available above (although at significantly slower performance). The library defaults to generating H3 codes as hexadecimal strings, but can be configured to support BIGINT codes. PSee the H3 documentationarrow-up-right for more details.

    H3 codes can be used in regular joins, including joins in Immerse. They can also be used as aggregators, such as in Immerse custom dimensions. For points which are exactly aligned, such as imports from raster data bands of the same source, aggregating on H3 codes is faster than the exact geographic overlaps function ST_EQUALS.

    Welcome to HEAVY.AI Documentation

    circle-info

    Use of HEAVY.AI is subject to the terms of the HEAVY.AI End User License Agreement (EULA)arrow-up-right and Addendum (EULA)arrow-up-right.

    hashtag
    What Will I Learn?

    hashtag
    For Analysts

    Learn how to use to gain new insights to your data with fast, responsive graphics and .

    hashtag
    For Administrators

    Learn how to and configure your HEAVY.AI instance, then for analysis.

    hashtag
    For Developers and Data Scientists

    Learn how to extend HEAVY.AI with an integrated and custom . Contribute to the HEAVY.AI Core Open Source project.

    hashtag
    Release Highlights

    For more complete release information, see the .

    hashtag
    Release 8.2

    With the V8.2 release, we now support the Nvidia Grace Hopper Superchip which combines an Nvidia GPU with an Arm-based CPU in a single chip. This architecture supports Nvidia's NVLink interconnect which provides for high-speed bandwidth between the GPU and CPU.

    hashtag
    Release 8.0

    We are pleased to introduce HeavyIQ, a custom LLM embedded within a brand new visual notebook interface. This combination of custom model and user experience represents our vision for the future of analytics. It supports the capabilities you’d expect, including English to SQL, English to SQL-backed answer and English to graphics. We think you will be very pleased with the “out of the box” results.

    While HeavyIQ is certainly the headline, there are as always a number of additional features in this release. One not yet fully apparent to a casual user is support for table and column level metadata. This is available at 8.0 in SQL, and at release will already be used by HeavyIQ to help in table and column selection. In cases where table or column names are ambiguous, we’ve found that simplifying adding a clarifying metadata comment is a simple way to improve HeavyIQ accuracy.

    At 8.0, we’ve also significantly improved our support for raster and multidimensional array datasets. Since most raster data is available on huge external data stores, we’ve added raster to HeavyConnect. Now rather than to import these datasets, you have the option to link to them on-the-fly as needed. We’ve also changed the internal storage of rasters to use a tile-oriented approach aligned with fragments. This lowers memory requirements and improves performance by allowing fragment skipping. What we’ve not changed is our unified syntax for raster and vector processing. That continues to make use of raster data significantly easier than on systems with entirely different internal languages for raster and vector data processing.

    Finally, this release includes major dependency updates and a more flexible license management system. The dependency updates should be transparent to most users, but are an important part of maintaining system security. The new licensing system deliberately mirrors those of our peers, now supporting “floating” as well as “node locked” licenses. As more of our customers deploy in the cloud, these new capabilities support more flexibility in resource management.

    We hope you enjoy this major new release, and look forward to seeing how you put these new capabilities to expand the power and accessibility of visual analytics within your organizations.

    hashtag
    Release 7.0

    hashtag
    Overview

    We are also pleased to announce the general availability of our new backend Executor Resource Manager with CPU / GPU parallelism and query policy controls such as executor type, memory and time limits. We can also now support CPU queries larger than available CPU memory.

    This release also features the debut of a user interface for joins in Immerse (beta), supporting inner and left joins which are named and persisted in dashboards. This provides analytic and visualization access to joined columns, complementing the prior table linking function supporting cross-filtering.

    Powerful machine learning (beta) and statistical methods (beta) are now available in the database, supporting high performance predictive analytics workflows. For example you can now perform clustering or run linear regression or random forest models on large datasets with interactive inferencing.

    Immerse also gains a large set of dashboard refinements, including an optional ‘minimalist’ style with hidden chart titles, and an optional new text chart with full HTML and font controls.

    There are several major external dependency updates in this release. With Ubuntu 18 reaching its end of life we now require Ubuntu 20.04. For similar reasons, we now support NVIDIA CUDA version 11.8, which deprecates support for Kepler GPUs. Last but not least, we are formally retiring polygon ‘render groups’ within the database, a change which is not backwards compatible. So full database backups are required as part of this upgrade.

    hashtag
    Heavy Immerse

    New Features and Improvements

    • BETA: Joins in Immerse

    • BETA: Enhanced text chart. The flag `ui/enable_new_text_chart` adds a “text2” chart type, with additional features:

      • font family (e.g. arial)

    hashtag
    HeavyML (BETA)

    7.0 marks the beta release of HeavyML, a new set of capabilities to execute accelerated machine learning workflows directly from SQL.

    General Capabilities and Methods

    • Named model creation is supported via a new CREATE MODEL statement (see the release notes and documentation for more details)

    • Row-wise inference (GPU-accelerated for GPU queries) can be performed via a new ML_PREDICT row-wise operator. This can be used as an Immerse custom measure and persisted into dashboards, allowing end-users to consume models without needing to know how to create or administer them.

    • An EVALUATE model function is provided to test models against metrics (such as r2).

    Regression Algorithms

    • Four regression algorithms are supported initially: linear regression, random forest regression, decision trees, and Gradient Boosted Trees (GBT).

    • Both categorical text and continuous numeric regressors/predictors are supported. Categorical inputs are automatically one-hot-encoded.

    • Support for continuous variable prediction is initially supported, categorical classification is planned for a later release.

    Clustering Algorithms

    • Two clustering algorithms are supported in this initial release: KMeans and DBScan.

    • Clustering algorithms can be called via associated table functions (more detail can be found in the relevant documentation), and currently support continuous numeric inputs only.

    Performance and Administration

    • A new Executor Resource Manager (ERM) framework is provided

    • The ERM allows for CPU queries to run fully in parallel, and one or more CPU queries to run in parallel while a GPU query is executing (parallel GPU query kernel execution is not supported yet).

    • It also allows execution of CPU queries where the input datasets do not fit into the CPU buffer pool by executing on a fragment-by-fragment basis, paging from storage.

    hashtag
    HeavyRF

    New Features and Improvements

    A new “cell editor” is provided. This supports multi-band antennas mounted within various sites within a cell. Various antenna attributes such as horizontal and vertical falloff can be easily applied based on an extensible library of antenna types.

    Vegetation and building envelope attenuation can now be directly or indirectly specified. For example, typical values can be provided as scalar constants, or clutter object-specific attributes can be derived from normal SQL cursor queries. Vegetation attenuation can be tied to measurements of canopy moisture content from remote sensing based on seasonal statistics, or for individual dates to match drive test data. Building attenuation can be driven by various known or inferred characteristics, such as from parcels databases.

    The right-hand information panel has been extended to better support targeting of large numbers of buildings. This can be done directly by searching and filtering on building attributes in the HeavyRF application, such as building type or size. But it can also be combined with analyses in Immerse extending to multiple arbitrary tags. For example, a set of locations with high customer value and high potential for churn can be identified in Immerse and tagged with attributes searchable in HeavyRF.

    Last but not least, the HeavyRF platform will soon be available on NVIDIA’s LaunchPad. This facilitates initial evaluation of the software by making it immediately available together with appropriate supporting GPU hardware.

    hashtag
    Release 6.4

    HEAVY.AI continues to refine and extend the data connectors ecosystem. This release features general availability of data connectors for PostGreSQL, beta Immerse connectors for Snowflake and Redshift, and SQL support for Google BigQuery and Hive (beta). These managed data connections let you use HEAVY.AI as an acceleration platform, wherever your source data lives. Scheduling and automated caching ensure that from an end-user perspective, fast analytics are always running on the latest available data.

    Immerse features four new chart types: Contour, Cross-section, Wind barb and Skew-t. While these are especially useful for atmospheric and geotechnical data visualization, Contour and Cross-section also have more general application.

    Major improvements for time series analysis have been added. This includes time series comparison via window functions, and a large number of SQL window function additions and performance enhancements.

    This release also includes two major architectural improvements:

    • The ability to perform cross-database queries in SQL, increasing flexibility across the board.

    • Render queries no longer block other GPU queries. In many use cases, renders can be significantly slower than other common queries. This should result in significant performance gains, particularly in map-heavy dashboards.

    hashtag
    Release 6.2

    hashtag
    Heavy Immerse

    • Chart animation through cross filter replay, allowing controlled playback of time-based data such as weather maps or GPS tracks.

    • You can now directly export your charts and dashboards as image files.

    • New control panel enables administrators to view the configuration of the system and easily access logs and system tables.

    hashtag
    General Analytics

    • Numerous improvements to core SQL and geoSQL capabilities.

    • Support for string to numeric, timestamp, date, and time types with the new TRY_CAST operator.

    • Explicit and implicit cast support for numeric, timestamp, date, and time types.

    hashtag
    Advanced Analytics

    • Two new functions now support direct loading of LiDAR data: tf_point_cloud_metadata quickly searches tile metadata and helps you find data to import, and tf_load_point_cloud does the actual import importing.

    • Network graph analytics functions have been added. These can work on networks alone, including non-geographic networks, or can find the least-cost path along a geographic network.

    hashtag
    Release 6.1

    Release 6.1.0 features more granular administrative monitoring dashboards based on logs. These have been accessible in an open format on the server side, and now they are available in Immerse, by specific dashboards, users, or queries. Intermediate and advanced SQL support continues to mature, with INSERT, window functions, and UNION ALL.

    This release contains a number of user interface polish items requested by customers. Cartography now supports polygons with colorful borders and transparent fills. Table presentation has been enhanced in various ways, from alignment to zebra striping. And dashboard saving reminders have been scaled back, based on customer feedback.

    The extension framework now features an enhanced “custom source” dialog, as well as new SQL commands to see installed extensions and their parameters. We introduce three new extensions. The first, tf_compute_dwell_times, reduces GPS event stream data volumes considerably while keeping relevant information. The others compute feature similarity scores and are very general.

    This release also includes initial public betas of our PostgreSQL Immerse connector, and SQL support for COPY FROM ODBC database connections, making it easier to connect to your enterprise data.

    hashtag
    Release 6.0

    This release features large advances in data access, including intelligent linking to enterprise data (HeavyConnect) and support for raster geodata. SQL support includes high-performance string functions, as well as enhancements to window functions and table unions. Performance improvements are noticeable across the product, including fundamental advances in rendering, query compilation, and data transport. Our system administration tools have been expanded with a new Admin Portal, as well as additional system tables supporting detailed diagnostics. Major strides in extensibility include new charting options and a new extensions framework (beta).

    hashtag
    Name Changes

    • Rebranded platform from OmniSci to HEAVY.AI, with OmniSciDB now HeavyDB, OmniSci Render now HeavyRender, and OmniSci Immerse now Heavy Immerse.

    hashtag
    HeavyConnect and Data Import

    • HeavyConnect allows the HEAVY.AI platform to work seamlessly as an accelerator for data in other data lakes and data warehouses. For Release 6.0, CSV and Parquet files on local file systems and in S3 buckets can be linked or imported. Other SQL databases are also supported via ODBC (beta).

    • HeavyConnect enables users to specify a data refresh schedule, which ensures access to up-to-date data.

    • Heavy Immerse now supports import of dozens of raster data formats, including geoTIFF, geoJPEG , and PNG. HeavySQL now supports most any vector GIS file format.

    hashtag
    Other Immerse Enhancements

    • New Gauge chart for easy visualization of key metrics relative to target thresholds.

    • New landing page and Help Center.

    • Enhanced mapping workflows with automated column picking.

    hashtag
    SQL Enhancements

    • Support for a wide range of performant string operations using a new string dictionary translation framework, as well as the ability to on-the-fly dictionary encode none-encoded strings with a new ENCODE_TEXT operator.

    • Support for UNION ALL is now enabled by default, with significant performance improvements from the previous release (where it was beta flagged).

    • Significant functionality and performance improvements for window functions, including the ability to support expressions in PARTITION and ORDER clauses.

    hashtag
    Performance

    • Parallel compilation of queries and a new multi-executor shared code cache provide up to 20% throughput/concurrency gains for interactive usage scenarios.

    • 10X+ performance improvements in many cases for initial join queries via optimized Join Hash Table framework.

    • New result set recycler allows for expensive query sub-steps to be cached via the SQL hint /*+ keep_result */, which can significantly increase performance when a subquery is used across multiple queries.

    hashtag
    System Administration

    • A new Admin Portal provides information on system resources usage and users.

    • System table support under a new information_schema database, containing 10 new system tables providing system statistics and memory and storage utilization.

    hashtag
    Extensibility

    • New system and user-defined UDF framework (beta), comprising both row (scalar) and table (UDTF) functions, including the ability to define fast UDFs via Numba Python using the RBC framework, which are then inlined into the HeavyDB compiled query code for performant CPU and GPU execution.

    • System-provided table functions include generate_series for easy numeric series generation, tf_geo_rasterize_slope for fast geospatial binning and slope/aspect computation over elevation data, and others, with more capabilities planned for future releases.

    • Leveraging the new table function framework, a new HeavyRF module (licensed separate) includes tf_rf_prop and tf_rf_prop_max_signal table functions for fast radio frequency signal propagation analysis and visualization.

    hashtag
    Release 5.10

    • Row-level security (RLS) can be used by an administrator to apply security filtering to queries run as a user or with a role.

    • Support for import from dozens of image and raster file types, such as jpeg, png, geotiff, and ESRI grid, including remote files.

    • Significantly more performant, parallelized window functions, executing up to 10X faster than in Release 5.9.

    hashtag
    Release 5.9

    • Significant speedup for POINT and fixed-length array imports and CTAS/ITAS, generally 5-20X faster.

    • The PNG encoding step of a render request is no longer a blocking step, providing improvement to render concurrency.

    • Adds support to hide legacy chart types from add/edit chart menu in preparation for future deprecation (defaults to off).

    hashtag
    Release 5.8

    • Parallel execution framework is on by default. Running with multiple executors allows parts of query evaluation, such as code generation and intermediate reductions, to be executed concurrently. Currently available for single-node deployments.

    • Spatial joins between geospatial point types using the ST_Distance operator are accelerated using the overlaps hash join framework, with speedups up to 100x compared to Release 5.7.1.

    • Significant performance gains for many query patterns through optimization of query code generation, particularly benefitting CPU queries.

    hashtag
    Release 5.7

    • Extensive enhancements to Immerse support for parameters. Parameters can now be used in chart column selectors, chart filters, chart titles, global filters, and dashboard titles. Dashboards can have parameter widgets embedded on them, side-by-side with charts. Parameter values are visible in chart axes/labels, legends, and tooltips, and you can toggle parameter visibility.

    • In Immerse Pointmap charts, you can specify which color-by attribute always render on top, which is useful for highlight anomalies in data.

    • Significantly faster and more accurate "lasso" tool filters geospatial data on Immerse Pointmap charts, leveraging native geospatial intersection operations.

    hashtag
    Release 5.6

    • Custom SQL dimensions, measures, and filters can now be parameterized in Immerse, enabling more flexible and powerful scenario analysis, projections, and comparison use cases.

    • New angle measure added to Pointmap and Scatter charts, allowing orientation data to be visualized with wedge and arrow icons.

    • Custom SQL modal with validation and column name display now enabled across all charts in Immerse.

    hashtag
    Release 5.5

    • Ability to set annotations on New Combo charts for different dimension/measure combinations.

    • New ‘Arrow-over-the-wire’ capability to deliver result sets in Apache Arrow format, with ~3x performance improvement over Thrift-based result set serialization.

    • Support for concurrent SELECT and UPDATE/DELETE queries for single-node installations

    hashtag
    Release 5.4

    • Added initial compilation support for NVIDIA Ampere GPUs.

    • Improved performance for UPDATE and DELETE queries.

    • Improved the performance of filtered group-by queries on large-cardinality string columns**.**

    hashtag
    Release 5.3

    • New Combo chart type in Immerse provides increased configurability and flexibility.

    • Immerse chart-specific filters and quick filters add increased flexibility and speed.

    • Updated Immerse Filter panel provides a Simple mode and Advanced mode for viewing and creating filters.

    hashtag
    Release 5.2

    • NULL support for geospatial types, including in ALTER TABLE ADD COLUMN.

    • : SHOW TABLES, SHOW DATABASES, SHOW CREATE TABLE, and SHOW USER SESSIONS.

    • Ability to perform updates and deletes on temporary tables.

    hashtag
    Release 5.1

    • Added support for UPDATE via JOIN with a subquery in the WHERE clause.

    • Initial support for (that is, non-persistent) tables.

    • Improved performance for multi-column GROUP BY queries, as well as single column GROUP BY queries with high cardinality. Performance improvement varies depending on data volume and available hardware, but most use cases can expect a 1.5 to 2x performance increase over OmniSciDB 5.0.

    hashtag
    Release 5.0

    • The new filter panel in Immerse enables the ability to toggle filters on and off, and introduces Filter Sets to provide quick access to different sets of filters in one dashboard.

    • Immerse now supports using global and cross-filters to interactively build cohorts of interest, and the ability to apply a cohort as a dashboard filter, either within the existing filter set or in a new filter set.

    • Data Catalog, located within Data Import, is a repository of datasets that users can use to enhance existing analyses.

    This link is for the benefit of the search crawler.

    Getting Started on AWS

    Getting Started with AWS AMI

    You can use the HEAVY.AI AWS AMI (Amazon Web Services Amazon Machine Image) to try HeavyDB and Heavy Immerse in the cloud. Perform visual analytics with the included New York Taxi database, or import and explore your own data.

    Many options are available when deploying an AWS AMI. These instructions skip to the specific tasks you must perform to deploy a sample environment.

    hashtag
    Prerequisite

    You need a security key pair when you launch your HEAVY.AI instance. If you do not have one, create one before you continue.

    1. Go to the EC2 Dashboard.

    2. Select Key Pairs under Network & Security.

    3. Click Create Key Pair.

    hashtag
    Launching Your Instance

    1. Go to the and select the version you want to use. You can get overview information about the product, see pricing, and get usage and support information.

    2. Click Continue to Subscribe to subscribe.

    3. Read the Terms and Conditions, and then click Continue to Configuration.

    hashtag
    Using HEAVY.AI Immerse on Your AWS Instance

    To connect to Heavy Immerse, you need your Public IP address and Instance ID for the instance you created. You can find these values on the Description tab for your instance.

    To connect to Heavy Immerse:

    1. Point your Internet browser to the public IP address for your instance, on port 6273. For example, for public IP 54.83.211.182, you would use the URL https://54.83.211.182:6273.

    2. If you receive an error message stating that the connection is not private, follow the prompts onscreen to click through to the unsecured website. To secure your site, see .

    For more information on Heavy Immerse features, see .

    hashtag
    Importing Your Own Data

    Working with your own familiar dataset makes it easier to see the advantages of HEAVY.AI processing speed and data visualization.

    To import your own data to Heavy Immerse:

    1. Export your data from your current datastore as a comma-separated value (CSV) or tab-separated value (TSV) file. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI.

    2. Point your Internet browser to the public IP address for your instance, on port 6273. For example, for public IP 54.83.211.182, you would use the URL https://54.83.211.182:6273.

    For more information, see .

    hashtag
    Accessing Your HEAVY.AI Instance Using SSH

    Follow these instructions to connect to your instance using SSH from MacOS or Linux. For information on connecting from Windows, see .

    1. Open a terminal window.

    2. Locate your private key file (for example, MyKey.pem). The wizard automatically detects the key you used to launch the instance.

    3. Your key must not be publicly viewable for SSH to work. Use this command to change permissions, if needed:

    Getting Started on Kubernetes (BETA)

    Using HEAVY.AI's Helm Chart on Kubernetes

    This documentation outlines how to use HEAVY.AI’s Helm Chart within a Kubernetes environment. It assumes the user is a network administrator within your organization and is an experienced Kubernetes administrator. This is not a beginner guide and does not instruct on Kubernetes installation or administration. It is quite possible you will require additional manifest files for your environment.

    hashtag
    Overview

    The HEAVY.AI Helm Chart is a template of how to configure deployment of the HEAVY.AI platform. The following files need to be updated/created to reflect the customer's deployment environment.

    • values.yml

    • <customer_created>-pv.yml

    • <customer_created>-pvc.yml

    Once the files are updated/created, follow the installation instructions below to install the Helm Chart into your Kubernetes environment.

    hashtag
    Where to get the Helm Chart?

    The Helm Chart is located in the HEAVY.AI github repository. It can be found here:

    hashtag
    What’s included?

    File Name
    Description

    hashtag
    How to install?

    1. Before installing, create a PV/PVC that the deployment will use. Save these files in the regular PVC/PV location used in the customer’s environment. Reference the README.pdf file found in the Helm Chart under templates and the example PV/PVC manifests in the misc folder in the helm chart. The PVC name is then provided to the helm install command.

    2. In your current directory; copy the values.yml file from the HEAVY.AI Helm Chart and customize for your needs.

    hashtag
    How to uninstall?

    To uninstall the helm installed HEAVY.AI instance:

    $ helm uninstall heavyai

    circle-info

    The PVC and PV space defined for the HEAVY.AI instance is not removed. The retained space must be manually deleted.

    hashtag
    Example: values.yml

    hashtag
    Example: example-heavyai-pvc.yml

    hashtag
    Example: example-heavyai-pv.yml

    Hardware Reference

    The amount of data you can process with the HEAVY.AI database depends primarily on the amount of GPU RAM and CPU RAM available across HEAVY.AI cluster servers. For zero-latency queries, the system caches compressed versions of the row- and column-queried fields into GPU RAM. This is called hot data (see ). Semi-hot data utilizes CPU RAM for certain parts of the data.

    show example configurations to help you configure your system.

    Optimal GPUs on which to run the HEAVY.AI platform include:

    • NVIDIA Tesla A100

    Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL

    hashtag
    Install Prerequisites

    Install the Extra Packages for Enterprise Linux (EPEL) repository and other packages before installing NVIDIA drivers.

    RHEL-based distributions require Dynamic Kernel Module Support (DKMS) to build the GPU driver kernel modules. For more information, see . Upgrade the kernel and restart the machine.

    tf_feature_similarity

    Given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity. The score is computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.

    hashtag
    Input Arguments

    Parameter
    Description

    tf_compute_dwell_times

    Given a query input with entity keys (for example, user IP addresses) and timestamps (for example, page visit timestamps), and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session (dwell time).

    hashtag
    Syntax

    Arrays

    HEAVY.AI supports arrays in dictionary-encoded text and number fields (TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, and DOUBLE). Data stored in arrays are not normalized. For example, {green,yellow} is not the same as {yellow,green}. As with many SQL-based services, OmniSci array indexes are 1-based.

    HEAVY.AI supports NULL variable-length arrays for all integer and floating-point data types, including dictionary-encoded string arrays. For example, you can insert NULL into BIGINT[ ], DOUBLE[ ], or TEXT[ ] columns. HEAVY.AI supports NULL fixed-length arrays for all integer and floating-point data types, but not for dictionary-encoded string arrays. For example, you can insert NULL into BIGINT[2] DOUBLE[3], but not into TEXT[2] columns.

    Expression
    Description

    EXPLAIN

    Shows generated Intermediate Representation (IR) code, identifying whether it is executed on GPU or CPU. This is primarily used internally by HEAVY.AI to monitor behavior.

    For example, when you use the EXPLAIN command on a basic statement, the utility returns 90 lines of IR code that is not meant to be human readable. However, at the top of the listing, a heading indicates whether it is IR for the CPU or IR for the GPU, which can be useful to know in some situations.

    hashtag

    tf_raster_graph_shortest_slope_weighted_path

    Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type) across all points in each bin as the output value for the bin. A Gaussian average is then taken over the neighboring bins, with the number of bins specified by neighborhood_fill_radius, optionally only filling in null-valued bins if fill_only_nulls is set to true.

    The graph shortest path is then computed between an origin point on the grid specified by origin_x and origin_y and a destination point on the grid specified by destination_x and destination_y

    tf_load_point_cloud

    Loads one or more las or laz point cloud/LiDAR files from a local file or directory source, optionally tranforming the output SRID to out_srs (if not specified, output points are automatically transformed to EPSG:4326 lon/lat pairs).

    If use_cache is set to true, an internal point cloud-specific cache will be used to hold the results per input file, and if queried again will significantly speed up the query time, allowing for interactive querying of a point cloud source. If the results of tf_load_point_cloud will only be consumed once (for example, as part of a CREATE TABLE

    tf_mandelbrot*

    Computes the over the complex domain [x_min, x_max), [y_min, y_max), discretizing the xy-space into an output of dimensions x_pixels X y_pixels. The output for each cell is the number of iterations needed to escape to infinity, up to and including the specified max_iterations.

    Query couldn’t keep the entire working set of columns in GPU Memory.
    COPY ( <SELECT statement> ) TO '<file path>' [WITH (<property> = value, ...)];
    COPY (SELECT * FROM tweets) TO '/tmp/tweets.csv';
    COPY (SELECT * tweets ORDER BY tweet_time LIMIT 10000) TO
      '/tmp/tweets.tsv' WITH (delimiter = '\t', quoted = 'true', header = 'false');
    SELECT * FROM TABLE(
        generate_series(
            <series_start>,
            <series_end>
            [, <increment>]
        )
    heavysql> select * from table(generate_series(2, 10, 2)); 
    series 
    2 
    4 
    6 
    8 
    10 
    5 rows returned.
    
    heavysql> select * from table(generate_series(8, -4, -3)); 
    series 
    8 
    5 
    2 
    -1 
    -4
    5 rows returned.
    SELECT * FROM TABLE(
        generate_series(
            <series_start>,
            <series_end>,
            <series_step>
        )
    )
    SELECT
      generate_series AS ts
    FROM
      TABLE(
        generate_series(
          TIMESTAMP(0) '2021-01-01 00:00:00',
          TIMESTAMP(0) '2021-09-04 00:00:00',
          INTERVAL '1' MONTH
        )
      )
      ORDER BY ts;
      
    ts
    2021-01-01 00:00:00.000000000
    2021-02-01 00:00:00.000000000
    2021-03-01 00:00:00.000000000
    2021-04-01 00:00:00.000000000
    2021-05-01 00:00:00.000000000
    2021-06-01 00:00:00.000000000
    2021-07-01 00:00:00.000000000
    2021-08-01 00:00:00.000000000
    2021-09-01 00:00:00.000000000
    'zero' - Export null elements as zero (or an empty string).
  • 'nullfield' - Set the entire array column field to null for that row.

  • Applies only to GeoJSON and GeoJSONL files.

    escape

    A single-character string for escaping quotes. Applies to only CSV and tab-delimited files.

    ' (quote)

    file_compression

    File compression; can be one of the following:

    • 'none'

    • 'gzip'

    • 'zip'

    For GeoJSON and GeoJSONL files, using GZip results in a compressed single file with a .gz extension. No other compression options are currently available.

    'none'

    file_type

    Type of file to export; can be one of the following:

    • 'csv' - Comma-separated values file.

    • 'geojson' - FeatureCollection GeoJSON file.

    • 'geojsonl' - Multiline GeoJSONL file.

    • 'shapefile' - Geospatial shapefile.

    For all file types except CSV, exactly one geo column (POINT, LINESTRING, POLYGON or MULTIPOLYGON) must be projected in the query. CSV exports can contain zero or any number of geo columns, exported as WKT strings.

    Export of array columns to shapefiles is not supported.

    'csv'

    header

    Either 'true' or 'false', indicating whether to output a header line for all the column names. Applies to only CSV and tab-delimited files.

    'true'

    layer_name

    A layer name for the geo layer in the file. If unspecified, the stem of the given filename is used, without path or extension.

    Applies to all file types except CSV.

    Stem of the filename, if unspecified

    line_delimiter

    A single-character string for terminating each line. Applies to only CSV and tab-delimited files.

    '\n'

    nulls

    A string pattern indicating that a field is NULL. Applies to only CSV and tab-delimited files.

    An empty string, 'NA', or

    quote

    A single-character string for quoting a column value. Applies to only CSV and tab-delimited files.

    " (double quote)

    quoted

    Either 'true' or 'false', indicating whether all the column values should be output in quotes. Applies to only CSV and tab-delimited files.

    'true'

    <series_step> (optional, defaults to 1)

    Increment to increase or decrease and values that follow. Integer.

    BIGINT

    series_step

    Time/Date interval signifying step between each element in the returned series.

    INTERVAL

    lat: Latitude coordinate of the center point to size from.
  • min_lon: Minimum longitude coordinate of the mercator-projected view.

  • max_lon: Maximum longitude coordinate of the mercator-projected view.

  • img_width: The width in pixels of the view.

  • min_width: Clamps the returned pixel size to be at least this width.

  • Returns: Floating-point value in pixel units. Can be used for the width of a symbol or a point in Vega.

    lat: Latitude coordinate of the center point to size from.
  • min_lat: Minimum latitude coordinate of the mercator-projected view.

  • max_lat: Maximum latitude coordinate of the mercator-projected view.

  • img_height: The height in pixels of the view.

  • min_height: Clamps the returned pixel size to be at least this height.

  • Returns: Floating-point value in pixel units. Can be used for the height of a symbol or a point in Vega.

    min_lon: Minimum longitude coordinate of the mercator-projected view.

  • max_lon: Maximum longitude coordinate of the mercator-projected view.

  • img_width: The width in pixels of the view.

  • min_width: Clamps the returned pixel size to be at least this width.

  • Returns: Floating-point value in pixel units. Can be used for the width of a symbol or a point in Vega.

    Converts a distance in meters in a latitudinal direction from an EPSG:4326 POINT to a pixel size. Currently only supports mercator-projected points:

    • meters: Distance in meters in a longitudinal direction to convert to pixel units.

    • pt: The center POINT to size from. The point must be defined in the EPSG:4326 spatial reference system.

    • min_lat: Minimum latitude coordinate of the mercator-projected view.

    • max_lat: Maximum latitude coordinate of the mercator-projected view.

    • img_height: The height in pixels of the view.

    • min_height: Clamps the returned pixel size to be at least this height.

    Returns: Floating-point value in pixel units. Can be used for the height of a symbol or a point in Vega.

    is_point_in_merc_view(lon, lat, min_lon, max_lon, min_lat, max_lat)

    Returns true if a latitude/longitude coordinate is within a mercator-projected view defined by min_lon/max_lon, min_lat/max_lat.

    • lon: Longitude coordinate of the point.

    • lat: Latitude coordinate of the point.

    • min_lon: Minimum longitude coordinate of the mercator-projected view.

    • max_lon: Maximum longitude coordinate of the mercator-projected view.

    • min_lat: Minimum latitude coordinate of the mercator-projected view.

    • max_lat: Maximum latitude coordinate of the mercator-projected view.

    Returns:True if the point is within the view defined by the min_lon/max_lon, min_lat/max_lat; otherwise, false.

    is_point_size_in_merc_view(lon, lat, meters, min_lon, max_lon, min_lat, max_lat)

    Returns true if a latitude/longitude coordinate, offset by a distance in meters, is within a mercator-projected view defined by min_lon/max_lon, min_lat/max_lat.

    • lon: Longitude coordinate of the point.

    • lat: Latitude coordinate of the point.

    • meters: Distance in meters to offset the point by, in any direction.

    • min_lon: Minimum longitude coordinate of the mercator-projected view.

    • max_lon: Maximum longitude coordinate of the mercator-projected view.

    • min_lat: Minimum latitude coordinate of the mercator-projected view.

    • max_lat: Maximum latitude coordinate of the mercator-projected view.

    Returns: True if the point is within the view defined by the min_lon/max_lon, min_lat/max_lat; otherwise, false.

    is_point_in_view(pt, min_lon, max_lon, min_lat, max_lat)

    Returns true if a latitude/longitude POINT defined in EPSG:4326 is within a mercator-projected view defined by min_lon/max_lon, min_lat/max_lat.

    • pt: The POINT to check. Must be defined in EPSG:4326 spatial reference system.

    • min_lon: Minimum longitude coordinate of the mercator-projected view.

    • max_lon: Maximum longitude coordinate of the mercator-projected view.

    • min_lat: Minimum latitude coordinate of the mercator-projected view.

    • max_lat: Maximum latitude coordinate of the mercator-projected view.

    Returns: True if the point is within the view defined by min_lon/max_lon, min_lat/max_lat; otherwise, false.

    is_point_size_in_view(pt, meters, min_lon, max_lon, min_lat, max_lat)

    Returns true if a latitude/longitude POINT defined in EPSG:4326 is within a mercator-projected view defined by min_lon/max_lon, min_lat/max_lat.

    • pt: The POINT to check. Must be defined in EPSG:4326 spatial reference system.

    • meters: Distance in meters to offset the point by, in any direction.

    • min_lon: Minimum longitude coordinate of the mercator-projected view.

    • max_lon: Maximum longitude coordinate of the mercator-projected view.

    • min_lat: Minimum latitude coordinate of the mercator-projected view.

    • max_lat: Maximum latitude coordinate of the mercator-projected view.

    Returns: True if a latitude/longitude POINT defined in EPSG:4326, offset by a distance in meters, is within the view defined by min_lon/max_lon, min_lat/max_lat; otherwise, false.

    font sizes, line height
  • colors populated from dashboard palette

  • html table

  • undo/redo

  • separator line with styles

  • full html support

  • Added a new “minimal” style mode in which chart titles are hidden by default but appear on rollover. Controlled by feature flag `ui/minimize_chart_size` which defaults to off

  • Within map chart editor geo layers are now renamable

  • Role-based access to control panel UX previously requiring admin access.

  • Table functions are provided to access linear regression coefficients for linear regression models and variable importance scores for random forest models.

  • A new “SHOW MODELS” SQL command allows end users to determine which models are available.

  • More-detailed model metadata can be accessed by admins with SHOW MODEL DETAILS and in a new ml_models system table in the information_schema database.

  • The Executor Resource Manager takes into account the resources needed for each query to schedule them in the most efficient manner.
  • It is defaulted on, however it can be turned off using the following flag: --enable-executor-resource-mgr=0, which will lead query kernel execution to follow the same serial, pre-7.0 path.

  • HeavyConnect now provides graphical Heavy Immerse support for Redshift, Snowflake, and PostGIS connections.
  • For CPU-only systems, mapping capabilities are improved with the introduction of multilayer CPU-rendered geo.

  • Advanced string functions facilitate extraction of data from JSON and externally encoded string formats.
  • Improvements to COUNT DISTINCT reduces memory requirements considerably in cases with very large cardinalities or highly skewed data distributions.

  • Added MULTIPOINT and MULTILINESTRING geo types.

  • Convex and concave hull operators, allowing generation of polygons from points and multipoints. For example, you could generate polygons from clusters of GPS points.

  • Syntax and performance optimizations across all geometry types, table orderings, and commonly nested functions.

  • Significant functionality extension of window functions; define windows directly in temporal terms, which is particularly important in time series with missing observations. Window frame support allows improved control at the edges of windows.

  • New spatial aggregation and smoothing functions. Aggregations work particularly well with LiDAR data--for example to pass through only the highest point within an area to create building or canopy height maps. Smoothing helps with noisy datasets and can reveal larger-scale patterns while minimizing visual distractions.

    Support is included for multidimensional arrays common in the sciences, including Grib2, NetCDF, and hd5.

  • Immerse now supports linking or import of files on the server filesystem (local or mounted). This help prevent slow data transfers when client bandwidth is limited.

  • File globbing and filtering allow import of thousands of files at once.

  • Arrow execution endpoints now leverage the parallel execution framework, and Arrow performance has been significantly improved when high-cardinality dictionary-encoded text columns are returned

  • Introduces a novel polygon rendering algorithm that does not require pre-triangulated or pre-grouped polygons and can render dynamically generated geometry on the fly (via ST_Buffer). The new algorithm is comparable to its predecessor in terms of both performance and memory and enables optimizations and enhancements in future releases.

  • New binary transport protocol to Heavy Immerse that significantly increases performance and interactivity for large result sets

  • New Iframe chart type in Heavy Immerse to allow easier addition of custom chart types. (BETA)

    Automatic use of columnar output (instead of the default row-wise output) for large projections, reducing query times by 5-10X in some cases.
  • Support for full set of ST_TRANSFORM SRIDs supported by geos/proj4 library.

  • Support for numerous vector GIS files (100+ formats supported by current GDAL release).

  • Support for multidimensional array import from formats common in science and meteorology.

  • Improved Table chart export to access all data represented by a Table chart.

  • Introduced dashboard-level named custom SQL.

  • BETA - Adds custom expressions to table columns, allowing for reusable custom dimensions and measures within a single dashboard (defaults to off).
  • BETA - Adds Crosslink feature with Crosslink Panel UI, allowing crossfilters to fire across different data sources within the same dashboard (defaults to off).

  • BETA - Adds Custom SQL Source support and Custom SQL Source Manager, allowing the creation of a data source as a SQL statement (defaults to off)

  • Window functions can now be executed without a partition clause being specified (to signify a partition encompassing all rows in the table).

  • Window functions can now execute over tables with multiple fragments and/or shards.

  • Native support for ST_Transform between all UTM Zones and EPSG:4326 (Lon/Lat) and EPSG:900913 (Web Mercator).

  • ST_Equals support for geospatial columns.

  • Support for the ANSI SQL WIDTH_BUCKET operator for easier and more performant numeric binning, now also used in Immerse for all numeric histogram visualizations

  • The Vulkan backend renderer is now enabled by default. The legacy OpenGL renderer is still available as a fallback if there are blocking issues with Vulkan. You can disable the Vulkan renderer using the renderer-use-vulkan-driver=false configuration flag.

    • Vulkan provides improved performance, memory efficiency, and concurrency.

    • You are likely to see some performance and memory footprint improvements with Vulkan in Release 5.8, most significantly in multi-GPU systems.

  • Support for file path regex filter and sort order when executing the COPY FROM command.

  • New ALTER SYSTEM CLEAR commands that enable clearing CPU or GPU memory from Immerse SQL Editor or any other SQL client.

  • Immerse 3D Pointmap chart and HTML support in text charts are available as a beta feature.

  • Airplane symbol shape has been added as a built-in mark type for the Vega rendering API.

  • Vega symbol and multi-GPU polygon renders have been made significantly faster.

  • User-interrupt of query kernels is now on by default. Queries can be interrupted using Ctrl + C in omnisql, or by calling the interrupt API.

  • Parallel executors is in public beta (set with --num-executors flag).

  • Support for APPROX_QUANTILE aggregate.

  • Support for default column values when creating a table and across all append endpoints, including COPY TO, INSERT INTO TABLE SELECT, INSERT, and binary load APIs.

  • Faster and more robust ability to return result sets in Apache Arrow format when queried from a remote client (i.e. non-IPC).

  • More performant and robust high-cardinality group-by queries.

  • ODBC driver now supports Geospatial data types.

  • Significantly faster point-in-polygon joins through a new range join hash framework.
  • Approximate Median function support.

  • INSERT and INSERT FROM SELECT now support specification of a subset of columns.

  • Automatic metadata updates and vacuuming for optimizing space usage.

  • Significantly improved OmniSciDB startup time, as well as a number of significant load and performance improvements.

  • Improvements to line and polygon stroke rendering and point/symbol rendering.

  • Initial OmniSci Render support for CPU-only query execution ("Query on CPU, render on GPU"), allowing for a wider set of deployment infrastructure choices.
  • Cap metadata stored on previous states of a table by using MAX_ROLLBACK_EPOCHS, improving performance for streaming and small batch load use cases and modulating table size on disk

  • Added SQL function SAMPLE_RATIO, which takes a proportion between 0 and 1 as an input argument and filters rows to obtain a sampling of a dataset.
  • Added support for exporting geo data in GeoJSON format.

  • Dashboard filter functionality is expanded, and filters can be saved as views.

  • You can perform bulk actions on the dashboard list.

  • New UI Setting panel in Immerse for customizing charts.

  • Tabbed dashboards.

  • SQL Editor now handles Vega JSON requests.

  • On multilayer charts, layer visibility can be set by zoom level.
  • Different map charts can be synced together for pan and zoom actions, regardless of data source.

  • Array support for the Array type over JDBC.

  • SELECT DISTINCT in UNION ALL is supported. (UNION ALL is prerelease and must be explicitly enabled.

  • Support for joins on DECIMAL types.

  • Performance improvements on CUDA GPUs, particularly Volta and Turing.

  • Updates to JDBC driver, including escape syntax handling for the fn keyword and added support to get table metadata.
  • Notable performance improvements, particularly for join queries, projection queries with order by and/or limit, queries with scalar subqueries, and multicolumn group-by queries.

  • Query interrupt capability improved to allow canceling long-running queries, also supports JDBC now.

  • Completely overhauled SQL Editorarrow-up-right, including query formatting, snippets, history and more.

  • Database switching from within Immerse, as well as dashboard URLs that contain the database name.

  • Over 50% reduction in load times for the dashboards list initial load and search.

  • Cohort builder now supports count (# records) in aggregate filter.

  • Improved error handling and more meaningful error messages.

  • Custom logos can now be configured separately for light and dark themes.

  • Logos can be configured to deep-link to a specific URL.

  • Improved support for EXISTS and NOT EXISTS subqueries.

  • Added support for LINESTRING, POLYGON, and MULTIPOLYGON in user defined functions.

  • Immerse log-ins are fully sessionized and persist across page refreshes.

  • Pie chart now supports "All Others"arrow-up-right and percentage labels.

  • Cohorts can now be built with aggregation-based filters.arrow-up-right

  • New filter sets can be created through duplicating existing filter sets.

  • Dashboard URLs now link to individual filter setsarrow-up-right.

  • To see these new features in action, please watch this video from Converge 2019arrow-up-right, where Rachel Wang demonstrates how you can use them.

  • Added support for binary dump and restore of database tables.

  • Added support for compile-time registered user-defined functions in C++, and experimental support for runtime user-defined SQL functions and table functions in Python via the Remote Backend Compiler.

  • Support for some forms of correlated subqueries.

  • Support for update via subquery, to allow for updating a table based on calculations performed on another table.

  • Multistep queries that generate large, intermediate result sets now execute up to 2.5x faster by leveraging new JIT code generator for reductions and optimized columnarization of intermediate query results.

  • Frontend-rendered choropleths now support the selection of base map layers.

  • Immerse
    SQL queries
    Install
    load data
    data science foundation
    charts and interfaces
    Release Notes
    SQL SHOW commandsarrow-up-right
    TEMPORARYarrow-up-right
    sitemaparrow-up-right
    Ship traffic analysis
    Pyomnisci interface

    Enter a name for your key pair. For example, MyKey.

  • Click Create. The key pair PEM file downloads to your local machine. For example, you would find MyKey.pem in your Downloads directory.

  • Select the Fulfillment Option, Software Version, and Region.

  • Click Continue to Launch.

  • On the Launch this software page, select Launch through EC2, and then click Launch.

  • From the Choose and Instance Type page, select an available EC2 instance type, and click Review and Launch.

  • Review the instance launch details, and click Launch.

  • Select a key pair, or click Create a key pair to create a new key pair and download it, and then click Launch Instances.

  • On the Launch Status page, click the instance name to see it on your EC2 Dashboard Instances page.

  • Enter the USERNAME (admin), PASSWORD ( {Instance ID} ), and DATABASE (heavyai). If you are using the BYOL version, enter you license key in the key field and click Apply.
  • Click Connect.

  • On the Dashboards page, click NYC Taxi Rides. Explore and filter the chart information on the NYC Taxis Dashboard.

  • Enter the USERNAME (admin) and PASSWORD ( {instance ID} ). If you are using the BYOL version, enter you license key in the key field and click Apply.
  • Click Connect.

  • Click Data Manager, and then click Import Data.

  • Drag your data file onto the table importer page, or use the directory selector.

  • Click Import Files.

  • Verify the column names and datatypes. Edit them if needed.

  • Enter a Name for your table.

  • Click Save Table.

  • Click Connect to Table.

  • On the New Dashboard page, click Add Chart.

  • Choose a chart type.

  • Add dimensions and measures as required.

  • Click Apply.

  • Enter a Name for your dashboard.

  • Click Save.

  • Connect to your instance using its Public DNS. The default user name is centos or ubuntu, depending on the version you are using. For example:

  • Use the following command to run the heavysql SQL command-line utility on HeavyDB. The default user is admin and the default password is { Instance ID }:

    For more information, see heavysql.

  • AWS Marketplace page for HEAVY.AIarrow-up-right
    Tips for Securing Your EC2 Instancearrow-up-right
    Introduction to Heavy Immerse
    Loading Data
    Connecting to Your Linux Instance from Windows Using PuTTYarrow-up-right

    example-heavyai-pv.yml

    Example PV file.

    example-heavyai-pv.yml

    Example PVC file.

    Run the helm install command with the desired deployment name and Helm Chart.

    1. When using a values.yml file:

      $ helm install heavyai --values values.yml heavyaihelmchart-1.0.0.tgz

    2. When not using a values.yml file:

      If you only need to change a value or two from the default values.yml file you can use --set instead of a custom values.yml file.

      For example:

      $ helm install heavyai --set pvcName=MyPVCName heavyaihelmchart-1.0.0.tgz

    Chart.yml

    HEAVY.AI Helm Chart. Contains version and contact information.

    values.yml

    Copy this file and edit values specific to your HEAVY.AI deployment. This is where to note the PVC name. This file is annotated to identify typical customizations and is pre-populated with default values.

    README.pdf

    These instructions.

    deployment.yml

    https://releases.heavy.ai/ee/helm/heavyai-1.0.0.tgzarrow-up-right

    HEAVY.AI platform deployment template. DO NOT EDIT

    ssh -i MyKey.pem [email protected]
    $HEAVYAI_PATH/bin/heavysql
         Helm-workspace
              ↳heavyai
                   ↳Chart.yml
                   ↳values.yml
    	       ↳templates
    	            ↳README.pdf
                        ↳deployment.yml
              ↳misc
                   ↳example-heavyai-pv.yml
                   ↳example-heavyai-pvc.yml
    # Default values for heavyai.
    # This is a YAML-formatted file.
    # Declare variables to be passed into your templates.
    #
    # Version of heavyai to install in the format 'v7.0.0' or 'latest' for the latest version released.
    version: v7.0.0
    # Persistant volume claim name to use with heavyai.
    pvcName: heavyai-pvc
    # Namespace to install heavyai in.
    nameSpace: heavyai
    # Number or GPU's to assign to heavyai or 0 to run the CPU version of heavyai.
    gpuNumber: 1
    # NodeName to install heavyai on, if you wish to let Kubernetes schedule a host, leave it blank.
    nodeName: heavyai-node
    # Immerse port redirect of 6273.
    hostPortImmerse: 9273
    # TCP port redirect of 6274.
    hostPortTCP: 9274
    # HTTP port redirect of 6278.
    hostPortHTTP: 9278
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
     name: heavyai-pvc
     namespace: heavyai
    spec:
     volumeMode: Filesystem
     accessModes:
       - ReadWriteOnce
     resources:
       requests:
         storage: 100Gi
     storageClassName: heavyai
    apiVersion: v1
    kind: PersistentVolume
    metadata:
     name: heavyai-pv
    spec:
     capacity:
       storage: 100Gi
     volumeMode: Filesystem
     accessModes:
       - ReadWriteOnce
     persistentVolumeReclaimPolicy: Retain
     storageClassName: heavyai
     mountOptions:
       - hard
       - nfsvers=4.1
     nfs:
       path: {your nfs path goes here }
       server: { your nfs server name goes here }

    NVIDIA Tesla V100 v2

  • NVIDIA Tesla V100 v1

  • NVIDIA Tesla P100

  • NVIDIA Tesla P40

  • NVIDIA Testa T4

  • The following configurations are valid for systems using any of these GPUs as the building blocks of your system. For production systems, use Tesla enterprise-grade cards. Avoid mixing card types in the same system; use a consistent card model across your environment.

    Primary factors to consider when choosing GPU cards are:

    • The amount of GPU RAM available on each card

    • The number of GPU cores

    • Memory bandwidth

    Newer cards like the Tesla V100 have higher double-precision compute performance, which is important in geospatial analytics. The Tesla V100 models support the NVLink interconnect, which can provide a significant speed increase for some query workloads.

    GPU
    Memory/GPU
    Cores
    Memory Bandwidth
    NVLink

    A100

    40 to 80 GB

    6912

    1134 GB/sec

    For advice on optimal GPU hardware for your particular use case, ask your HEAVY.AI sales representative.

    hashtag
    HeavyDB Architecture

    Before considering hardware details, this topic describes the HeavyDB architecture.

    HeavyDB is a hybrid compute architecture that utilizes GPU, CPU, and storage. GPU and CPU are the Compute Layer, and SSD storage is the Storage Layer.

    When determining the optimal hardware, make sure to consider the storage and compute layers separately.

    Loading raw data into HeavyDB ingests data onto disk, so you can load as much data as you have disk space available, allowing some overhead.

    When queries are executed, HeavyDB optimizer utilizes GPU RAM first if it is available. You can view GPU RAM as an L1 cache conceptually similar to modern CPU architectures. HeavyDB attempts to cache the hot data. If GPU RAM is unavailable or filled, HeavyDB optimizer utilizes CPU RAM (L2). If both L1 and L2 are filled, query records overflow to disk (L3). To minimize latency, use SSDs for the Storage Layer.

    You can run a query on a record set that spans both GPU RAM and CPU RAM as shown in the diagram above, which also shows the relative performance improvement you can expect based on whether the records all fit into L1, a mix of L1 and L2, only L2, or some combination of L1, L2, and L3.

    hashtag
    Hot Records and Columns

    The Hardware Sizing Schedule table refers to hot records, which are the number of records that you want to put into GPU RAM to get zero-lag performance when querying and interacting with the data. The Hardware Sizing Schedule assumes 16 hot columns, which is the number of columns involved in the predicate or computed projections (such as, column1 / column2) of any one of your queries. A 15 percent GPU RAM overhead is reserved for rendering buffering and intermediate results. If your queries involve more columns, the number of records you can put in GPU RAM decreases, accordingly.

    circle-exclamation

    The server is not limited to any number of hot records. You can store as much data on disk as you want. The system can also store and query records in CPU RAM, but with higher latency. The hot records represent the number of records on which you can perform zero-latency queries.

    hashtag
    Projection-only Columns

    HeavyDB does not require all queried columns to be processed on the GPU. Non-aggregate projection columns, such as SELECT x, y FROM table, do not need to be processed on the GPU, so can be stored in CPU RAM. The Hardware Sizing Schedule CPU RAM sizing assumes that up to 24 columns are used in only non-computed projections, in addition to the Hot Records and Columns.

    hashtag
    CPU RAM

    The amount of CPU RAM should equal four to eight times the amount of total available GPU memory. Each NVIDIA Tesla P40 has 24 GB of onboard RAM available, so if you determine that your application requires four NVIDIA P40 cards, you need between 4 x 24 GB x 4 (384 GB) and 4 x 24 GB x 8 (768 GB) of CPU RAM. This correlation between GPU RAM and CPU RAM exists because the HeavyDB uses CPU RAM in certain operations for columns that are not filtered or aggregated.

    hashtag
    SSD Storage

    A HEAVY.AI deployment should be provisioned with enough SSD storage to reliably store the required data on disk, both in compressed format and in HEAVY.AI itself. HEAVY.AI requires 30% overhead beyond compressed data volumes. HEAVY.AI recommends drives such as the Intel® SSD DC S3610 Series, or similar, in any size that meets your requirements.

    circle-info
    • For maximum ingestion speed, HEAVY.AI recommends ingesting data from files stored on the HEAVY.AI instance.

    • Most public cloud environments’ default storage is too small for the data volume HEAVY.AI ingests. Estimate your storage requirements and provision accordingly.

    hashtag
    Hardware Sizing Schedule

    This schedule estimates the number of records you can process based on GPU RAM and CPU RAM sizes, assuming up to 16 hot columns (see Hot Records and Columns). This applies to the compute layer. For the storage layer, provision your application according to SSD Storage guidelines.

    GPU Count
    GPU RAM (GB)
    CPU RAM (GB)
    “Hot” Records

    (NVIDIA P40)

    8x GPU RAM

    L1

    1

    24

    If you already have your data in a database, you can look at the largest fact table, get a count of those records, and compare that with this schedule.

    If you have a .csv file, you need to get a count of the number of lines and compare it with this schedule.

    hashtag
    CPU Cores

    HEAVY.AI uses the CPU in addition to the GPU for some database operations. GPUs are the primary performance driver; CPUs are utilized secondarily. More cores provide better performance but increase the cost. Intel CPUs with 10 cores offer good performance for the price. For example, so you could configure your system with a single NVIDIA P40 GPU and two 10-core CPUs. Similarly, you can configure a server with eight P40s and two 10-core CPUs.

    Suggested CPUs:

    • Intel® Xeon® E5-2650 v3 2.3GHz, 10 cores

    • Intel® Xeon® E5-2660 v3 2.6GHz, 10 cores

    • Intel® Xeon® E5-2687 v3 3.1GHz, 10 cores

    • Intel® Xeon® E5-2667 v3 3.2GHz, 8 cores

    hashtag
    PCI Express (PCIe)

    GPUs are typically connected to the motherboard using PCIe slots. The PCIe connection is based on the concept of a lane, which is a single-bit, full-duplex, high-speed serial communication channel. The most common numbers of lanes are x4, x8, and x16. The current PCIe 3.0 version with an x16 connection has a bandwidth of 16 GB/s. PCIe 2.0 bandwidth is half the PCIe 3.0 bandwidth, and PCIe 1.0 is half the PCIe 2.0 bandwidth. Use a motherboard that supports the highest bandwidth, preferably, PCIe 3.0. To achieve maximum performance, the GPU and the PCIe controller should have the same version number.

    The PCIe specification permits slots with different physical sizes, depending on the number of lanes connected to the slot. For example, a slot with an x1 connection uses a smaller slot, saving space on the motherboard. However, bigger slots can actually have fewer lanes than their physical designation. For example, motherboards can have x16 slots connected to x8, x4, or even x1 lanes. With bigger slots, check to see if their physical sizes correspond to the number of lanes. Additionally, some slots downgrade speeds when lanes are shared. This occurs most commonly on motherboards with two or more x16 slots. Some motherboards have only 16 lanes connecting the first two x16 slots to the PCIe controller. This means that when you install a single GPU, it has the full x16 bandwidth available, but two installed GPUs each have x8 bandwidth.

    HEAVY.AI recommends installing GPUs in motherboards with support for as much PCIe bandwidth as possible. On modern Intel chip sets, each socket (CPU) offers 40 lanes, so with the correct motherboards, each GPU can receive x8 of bandwidth. All recommended System Examples have motherboards designed for maximizing PCIe bandwidth to the GPUs.

    HEAVY.AI does not recommend adding GPUs to a system that is not certified to support the cards. For example, to run eight GPU cards in a machine, the BIOS register the additional address space required for the number of cards. Other considerations include power routing, power supply rating, and air movement through the chassis and cards for temperature control.

    For an emerging alternative to PCIe, see NVLink.

    hashtag
    NVLink

    NVLink is a bus technology developed by NVIDIA. Compared to PCIe, NVLink offers higher bandwidth between host CPU and GPU and between the GPU processors. NVLink-enabled servers, such as the IBM S822LC Minsky server, can provide up to 160 GB/sec bidirectional bandwidth to the GPUs, a significant increase over PCIe. Because Intel does not currently support NVLink, the technology is available only on IBM Power servers. Servers like the NVIDIA-manufactured DGX-1 offer NVLink between the GPUs but not between the host and the GPUs.

    hashtag
    System Examples

    A variety of hardware manufacturers make suitable GPU systems. For more information, follow these links to their product specifications.

    • Dell 2 GPU 2U Serverarrow-up-right

    • NVIDIA DGX Workstationarrow-up-right

    • System 76 Ibex Pro GPU Workstationarrow-up-right

    Hot Records and Columns
    System Examples
    hashtag
    Install Kernel Headers

    Install kernel headers and development packages:

    circle-exclamation

    If installing kernel headers does not work correctly, follow these steps instead:

    1. Identify the Linux kernel you are using by issuing the uname -r command.

    2. Use the name of the kernel (4.18.0-553.el8_10.x86_64 in the following code example) to install kernel headers and development packages:

    Install the dependencies and extra packages:

    hashtag
    Install NVIDIA Drivers and Vulkan

    CUDA is a parallel computing platform and application programming interface (API) model. It uses a CUDA-enabled graphics processing unit (GPU) for general-purpose processing. The CUDA platform provides direct access to the GPU virtual instruction set and parallel computation elements. For more information on CUDA unrelated to installing HEAVY.AI, see https://developer.nvidia.com/cuda-zonearrow-up-right. You can install drivers in multiple ways. This section provides installation information using the NVIDIA website or using dnf.

    circle-info

    Although using the NVIDIA website is more time-consuming and less automated, you are assured that the driver is certified for your GPU. Use this method if you are not sure which driver to install. If you prefer a more automated method and are confident that the driver is certified, you can use the DNF package manager method.

    circle-info

    Please check that the driver's version you download meets the HEAVI.AI minimum requirementsarrow-up-right.

    hashtag
    Install NVIDIA Drivers Using the NVIDIA Website

    Install the CUDA package for your platform and operating system according to the instructions on the NVIDIA website (https://developer.nvidia.com/cuda-downloadsarrow-up-right).

    If you do not know the GPU model installed on your system, run this command:

    The output shows the product type, series, and model. In this example, the product type is Tesla, the series is T (as Turing), and the model is T4.

    1. Select the product type shown after running the command above.

    2. Select the correct product series and model for your installation.

    3. In the Operating System dropdown list, select Linux 64-bit.

    4. In the CUDA Toolkit dropdown list, click a supported version (11.4 or higher).

    5. Click Search.

    6. On the resulting page, verify the download information and click Download.

    circle-info

    Please check that the driver's version you download meets the HEAVI.AI minimum requirementsarrow-up-right.

    Move the downloaded file to the server, change the permissions, and run the installation.

    circle-info

    You might receive the following error during installation:

    ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.

    If you receive this error, blacklist the Nouveau driver by editing the /etc/modprobe.d/blacklist-nouveau.conffile, adding the following lines at the end:

    blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off

    hashtag
    Install NVIDIA Drivers Using DNF

    Install a specific version of the driver for your GPU by installing the NVIDIA repository and using the DNF package manager.

    circle-info

    When installing the driver, ensure your GPU model is supported and meets the HEAVI.AI minimum requirementsarrow-up-right.

    Add the NVIDIA network repository to your system.

    Install the driver version needed with dnf. For 8.0, the minimum version is 535.

    To load the installed driver, you can run sudo modprobe nvidia or nvidia-smi commands, or , in case of driver upgrade, you can reboot your system to ensure that the new version of the driver is loaded using the command sudo reboot

    hashtag
    Check NVIDIA Driver Installation

    Run the specified command to verify that your drivers are installed correctly and recognize the GPUs in your environment. Depending on your environment, you should see output confirming the presence of your NVIDIA GPUs and drivers. This verification step ensures that your system can identify and utilize the GPUs as intended.

    output of nvidia-smi with a system with a correctly working driver
    circle-info

    If you encounter an error similar to the following, the NVIDIA drivers are likely installed incorrectly: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Please ensure that the latest NVIDIA driver is installed and running.

    Please review the Install NVIDIA Drivers section and correct any errors.

    hashtag
    Install Vulkan

    The back-end renderer requires a Vulkan-enabled driver and the Vulkan library to work correctly. Without these components, the database cannot start without disabling the back-end renderer.

    To ensure the Vulkan library and its dependencies are installed, use the DNF.

    For more information about troubleshooting Vulkan, see the Vulkan Renderer section.

    hashtag
    Install CUDA Toolkit ᴼᴾᵀᴵᴼᴺᴬᴸ

    You must install the CUDA Toolkit if you use advanced features like C++ User-Defined Functions or User-Defined Table Functions to extend the database capabilities.

    1. Add the NVIDIA network repository to your system:

    2. List the available CUDA Toolkit versions using the DNF list command

    3. Install the CUDA Toolkit version using DNF.

    4. Check that everything is working correctly:

    https://fedoraproject.org/wiki/EPELarrow-up-right
    Data Type

    primary_key

    Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function will compute the similarity to the search vector specified by the comparison_features cursor. Examples include countries, census block groups, user IDs of website visitors, and aircraft call signs.

    Column<TEXT ENCODING DICT | INT | BIGINT>

    pivot_features

    One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by primary_key based on whether they visited the same census block group in the same hour. If a single census block group feature column is used, the primary_key entities are compared only by the census block groups visited, regardless of time overlap.

    Column<TEXT ENCODING DICT | INT | BIGINT>

    metric

    Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is simply COUNT(*) such that feature overlaps are weighted by the number of co-occurrences.

    Column<INT | BIGINT | FLOAT | DOUBLE>

    comparison_pivot_features

    hashtag
    Output Columns

    Name
    Description
    Data Types

    class

    ID of the primary key being compared against the search vector.

    Column<TEXT ENCODING DICT | INT | BIGINT> (type will be the same of primary_key input column)

    similarity_score

    Computed cosine similarity score between each primary_key pair, with values falling between 0 (completely dissimilar) and 1 (completely similar).

    Column<FLOAT>

    hashtag
    Example

    hashtag
    Input Arguments
    Parameter
    Description
    Data Type

    entity_id

    Column containing keys/IDs used to identify the entities for which dwell/session times are to be computed. Examples include IP addresses of clients visiting a website, login IDs of database users, MMSIs of ships, and call signs of airplanes.

    Column<TEXT ENCODING DICT | BIGINT>

    site_id

    Column containing keys/IDs of dwell “sites” or locations that entities visit. Examples include website pages, database session IDs, ports, airport names, or binned h3 hex IDs for geographic location.

    Column<TEXT ENCODING DICT | BIGINT>

    hashtag
    Output Columns

    Name
    Description
    Data Type

    entity_id

    The ID of the entity for the output dwell time, identical to the corresponding entity_id column in the input.

    Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the entity_id input column type)

    site_id

    The site ID for the output dwell time, identical to the corresponding site_id column in the input.

    Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id input column type)

    Example

    ArrayCol[n] ...

    Returns value(s) from specific location n in the array.

    UNNEST(ArrayCol)

    Extract the values in the array to a set of rows. Requires GROUP BY; projecting UNNEST is not currently supported.

    test = ANY ArrayCol

    ANY compares a scalar value with a single row or set of values in an array, returning results in which at least one item in the array matches. ANY must be preceded by a comparison operator.

    test = ALL ArrayCol

    ALL compares a scalar value with a single row or set of values in an array, returning results in which all records in the array field are compared to the scalar value. ALL must be preceded by a comparison operator.

    CARDINALITY()

    Returns the number of elements in an array. For example:

    hashtag
    Examples

    The following examples show query results based on the table test_array created with the following statement:

    The following queries use arrays in an INTEGER field:

    EXPLAIN CALCITE

    Returns a relational algebra tree describing the high-level plan to execute the statement.

    The table below lists the relational algebra classes used to describe the execution plan for a SQL statement.

    Method

    Description

    LogicalAggregate

    Operator that eliminates duplicates and computes totals.

    LogicalCalc

    Expression that computes project expressions and also filters.

    LogicalChi

    Operator that converts a stream to a relation.

    For example, a SELECT statement is described as a table scan and projection.

    If you add a sort order, the table projection is folded under a LogicalSort procedure.

    When the SQL statement is simple, the EXPLAIN CALCITE version is actually less “human readable.” EXPLAIN CALCITE is more useful when you work with more complex SQL statements, like the one that follows. This query performs a scan on the BOOK table before scanning the BOOK_ORDER table.

    Revising the original SQL command results in a more natural selection order and a more performant query.

    hashtag
    EXPLAIN CALCITE DETAILED

    Augments the EXPLAIN CALCITE command by adding details about referenced columns in the query plan.

    For example, for the following EXPLAIN CALCITE command execution:

    EXPLAIN CALCITE DETAILED adds more column details as seen below:

    , where the shortest path is weighted by the nth exponent of the computed slope between a bin and its neighbors, with the nth exponent being specified by
    slope_weighted_exponent
    . A max allowed traversable slope can be specified by
    slope_pct_max
    , such that no traversal is considered or allowed between bins with absolute computed slopes greater than the percentage specified by
    slope_pct_max
    .

    Input Arguments

    Parameter
    Description
    Data Types

    x

    Input x-coordinate column or expression of the data to be rasterized.

    Column <FLOAT | DOUBLE>

    y

    Input y-coordinate column or expression of the data to be rasterized.

    Column <FLOAT | DOUBLE> (must be the same type as x)

    Output Columns

    Result of the example query above, showing the shortest slope-weighted path between the Nepali planes and the peak of Mt. Everest. The path closely mirrors the actual climbing route used.
    statement), it is highly recommended that
    use_cache
    is set to
    false
    or left unspecified (as it is defaulted to
    false
    ) to avoid the performance and memory overhead incurred by used of the cache.

    The bounds of the data retrieved can be optionally specified with the x_min, x_max, y_min, y_max arguments. These arguments can be useful when the user desires to retrieve a small geographic area from a large point-cloud file set, as files containing data outside the bounds of the specified bounding box will be quickly skipped by tf_load_point_cloud, only requiring a quick read of the spatial metadata for the file.

    Input Arguments

    Parameter
    Description
    Data Types

    path

    The path of the file or directory containing the las/laz file or files. Can contain globs. Path must be in allowed-import-paths.

    TEXT ENCODING NONE

    out_srs (optional)

    EPSG code of the output SRID. If not specified, output points are automatically converted to lon/lat (EPSG 4326).

    TEXT ENCODING NONE

    Output Columns

    Name
    Description
    Data Types

    x

    Point x-coordinate

    Column<DOUBLE>

    y

    Point y-coordinate

    Column<DOUBLE>

    Example A

    Example B

    LiDAR data from downtown Tallahassee, FL, colored by Z-value

    tf_mandelbrot

  • tf_mandelbrot_float

  • tf_mandelbrot_cuda

  • tf_mandelbrot_cuda_float

  • hashtag
    tf_mandelbrot

    Parameter
    Data Type

    x_pixels

    32-bit integer

    y_pixels

    32-bit integer

    x_min

    DOUBLE

    x_max

    Example

    Computed Mandelbrot set using the HEAVY.AI Vega demo

    hashtag
    tf_mandelbrot_cuda

    Parameter
    Data Type

    x_pixels

    32-bit integer

    y_pixels

    32-bit integer

    x_min

    DOUBLE

    x_max

    hashtag
    tf_mandelbrot_float

    Parameter
    Data Type

    x_pixels

    32-bit integer

    y_pixels

    32-bit integer

    x_min

    DOUBLE

    x_max

    hashtag
    tf_mandelbrot_cuda_float

    Parameter
    Data Type

    x_pixels

    32-bit integer

    y_pixels

    32-bit integer

    x_min

    DOUBLE

    x_max

    Mandelbrot setarrow-up-right

    Users and Databases

    HEAVY.AI has a default superuser named admin with default password HyperInteractive.

    When you create or alter a user, you can grant superuser privileges by setting the is_super property.

    You can also specify a default database when you create or alter a user by using the default_db property. During login, if a database is not specified, the server uses the default database assigned to that user. If no default database is assigned to the user and no database is specified during login, the heavyai database is used.

    circle-info

    When an administrator, superuser, or owner drops or renames a database, all current active sessions for users logged in to that database are invalidated. The users must log in again.

    Similarly, when an administrator or superuser drops or renames a user, all active sessions for that user are immediately invalidated.

    circle-info

    If a password includes characters that are nonalphanumeric, it must be enclosed in single quotes when logging in to heavysql. For example: $HEAVYAI_PATH/bin/heavysql heavyai -u admin -p '77Heavy!9Ai'

    For more information about users, roles, and privileges, see .

    hashtag
    Nomenclature Constraints

    The following are naming convention requirements for HEAVY.AI objects, described in notation:

    • A NAME is [A-Za-z_][A-Za-z0-9\$_]*

    • A DASHEDNAME is [A-Za-z_][A-Za-z0-9\$_\-]*

    • An EMAIL is ([^[:space:]\"]+|\".+\")@[A-Za-z0-9][A-Za-z0-9\-\.]*\.[A-Za-z]+

    User objects can use NAME, DASHEDNAME, or EMAIL format.

    Role objects must use either NAME or DASHEDNAME format.

    Database and column objects must use NAME format.

    hashtag
    CREATE USER

    HEAVY.AI accepts (almost) any string enclosed in optional double quotation marks as the user name.

    Property
    Value

    Examples:

    hashtag
    DROP USER

    Example:

    hashtag
    ALTER USER

    HEAVY.AI accepts (almost) any string enclosed in optional double quotation marks as the old or new user name.

    Property
    Value

    Example:

    hashtag
    CREATE DATABASE

    Database names cannot include quotes, spaces, or special characters.

    circle-info

    In Release 6.3.0 and later, database names are case insensitive. Duplicate database names will cause a failure when attempting to start HeavyDB 6.3.0 or higher. Check database names and revise as necessary to avoid duplicate names.

    Property
    Value

    Example:

    hashtag
    DROP DATABASE

    Example:

    hashtag
    ALTER DATABASE

    To alter a database, you must be the owner of the database or an HeavyDB superuser.

    Example:

    hashtag
    ALTER DATABASE OWNER TO

    Enable super users to change the owner of a database.

    hashtag
    Example

    Change the owner of my_database to user Joe:

    circle-info

    Only superusers can run the ALTER DATABASE OWNER TO command.

    hashtag
    REASSIGN OWNED

    Changes ownership of database objects (tables, views, dashboards, etc.) from a user or set of users to a different user. When the ALL keyword is specified, ownership change would apply to database objects across all databases. Otherwise, ownership change only applies to database objects in the current database.

    Example: Reassign database objects owned by jason and mike in the current database to joe.

    Example: Reassign database objects owned by jason and mike across all databases to joe.

    Database object ownership changes only for objects within the database; ownership of the database itself is not affected. You must be a superuser to run this command.

    hashtag
    Database Security Example

    See in for a database security example.

    tf_graph_shortest_paths_distances

    Given a distance-weighted directed graph, consisting of a queryCURSOR input consisting of the starting and ending node for each edge and a distance, and a specified origin node, tf_graph_shortest_paths_distances computes the shortest distance-weighted path distance between the origin_node and every other node in the graph. It returns a row for each node in the graph, with output columns consisting of the input origin_node, the given destination_node, the distance for the shortest path between the two nodes, and the number of edges or graph "hops" between the two nodes. If origin_node does not exist in the node1 column of the edge_list CURSOR, an error is returned.

    Input Arguments

    Parameter
    Description
    Data Types

    Output Columns

    Name
    Description
    Data Types

    Example A

    Example B

    LDAP Integration

    HEAVY.AI supports LDAP authentication using an IPA Server or Microsoft Active Directory.

    You can configure HEAVY.AI Enterprise edition to map LDAP roles 1-to-1 to HEAVY.AI roles. When you enable this mapping, LDAP becomes the main authority controlling user roles in HEAVY.AI.

    circle-info

    LDAP mapping is available only in HEAVY.AI Enterprise edition.

    HEAVY.AI supports five configuration settings that allow you to integrate with your LDAP server.

    Parameter
    Description
    Example

    hashtag
    Obtaining Credential Information

    To find the ldap-role-query-url and ldap-role-query-regex to use, query your user roles. For example, if there is a user named kiran on the IPA LDAP server ldap://myldapserver.mycompany.com, you could use the following curl command to get the role information:

    When successful, it returns information similar to the following:

    • ldap-dn matches the DN, which is uid=kiran,cn=users,cn=accounts,dc=mycompany,dc=com.

    • ldap-role-query-url includes the LDAP URI + the DN + the LDAP attribute that represents the role/group the member belongs to, such as memberOf.

    circle-exclamation

    Make sure that LDAP configuration appears before the [web] section of heavy.conf.

    circle-info

    Double quotes are not required for LDAP properties in heavy.conf. For example, both of the following are valid:

    ldap-uri = "ldap://myldapserver.mycompany.com" ldap-uri = ldap://myldapserver.mycompany.com

    hashtag
    Setting Up LDAP with HEAVY.AI

    To integrate LDAP with HEAVY.AI, you need the following:

    • A functional LDAP server, with all users/roles/groups created (ldap-uri, ldap-dn, ldap-role-query-url, ldap-role-query-regex, and ldap-superuser-role) to be used by HEAVY.AI. You can use the curl command to test and find the filters.

    Once you have your server information, you can configure HEAVY.AI to use LDAP authentication.

    1. Locate the heavy.conf file and edit it to include the LDAP parameter. For example:

    2. Restart the HEAVY.AI server:

    3. Log on to heavysql as MyCompany user, or any user who belongs to one of the roles/groups that match the filter.

    circle-exclamation

    When you use LDAP authentication, the default admin user and password HyperInteractive do not work unless you create the admin user with the same password on the LDAP server.

    If your login fails, inspect $HEAVYAI_STORAGE/mapd_log/heavyai_server.INFO to check for any obvious errors about LDAP authentication.

    Once you log in, you can create a new role name in heavysql, and then apply GRANT/REVOKE privileges to the role. Log in as another user with that role and confirm that GRANT/REVOKE works.

    circle-info

    If you refresh the browser window, you are required to log in and reauthenticate.

    hashtag
    Using LDAPS

    To use LDAPS, HEAVY.AI must trust the LDAP server's SSL certificate. To achieve this, you must have the CA for the server's certificate, or the server certificate itself. Install the certificate as a trusted certificate.

    hashtag
    IPA on CentOS

    To use IPA as your LDAP server with HEAVY.AI running on CentOS 7:

    1. Copy the IPA server CA certificate to your local machine.

    2. Update the PKI certificates.

    3. Edit /etc/openldap/ldap.conf to add the following line.

    hashtag
    IPA on Ubuntu

    To use IPA as your LDAP server with HEAVY.AI running on Ubuntu:

    1. Copy the IPA server CA certificate to your local machine.

    2. Rename ipa-ca.crm to ipa-ca.crt so that the certificates bundle update script can find it:

    3. Update the PKI certificates:

    hashtag
    Active Directory

    1. Locate the heavy.conf file and edit it to include the LDAP parameter.

    Example 1:

    Example 2:

    2. Restart the HEAVY.AI server:

    circle-info

    Other LDAP user authentication attributes, such as userPrincipalName, are not currently supported.

    tf_graph_shortest_path

    Given a distance-weighted directed graph, consisting of a queryCURSOR input consisting of the starting and ending node for each edge and a distance, and a specified origin and destination node, tf_graph_shortest_path computes the shortest distance-weighted path through the graph between origin_node and destination_node, returning a row for each node along the computed shortest path, with the traversal-ordered index of that node and the cumulative distance from the origin_node to that node. If either origin_node or destination_node do not exist, an error is returned.

    Input Arguments

    Parameter
    Description
    Data Types

    Output Columns

    Name
    Description
    Data Types

    Example A

    Example B

    tf_geo_rasterize_slope

    Similar to tf_geo_rasterize, but also computes the slope and aspect per output bin. Aggregates point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type) across all points in each bin as the output value for the bin. A Gaussian average is then taken over the neighboring bins, with the number of bins specified by neighborhood_fill_radius, optionally only filling in null-valued bins if fill_only_nulls is set to true. The slope and aspect is then computed for every bin, based on the z values of that bin and its neighboring bins. The slope can be returned in degrees or as a fraction between 0 and 1, depending on the boolean argument to compute_slope_in_degrees.

    Note that the bounds of the spatial output grid will be bounded by the x/y range of the input query, and if SQL filters are applied on the output of the tf_geo_rasterize_slope table function, these filters will also constrain the output range.

    hashtag
    Input Arguments

    Parameter
    Description
    Data Types

    hashtag
    Output Columns

    Name
    Description
    Data Types

    Example

    tf_feature_self_similarity

    Given a query input of entity keys/IDs (for example, airplane tail numbers), a set of feature columns (for example, airports visited), and a metric column (for example number of times each airport was visited), scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.

    hashtag
    Input Arguments

    Parameter
    Description
    Data Type

    hashtag
    Output Columns

    Name
    Description
    Data Types

    Example

    tf_point_cloud_metadata

    Returns metadata for one or more las or laz point cloud/LiDAR files from a local file or directory source, optionally constraining the bounding box for metadata retrieved to the lon/lat bounding box specified by the x_min, x_max, y_min, y_max arguments.

    Note: specified path must be contained in global allowed-import-paths, otherwise an error will be returned.

    Input Arguments

    Parameter
    Description
    Data Types

    Output Columns

    Name
    Description
    Data Types

    Example

    Distributed Configuration

    circle-exclamation

    When installing a distributed cluster, you must run initdb --skip-geo to avoid the automatic creation of the sample geospatial data table. Otherwise, metadata across the cluster falls out of synchronization and can put the server in an unusable state.

    HEAVY.AI supports distributed configuration, which allows single queries to span more than one physical host when the scale of the data is too large to fit on a single machine.

    In addition to increased capacity, distributed configuration has other advantages:

    tf_raster_contour_lines; tf_raster_contour_polygons

    Process a raster input to derive contour lines or regions and output as LINESTRING or POLYGON for rendering or further processing. Each has two variants:

    • One that re-rasterizes the input points ()

    • One which accepts raw raster points directly ()

    Use the rasterizing variants if the raster table rows are not already sorted in row-major order (for example, if they represent an arbitrary 2D point cloud), or if filtering or binning is required to reduce the input data to a manageable count (to speed up the contour processing) or to smooth the input data before contour processing. If the input rows do not already form a rectilinear region, the output region will be their 2D bounding box. Many of the parameters of the rasterizing variant are directly equivalent to those of

    Type Casts

    Expression
    Example
    Description
    sudo dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
    sudo dnf -y upgrade kernel
    sudo reboot now
    sudo dnf -y install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
    Sudo dnf install -y kernel-devel kernel-headers pciutils dkms
    lspci -v | egrep "3D|VGA*.NVIDIA" | awk -F '\[|\]' ' { print $2 } '
    Tesla T4
    chmod +x NVIDIA-Linux-x86_64-*.run
    sudo ./NVIDIA-Linux--x86_64-*.runYou might get the following error during installation:
    sudo dnf config-manager --add-repo \
    http://developer.download.nvidia.com/compute/cuda/repos/rhel8/$(uname -i)/cuda-rhel8.repo
    sudo dnf -y module install nvidia-driver:535-dkms
    sudo dnf -y install vulkan
    sudo dnf config-manager --add-repo \
    http://developer.download.nvidia.com/compute/cuda/repos/rhel8/$(uname -i)/cuda-rhel8.repo
    dnf list cuda-toolkit-* | egrep -v config
    
    Available Packages
    cuda-toolkit-10-1.x86_64                     10.1.243-1        cuda-rhel8-x86_64
    cuda-toolkit-10-2.x86_64                     10.2.89-1         cuda-rhel8-x86_64
    cuda-toolkit-11-0.x86_64                     11.0.3-1          cuda-rhel8-x86_64
    cuda-toolkit-11-1.x86_64                     11.1.1-1          cuda-rhel8-x86_64
    cuda-toolkit-11-2.x86_64                     11.2.2-1          cuda-rhel8-x86_64
    cuda-toolkit-11-3.x86_64                     11.3.1-1          cuda-rhel8-x86_64
    cuda-toolkit-11-4.x86_64                     11.4.4-1          cuda-rhel8-x86_64
    cuda-toolkit-11-5.x86_64                     11.5.2-1          cuda-rhel8-x86_64
    cuda-toolkit-11-6.x86_64                     11.6.2-1          cuda-rhel8-x86_64
    cuda-toolkit-11-7.x86_64                     11.7.1-1          cuda-rhel8-x86_64
    cuda-toolkit-11-8.x86_64                     11.8.0-1          cuda-rhel8-x86_64
    cuda-toolkit-12.x86_64                       12.5.0-1          cuda-rhel8-x86_64
    cuda-toolkit-12-0.x86_64                     12.0.1-1          cuda-rhel8-x86_64
    cuda-toolkit-12-1.x86_64                     12.1.1-1          cuda-rhel8-x86_64
    cuda-toolkit-12-2.x86_64                     12.2.2-1          cuda-rhel8-x86_64
    cuda-toolkit-12-3.x86_64                     12.3.2-1          cuda-rhel8-x86_64
    cuda-toolkit-12-4.x86_64                     12.4.1-1          cuda-rhel8-x86_64
    cuda-toolkit-12-5.x86_64                     12.5.0-1          cuda-rhel8-x86_64
    
    sudo dnf -y install cuda-toolkit-<version>.x86_64
    nvcc --version
    
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2020 NVIDIA Corporation
    Built on Mon_Nov_30_19:08:53_PST_2020
    Cuda compilation tools, release 11.2, V11.2.67
    Build cuda_11.2.r11.2/compiler.29373293_0
    SELECT
      *
    FROM
      TABLE(
        tf_feature_similarity(
          primary_features => CURSOR(
            SELECT
              primary_key,
              pivot_features,
              metric
            from
              table
            where
              ...
            group by
              primary_key,
              pivot_features
          ),
          comparison_features => CURSOR(
            SELECT
              comparison_metric
            from
              table
            where
              ...
            group by <column>
          ),
          use_tf_idf => <boolean>
        )
      )
    /* Compute the similarity of US airline flight nums to a particular
    Delta flight (DAL795) based on the cosine similarity of the overlap of
    flight paths binned to a H3 Hex at zoom level 7 (roughly 5 sq km),  
    and return the top 10 most similar flight nums */
    
    SELECT
      *
    FROM
      TABLE(
        tf_feature_similarity(
          primary_features => CURSOR(
            SELECT
              callsign,
              geotoh3(st_x(location), st_y(location), 7) as h3,
              count(*) as n
            from
              adsb_2021_03_01
            where
              operator in (
                'Delta Air Lines',
                'Alaska Airlines',
                'Southwest Airlines',
                'American Airlines',
                'United Airlines'
              )
              and altitude >= 1000
            group by
              callsign,
              h3
          ),
          comparison_features => CURSOR(
            SELECT
              geotoh3(st_x(location), st_y(location), 7) as h3,
              COUNT(*) as n
            from
              adsb_2021_03_01
            where
              callsign = 'DAL795'
              and altitude >= 1000
            group by
              h3
          ),
          use_tf_idf => false
        )
      )
    ORDER BY
      similarity_score desc
    limit
      10;
      
    class|similarity_score
    DAL795|1
    DAL538|0.610889
    DAL1192|0.3419932
    DAL1185|0.3391671
    SWA4346|0.3206964
    DAL365|0.3037131
    SWA953|0.2912168
    UAL1559|0.2747431
    SWA2098|0.2511763
    DAL526|0.2473387
    select * from table( 
      tf_compute_dwell_times( 
        data => CURSOR( 
          select 
            entity_id, 
            site_id, 
            ts, 
          from 
            <table> 
          where 
            ... 
            ), 
            min_dwell_seconds => <seconds>, 
            min_dwell_points => <points>, 
            max_inactive_seconds => <seconds> 
            ) 
          );
    /* Data from https://www.kaggle.com/datasets/vodclickstream/netflix-audience-behaviour-uk-movies */
    
    select
      *
    from
      table(
        tf_compute_dwell_times(
          data => cursor(
            select
              user_id,
              movie_id,
              ts
            from
              netflix_audience_behavior
          ),
          min_dwell_points => 3,
          min_dwell_seconds => 600,
          max_inactive_seconds => 10800
        )
      )
    order by
      num_dwell_points desc
    limit
      10;
    
    entity_id|site_id|prev_site_id|next_site_id|session_id|start_seq_id|ts|dwell_time_sec|num_dwell_points
    59416738c3|cbdf9820bc|d058594d1c|863b39bbe8|2|19|2017-02-21 15:12:11.000000000|4391|54
    16d994f6dd|1bae944666|4f1cf3c2dc|NULL|5|61|2017-11-11 20:27:02.000000000|9570|36
    3675d9ba4a|948f2b5bf6|948f2b5bf6|69cb38018a|2|11|2018-11-26 18:42:52.000000000|3600|34
    da01959c0b|fd711679f9|1f579d43c3|NULL|5|90|2019-03-21 05:37:22.000000000|7189|31
    23c52f9b50|df00041e47|df00041e47|NULL|2|39|2019-01-21 15:53:33.000000000|1227|29
    da01959c0b|8ab46a0cb1|f1fffa6ff4|1f579d43c3|3|29|2019-03-12 04:33:01.000000000|6026|29
    23c52f9b50|df00041e47|NULL|df00041e47|1|10|2019-01-21 15:33:39.000000000|1194|28
    da01959c0b|1f579d43c3|8ab46a0cb1|fd711679f9|4|63|2019-03-17 02:01:49.000000000|7240|27
    3261cb81a5|1cb40406ae|NULL|NULL|1|2|2019-04-28 20:48:24.000000000|11240|27
    dbed64ce9e|c5830185ca|NULL|NULL|1|3|2019-03-01 06:43:32.000000000|7261|25
    omnisql> SELECT * FROM test_array;
    name|colors|qty
    Banana|{green, yellow}|{1, 2}
    Cherry|{red, black}|{1, 1}
    Olive|{green, black}|{1, 0}
    Onion|{red, white}|{1, 1}
    Pepper|{red, green, yellow}|{1, 2, 3}
    Radish|{red, white}|{}
    Rutabaga|NULL|{}
    Zucchini|{green, yellow}|{NULL}
    omnisql> SELECT UNNEST(colors) AS c FROM test_array;
    Exception: UNNEST not supported in the projection list yet.
    omnisql> SELECT UNNEST(colors) AS c, count(*) FROM test_array group by c;
    c|EXPR$1
    green|4
    yellow|3
    red|4
    black|2
    white|2
    omnisql> SELECT name, colors [2] FROM test_array;
    name|EXPR$1
    Banana|yellow
    Cherry|black
    Olive|black
    Onion|white
    Pepper|green
    Radish|white
    Rutabaga|NULL
    Zucchini|yellow
    omnisql> SELECT name, colors FROM test_array WHERE colors[1]='green';
    name|colors
    Banana|{green, yellow}
    Olive|{green, black}
    Zucchini|{green, yellow}
    omnisql> SELECT * FROM test_array WHERE colors IS NULL;
    name|colors|qty
    Rutabaga|NULL|{}
    omnisql> SELECT name, qty FROM test_array WHERE qty[2] >1;
    name|qty
    Banana|{1, 2}
    Pepper|{1, 2, 3}
    omnisql> SELECT name, qty FROM test_array WHERE 15< ALL qty;
    No rows returned.
    omnisql> SELECT name, qty FROM test_array WHERE 2 = ANY qty;
    name|qty
    Banana|{1, 2}
    Pepper|{1, 2, 3}
    omnisql> SELECT COUNT(*) FROM test_array WHERE qty IS NOT NULL;
    EXPR$0
    8
    omnisql> SELECT COUNT(*) FROM test_array WHERE CARDINALITY(qty)<0;
    EXPR$0
    6
    CREATE TABLE test_array (name TEXT ENCODING DICT(32),colors TEXT[] ENCODING DICT(32), qty INT[]);
    EXPLAIN <STMT>
    EXPLAIN CALCITE <STMT>
    heavysql> EXPLAIN CALCITE (SELECT * FROM movies);
    Explanation
    LogicalProject(movieId=[$0], title=[$1], genres=[$2])
       LogicalTableScan(TABLE=[[CATALOG, heavyai, MOVIES]])
    heavysql> EXPLAIN calcite (SELECT * FROM movies ORDER BY title);
    Explanation
    LogicalSort(sort0=[$1], dir0=[ASC])
       LogicalProject(movieId=[$0], title=[$1], genres=[$2])
          LogicalTableScan(TABLE=[[CATALOG, omnisci, MOVIES]])
    heavysql> EXPLAIN calcite SELECT bc.firstname, bc.lastname, b.title, bo.orderdate, s.name
    FROM book b, book_customer bc, book_order bo, shipper s
    WHERE bo.cust_id = bc.cust_id AND b.book_id = bo.book_id AND bo.shipper_id = s.shipper_id
    AND s.name = 'UPS';
    Explanation
    LogicalProject(firstname=[$5], lastname=[$6], title=[$2], orderdate=[$11], name=[$14])
        LogicalFilter(condition=[AND(=($9, $4), =($0, $8), =($10, $13), =($14, 'UPS'))])
            LogicalJoin(condition=[true], joinType=[INNER])
                LogicalJoin(condition=[true], joinType=[INNER])
                    LogicalJoin(condition=[true], joinType=[INNER])
                        LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK]])
                        LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_CUSTOMER]])
                    LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_ORDER]])
                LogicalTableScan(TABLE=[[CATALOG, omnisci, SHIPPER]])
    heavysql> EXPLAIN calcite SELECT bc.firstname, bc.lastname, b.title, bo.orderdate, s.name
    FROM book_order bo, book_customer bc, book b, shipper s
    WHERE bo.cust_id = bc.cust_id AND bo.book_id = b.book_id AND bo.shipper_id = s.shipper_id
    AND s.name = 'UPS';
    Explanation
    LogicalProject(firstname=[$10], lastname=[$11], title=[$7], orderdate=[$3], name=[$14])
        LogicalFilter(condition=[AND(=($1, $9), =($5, $0), =($2, $13), =($14, 'UPS'))])
            LogicalJoin(condition=[true], joinType=[INNER])
                LogicalJoin(condition=[true], joinType=[INNER])
                    LogicalJoin(condition=[true], joinType=[INNER])
                      LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_ORDER]])
                      LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_CUSTOMER]])
                    LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK]])
                LogicalTableScan(TABLE=[[CATALOG, omnisci, SHIPPER]])
    heavysql> EXPLAIN CALCITE SELECT x, SUM(y) FROM test GROUP BY x;
    Explanation
    LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
      LogicalProject(x=[$0], y=[$2])
        LogicalTableScan(table=[[testDB, test]])
    heavysql> EXPLAIN CALCITE DETAILED SELECT x, SUM(y) FROM test GROUP BY x;
    Explanation
    LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])	{[$1->db:testDB,tableName:test,colName:y]}
      LogicalProject(x=[$0], y=[$2])	{[$2->db:testDB,tableName:test,colName:y], [$0->db:testDB,tableName:test,colName:x]}
        LogicalTableScan(table=[[testDB, test]])
    SELECT * FROM TABLE(
        tf_raster_graph_shortest_slope_weighted_path(
            raster => CURSOR(
                SELECT x, y, z FROM table
            ),
            agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
            bin_dim => <meters>,
            geographic_coords => <true/false>,
            neighborhood_fill_radius => <num bins>,
            fill_only_nulls => <true/false>,
            origin_x => <origin x coordinate>,
            origin_y => <origin y coordinate>,
            destination_x => <destination x coordinate>,
            destination_y => <destination y coordinate>,
            slope_weighted_exponent => <exponent>,
            slope_pct_max => <max pct slope>
        )
    /* Compute the shortest slope weighted path over a 30m Copernicus 
    Digital Elevation Model (DEM) input raster comprising the area around Mt. Everest,
    to compute the shorest slope-weighted path from the plains of Nepal to the peak */
    
    create table mt_everest_climb as
    select
      path_step,
      st_setsrid(st_point(x, y), 4326) as path_pt
    from
      table(
        tf_raster_graph_shortest_slope_weighted_path(
          raster => cursor(
            select
              st_x(raster_point),
              st_y(raster_point),
              z
            from
              copernicus_30m_mt_everest
          ),
          agg_type => 'AVG',
          bin_dim => 30,
          geographic_coords => TRUE,
          neighborhood_fill_radius => 1,
          fill_only_nulls => FALSE,
          origin_x => 86.01,
          origin_y => 27.01,
          destination_x => 86.9250,
          destination_y => 27.9881,
          slope_weight_exponent => 4,
          slope_pct_max => 50
        )
      );
    SELECT * FROM TABLE(
        tf_load_point_cloud(
            path => <path>,
            [out_srs => <out_srs>,
            use_cache => <use_cache>,
            x_min => <x_min>,
            x_max => <x_max>,
            y_min => <y_min>,
            y_max => <y_max>]
        )
    )    
    CREATE TABLE wake_co_lidar_test AS
    SELECT
      *
    FROM
      TABLE(
        tf_load_point_cloud(
          path => '/path/to/20150118_LA_37_20066601.laz'
        )
      );
    SELECT
      x, y, z, classification
    FROM
      TABLE(
        tf_load_point_cloud(
          path => '/path/to/las_files/*.las',
          out_srs => 'EPSG:4326',
          use_cache => true,
          y_min => 37.0,
          y_max => 38.0,
          x_min => -123.0,
          x_max => -122.0
        )
      )
    SELECT * FROM TABLE(
      tf_mandelbrot( 
        x_pixels => <x_pixels>,
        y_pixels => <y_pixels>,
        x_min => <x_min>,
        x_max => <x_max>,
        y_min => <y_min>,
        y_max => <y_max>,
        max_iterations => <max_iterations>
      )
    )  
    SELECT * FROM TABLE(
      tf_mandelbrot_cuda( <x_pixels>, <y_pixels>, <x_min>, <x_max>, <y_min>, <y_max>, <max_iterations>
      )
    )
    SELECT * FROM TABLE(
      tf_mandelbrot_float(<x_pixels>, <y_pixels>, <x_min>, <x_max>, <y_min>, <y_max>, <max_iterations>
      )
    )
    SELECT * FROM TABLE(
      tf_mandelbrot_cuda_float( <x_pixels>, <y_pixels>, <x_min>, <x_max>, <y_min>, <y_max>, <max_iterations>
      )
    )
    SELECT * FROM TABLE(
        tf_graph_shortest_path(
            edge_list => CURSOR(
                SELECT node1, node2, distance FROM table
            ),
            origin_node => <origin node>,
            destination_node => <destination node>
        )
    select * from table(
      tf_feature_self_similarity(
        primary_features => cursor(
          select
            primary_key,
            pivot_features,
            metric
          from
            table
          group by
            primary_key,
            pivot_features
        ),
        use_tf_idf => <boolean>))
    SELECT * FROM TABLE(
        tf_point_cloud_metadata(
            path => <path>,
            [x_min => <x_min>,
            x_max => <x_max>,
            y_min => <y_min>,
            y_max => <y_max>]
        )
    )

    One or more columns constituting a compound feature for the search vector. This should match in number of sub-features, types, and semantics pivot features.

    Column<TEXT ENCODING DICT | INT | BIGINT>

    comparison_metric

    Column denoting the values used as input for the cosine similarity metric computation from the search vector. In many cases, this is simply COUNT(*) such that feature overlaps are weighted by the number of co-occurrences.

    Column<TEXT ENCODING DICT | INT | BIGINT>

    use_tf_idf

    Boolean constant denoting whether TF-IDFarrow-up-right weighting should be used in the cosine similarity score computation.

    BOOLEAN

    ts

    Column denoting the time at which an event occurred.

    Column<TIMESTAMP(0|3|6|0)>

    min_dwell_seconds

    Constant integer value specifying the minimum number of seconds required between the first and last timestamp-ordered record for an entity_id at a site_id to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3600 (one hour), but only 1800 seconds elapses between an entity’s first and last ordered timestamp records at a site, these records are not considered a valid session and a dwell time for that session is not calculated.

    BIGINT (other integer types are automatically casted to BIGINT)

    min_dwell_points

    A constant integer value specifying the minimum number of successive observations (in ts timestamp order) required to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3, but only two consecutive records exist for a user at a site before they move to a new site, no dwell time is calculated for the user.

    BIGINT (other integer types are automatically casted to BIGINT)

    max_inactive_seconds

    A constant integer value specifying the maximum time in seconds between two successive observations for an entity at a given site before the current session/dwell time is considered finished and a new session/dwell time is started. For example, if this variable is set to 86400 seconds (one day), and the time gap between two successive records for an entity id at a given site id is 86500 seconds, the session is considered ended at the first timestamp-ordered record, and a new session is started at the timestamp of the second record.

    BIGINT (other integer types are automatically casted to BIGINT)

    prev_site_id

    The site ID for the session preceding the current session, which might be a different site_id, the same site_id (if successive records for an entity at the same site were split into multiple sessions because the max_inactive_seconds threshold was exceeded), or null if the last site_id visited was null.

    Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id input column type)

    next_site_id

    The site id for the session after the current session, which might be a different site_id, the same site_id (if successive records for an entity at the same site were split into multiple sessions due to exceeding the max_inactive_seconds threshold, or null if the next site_id visited was null.

    Column<TEXT ENCODING DICT> | Column<BIGINT> (type will be the same as the site_id input column type)

    session_id

    An auto-incrementing session ID specific/relative to the current entity_id, starting from 1 (first session) up to the total number of valid sessions for an entity_id, such that each valid session dwell time increments the session_id for an entity by 1.

    Column<INT>

    start_seq_id

    The index of the _n_th timestamp (ts-ordered) record for a given entity denoting the start of the current output row's session.

    Column<INT>

    dwell_time_sec

    The duration in seconds for the session.

    Column<INT>

    num_dwell_points

    The number of records/observations constituting the current output row's session.

    Column<INT>

    LogicalCorrelate

    Operator that performs nested-loop joins.

    LogicalDelta

    Operator that converts a relation to a stream.

    LogicalExchange

    Expression that imposes a particular distribution on its input without otherwise changing its content.

    LogicalFilter

    Expression that iterates over its input and returns elements for which a condition evaluates to true.

    LogicalIntersect

    Expression that returns the intersection of the rows of its inputs.

    LogicalJoin

    Expression that combines two relational expressions according to some condition.

    LogicalMatch

    Expression that represents a MATCH_RECOGNIZE node.

    LogicalMinus

    Expression that returns the rows of its first input minus any matching rows from its other inputs. Corresponds to the SQL EXCEPT operator.

    LogicalProject

    Expression that computes a set of ‘select expressions’ from its input relational expression.

    LogicalSort

    Expression that imposes a particular sort order on its input without otherwise changing its content.

    LogicalTableFunctionScan

    Expression that calls a table-valued function.

    LogicalTableModify

    Expression that modifies a table. Similar to TableScan, but represents a request to modify a table instead of read from it.

    LogicalTableScan

    Reads all the rows from a RelOptTable.

    LogicalUnion

    Expression that returns the union of the rows of its inputs, optionally eliminating duplicates.

    LogicalValues

    Expression for which the value is a sequence of zero or more literal row values.

    LogicalWindow

    Expression representing a set of window aggregates. See Window Functions

    V100 v2

    32 GB

    5120

    900 GB/sec

    Yes

    V100

    16 GB

    5120

    900 GB/sec

    Yes

    P100

    16 GB

    3584

    732 GB/sec

    Yes

    P40

    24GB

    3840

    346 GB/sec

    No

    T4

    16GB

    2560

    320 GB/sec

    192

    417M

    2

    48

    384

    834M

    3

    72

    576

    1.25B

    4

    96

    768

    1.67B

    5

    120

    960

    2.09B

    6

    144

    1,152

    2.50B

    7

    168

    1,344

    2.92B

    8

    192

    1,536

    3.33B

    12

    288

    2,304

    5.00B

    16

    384

    3,456

    6.67B

    20

    480

    3,840

    8.34B

    24

    576

    4,608

    10.01B

    28

    672

    5,376

    11.68B

    32

    768

    6,144

    13.34B

    40

    960

    7,680

    16.68B

    48

    1,152

    9,216

    20.02B

    56

    1,344

    10,752

    23.35B

    64

    1,536

    12,288

    26.69B

    128

    3,072

    24,576

    53.38B

    256

    6,144

    49,152

    106.68B

    HPE ProLiant DL580 Gen10 Serverarrow-up-right
    Penguin Computers NVIDIA DGX Workstationsarrow-up-right
    Thinkmate NVIDIA Tesla GPU Serversarrow-up-right
    Colfax NVIDIA DGX Workstationsarrow-up-right

    z

    Input z-coordinate column or expression of the data to be rasterized.

    Column <FLOAT | DOUBLE>

    agg_type

    The aggregate to be performed to compute the output z-column. Should be one of 'AVG', 'COUNT', 'SUM', 'MIN', or 'MAX'.

    TEXT ENCODING NONE

    bin_dim

    The width and height of each x/y bin . If geographic_coords is true, the input x/y units will be translated to meters according to a local coordinate transform appropriate for the x/y bounds of the data.

    DOUBLE

    geographic_coords

    If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.

    BOOLEAN

    neighborhood_bin_radius

    The radius in bins to compute the gaussian blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius bins.

    BIGINT

    fill_only_nulls

    Specifies that the gaussian blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).

    BOOLEAN

    origin_x

    The x-coordinate for the starting point for the graph traversal, in input (not bin) units.

    DOUBLE

    origin_y

    The y-coordinate for the starting point for the graph traversal, in input (not bin) units.

    DOUBLE

    destination_x

    The x-coordinate for the destination point for the graph traversal, in input (not bin) units.

    DOUBLE

    destination_y

    The y-coordinate for the destination point for the graph traversal, in input (not bin) units.

    DOUBLE

    slope_weighted_exponent

    The slope weight between neighboring raster cells will be weighted by the slope_weighted_exponent power. A value of 1 signifies that the raw slopes between neighboring cells should be used, increasing this value from 1 will more heavily penalize paths that traverse steep slopes.

    DOUBLE

    slope_pct_max

    The max absolute value of slopes (measured in percentages) between neighboring raster cells that will be considered for traversal. A neighboring graph cell with an absolute slope greater than this amount will not be considered in the shortest slope-weighted path graph traversal

    DOUBLE

    use_cache (optional)

    If true, use internal point cloud cache. Useful for inline querying of the output of tf_load_point_cloud. Should turn off for one-shot queries or when creating a table from the output, as adding data to the cache incurs performance and memory usage overhead. If not specified, is defaulted to false/off.

    BOOLEAN

    x_min (optional)

    Min x-coordinate value (in degrees) for the output data.

    DOUBLE

    x_max (optional)

    Max x-coordinate value (in degrees) for the output data.

    DOUBLE

    y_min(optional)

    Min y-coordinate value (in degrees) for the output data.

    DOUBLE

    y_max (optional)

    Max y-coordinate value (in degrees) for the output data.

    DOUBLE

    z

    Point z-coordinate

    Column<DOUBLE>

    intensity

    Point intensity

    Column<INT>

    return_num

    The ordered number of the return for a given LiDAR pulse. The first returns (lowest return numbers) are generally associated with the highest-elevation points for a LiDAR pulse, i.e. the forest canopy will generally have a lower return_num than the ground beneath it.

    Column<TINYINT>

    num_returns

    The total number of returns for a LiDAR pulse. Multiple returns occur when there are multiple objects between the LiDAR source and the lowest ground or water elevation for a location.

    Column<TINYINT>

    scan_direction_flag

    From the ASPRS LiDAR Data Exchange Format Standardarrow-up-right: "The scan direction flag denotes the direction at which the scanner mirror was traveling at the time of the output pulse. A bit value of 1 is a positive scan direction, and a bit value of 0 is a negative scan direction."

    Column<TINYINT>

    edge_of_flight_line_flag

    From the ASPRS LiDAR Data Exchange Format Standardarrow-up-right: "The edge of flight line data bit has a value of 1 only when the point is at the end of a scan. It is the last point on a given scan line before it changes direction."

    Column<TINYINT>

    classification

    From the ASPRS LiDAR Data Exchange Format Standardarrow-up-right: "The classification field is a number to signify a given classification during filter processing. The ASPRS standard has a public list of classifications which shall be used when mixing vendor specific user software."

    Column<SMALLINT>

    scan_angle_rank

    From the ASPRS LiDAR Data Exchange Format Standardarrow-up-right: "The angle at which the laser point was output from the laser system, including the roll of the aircraft... The scan angle is an angle based on 0 degrees being NADIR, and –90 degrees to the left side of the aircraft in the direction of flight."

    Column<TINYINT>

    DOUBLE

    y_min

    DOUBLE

    y_max

    DOUBLE

    max_iterations

    32-bit integer

    DOUBLE

    y_min

    DOUBLE

    y_max

    DOUBLE

    max_iterations

    32-bit integer

    DOUBLE

    y_min

    DOUBLE

    y_max

    DOUBLE

    max_iterations

    32-bit integer

    DOUBLE

    y_min

    DOUBLE

    y_max

    DOUBLE

    max_iterations

    32-bit integer

    password

    User's password.

    is_super

    Set to true if user is a superuser. Default is false.

    default_db

    User's default database on login.

    can_login

    password

    User's password.

    is_super

    Set to true if user is a superuser. Default is false.

    default_db

    User's default database on login.

    can_login

    owner

    User name of the database owner.

    DDL - Roles and Privileges
    regexarrow-up-right
    Example: Data Security
    DDL - Roles and Privileges

    Set to true (default/implicit) to activate a user.

    When false, the user still retains all defined privileges and configuration settings, but cannot log in to HEAVY.AI. Deactivated users who try to log in receive the error message "Unauthorized Access: User is deactivated."

    Set to true (default/implicit) to activate a user.

    When false, the user still retains all defined privileges and configuration settings, but cannot log in to HEAVY.AI. Deactivated users who try to log in receive the error message "Unauthorized Access: User is deactivated."

    Distance between origin and destination node in directed edge list CURSOR

    Column INT | BIGINT | FLOAT | DOUBLE>

    origin_node

    The origin node to start graph traversal from. If not a value present in edge_list.node1, will cause empty result set to be returned.

    BIGINT | TEXT ENCODED DICT

    Cumulative distance between origin and destination node for shortest path graph traversal.

    Column<INT | BIGINT | FLOAT | DOUBLE> (same type as the distance input column)

    num_edges_traversed

    Number of edges (or "hops") traversed in the graph to arrive at destination_node from origin_node for the shortest path graph traversal between these two nodes.

    Column <INT>

    node1

    Origin node column in directed edge list CURSOR

    Column<INT | BIGINT | TEXT ENCODED DICT>

    node2

    Destination node column in directed edge list CURSOR

    Column<INT | BIGINT | TEXT ENCODED DICT> (must be the same type as node1)

    origin_node

    Starting node in graph traversal. Always equal to input origin_node.

    Column <INT | BIGINT | TEXT ENCODED DICT> (same type as the node1 and node2 input columns)

    destination_node

    Final node in graph traversal. Will be equal to one of values of node2 input column.

    Column <INT | BIGINT | TEXT ENCODED DICT> (same type as the node1 and node2 input columns)

    Rendered chart of the output of tf_graph_shortest_paths_distances along an Eastern US time-traversal weighted edge graph. The shortest travel destinations are rendered in blue, and the furthest travel destinations in yellow.

    distance

    distance

    ldap://myServer.myCompany.com/uid=$USERNAME, cn=users, cn=accounts,dc=myCompany,dc=com?memberOf

    ldap-role-query-regex

    Applies a regex filter to find matching roles from the roles in the LDAP server.

    (MyCompany_.*?),

    ldap-superuser-role

    Identifies one of the filtered roles as a superuser role. If a user has this filtered ldap role, the user is marked as a superuser.

    MyCompany_SuperUser

    ldap-role-query-regex is a regular expression that matches the role names. The matching role names are used to grant and revoke privileges in HEAVY.AI. For example, if we created some roles on an IPA LDAP server where the role names begin with MyCompany_ (for example, _MyCompany__Engineering, MyCompany_Sales, _MyCompany__SuperUser), the regular expression can filter the role names using MyCompany_.
  • ldap-superuser-role is the role/group name for HEAVY.AI users who are superusers once they log on to the HEAVY.AI database. In this example, the superuser role name is MyCompany_SuperUser.

  • A functional HEAVY.AI server, version 4.1 or higher.

    Locate the heavy.conf file and edit it to include the LDAP parameter. For example:

  • Restart the HEAVY.AI server:

  • Edit /etc/openldap/ldap.conf to add the following line:

  • Locate the heavy.conf file and edit it to include the LDAP parameter. For example:

  • Restart the HEAVY.AI server:

  • ldap-uri

    LDAP server host or server URI.

    ldap://myLdapServer.myCompany.com

    ldap-dn

    LDAP distinguished name (DN).

    uid=$USERNAME,cn=users,cn=accounts, dc=myCompany,dc=com

    ldap-role-query-url

    Returns the role names a user belongs to in the LDAP.

    Distance between origin and destination node in directed edge list CURSOR

    Column< INT | BIGINT | FLOAT | DOUBLE >

    origin_node

    The origin node to start graph traversal from. If not a value present in edge_list.node1, will cause empty result set to be returned.

    BIGINT | TEXT ENCODED DICT

    destination_node

    The destination node to finish graph traversal at. If not a value present in edge_list.node1, will cause empty result set to be returned.

    BIGINT | TEXT ENCODED DICT

    The cumulative distance adding all input distance values from the origin_node to the current node.

    Column < INT | BIGINT | FLOAT | DOUBLE> (same type as the distance input column)

    node1

    Origin node column in directed edge list CURSOR

    Column< INT | BIGINT | TEXT ENCODED DICT>

    node2

    Destination node column in directed edge list CURSOR

    Column< INT | BIGINT | TEXT ENCODED DICT> (must be the same type as node1)

    path_step

    The index of this node along the path traversal from origin_node to destination_node, with the first node (the origin_node) indexed as 1.

    Column< INT >

    node

    The current node along the path traversal from origin_node to destination_node. The first node (as denoted by path_step = 1) will always be the input origin_node, and the final node (as denoted by MAX(path_step)) will always be the input destination_node.

    Column < INT | BIGINT | TEXT ENCODED DICT> (same type as the node1 and node2 input columns)

    The computed shortest path along a time-traversal weighted road edge graph for the eastern US.

    distance

    cume_distance

    Input z-coordinate column or expression. The output bin is computed as the maximum z-value for all points falling in each bin.

    Column<FLOAT | DOUBLE>

    agg_type

    The aggregate to be performed to compute the output z-column. Should be one of 'AVG', 'COUNT', 'SUM', 'MIN', or 'MAX'.

    TEXT ENCODING NONE

    bin_dim_meters

    The width and height of each x/y bin in meters. If geographic_coords is not set to true, the input x/y units are already assumed to be in meters.

    DOUBLE

    geographic_coords

    If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.

    BOOLEAN

    neighborhood_fill_radius

    The radius in bins to compute the box blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius bins.

    BIGINT

    fill_only_nulls

    Specifies that the box blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).

    BOOLEAN

    compute_slope_in_degrees

    If true, specifies the slope should be computed in degrees (with 0 degrees perfectly flat and 90 degrees perfectly vertical). If false, specifies the slope should be computed as a fraction from 0 (flat) to 1 (vertical). In a future release, we are planning to move the default output to percentage slope.

    BOOLEAN

    The maximum z-coordinate of all input data assigned to a given spatial bin.

    Column<FLOAT | DOUBLE> (same as input z column/expression)

    slope

    The average slope of an output grid cell (in degrees or a fraction between 0 and 1, depending on the argument to compute_slope_in_degrees).

    Column<FLOAT | DOUBLE> (same as input z column/expression)

    aspect

    The direction from 0 to 360 degrees pointing towards the maximum downhill gradient, with 0 degrees being due north and moving clockwise from N (0°) -> NE (45°) -> E (90°) -> SE (135°) -> S (180°) -> SW (225°) -> W (270°) -> NW (315°).

    Column<FLOAT | DOUBLE> (same as input z column/expression)

    x

    Input x-coordinate column or expression.

    Column<FLOAT | DOUBLE>

    y

    Input y-coordinate column or expression.

    Column<FLOAT | DOUBLE>

    x

    The x-coordinates for the centroids of the output spatial bins.

    Column<FLOAT | DOUBLE> (same as input x column/expression)

    y

    The y-coordinates for the centroids of the output spatial bins.

    Column<FLOAT | DOUBLE> (same as input y column/expression)

    Inline generation of slope-field using the above example query, showing the computed slopes over 90-meter binned Copernicus 30m DEM data. Note that this can be done in Immerse using a custom source, and optionally parametrized if desired. The direction of the slope (aspect) is indicated by the direction of the arrows.

    z

    z

    Max x-coordinate value for point cloud files to retrieve metadata from.

    DOUBLE

    y_min (optional)

    Min y-coordinate value for point cloud files to retrieve metadata from.

    DOUBLE

    y_max (optional)

    Max y-coordinate value for point cloud files to retrieve metadata from.

    DOUBLE

    File source id per file metadata

    Column<SMALLINT>

    version_major

    LAS version major number

    Column<SMALLINT>

    version_minor

    LAS version minor number

    Column<SMALLINT>

    creation_year

    Data creation year

    Column<SMALLINT>

    is_compressed

    Whether data is compressed, i.e. LAZ format

    Column<BOOLEAN>

    num_points

    Number of points in this file

    Column<BIGINT>

    num_dims

    Number of data dimensions for this file

    Column<SMALLINT>

    point_len

    Not currently used

    Column<SMALLINT>

    has_time

    Whether data has time value

    COLUMN<BOOLEAN>

    has_color

    Whether data contains rgb color value

    COLUMN<BOOLEAN>

    has_wave

    Whether data contains wave info

    COLUMN<BOOLEAN>

    has_infrared

    Whether data contains infrared value

    COLUMN<BOOLEAN>

    has_14_point_format

    Data adheres to 14-attribute standard

    COLUMN<BOOLEAN>

    specified_utm_zone

    UTM zone of data

    Column<INT>

    x_min_source

    Minimum x-coordinate in source projection

    Column<DOUBLE>

    x_max_source

    Maximum x-coordinate in source projection

    Column<DOUBLE>

    y_min_source

    Minimum y-coordinate in source projection

    Column<DOUBLE>

    y_max_source

    Maximum y-coordinate in source projection

    Column<DOUBLE>

    z_min_source

    Minimum z-coordinate in source projection

    Column<DOUBLE>

    z_max_source

    Maximum z-coordinate in source projection

    Column<DOUBLE>

    x_min_4326

    Minimum x-coordinate in lon/lat degrees

    Column<DOUBLE>

    x_max_4326

    Maximum x-coordinate in lon/lat degrees

    Column<DOUBLE>

    y_min_4326

    Minimum y-coordinate in lon/lat degrees

    Column<DOUBLE>

    y_max_4326

    Maximum y-coordinate in lon/lat degrees

    Column<DOUBLE>

    z_min_4326

    Minimum z-coordinate in meters above sea level (AMSL)

    Column<DOUBLE>

    z_max_4326

    Maximum z-coordinate in meters above sea level (AMSL)

    Column<DOUBLE>

    path

    The path of the file or directory containing the las/laz file or files. Can contain globs. Path must be in allowed-import-paths.

    TEXT ENCODING NONE

    x_min (optional)

    Min x-coordinate value for point cloud files to retrieve metadata from.

    DOUBLE

    file_path

    Full path for the las or laz file

    Column<TEXT ENCODING DICT>

    file_name

    Filename for the las or laz file

    Column<TEXT ENCODING DICT>

    x_max (optional)

    file_source_id

    CREATE TABLE arr (

    sia SMALLINT[])

    omnisql> select sia, CARDINALITY(sia) from arr;

    sia

    EXPR$0

    NULL

    NULL

    {}

    0

    {NULL}

    1

    {1}

    1

    {2,2}

    2

    {3,3,3}

    3

    DOT_PRODUCT(array_col_1, array_col_2)

    Computes the dot product between two arrays of the same length, returning a scalar floating point value. If the input arrays (vectors) are of unit length, the computed dot product will represent the angular similarity of the two vectors.

    sudo dnf -y install \
    kernel-devel-4.18.0-553.el8_10.x86_64 \
    kernel-headers-4.18.0-553.el8_10.x86_64
    CREATE USER ["]<name>["] (<property> = value,...);
    CREATE USER jason (password = 'HeavyaiRocks!', is_super = 'true', default_db='tweets');
    CREATE USER "pembroke.q.aloysius" (password= 'HeavyaiRolls!', default_db='heavyai');
    DROP USER [IF EXISTS] ["]<name>["];
    DROP USER [IF EXISTS] jason;
    DROP USER "pemboke.q.aloysius";
    ALTER USER ["]<name>["] (<property> = value, ...);
    ALTER USER ["]<oldUserName>["] RENAME TO ["]<newUserName>["];
    ALTER USER admin (password = 'HeavyaiIsFast!');
    ALTER USER jason (is_super = 'false', password = 'SilkySmooth', default_db='traffic');
    ALTER USER methuselah RENAME TO aurora;
    ALTER USER "pembroke.q.aloysius" RENAME TO "pembroke.q.murgatroyd";
    ALTER USER chumley (can_login='false');
    CREATE DATABASE [IF NOT EXISTS] <name> (<property> = value, ...);
    CREATE DATABASE test (owner = 'jason');
    DROP DATABASE [IF EXISTS] ;
    DROP DATABASE IF EXISTS test;
    ALTER DATABASE <name> RENAME TO <name>;
    ALTER DATABASE curmudgeonlyOldDatabase RENAME TO ingenuousNewDatabase;
    ALTER DATABASE <database name> OWNER TO <new_owner>;
    ALTER DATABASE my_database OWNER TO Joe;
    REASSIGN [ALL] OWNED BY <old_owner>, <old_owner>, ... TO <new_owner>
    REASSIGN OWNED BY jason, mike TO joe;
    REASSIGN ALL OWNED BY jason, mike TO joe;
    SELECT * FROM TABLE(
        tf_graph_shortest_paths_distances(
            edge_list => CURSOR(
                SELECT node1, node2, distance FROM table
            ),
            origin_node => <origin node>
        )
    /* Compute the 10 furthest destination airports as measured by average travel-time
    when departing origin airport 'RDU' (Raleigh-Durham, NC) on United Airlines for the
    year 2008, adding 60 minutes for each leg to account forboarding/plane change time 
    costs. */
    
    SELECT
      *
    FROM
      TABLE(
        tf_graph_shortest_paths_distances(
          edge_list => CURSOR(
            SELECT
              origin,
              dest,
              /* Add 60 minutes to each leg to account for boarding/plane change costs */
              AVG(airtime) + 60 as avg_airtime
            FROM
              flights_2008
            WHERE
              carrier_name = 'United Air Lines'
            GROUP by
              origin,
              dest
          ),
          origin_node => 'RDU'
        )
      )
    ORDER BY
      distance DESC
    LIMIT
      10;
      
    origin_node|destination_node|distance|num_edges_traversed
    RDU|JFK|803|3
    RDU|LIH|757|2
    RDU|KOA|746|2
    RDU|HNL|735|2
    RDU|OGG|728|2
    RDU|EUG|595|3
    RDU|ANC|586|2
    RDU|SJC|468|2
    RDU|SFO|468|2
    RDU|OAK|468|2
    /* Compute the all-destinations path distances along a time-traversal weighted
    edge graph of roads in the Eastern United States from a location in North Carolina joining to a node locations table to output the lon/lat pairs 
    of each destination node. */
    
    select
      destination_node,
      lon,
      lat
      distance,
      num_edges_traversed
    from
      table(
        tf_graph_shortest_paths_distances(
          cursor(
            select
              node1,
              node2,
              traversal_time
            from
              usa_roads_east_time
          ),
          1561955
        )
      ),
      USA_roads_east_coords
    where
      destination_node = node_id
    order by
      distance desc
    limit
      20;
      
    destination_node|lon|lat|distance|num_edges_traversed
    2228153|-69.74701|46.941648|22021532|5387
    324156|-69.67822799999999|46.990543|21916494|5386
    324151|-69.687833|46.933106|21906798|5386
    1372661|-69.64962799999999|46.942144|21830101|5385
    320610|-69.47672399999999|46.967413|21807384|5379
    324152|-69.637714|46.958516|21798959|5385
    1372667|-69.633437|46.95189999999999|21793379|5385
    1372662|-69.63483099999999|46.954334|21786119|5384
    2228156|-69.622767|46.949534|21768541|5383
    1372670|-69.58720599999999|46.942504|21759257|5382
    1372663|-69.62387099999999|46.968569|21741445|5383
    2226724|-69.557773|46.969276|21714682|5381
    324159|-69.607209|46.967823|21709789|5382
    324160|-69.59385999999999|46.967445|21691648|5382
    2228155|-69.59575599999999|46.967461|21688053|5381
    320578|-69.57176699999999|47.067628|21683322|5377
    1372669|-69.58906999999999|46.977104|21675010|5382
    2226740|-69.582106|46.991048|21673764|5379
    320609|-69.55000199999999|46.966089|21668411|5378
    324158|-69.585776|46.973521|21663260|5381
    ldap-uri = "ldaps://myldapserver.mycompany.com"
    ldap-dn = "uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com"
    ldap-role-query-url = "ldaps://myldapserver.mycompany.com/uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com?memberOf"
    ldap-role-query-regex = "(MyCompany_.*?),"
    ldap-superuser-role = "MyCompany_SuperUser"
    sudo systemctl restart heavyaidb
    sudo systemctl restart heavyai_web_server
    TLS_CACERT      /etc/ssl/certs/ca-certificates.crt
    ldap-uri = "ldaps://myldapserver.mycompany.com"
    ldap-dn = "uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com"
    ldap-role-query-url = "ldaps://myldapserver.mycompany.com/uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com?memberOf"
    ldap-role-query-regex = "(MyCompany_.*?),"
    ldap-superuser-role = "MyCompany_SuperUser"
    sudo systemctl restart heavydb
    sudo systemctl restart heavyai_web_server
    $ curl --user "uid=kiran,cn=users,cn=accounts,dc=mycompany,dc=com" 
    "ldap://myldapserver.mycompany.com/uid=kiran,cn=users,cn=accounts,dc=mycompany,dc=com?memberOf"
    DN: uid=kiran,cn=users,cn=accounts,dc=mycompany,dc=com
    memberOf: cn=ipausers,cn=groups,cn=accounts,dc=mycompany,dc=com
    memberOf: cn=MyCompany_SuperUser,cn=roles,cn=accounts,dc=mycompany,dc=com
    memberOf: cn=test,cn=groups,cn=accounts,dc=mycompany,dc=com
    ldap-uri = "ldap://myldapserver.mycompany.com"
    ldap-dn = "uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com"
    ldap-role-query-url = "ldap://myldapserver.mycompany.com/uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com?memberOf"
    ldap-role-query-regex = "(MyCompany_.*?),"
    ldap-superuser-role = "MyCompany_SuperUser"
    sudo systemctl restart heavyai_server
    sudo systemctl restart heavyai_web_server
    scp root@myldapserver:/etc/ipa/ca.crt /etc/pki/ca-trust/source/anchors/ipa-ca.pem
    update-ca-trust
    TLS_CACERT      /etc/pki/tls/certs/ca-bundle.crt
    mkdir /usr/local/share/ca-certificates/ipa
    scp root@myldapserver:/etc/ipa/ca.crt /usr/local/share/ca-certificates/ipa/ipa-ca.pem
    mv /usr/local/share/ca-certificates/ipa/ipa-ca.pem /usr/local/share/ca-certificates/ipa/ipa-ca.crt
    update-ca-certificates
    ldap-uri = "ldap://myldapserver.mycompany.com"
    ldap-dn = "cn=$USERNAME,cn=users,dc=qa-mycompany,dc=com"
    ldap-role-query-url = "ldap:///myldapserver.mycompany.com/cn=$USERNAME,cn=users,dc=qa-mycompany,dc=com?memberOf"
    ldap-role-query-regex = "(HEAVYAI_.*?),"
    ldap-superuser-role = "HEAVYAI_SuperUser"
    ldap-uri = "ldap://myldapserver.mycompany.com"
    ldap-dn = "[email protected]"
    ldap-role-query-url = "ldap:///myldapserver.mycompany.com/OU=MyCompany Users,dc=MyCompany,DC=com?memberOf?sub?(sAMAccountName=$USERNAME)"
    ldap-role-query-regex = "(HEAVYAI_.*?),"
    ldap-superuser-role = "HEAVYAI_SuperUser"
    sudo systemctl restart heavyai_server
    sudo systemctl restart heavyai_web_server
    /* Compute the shortest flight route on United Airlines for the year 2008 as measured
    by flight time between origin airport 'RDU' (Raleigh-Durham, NC) and destination 
    airport 'SAT' (San Antonio, TX), adding 60 minutes for each leg to account for 
    boarding/plane change time costs, and only counting routes that were flown at least
    300 times during the year. */
     
    SELECT
      *
    FROM
      TABLE(
        tf_graph_shortest_path(
          edge_list => CURSOR(
            SELECT
              origin,
              dest,
              /* Add 60 minutes to each leg to account
              for boarding/plane change costs */
              AVG(airtime) + 60 as avg_airtime
            FROM
              flights_2008
            WHERE
              carrier_name = 'United Air Lines'
            GROUP by
              origin,
              dest
            HAVING
              COUNT(*) > 300
          ),
          origin_node => 'RDU',
          destination_node => 'SAT'
        )
      )
    ORDER BY
      path_step
     
    path_step|node|cume_distance
    1|RDU|0
    2|ORD|167
    3|DEN|354
    4|SAT|519
    /* Compute the shortest path between along a time-traversal weighted
    edge graph of roads in the Eastern United States between a location in North Carolina and
    a location in Maine, joining to a node locations table to output the lon/lat pairs 
    of each node. */
    
    select
      path_step,
      node,
      lon,
      lat,
      cume_distance
    from
      table(
        tf_graph_shortest_path(
          cursor(
            select
              node1,
              node2,
              traversal_time
            from
              usa_roads_east_time
          ),
          1561955,
          1591319
        )
      ),
      USA_roads_east_coords
    where
      node = node_id 
    order by 
      cume_distance desc
    limit 20;
    
    path_step|node|lon|lat|cume_distance
    4380|1591319|-71.55136299999999|43.75256|13442017
    4379|1591989|-71.55174099999999|43.75245|13441199
    4378|1589348|-71.554147|43.752464|13436371
    4377|2315795|-71.554867|43.752489|13434924
    4376|1589286|-71.55497099999999|43.752113|13434214
    4375|1589285|-71.555049|43.751833|13433685
    4374|2315785|-71.555999|43.750704|13431238
    4373|2315973|-71.55798799999999|43.748622|13426553
    4372|2315950|-71.56366299999999|43.746268|13417798
    4371|1589788|-71.56476599999999|43.745765|13416053
    4370|1591997|-71.56484|43.745691|13415884
    4369|1589787|-71.564886|43.745645|13415779
    4368|2315951|-71.56517599999999|43.745353|13415113
    4367|2315952|-71.56659499999999|43.744599|13412756
    4366|1591999|-71.56685899999999|43.744565|13412397
    4365|543394|-71.567357|43.744335|13411606
    4364|543393|-71.567832|43.744116|13410852
    4363|543392|-71.571827|43.743673|13405444
    4362|541181|-71.57268499999999|43.743802|13404271
    4361|1589786|-71.572964|43.743844|13403890
    SELECT * FROM TABLE(
      tf_geo_rasterize_slope(
          raster => CURSOR(
            SELECT 
               x, y, z FROM table
          ),
          agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
          bin_dim_meters => <meters>, 
          geographic_coords => <true/false>, 
          neighborhood_fill_radius => <radius in bins>,
          fill_only_nulls => <true/false>,
          compute_slope_in_degrees => <true/false>
        )
     ) 
    /* Compute the slope and aspect ratio for a 30-meter Copernicus 
    Digital Elevation Model (DEM) raster, binned to 90-meters */
    
    select
      *
    from
      table(
        tf_geo_rasterize_slope(
          raster => cursor(
            select
              st_x(raster_point),
              st_y(raster_point),
              CAST(z AS float)
            from
              copernicus_30m_mt_everest
          ),
          agg_type => 'AVG',
          bin_dim_meters => 90.0,
          geographic_coords => true,
          neighborhood_fill_radius => 1,
          fill_only_nulls => false,
          compute_slope_in_degrees => true
        )
      )
    order by
      slope desc nulls last
    limit
      20;
      
    x|y|z|slope|aspect
    86.96533511629579|27.96534132281817|6212.096|78.37033|18.09232
    87.23751907091268|27.78489838800869|3793.584|78.17864|125.03
    87.23660262662104|27.78408922686605|3929.989|78.06877|127.629
    86.96625156058742|27.96534132281817|6041.277|78.00574|19.00616
    87.2356861823294|27.78328006572341|3981.662|77.53327|127.3175
    86.96441867200414|27.96615048396082|5869.373|77.3751|20.82031
    86.95800356196267|27.96857796738875|6083.791|77.13709|29.89468
    86.96350222771251|27.96615048396082|6081.35|77.08266|21.6792
    87.23843551520432|27.78570754915134|3630.32|77.04676|125.2154
    86.96441867200414|27.96534132281817|6378.94|76.95021|17.77107
    87.22468885082972|27.81321902800121|4771.554|76.71017|253.2764
    87.2356861823294|27.78247090458076|3520.049|76.63997|113.6511
    87.23660262662104|27.78328006572341|3445.282|76.38319|127.2889
    86.96716800487906|27.96534132281817|5864.711|76.16835|19.27573
    87.23476973803776|27.78166174343812|3945.683|76.13519|102.7789
    86.95708711767104|27.96857796738875|6336.072|76.13168|24.90349
    87.22468885082972|27.81240986685857|4732.937|76.07494|264.7046
    87.23751907091268|27.78408922686605|3367.659|76.0099|126.7463
    86.9589200062543|27.9677688062461|6223.083|75.46346|26.85898
    87.22377240653809|27.81402818914385|4704.619|75.41299|205.3219
    SELECT
      file_name,
      num_points,
      specified_utm_zone,
      x_min_4326,
      x_max_4326,
      y_min_4326,
      y_max_4326
    FROM
      TABLE(
        tf_point_cloud_metadata(
          path => '/home/todd/data/lidar/las_files/*2010_00000*.las'
        )
      )
    ORDER BY
      file_name;
      
    file_name|num_points|specified_utm_zone|x_min_4326|x_max_4326|y_min_4326|y_max_4326
    ARRA-CA_GoldenGate_2010_000001.las|2063102|10|-122.9943066785969|-122.9772226614453|37.97913478250298|37.99265200734278
    ARRA-CA_GoldenGate_2010_000002.las|4755131|10|-122.9943056338411|-122.9772184796481|37.99265416515848|38.00617135784082
    ARRA-CA_GoldenGate_2010_000003.las|4833631|10|-122.9943045883859|-122.9772142950517|38.00617351665583|38.01969067717678
    ARRA-CA_GoldenGate_2010_000004.las|6518715|10|-122.9943035422309|-122.9772101076538|38.01969283699149|38.03320996534712
    ARRA-CA_GoldenGate_2010_000005.las|7508919|10|-122.9943024953755|-122.9772059174526|38.03321212616189|38.04672922234828
    ARRA-CA_GoldenGate_2010_000006.las|7442130|10|-122.9943014478193|-122.977201724446|38.04673138416345|38.06024844817669
    ARRA-CA_GoldenGate_2010_000007.las|5610772|10|-122.9943003995618|-122.9771975286321|38.06025061099263|38.07376764282882
    ARRA-CA_GoldenGate_2010_000008.las|3515095|10|-122.9942993506024|-122.9771933300088|38.07376980664591|38.08728680630115
    ARRA-CA_GoldenGate_2010_000009.las|1689283|10|-122.9942898783015|-122.9771554156435|38.19544116402802|38.20895787388029
    heavysql> \d arr

    BOOLEAN

    Computed cosine similarity score between each primary_key pair, with values falling between 0 (completely dissimilar) and 1 (completely similar).

    Column<Float>

    primary_key

    Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function computes co-similarity. Examples include countries, census block groups, user IDs of website visitors, and aircraft callsigns.

    Column<TEXT ENCODING DICT | INT | BIGINT>

    pivot_features

    One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by primary_key based on whether they visited the same census block group in the same hour. If a single census block group feature column is used, the primary_key entities would be compared only by the census block groups visited, regardless of time overlap.

    Column<TEXT ENCODING DICT | INT | BIGINT>

    metric

    Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is COUNT(*) such that feature overlaps are weighted by the number of co-occurrences.

    Column<INT | BIGINT | FLOAT | DOUBLE>

    use_tf_idf

    class1

    ID of the first primary key in the pair-wise comparison.

    Column<TEXT ENCODING DICT | INT | BIGINT> (type is the same of primary_key input column)

    class2

    ID of the second primary key in the pair-wise comparison. Because the computed similarity score for a pair of primary keys is order-invariant, results are output only for ordering such that class1 <= class2. For primary keys of type TextEncodingDict, the order is based on the internal integer IDs for each string value and not lexicographic ordering.

    Column<TEXT ENCODING DICT | INT | BIGINT> (type is the same of primary_key input column)

    Computed similarity score for US airlines for 2008, where similarity is computed by the cosine similarity of the airports each airline departs from, weighted by the number of flights from that airport (using the first example query above, sans LIMIT). Dataset courtesy of the FAA.

    Boolean constant denoting whether weighting should be used in the cosine similarity score computation.

    similarity_score

  • Writes to the database can be distributed across the nodes, thereby speeding up import.

  • Reads from disk are accelerated.

  • Additional GPUs in a distributed cluster can significantly increase read performance in many usage scenarios. Performance scales linearly, or near linearly, with the number of GPUs, for simple queries requiring little communication between servers.

  • Multiple GPUs across the cluster query data on their local hosts. This allows processing of larger datasets, distributed across multiple servers.

  • hashtag
    HEAVY.AI Distributed Cluster Components

    A HEAVY.AI distributed database consists of three components:

    • An aggregator, which is a specialized HeavyDB instance for managing the cluster

    • One or more leaf nodes, each being a complete HeavyDB instance for storing and querying data

    • A String Dictionary Server, which is a centralized repository for all dictionary-encoded items

    Conceptually, a HEAVY.AI distributed database is horizontally sharded across n leaf nodes. Each leaf node holds one nth of the total dataset. Sharding currently is round-robin only. Queries and responses are orchestrated by a HEAVY.AI Aggregator server.

    hashtag
    The HEAVY.AI Aggregator

    Clients interact with the aggregator. The aggregator orchestrates execution of a query across the appropriate leaf nodes. The aggregator composes the steps of the query execution plan to send to each leaf node, and manages their results. The full query execution might require multiple iterations between the aggregator and leaf nodes before returning a result to the client.

    A core feature of the HeavyDB is back-end, GPU-based rendering for data-rich charts such as point maps. When running as a distributed cluster, the backend rendering is distributed across all leaf nodes, and the aggregator composes the final image.

    hashtag
    String Dictionary Server

    The String Dictionary Server manages and allocates IDs for dictionary-encoded fields, ensuring that these IDs are consistent across the entire cluster.

    The server creates a new ID for each new encoded value. For queries returning results from encoded fields, the IDs are automatically converted to the original values by the aggregator. Leaf nodes use the string dictionary for processing joins on encoded columns.

    For moderately sized configurations, the String Dictionary Server can share a host with a leaf node. For larger clusters, this service can be configured to run on a small, separate CPU-only server.

    hashtag
    Replicated Tables

    A table is split by default to 1/nth of the complete dataset. When you create a table used to provide dimension information, you can improve performance by replicating its contents onto every leaf node using the partitions property. For example:

    This reduces the distribution overhead during query execution in cases where sharding is not possible or appropriate. This is most useful for relatively small, heavily used dimension tables.

    hashtag
    Data Loading

    You can load data to a HEAVY.AI distributed cluster using a COPY FROM statement to load data to the aggregator, exactly as with HEAVY.AI single-node processing. The aggregator distributes data evenly across the leaf nodes.

    hashtag
    Data Compression

    Records transferred between systems in a HEAVY.AI cluster are compressed to improve performance. HEAVY.AI uses the LZ4_HC compressor by default. It is the fastest compressor, but has the lowest compression rate of the available algorithms. The time required to compress each buffer is directly proportional to the final compressed size of the data. A better compression rate will likely require more time to process.

    You can specify another compressor on server startup using the runtime flag compressor. Compressor choices include:

    • blosclz

    • lz4

    • lz4hc

    • snappy

    • zlib

    • zstd

    For more information on the compressors used with HEAVY.AI, see also:

    • http://blosc.org/pages/synthetic-benchmarks/arrow-up-right

    • https://quixdb.github.io/squash-benchmark/arrow-up-right

    • https://lz4.github.io/lz4/arrow-up-right

    HEAVY.AI does not compress the payload until it reaches a certain size. The default size limit is 512MB. You can change the size using the runtime flag compression-limit-bytes.

    hashtag
    HEAVY.AI Distributed Cluster Example

    This example uses four GPU-based machines, each with a combination of one or more CPUs and GPUs.

    Hostname
    IP
    Role(s)

    Node1

    10.10.10.1

    Leaf, Aggregator

    Node2

    10.10.10.2

    Leaf, String Dictionary Server

    Install HEAVY.AI server on each node. For larger deployments, you can have the install on a shared drive.

    Set up the configuration file for the entire cluster. This file is the same for all nodes.

    In the cluster.conf file, the location of each leaf node is identified as well as the location of the String Dictionary server.

    Here, dbleaf is a leaf node, and string is the String Dictionary Server. The port each node is listening on is also identified. These ports must match the ports configured on the individual server.

    Each leaf node requires a heavy.conf configuration file.

    The parameter string-servers identifies the file containing the cluster configuration, to tell the leaf node where the String Dictionary Server is.

    The aggregator node requires a slightly different heavy.conf. The file is named heavy-agg.conf in this example.

    hashtag
    heavy-agg.conf

    The parameter cluster tells the HeavyDB instance that it is an aggregator node, and where to find the rest of its cluster.

    If your aggregator node is sharing a machine with a leaf node, there might be a conflict on the calcite-port. Consider changing the port number of the aggregator node to another that is not in use.

    hashtag
    Implementing a HEAVY.AI Distributed Cluster

    Contact HEAVY.AI support for assistance with HEAVY.AI Distributed Cluster implementation.

    ; see that function for details.

    The direct variants require that the input rows represent a rectilinear region of pixels in nonsparse row-major order. The dimensions must also be provided, and (raster_width * raster_height) must match the input row count. The contour processing is then performed directly on the raster values with no preprocessing.

    The line variants generate LINESTRING geometries that represent the contour lines of the raster space at the given interval with the optional given offset. For example, a raster space representing a height field with a range of 0.0 to 1000.0 will likely result in 10 or 11 lines, each with a corresponding contour_values value, 0.0, 100.0, 200.0 etc. If contour_offset is set to 50.0, then the lines are generated at 50.0, 150.0, 250.0, and so on. The lines can be open or closed and can form rings or terminate at the edges of the raster space.

    The polygon variants generate POLYGON geometries that represent regions between contour lines (for example from 0.0 to 100.0), and from 100.0 to 200.0. If the raster space has multiple regions with that value range, then a POLYGON row is output for each of those regions. The corresponding contour_values value for each is the lower bound of the range for that region.

    hashtag
    Rasterizing Variant

    hashtag
    Direct Variant

    hashtag
    Input Arguments

    Parameter
    Description
    Data Types

    lon

    Longitude value of raster point (degrees, SRID 4326).

    Column<FLOAT | DOUBLE>

    lat

    Latitude value of raster point (degrees, SRID 4326).

    Column<FLOAT | DOUBLE> (must be the same as <lon>)

    hashtag
    Output Columns

    Name
    Description
    Data Types

    contour_[lines|polygons]

    Output geometries.

    Column<LINESTRING | POLYGON>

    contour_values

    Raster values associated with each contour geometry.

    Column<FLOAT | DOUBLE> (will be the same type as value)

    hashtag
    Examples

    rasterizing variant
    direct variant
    tf_geo_rasterize

    ENCODE_TEXT(none_encoded_str)

    ENCODE_TEXT(long_str)

    Converts a none-encoded text type to a dictionary-encoded text type.

    The following table shows cast type conversion support.

    FROM/TO:

    TINYINT

    SMALLINT

    INTEGER

    BIGINT

    FLOAT

    DOUBLE

    DECIMAL

    TEXT

    BOOLEAN

    hashtag

    CAST(expr AS type)

    CAST(1.25 AS FLOAT)

    Converts an expression to another data type. For conversions to a TEXT type, use TRY_CAST.

    TRY_CAST(text_expr AS type)

    CAST('1.25' AS FLOAT)

    Converts a text to a non-text type, returning null if the conversion could not be successfully performed.

    tf_geo_rasterize

    Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, with taking the maximum z value across all points in each bin as the output value for the bin. The aggregate performed to compute the value for each bin is specified by agg_type, with allowed aggregate types of AVG, COUNT, SUM, MIN, and MAX. If neighborhood_fill_radius is set greater than 0, a blur pass/kernel will be computed on top of the results according to the optionally-specified fill_agg_type, with allowed types of GAUSS_AVG, BOX_AVG, COUNT, SUM, MIN, and MAX (if not specified, defaults to GAUSS_AVG, or a Gaussian-average kernel). if fill_only_nulls is set to true, only null bins from the first aggregate step will have final output values computed from the blur pass, otherwise if false all values will be affected by the blur pass.

    Note that the arguments to bound the spatial output grid (x_min, x_max, y_min, y_max) are optional, however either all or none of these arguments must be supplied. If the arguments are not supplied, the bounds of the spatial output grid will be bounded by the x/y range of the input query, and if SQL filters are applied on the output of the tf_geo_rasterize table function, these filters will also constrain the output range.

    hashtag
    Input Arguments

    Parameter
    Description
    Data Types

    hashtag
    Output Columns

    Name
    Description
    Data Types

    Example

    Install NVIDIA Drivers and Vulkan on Ubuntu

    hashtag
    Installation Prerequisites

    Hardware Check

    Check that your system has a compatible NVIDIA GPU. If you don't know the exact GPU model in your system run this command:

    This command should output one row per installed NVIDIA GPU. Check that your hardware is compatible on our page. Note that you can still install the CPU Edition of HEAVY.AI on machines that do not have an NVIDIA GPU.

    Window Functions

    Window functions allow you to work with a subset of rows related to the currently selected row. For a given dimension, you can find the most associated dimension by some other measure (for example, number of records or sum of revenue).

    Window functions must always contain an OVER clause. The OVER clause splits up the rows of the query for processing by the window function.

    The PARTITION BY list divides the rows into groups that share the same values of the PARTITION BY expression(s). For each row, the window function is computed using all rows in the same partition as the current row.

    Rows that have the same value in the ORDER BY clause are considered peers. The ranking functions give the same answer for any two peer rows.

    SELECT

    The SELECT command returns a set of records from one or more tables.

    For more information, see .

    hashtag
    ORDER BY

    /* Compute similarity of airlines by the airports they fly from */
    
    select
      *
    from
      table(
        tf_feature_self_similarity(
          primary_features => cursor(
            select
              carrier_name,
              origin,
              count(*) as num_flights
            from
              flights_2008
            group by
              carrier_name,
              origin
          ),
          use_tf_idf => false
        )
      )
    where
      similarity_score <= 0.99
    order by
      similarity_score desc
    limit
      20;
      
    class1|class2|similarity_score
    Expressjet Airlines|Continental Air Lines|0.9564615
    Delta Air Lines|Atlantic Southeast Airlines|0.9436753
    Delta Air Lines|AirTran Airways Corporation|0.9379856
    Atlantic Southeast Airlines|AirTran Airways Corporation|0.9326661
    American Eagle Airlines|American Airlines|0.8906327
    Northwest Airlines|Pinnacle Airlines|0.8222722
    Skywest Airlines|United Air Lines|0.6857293
    Mesa Airlines|US Airways|0.6116939
    United Air Lines|Frontier Airlines|0.5921053
    Mesa Airlines|United Air Lines|0.5686765
    United Air Lines|American Eagle Airlines|0.5272493
    Skywest Airlines|Frontier Airlines|0.4684323
    Southwest Airlines|US Airways|0.4166781
    United Air Lines|American Airlines|0.397027
    Comair|JetBlue Airways|0.3631534
    Mesa Airlines|American Eagle Airlines|0.3379275
    Skywest Airlines|American Eagle Airlines|0.3331468
    Mesa Airlines|Skywest Airlines|0.3235496
    Comair|Delta Air Lines|0.3075919
    Southwest Airlines|Mesa Airlines|0.2901711
    
    /* Compute the similarity of US States by the TF-IDF
     weighted cosine similarity of the words tweeted in each state */
     
     select
      *
    from
      table(
        tf_feature_self_similarity(
          primary_features => cursor(
            select
              state_abbr,
              unnest(tweet_tokens),
              count(*)
            from
              tweets_2022_06
            where country = 'US'
            group by
              state_abbr,
              unnest(tweet_tokens)
          ),
          use_tf_idf => TRUE
        )
      )
    where
      class1 <> class2
    order by
      similarity_score desc;
      
    TX|GA|0.9928479
    IL|TN|0.9920474
    IL|NC|0.9920027
    TX|IL|0.9917723
    IN|OH|0.9916649
    TN|NC|0.9915619
    CA|TX|0.9910875
    IN|VA|0.9909871
    CA|IL|0.9909689
    IL|OH|0.9909481
    TX|NC|0.9908867
    IL|MO|0.9907863
    IN|MI|0.990751
    TN|OH|0.9907123
    IL|MD|0.9907106
    OH|NC|0.9905779
    VA|OH|0.990536
    IN|IL|0.9904549
    IN|MO|0.9903805
    TX|TN|0.9903381
    CREATE TABLE flights … WITH (PARTITIONS='REPLICATED')
    [
      {
        "host": "node1",
        "port": 16274,
        "role": "dbleaf"
      },
      {
        "host": "node2",
        "port": 16274,
        "role": "dbleaf"
      },
     {
        "host": "node3",
        "port": 16274,
        "role": "dbleaf"
      },
      {
        "host": "node4",
        "port": 16274,
        "role": "dbleaf"
      },
    
      {
        "host": "node2",
        "port": 6277,
        "role": "string"
      }
    ]
    port = 16274
    http-port = 16278
    calcite-port = 16279
    data = "<location>/heavyai-storage/nodeLocal/data"
    read-only = false
    string-servers = "<location>/heavyai-storage/cluster.conf"
    port = 6274
    http-port = 6278
    calcite-port = 6279
    data = "<location>/heavyai-storage/nodeLocalAggregator/data"
    read-only = false
    num-gpus = 1
    cluster = "<location>/heavyai-storage/cluster.conf"
    
    [web]
    port = 6273
    frontend = "<location>/prod/heavyai/frontend"
    SELECT
      contour_[lines|polygons],
      contour_values
    FROM TABLE(
      tf_raster_contour_[lines|polygons](
        raster => CURSOR(
          <lon>,
          <lat>,
          <value>
        ),
        agg_type => ‘<agg_type>’,
        bin_dim_meters => <bin_dim_meters>,
        neighborhood_fill_radius => <neighborhood_fill_radius>,
        fill_only_nulls => <fill_only_nulls>,
        fill_agg_type => ‘<fill_agg_type>’,
        flip_latitude => <flip_latitude>,
        contour_interval => <contour_interval>,
        contour_offset => <contour_offset>
      )
    );
    SELECT
      contour_[lines|polygons],
      contour_values
    FROM TABLE(
      tf_raster_contour_[lines|polygons](
        raster => CURSOR(
          <lon>,
          <lat>,
          <value>
        ),
        raster_width => <raster_width>,
        raster_height => <raster_height>,
        flip_latitude => <flip_latitude>,
        contour_interval => <contour_interval>,
        contour_offset => <contour_offset>
      )
    );
    SELECT
      contour_lines,
      contour_values
    FROM TABLE(
      tf_raster_contour_lines(
        raster => CURSOR(
          SELECT
            lon,
            lat,
            elevation
          FROM
            elevation_table
        ),
        agg_type => ‘AVG’,
        bin_dim_meters => 10.0,
        neighborhood_fill_radius => 0,
        fill_only_nulls => FALSE,
        fill_agg_type => ‘AVG’,
        flip_latitude => FALSE,
        contour_interval => 100.0,
        contour_offset => 0.0
      )
    );
    SELECT
      contour_polygons,
      contour_values
    FROM TABLE(
      tf_raster_contour_polygons(
        raster => CURSOR(
          SELECT
            lon,
            lat,
            elevation
          FROM
            elevation_table
        ),
        raster_width => 1024,
        raster_height => 1024,
        flip_latitude => FALSE,
        contour_interval => 100.0,
        contour_offset => 0.0
      )
    );

    value

    Raster band value from which to derive contours.

    Column<FLOAT | DOUBLE>

    agg_type

    See tf_geo_rasterize

    bin_dim_meters

    See tf_geo_rasterize

    neighborhood_fill_radius

    See tf_geo_rasterize

    fill_only_nulls

    See tf_geo_rasterize

    fill_agg_type

    See tf_geo_rasterize

    flip_latitude

    Optionally flip resulting geometries in latitude (default FALSE).

    (This parameter may be removed in future releases)

    BOOLEAN

    contour_interval

    Desired contour interval. The function will generate a line at each interval, or a polygon region that covers that interval.

    FLOAT/DOUBLE (must be same type as value)

    contour_offset

    Optional offset for resulting intervals.

    FLOAT/DOUBLE (must be same type as value)

    raster_width

    Pixel width (stride) of the raster data.

    INTEGER

    raster_height

    Pixel height of the raster data.

    INTEGER

    TF-IDFarrow-up-right

    Node3

    10.10.10.3

    Leaf

    Node4

    10.10.10.4

    Leaf

    DATE

    TIME

    TIMESTAMP

    TINYINT

    -

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    SMALLINT

    Yes

    -

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    INTEGER

    Yes

    Yes

    -

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    No

    BIGINT

    Yes

    Yes

    Yes

    -

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    FLOAT

    Yes

    Yes

    Yes

    Yes

    -

    Yes

    No

    Yes

    No

    No

    No

    DOUBLE

    Yes

    Yes

    Yes

    Yes

    Yes

    -

    No

    Yes

    No

    No

    No

    DECIMAL

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    -

    Yes

    No

    No

    No

    TEXT

    Yes (Use TRY_CAST)

    Yes (Use TRY_CAST)

    Yes (Use TRY_CAST)

    Yes (Use TRY_CAST)

    Yes (Use TRY_CAST)

    Yes (Use TRY_CAST)

    Yes (Use TRY_CAST)

    -

    Yes (Use TRY_CAST)

    Yes (Use TRY_CAST)

    Yes (Use TRY_CAST)

    BOOLEAN

    No

    No

    Yes

    No

    No

    No

    No

    Yes

    -

    n/a

    n/a

    DATE

    No

    No

    No

    No

    No

    No

    No

    Yes

    n/a

    -

    No

    TIME

    No

    No

    No

    No

    No

    No

    No

    Yes

    n/a

    No

    -

    TIMESTAMP

    No

    No

    No

    No

    No

    No

    No

    Yes

    n/a

    Yes

    No

    n/a

    n/a

    No

    No

    No

    n/a

    n/a

    Yes (Use TRY_CAST)

    n/a

    Yes

    n/a

    -

    Z-coordinate column or expression. The output bin is computed as the maximum z-value for all points falling in each bin.

    Column<FLOAT | DOUBLE>

    agg_type

    The aggregate to be performed to compute the output z-column. Should be one of 'AVG', 'COUNT', 'SUM', 'MIN', or 'MAX'.

    TEXT ENCODING NONE

    fill_agg_type (optional)

    The aggregate to be performed when computing the blur pass on the output bins. Should be one of 'AVG', 'COUNT', 'SUM', 'MIN', 'MAX', ' 'AVG', 'COUNT', 'SUM', 'GAUSS_AVG', or 'BOX_AVG'. Note that AVG is synonymous with GAUSS_AVG in this context, and the default fill_agg_type

    TEXT ENCODING NONE

    bin_dim_meters

    The width and height of each x/y bin in meters. If geographic_coords is not set to true, the input x/y units are already assumed to be in meters.

    DOUBLE

    geographic_coords

    If true, specifies that the input x/y coordinates are in lon/lat degrees. The function will then compute a mapping of degrees to meters based on the center coordinate between x_min/x_max and y_min/y_max.

    BOOLEAN

    neighborhood_fill_radius

    The radius in bins to compute the box blur/filter over, such that each output bin will be the average value of all bins within neighborhood_fill_radius bins.

    DOUBLE

    fill_only_nulls

    Specifies that the box blur should only be used to provide output values for null output bins (i.e. bins that contained no data points or had only data points with null Z-values).

    BOOLEAN

    x_min (optional)

    Min x-coordinate value (in input units) for the spatial output grid.

    DOUBLE

    x_max (optional)

    Max x-coordinate value (in input units) for the spatial output grid.

    DOUBLE

    y_min (optional)

    Min y-coordinate value (in input units) for the spatial output grid.

    DOUBLE

    y_max (optional)

    Max y-coordinate value (in input units) for the spatial output grid.

    DOUBLE

    The maximum z-coordinate of all input data assigned to a given spatial bin.

    Column<FLOAT | DOUBLE> (same as input z-coordinate column/expression)

    x

    X-coordinate column or expression

    Column<FLOAT | DOUBLE>

    y

    Y-coordinate column or expression

    Column<FLOAT | DOUBLE>

    x

    The x-coordinates for the centroids of the output spatial bins.

    Column<FLOAT | DOUBLE> (same as input x-coordinate column/expression)

    y

    The y-coordinates for the centroids of the output spatial bins.

    Column<FLOAT | DOUBLE> (same as input y-coordinate column/expression)

    Heavy Immerse dashboard showing a raw Tallahasse LiDAR dataset on the left, and a smoothed version on the right using the min z values for each 1-meter binned cell of the LiDAR data, Gaussian-smoothed over the neighboring 100 cells.

    z

    z

    Pre-Installation Updates

    Upgrade the system and the kernel, then the machine if needed.

    hashtag
    Install Kernel Headers

    Install kernel headers and development packages.

    Install the extra packages.

    hashtag
    Installing Vulkan Library

    The rendering engine of HEAVY.AI (present in Enterprise Editions) requires a Vulkan-enabled driver and the Vulkan library. Without these components, the database itself may not be able to start.

    Install the Vulkan library and its dependencies using apt.

    For more information about troubleshooting Vulkan, see the Vulkan Renderer section.

    hashtag
    Installing NVIDIA Drivers

    Installing NVIDIA drivers with support for the CUDA platform is required to run GPU-enabled versions of HEAVY.AI.

    Each version of HEAVY.AI has a minimum required driver version, which is documented in the Software Requirements page. You can generally install NVIDIA drivers newer than the minimum required version, but the version listed in our Software Requirements page reflects the NVIDIA driver used for software testing.

    You can install NVIDIA drivers in multiple ways, we've outlined three available options below - we recommend Option 1.

    • Option 1: Install NVIDIA drivers with CUDA toolkit from NVIDIA Website

    • Option 2: Install NVIDIA drivers via .run file using the NVIDIA Website

    • Option 3: Install NVIDIA drivers using APT package manager

    circle-info

    It is advisable to keep a record of the installation method used, as upgrading NVIDIA drivers will require the utilization of the same method for successful results.

    hashtag
    What is CUDA? What is the CUDA toolkit?

    CUDA is a parallel computing platform and application programming interface (API) model. It uses a CUDA-enabled graphics processing unit (GPU) for general-purpose processing. The CUDA platform provides direct access to the GPU virtual instruction set and parallel computation elements. For more information on CUDA unrelated to installing HEAVY.AI, see https://developer.nvidia.com/cuda-zonearrow-up-right.

    The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime. The CUDA Toolkit is not required to run HEAVY.AI, but you must install the CUDA toolkit if you use advanced features like C++ User-Defined Functions and or User-Defined Table Functions to extend the database capabilities.

    hashtag
    Option 1: Install NVIDIA Drivers with CUDA Toolkit from NVIDIA Website

    Open https://developer.nvidia.com/cuda-toolkit-archivearrow-up-right and select the desired CUDA Toolkit version to install.

    circle-info

    The minimum CUDA version supported by HEAVY.AI is 11.4. We recommend using a release that has been available for at least two months.

    In the "Target Platform" section, follow these steps:

    1. For "Operating System" select Linux

    2. For Architecture" select x86_64

    3. For "Distribution" select Ubuntu

    4. For "Version" select the version of your operating system (20.04)

    5. For "Installer Type" choose deb (network) **

    6. One by one, run the presented commands in the Installer Instructions section on your server.

    circle-info

    ** You may optionally use any of the "Installer Type" options available.

    If you choose to use the .run file option, prior to running the installer you will need to manually install build-essentials using apt and change permissions of the downloaded .run file to allow execution.

    hashtag
    Option 2: Install NVIDIA Drivers via .run file using the NVIDIA Website

    Install the CUDA package for your platform and operating system according to the instructions on the NVIDIA website (https://www.nvidia.com/download/index.aspxarrow-up-right).

    If you don't know the exact GPU model in your system run this command

    You'll get an output in the format Product Type, Series and Model

    In this example, the Product type is Tesla the Series is T (as Turing), and the model is T4.

    1. Select the Product Type as the one you got with the command.

    2. Select the correct Product Series and Product Type for your installation.

    3. In the Operating System dropdown list, select Linux 64-bit.

    4. In the CUDA Toolkit dropdown list, click a supported version (11.4 or higher).

    5. Click Search.

    6. On the resulting page, verify the download information and click Download

    7. On the subsequent page, if you agree to the terms, right click on "Agree and Download" and select "Copy Link Address". You may also manually download and transfer to your server, skipping the next step.

    8. On your server, type wget and paste the URL you copied in the previous step. Press enter to download.

    circle-info

    Please check that the driver's version you are downloading meets the HEAVI.AI minimum requirementsarrow-up-right

    Install the tools needed for installation.

    Change the permissions of the downloaded .run file to allow execution, and run the installation.

    hashtag
    Option 3: Install NVIDIA drivers using APT

    Install a specific version of the driver for your GPU by installing the NVIDIA repository and using the apt package manager.

    circle-info

    Be careful when choosing the driver version to install. Ensure that your GPU's model is supported and that meets the HEAVI.AI minimum requirementsarrow-up-right

    Run the command to get a list of the available driver's version

    Install the driver version needed with apt

    hashtag
    NVIDIA Driver Post-Installation steps

    Reboot your system to ensure the new version of the driver is loaded

    hashtag
    Verify Successful NVIDIA driver installation

    Run nvidia-smi to verify that your drivers are installed correctly and recognize the GPUs in your environment. Depending on your environment, you should see something like this to confirm that your NVIDIA GPUs and drivers are present.

    circle-info

    If you see an error like the following, the NVIDIA drivers are probably installed incorrectly:

    Review the installation instructions, specifically checking for completion of install prerequisites, and correct any errors.

    hashtag
    Install Vulkan library

    The rendering engine of HEAVY.AI requires a Vulkan-enabled driver and the Vulkan library. Without these components, the database itself can't even start without disabling the back-end renderer.

    Install the Vulkan library and its dependencies using apt.

    For more information about troubleshooting Vulkan, see the Vulkan Renderer section.

    hashtag
    Advanced Installation

    You must install the CUDA toolkit and Clang if you use advanced features like C++ User-Defined Functions and or User-Defined Table Functions to extend the database capabilities.

    hashtag
    Install CUDA Toolkit ᴼᴾᵀᴵᴼᴺᴬᴸ

    circle-check

    If you installed NVIDIA drivers using Option 1 above, the CUDA toolkit is already installed; you may proceed to the verification step below.

    Install the NVIDIA public repository GPG key.

    Add the repository.

    List the available Cuda toolkit versions.

    Install the CUDA toolkit using apt.

    hashtag
    Verification

    Check that everything is working and the toolkit has been installed.

    hashtag
    Install Clang ᴼᴾᵀᴵᴼᴺᴬᴸ

    You must install Clang if you use advanced features like C++ User-Defined Functions and or User-Defined Table Functions to extend the database capabilities. Install Clang and LLVM dependencies using apt.

    hashtag
    Verification

    Check that the software is installed and in the execution path.

    For more information, see C++ User-Defined Functions.

    Hardware Requirements
    hashtag
    Supported Window Functions

    Function

    Description

    BACKWARD_FILL(value)

    Replace the null value by using the nearest non-null value of the value column, using backward search.

    For example, for column x, with the current row r at the index K having a NULL value, and assuming column x has N rows (where K < N):

    BACKWARD_FILL(x) searches for the non-NULL value by searching rows with the index starting from K+1

    CONDITIONAL_CHANGE_EVENT(expr)

    For each partition, a zero-initialized counter is incremented every time the result of expr changes as the expression is evaluated over the partition. Requires an ORDER BY clause for the window.

    COUNT_IF(condition_expr)

    Aggregate function that can be used as a window function for both a nonframed window partition and a window frame. Returns the number of rows satisfying the given condition_expr, which must evaluate to a Boolean value (TRUE/FALSE) like x IS NULL or x > 1.

    HeavyDB supports the aggregate functions AVG, MIN, MAX, SUM, and COUNT in window functions.

    Updates on window functions are supported, assuming the target table is single-fragment. Updates on multi-fragment target tables are not currently supported.

    hashtag
    Example

    This query shows the top airline carrier for each state, based on the number of departures.

    hashtag
    Window Frames

    A window function can include a frame clause that specifies a set of neighboring rows of the current row belonging to the same partition. This allows us to compute a window aggregate function over the window frame, instead of computing it against the entire partition. Note that a window frame for the current row is computed based on either 1) the number of rows before or after the current row (called rows mode) or 2) the specified ordering column value in the frame clause (called range mode).

    For example:

    • From the starting row of the partition to the current row: Using the sum aggregate function, you can compute the running sum of the partition.

    • You can construct a frame based on the position of the rows (called rows mode): For example, a row before 3 rows and after 2 rows:

      • You can compute the aggregate function of the frame having up to six rows (including the current row).

    • You can organize a frame based on the value of the ordering column (called range mode): Assuming C as the current ordering column value, we can compute aggregate value of the window frame which contains rows having ordering column values between (C - 3) and (C + 2).

    Window functions that ignore the frame are evaluated on the entire partition.

    Note that we can define the window frame clause using rows mode with an ordering column.

    You can use the following aggregate functions with the window frame clause.

    hashtag
    Supported Functions

    Category
    Supported Functions

    Frame aggregation

    MIN(val), MAX(val), COUNT(val), AVG(val), SUM(val)

    Frame navigation

    LEAD_IN_FRAME(value, offset)

    LAG_IN_FRAME(value, offset)

    FIRST_VALUE_IN_FRAME

    LAST_VALUE_IN_FRAME

    NTH_VALUE_IN_FRAME

    These are window-frame-aware versions of the

    hashtag
    Syntax

    <frame_mode> | <frame_bound> <frame_mode>can be one of the following:

    • rows

    • range

    hashtag
    Example

    1 | 2 | 3 | 4 | 5.5 | 7.5 | 8 | 9 | 10 → value of a each tuple’s order by expression.

    When the current row has a value 5.5:

    • ROWS BETWEEN 3 PRECEDING and 3 FOLLOWING : 3 rows before and 3 rows after → {2, 3, 4, 5.5, 7.5, 8, 9 }

    • RANGE BETWEEN 3 PRECEDING and 3 FOLLOWING: 5.5 - 3 <= x <= 5.5 + 3 → { 3, 4, 5.5, 8 }

    <frame_bound>:

    • frame_start or

    • frame_between: between frame_start and frame_end

    frame_start and frame_end can be one of the following:

    • UNBOUNDED PRECEDING: The start row of the partition that the current row belongs to.

    • UNBOUNDED FOLLOWING: The end row of the partition that the current row belongs to.

    • CURRENT ROW

      • For rows mode: the current row.

      • For range mode: the peers of the current row. A peer is a row having the same value as the ordering column expression of the current row. Note that all null values are peers of each other.

    • expr PRECEDING

      • For rows mode: expr row before the current row.

      • For range mode: rows with the current row’s ordering expression value minus expr.

    • expr FOLLOWING

      • For rows mode: expr row after the current row.

      • For range mode: rows with the current row’s ordering expression value plus expr.

    UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING have the same meaning in both rows and range mode.

    When the query has no window frame bound, the window aggregate function is computed differently depending on the existence of the ORDER BY clause:

    • Has ORDER BY clause: The window function is computed with the default frame bound, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

    • No Order BY clause: The window function is computed over the entire partition.

    hashtag
    Named Window Function Clause

    You can refer to the same window clause in multiple window aggregate functions by defining it with a unique name in the query definition.

    For example, you can define the named window clauses W1 and W2 as follows:

    Named window function clause w1 refers to a window function clause without a window frame clause, and w2 refers to a named window frame clause.

    hashtag
    Notes and Restrictions

    • To use window framing, you may need an ORDER BY clause in the window definition. Depending on the framing mode used, the constraint varies:

      • Row mode: no restriction of the existence of the ordering column. It also can include multiple ordering columns.

      • Range mode: only a single ordering column is required (not multi-column ordering).

    • Currently, all window functions including aggregation over window frame are computed via CPU-mode.

    • For window frame bound expressions, only non-negative integer literals are supported.

    • GROUPING mode and EXCLUDING are not currently supported.

    Sort order defaults to ascending (ASC).
  • Sorts null values after non-null values by default in an ascending sort, before non-null values in a descending sort. For any query, you can use NULLS FIRST to sort null values to the top of the results or NULLS LAST to sort null values to the bottom of the results.

  • Allows you to use a positional reference to choose the sort column. For example, the command SELECT colA,colB FROM table1 ORDER BY 2 sorts the results on colB because it is in position 2.

  • hashtag
    Query Hints

    HEAVY.AI provides various query hints for controlling the behavior of the query execution engine.

    hashtag
    Syntax

    circle-info

    SELECT hints must appear first, immediately after the SELECT statement; otherwise, the query fails.

    By default, a hint is applied to the query step in which it is defined. If you have multiple SELECT clauses and define a query hint in one of those clauses, the hint is applied only to the specific query step; the rest of the query steps are unaffected. For example, applying the /* cpu_mode */ hint affects only the SELECT clause in which it exists.

    You can define a hint to apply to all query steps by prepending g_ to the query hint. For example, if you define /*+ g_cpu_mode */, CPU execution is applied to all query steps.

    HEAVY.AI supports the following query hints.

    The marker hint type represents a Boolean flag.

    Hint
    Details
    Example

    allow_loop_join

    Enable loop joins.

    SELECT /+* allow_loop_join */ ...

    cpu_mode

    The key-value pair type is a hint name and its value.

    Hint
    Details
    Example

    hashtag
    Cross-Database Queries

    In Release 6.4 and higher, you can run SELECT queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases. This enables more efficient storage and memory utilization by eliminating the need for table duplication across databases, and simplifies access to shared data and tables.

    To execute queries against another database, you must have ACCESS privilege on that database, as well as SELECT privilege.

    hashtag
    Example

    Execute a join query involving a table in the current database and another table in the my_other_db database:

    SELECTarrow-up-right

    HEAVY.AI Installation on RHEL

    This is an end-to-end recipe for installing HEAVY.AI on a Red Hat Enterprise 8.x machine using CPU and GPU devices.

    triangle-exclamation

    The order of these instructions is significant. To avoid problems, install each component in the order presented.

    circle-info

    The same instructions can be used to install on RL / RHEL 9, which some minor modifications.

    hashtag
    Assumptions

    These instructions assume the following:

    • You are installing a "clean" Rocky Linux / RHEL 8 host machine with only the operating system.

    • Your HEAVY.AI host only runs the daemons and services required to support HEAVY.AI.

    • Your HEAVY.AI host is connected to the Internet.

    hashtag
    Preparation

    Prepare your machine by updating your system and optionally enabling or configuring a firewall.

    hashtag
    Update and Reboot

    Update the entire system and reboot the system if needed.

    Install the utilities needed to create HEAVY.AI repositories and download installation binaries.

    hashtag
    JDK

    Follow these instructions to install a headless JDK and configure an environment variable with a path to the library. The “headless” Java Development Kit does not provide support for keyboard, mouse, or display systems. It has fewer dependencies and is best suited for a server host. For more information, see .

    1. Open a terminal on the host machine.

    2. Install the headless JDK using the following command:

    hashtag
    Create the HEAVY.AI User

    Create a group called heavyai and a user named heavyai, who will own HEAVY.AI software and data on the file system.

    You can create the group, user, and home directory using the useradd command with the --user-group and --create-home switches:

    Set a password for the user using the passwd command.

    Log in with the newly created user.

    hashtag
    Installation

    There are two ways to install the heavy.ai software

    • To install software using DNF's package manager, you can utilize DNF's package management capabilities to search for and then install the desired software. This method provides a convenient and efficient way to manage software installations and dependencies on your system.\

    • Installing via a tarball involves obtaining a compressed archive file (tarball) from the software's official source or repository. After downloading the tarball, you would need to extract its contents and follow the installation instructions provided by the software developers. This method allows for manual installation and customization of the software.

    circle-info

    Using the DNF package manager for installation is highly recommended due to its ability to handle dependencies and streamline the installation process, making it a preferred choice for many users.

    hashtag
    Install NVIDIA Drivers ᴳᴾᵁ ᴼᴾᵀᴵᴼᴺ

    If your system includes NVIDIA GPUs but the drivers are not installed, it is advisable to install them before proceeding with the suite installation.

    See I for details.

    hashtag
    Installing with DNF

    Create a DNF repository depending on the edition (Enterprise, Free, or Open Source) and execution device (GPU or CPU) you will use.

    Add the GPG-key to the newly added repository.

    Use DNF to install the latest version of HEAVY.AI.

    circle-info

    You can use the DNF package manager to list the available packages when installing a specific version of HEAVY.AI, such as when a multistep upgrade is necessary, or a specific version is needed for any other reason. sudodnf --showduplicateslistheavyai Select the version needed from the list (e.g. 7.0.0) and install using the command.

    sudo

    hashtag
    Installing with a Tarball

    Let's begin by creating the installation directory.

    Download the archive and install the latest version of the software. The appropriate archive is downloaded based on the edition (Enterprise, Free, or Open Source) and the device used for runtime.

    hashtag
    Configuration

    Follow these steps to configure your HEAVY.AI environment.

    hashtag
    Set Environment Variables

    For your convenience, you can update .bashrc with these environment variables

    circle-exclamation

    Although this step is optional, you will find references to the HEAVYAI_BASE and HEAVYAI_PATH variables. These variables contain the paths where configuration, license, and data files are stored and the location of the software installation. It is strongly recommended that you set them up.

    hashtag
    Initialization

    Run the script that will initialize the HEAVY.AI services and database storage located in the systemd folder.

    Accept the default values provided or make changes as needed.

    circle-info

    This step will take a few minutes if you are installing a CUDA-enabled version of the software because the shaders must be compiled.

    The script creates a data directory in $HEAVYAI_BASE/storage (typically /var/lib/heavyai) with the directories catalogs, dataand log, which will contain the metadata, the data of the database tables, and the log files from Immerse's web server and the database. The log folder is particularly important for database administrators. It contains data about the system's health, performance, and user activities.

    hashtag
    Activation

    The first step to activate the system is starting HeavyDB and the Web Server service that Heavy Immerse needs.

    circle-info

    Heavy Immerse is not available in the OS Edition.

    Start the services and enable the automatic startup of the service at reboot and start the HEAVY.AI services.

    hashtag
    Configure the Firewall ᴼᴾᵀᴵᴼᴺᴬᴸ

    If a firewall is not already installed and you want to harden your system, install and start firewalld.

    To use Heavy Immerse or other third-party tools, you must prepare your host machine to accept incoming HTTP(S) connections. Configure your firewall for external access:

    circle-info

    Most cloud providers use a different mechanism for firewall configuration. The commands above might not run in cloud deployments.

    For more information, see .

    hashtag
    Licensing HEAVY.AI ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

    If you are on Enterprise or Free Edition, you need to validate your HEAVY.AI instance with your license key. You can skip this section if you are using Open Source Edition.

    1. Copy your license key from the registration email message. If you have not received your license key, contact your Sales Representative or register for your 30-day trial .

    2. Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.

    3. When prompted, paste your license key in the text box and click Apply.

    The $HEAVYAI_BASE directory must be dedicated to HEAVYAI; do not set it to a directory shared by other packages.

    hashtag
    Final Checks

    To verify that everything is working, load some sample data, perform a heavysql query, and generate a Pointmap using Heavy Immerse.

    hashtag
    Load Sample Data and Run a Simple Query

    HEAVY.AI ships with two sample datasets of airline flight information collected in 2008, and a census of New York City trees. To install sample data, run the following command.

    Connect to HeavyDB by entering the following command in a terminal on the host machine (default password is HyperInteractive):

    anEnter a SQL query such as the following:

    The results should be similar to the results below.

    hashtag
    Create a Dashboard Using Heavy Immerse ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

    After installing Enterprise or Free Edition, check if Heavy Immerse is running as intended.

    1. Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.

    2. Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

    Create a new dashboard and a Scatter Plot to verify that backend rendering is working.

    1. Click New Dashboard.

    2. Click Add Chart.

    3. Click SCATTER

    hashtag
    ¹ In the OS Edition, Heavy Immerse is unavailable.

    hashtag
    ² The OS Edition does not require a license key.

    Connecting Using SAML

    Security Assertion Markup Language (SAML) is used for exchanging authentication and authorization data between security domains. SAML uses security tokens containing assertions (statements that service providers use to make decisions about access control) to pass information about a principal (usually an end user) between a SAML authority, named an Identity Provider (IdP), and a SAML consumer, named a Service Provider (SP). SAML enables web-based, cross-domain, single sign-on (SSO), which helps reduce the administrative overhead of sending multiple authentication tokens to the user.

    circle-info

    If you use SAML for authentication to HEAVY.AI, and SAML login fails, HEAVY.AI automatically falls back to log in using LDAP if it is configured.

    If both SAML and LDAP authentication fail, you are authenticated against a locally stored password, but only if the allow-local-auth-fallback flag is set.

    These instructions use as the IdP and HEAVY.AI as the SP in an SP-initiated workflow, similar to the following:

    1. A user uses a login page to connect to HEAVY.AI.

    2. The HEAVY.AI login page redirects the user to the Okta login page.

    3. The user signs in using an Okta account. (This step is skipped if the user is already logged in to Okta.)

    In addition to Okta, the following SAML providers are also supported:

    hashtag
    Registering Your SAML Application in Okta

    Begin by adding your SAML application in Okta. If you do not have an Okta account, you can sign up on the .

    1) Log into your Okta account and click the Admin button.

    2) From the Applications menu, select Applications.

    3) Click the Add Application button.

    4) On the Add Application screen, click Create New App.

    5) On the Create a New Application Integration page, set the following details:

    • Platform: Web

    • Sign on Method: SAML 2.0

      And then, click Create.

    6) On the Create SAML Integration page, in the App name field, type Heavyai and click Next.

    7) In the SAML Settings page, enter the following information:

    • Single sign on URL: Your Heavy Immerse web URL with the suffix saml-post; for example, . Select the Use this for Recipient URL and Destination URL checkbox.

    • Audience URI (SP Entity ID): Your Heavy Immerse web URL with the suffix saml-post.

    • Default RelayState

    Leave other settings at their default values, or change as required for your specific installation.

    After making your selections, click Next.

    8) In the Help Okta Support... page, click I'm an Okta customer adding an internal app. All other questions on this page are optional.

    After making your selections, click Finish.

    Your application is now registered and displayed, and the Sign On tab is selected.

    hashtag
    Configuring SAML for Your HEAVY.AI Application

    circle-exclamation

    Before configuring SAML, make sure that HTTPS is enabled on your web server.

    On the Sign On tab, configure SAML settings for your application:

    1) On the Settings page, click View Setup Instructions.

    2) On the How to Configure SAML 2.0 for HEAVY.AI Application page, scroll to the bottom, copy the XML fragment in the Provide the following IDP metadata to your SP provider box, and save it as a raw text file called idp.xml.

    3) Upload idp.xml to your HEAVY.AI server in $HEAVYAI_STORAGE.

    4) Edit heavy.conf and add the following configuration parameters:

    • saml-metadata-file: Path to the idp.xml file you created.

    • saml-sp-target-url: Web URL to your Heavy Immerse saml-post endpoint.

    • saml-signed-assertion

    5) On the How to Configure SAML 2.0 for HEAVY.AI Application page, copy the Identity Provider Single Sign-On URL, which looks similar to this:

    6) If the servers.json file you identified in the [web] section of heavy.conf does not exist, create it. In servers.json, include the SAMLurl property, using the same value you copied in Identify Provider Single Sign-On URL. For example:

    7) Restart the heavyai_server and heavyai_web_server services.

    hashtag
    Auto-Creating Users with SAML

    Users can be automatically created in HEAVY.AI based on group membership:

    1) Go to the Application Configuration page for the HEAVY.AI application in Okta.

    2) On the General tab, scroll to the SAML Settings section and click the Edit button.

    3) Click the Next button, and then in the Group Attribute Statements section, set the following:

    • Name: Groups

    • Filter: Set to the desired filter type to determine the set of groups delivered to HEAVY.AI through the SAML response. In the text box next to the Filter type drop-down box, enter the text that defines the filter.

    • Click Next, and then click Finish.

    Any group that requires access to HEAVY.AI must be created in HEAVY.AI before users can log in.

    1. Modify your heavyai.conf file by adding the following parameter:

      The heavyai.conf entries now look like this:

    2. Restart the heavyai_server and heavyai_web_server processes.

    Users whose group membership in Okta contains a group name that exists in HeavyDB can log in and have the privileges assigned to their groups.

    hashtag
    Creating Users Manually

    1) On the Okta website, on the Assignments tab, click Assign > Assign to People.

    2) On the Assign HEAVY.AI to People panel, click the Assign button next to users that you want to provide access to HEAVY.AI.

    3) Click Save and Go Back to assign HEAVY.AI to the user.

    ) Repeat steps 2 and 3 for all users to whom you want to grant access. Click Done when you are finished.

    circle-info

    User accounts assigned to the HEAVY.AI application in Okta must exist in HEAVY.AI before a user can log in. To have users created automatically based on their group membership, see .

    hashtag
    Verifying SAML Configuration

    Verify that the SAML is configured correctly by opening your Heavy Immerse login page. You should be automatically redirected to the Okta login page, and then back to Immerse, without entering credentials.

    When you log out of Immerse, you see the following screen:

    circle-info

    Logging out of Immerse does not log you out of Okta. If you log back in to Immerse and are still logged in to Okta, you do not need to reathenticate.

    If authentication fails, you see this error message when you attempt to log in through Okta:

    To resolve the authentication error:

    1. Add the license information by either:

      • Adding heavyai.license to your HEAVY.AI data directory.

      • Logging in to HeavyDB and run the following command:

    The Information about authentication errors can be found in the log files.

    Configuration Parameters for HEAVY.AI Web Server

    Following are the parameters for runtime settings on HeavyAI Web Server. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.

    Flag
    Description
    Default
    SELECT * FROM TABLE(
      tf_geo_rasterize(
          raster => CURSOR(
            SELECT 
               x, y, z FROM table
          ),
          agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
          /* fill_agg_type is optional */
          [<fill_agg_type> => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'|'GAUSS_AVG'|'BOX_AVG'>,] 
          bin_dim_meters => <meters>, 
          geographic_coords => <true/false>, 
          neighborhood_fill_radius => <radius in bins>,
          fill_only_nulls => <true/false> [,
          <x_min> => <minimum output x-coordinate>,
          <x_max> => <maximum output x-coordinate>,
          <y_min> => <minimum output y-coordinate>,
          <y_max> => <maximum output y-coordinate>]
        ) 
      )...
    /* Bin 10cm USGS LiDAR from Tallahassee to 1 meter, taking the minimum z-value
    for each xy-bin. Then for each xy-bin, perform a Gaussian-average over the neighboring
    100 xy-bins. This query yields the approximate terrain for an area after removing human-made
    structures (due to the wide 100-bin Gaussian-average window), as can be seen in the 
    right-hand render result in the screenshot below. Note that the LIMIT was only
    applied to this SQL query and is not used in the rendered-screenshot below. */
    
    SELECT
      x,
      y,
      z
    FROM
      TABLE(
        tf_geo_rasterize(
          raster => CURSOR(
            SELECT
              ST_X(pt),
              ST_Y(pt),
              z
            FROM
              USGS_LPC_FL_LeonCo_2018_049377_N_LAS_2019
          ),
          bin_dim_meters => 1,
          geographic_coords => TRUE,
          neighborhood_fill_radius => 100,
          fill_only_nulls => FALSE,
          agg_type => 'MIN',
          fill_agg_type => 'GAUSS_AVG'
        )
      ) limit 20;
      
    x|y|z
    -84.29857764791747|30.40240526206634|-15.30264
    -84.29086331121893|30.40264801040913|-17.25718
    -84.29856722313815|30.40240526206634|-15.31047
    -84.29855679835883|30.40240526206634|-15.31835
    -84.29085288643959|30.40264801040913|-17.25859
    -84.2985463735795|30.40240526206634|-15.32627
    -84.30278925876371|30.402198476441|-17.09047
    -84.29084246166028|30.40264801040913|-17.25993
    -84.30277883398438|30.402198476441|-17.10194
    -84.29853594880018|30.40240526206634|-15.33422
    -84.30276840920506|30.402198476441|-17.11329
    -84.29083203688096|30.40264801040913|-17.26122
    -84.30275798442574|30.402198476441|-17.12446
    -84.29852552402086|30.40240526206634|-15.34223
    -84.30274755964642|30.402198476441|-17.1354
    -84.29878614350392|30.40263002905041|-14.74146
    -84.29119690415723|30.40236030866953|-17.22919
    -84.30449892257258|30.40238728070761|-15.9867
    -84.29328186002171|30.40223443915845|-17.63177
    -84.29432433795395|30.40263901972977|-17.85748  
    NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. 
    Make sure that the latest NVIDIA driver is installed and running.
    lspci -v | egrep "NVIDIA"
    sudo apt update
    sudo apt upgrade -y
    sudo reboot
    sudo apt install linux-headers-$(uname -r)
    sudo apt install pciutils
    sudo apt install libvulkan1
    lspci -v | egrep "3D|VGA*.NVIDIA" | awk -F '\[|\]' ' { print $2 } '
    Tesla T4
    sudo apt install build-essential
    chmod +x NVIDIA-Linux-x86_64-*.run
    sudo ./NVIDIA-Linux--x86_64-*.run
    apt list nvidia-driver-*
    Listing... Done
    
    nvidia-driver-450/bionic-updates,bionic-security 460.91.03-0ubuntu0.18.04.1 amd64
    nvidia-driver-450-server/bionic-updates,bionic-security 450.172.01-0ubuntu0.18.04.1 amd64
    nvidia-driver-455/bionic-updates,bionic-security 460.91.03-0ubuntu0.18.04.1 amd64
    nvidia-driver-460/bionic-updates,bionic-security 470.103.01-0ubuntu0.18.04.1 amd64
    nvidia-driver-465/bionic-updates,bionic-security 470.103.01-0ubuntu0.18.04.1 amd64
    nvidia-driver-470/bionic-updates,bionic-security 470.103.01-0ubuntu0.18.04.1 amd64
    nvidia-driver-470-server/bionic-updates,bionic-security 470.103.01-0ubuntu0.18.04.1 amd64
    nvidia-driver-495/bionic-updates,bionic-security 510.60.02-0ubuntu0.18.04.1 amd64
    nvidia-driver-510/bionic-updates,bionic-security 510.60.02-0ubuntu0.18.04.1 amd64
    nvidia-driver-510-server/bionic-updates,bionic-security 510.47.03-0ubuntu0.18.04.1 amd64
    sudo apt install nvidia-driver-<version>
    sudo reboot
    sudo apt install libvulkan1
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
    sudo apt-key adv --fetch-keys \
    https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
    echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" \
    | sudo tee /etc/apt/sources.list.d/cuda.list
    apt update
    apt list cuda-toolkit-* | grep -v config
    
    Listing...
    cuda-toolkit-10-0/unknown 10.0.130-1 amd64
    cuda-toolkit-10-1/unknown 10.1.243-1 amd64
    cuda-toolkit-10-2/unknown 10.2.89-1 amd64
    cuda-toolkit-11-0/unknown 11.0.3-1 amd64
    cuda-toolkit-11-1/unknown 11.1.1-1 amd64
    cuda-toolkit-11-2/unknown 11.2.2-1 amd64
    cuda-toolkit-11-3/unknown 11.3.1-1 amd64
    cuda-toolkit-11-4/unknown 11.4.4-1 amd64
    cuda-toolkit-11-5/unknown 11.5.2-1 amd64
    cuda-toolkit-11-6/unknown 11.6.2-1 amd64
    cuda-toolkit-11-7/unknown 11.7.0-1 amd64
    sudo apt install cuda-toolkit-<version>
    /usr/local/cuda/bin/nvcc --version
    
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2020 NVIDIA Corporation
    Built on Mon_Nov_30_19:08:53_PST_2020
    Cuda compilation tools, release 11.2, V11.2.67
    Build cuda_11.2.r11.2/compiler.29373293_0
    sudo apt install clang
    clang --version
    clang version 10.0.0-4ubuntu1 
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin
    select origin_state, carrier_name, n 
       from (select origin_state, carrier_name, row_number() over(
          partition by origin_state order by n desc) as rownum, n 
             from (select origin_state, carrier_name, count(*) as n 
                from flights_2008_7M where extract(year 
                   from dep_timestamp) = 2008 
       group by origin_state, carrier_name )) where rownum = 1
    select min(x) over w1, max(x) over w2 from test window w1 as (order by y), 
      w2 as (partition by y order by z rows between 2 preceding and 2 following);
    query:
      |   WITH withItem [ , withItem ]* query
      |   {
              select
          }
          [ ORDER BY orderItem [, orderItem ]* ]
          [ LIMIT [ start, ] { count | ALL } ]
          [ OFFSET start { ROW | ROWS } ]
    
    withItem:
          name
          [ '(' column [, column ]* ')' ]
          AS '(' query ')'
    
    orderItem:
          expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ]
    
    select:
          SELECT [ DISTINCT ] [/*+ hints */]
              { * | projectItem [, projectItem ]* }    
          FROM tableExpression
          [ WHERE booleanExpression ]
          [ GROUP BY { groupItem [, groupItem ]* } ]
          [ HAVING booleanExpression ]
          [ WINDOW window_name AS ( window_definition ) [, ...] ]
    
    projectItem:
          expression [ [ AS ] columnAlias ]
      |   tableAlias . *
    
    tableExpression:
          tableReference [, tableReference ]*
      |   tableExpression [ ( LEFT ) [ OUTER ] ] JOIN tableExpression [ joinCondition ]
    
    joinCondition:
          ON booleanExpression
      |   USING '(' column [, column ]* ')'
    
    tableReference:
          tablePrimary
          [ [ AS ] alias ]
    
    tablePrimary:
          [ catalogName . ] tableName
      |   '(' query ')'
    
    groupItem:
          expression
      |   '(' expression [, expression ]* ')'
    SELECT /*+ hint */ FROM ...;
    SELECT name, saleamt, saledate FROM my_other_db.customers AS c, sales AS s 
      WHERE c.id = s.customerid;
    if not specified is
    GAUSS_AVG
    .

    Set the maximum number of rows to 100: SELECT /+* loop_join_inner_table_max_num_rows(100) */ ...

    max_join_hash_table_size

    Set the maximum size of the hash table.

    • Value type: INT

    • Range: 0 < x

    Set the maximum size of the join hash table to 100:

    SELECT /+* max_join_hash_table_size(100) */ ...

    overlaps_bucket_threshold

    Set the overlaps bucket threshold.

    • Value type: DOUBLE

    • Range: 0-90

    Set the overlaps threshold to 10:

    SELECT /*+ overlaps_bucket_threshold(10.0) */ ...

    overlaps_max_size

    Set the maximum overlaps size.

    • Value type: INTEGER

    • Range: >=0

    Set the maximum overlap to 10: SELECT /*+ overlaps_max_size(10.0) */ ...

    overlaps_keys_per_bin

    Set the number of overlaps keys per bin.

    • Value type: DOUBLE

    • Range: 0.0 < x < double::max

    SELECT /+* overlaps_keys_per_bin(0.1) */ ...

    query_time_limit

    Set the maximum time for the query to run.

    • Value type: INTEGER

    • Range: >=0

    SELECT /+* query_time_limit(1000) */ ...

    Force CPU execution mode.

    SELECT /*+ cpu_mode */ ...

    columnar_output

    Enable columnar output for the input query.

    SELECT /+* columnar_output */ ...

    disable_loop_join

    Disable loop joins.

    SELECT /+* disable_loop_join */ ...

    dynamic_watchdog

    Enable dynamic watchdog.

    SELECT /+* dynamic_watchdog */ ...

    dynamic_watchdog_off

    Disable dynamic watchdog.

    SELECT /+* dynamic_watchdog_off */ ...

    force_baseline_hash_join

    Use the baseline hash join scheme by skipping the perfect hash join scheme, which is used by default.

    SELECT /+* force_baseline_hash_join */ ...

    force_one_to_many_hash_join

    Deploy a one-to-many hash join by skipping one-to-one hash join, which is used by default.

    SELECT /+* force_one_to_many_hash_join */ ...

    keep_result

    Add result set of the input query to the result set cache.

    SELECT /+* keep_result */ ...

    keep_table_function_result

    Add result set of the table function query to the result set cache.

    SELECT /+* keep_table_function_result */ ...

    overlaps_allow_gpu_build

    Use GPU (if available) to build an overlaps join hash table. (CPU is used by default.)

    SELECT /+* overlaps_allow_gpu_build */ ...

    overlaps_no_cache

    Skip adding an overlaps join hash table to the hash table cache.

    SELECT /+* overlaps_no_cache */ ...

    rowwise_output

    Enable row-wise output for the input query.

    SELECT /+* rowwise_output */ ...

    watchdog

    Enable watchdog.

    SELECT /+* watchdog */ ...

    watchdog_off

    Disable watchdog.

    SELECT /+* watchdog_off */ ...

    aggregate_tree_fanout

    Defines a fan out of a tree used to compute window aggregation over frame. Depending on the frame size, the tree fanout affects the performance of aggregation and the tree construction for each window function with a frame clause.

    • Value type: INT

    • Range: 0-1024

    SELECT /+* aggregate_tree_fanout(32) */ SUM(y) OVER (ORDER BY x ROWS BETWEEN ...) ...

    loop_join_inner_table_max_num_rows

    Set the maximum number of rows available for a loop join.

    • Value type: INT

    • Range: 0 < x

    For DATE, TIME, and TIMESTAMP: Use the INTERVAL keyword with a specific time unit, depending on a data type:

    • TIMESTAMP type: NANOSECOND, MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR, DAY, MONTH, and YEAR

    • TIME type: SECOND, MINUTE, and HOUR

    • DATE type: DAY, MONTH, and YEAR

      For example: RANGE BETWEEN INTERVAL 1 DAY PRECEDING and INTERVAL 3 DAY FOLLOWING

  • Currently, only literal expressions as expr such as 1 PRECEDING and 100 PRECEDING are supported.

  • For DATE, TIME, and TIMESTAMP: Use the INTERVAL keyword with a specific time unit, depending on a data type:

    • TIMESTAMP type: NANOSECOND, MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR, DAY, MONTH, and YEAR

    • TIME type: SECOND, MINUTE, and HOUR

    • DATE type: DAY, MONTH, and YEAR

      For example: RANGE BETWEEN INTERVAL 1 DAY PRECEDING and INTERVAL 3 DAY FOLLOWING

  • Currently, only support literal expression as expr such as 1 FOLLOWING and 100 FOLLOWING are supported.

  • to
    N
    . The NULL value is replaced with the first non-NULL value found.

    At least one ordering column must be defined in the window clause.

    NULLS FIRST ordering of the input value is added automatically for any user-defined ordering of the input value. For example:

    BACKWARD_FILL(x) OVER (PARTITION BY c ORDER BY x) - No ordering is added; ordering already exists on x. BACKWARD_FILL(x) OVER (PARTITION BY c ORDER BY o) - Ordering is added internally for a consistent query result.

    ,
    ,
    ,
    , and NTH_VALUE functions.

    CUME_DIST()

    Cumulative distribution value of the current row: (number of rows preceding or peers of the current row)/(total rows). Window framing is ignored.

    DENSE_RANK()

    Rank of the current row without gaps. This function counts peer groups. Window framing is ignored.

    FIRST_VALUE(value)

    Returns the value from the first row of the window frame (the rows from the start of the partition to the last peer of the current row).

    FORWARD_FILL(value)

    Replace the null value by using the nearest non-null value of the value column, using forward search. For example, for column x, with the current row r at the index K having a NULL value, and assuming column x has N rows (where K < N): FORWARD_FILL(x) searches for the non-NULL value by searching rows with the index starting from K-1 to 1. The NULL value is replaced with the first non-NULL value found. At least one ordering column must be defined in the window clause.

    NULLS FIRST ordering of the input value is added automatically for any user-defined ordering of the input value. For example: FORWARD_FILL(x) OVER (PARTITION BY c ORDER BY x) - No ordering is added; ordering already exists on x. FORWARD_FILL(x) OVER (PARTITION BY c ORDER BY o) - Ordering is added internally for a consistent query result.

    LAG(value, offset)

    Returns the value at the row that is offset rows before the current row within the partition. LAG_IN_FRAME is the window-frame-aware version.

    LAST_VALUE(value)

    Returns the value from the last row of the window frame.

    LEAD(value, offset)

    Returns the value at the row that is offset rows after the current row within the partition. LEAD_IN_FRAME is the window-frame-aware version.

    NTH_VALUE(expr,N)

    Returns a value of expr at row N of the window partition.

    NTILE(num_buckets)

    Subdivide the partition into buckets. If the total number of rows is divisible by num_buckets, each bucket has a equal number of rows. If the total is not divisible by num_buckets, the function returns groups of two sizes with a difference of 1. Window framing is ignored.

    PERCENT_RANK()

    Relative rank of the current row: (rank-1)/(total rows-1). Window framing is ignored.

    RANK()

    Rank of the current row with gaps. Equal to the row_number of its first peer.

    ROW_NUMBER()

    Number of the current row within the partition, counting from 1. Window framing is ignored.

    SUM_IF(condition_expr)

    Aggregate function that can be used as a window function for both a nonframed window partition and a window frame. Returns the sum of all expression values satisfying the given condition_expr. Applies to numeric data types.

    LEAD
    LAG
    FIRST_VALUE
    LAST_VALUE
    Okta returns a base64-encoded SAML Response to the user, which contains a SAML Assertion that the user is allowed to use HEAVY.AI. If configured, it also returns a list of SAML Groups assigned to the user.
  • Okta redirects the user to the HEAVY.AI login page together with the SAML response (a token).

  • HEAVY.AI verifies the token, and retrieves the user name and groups. Authentication and authorization is complete.

  • Oracle Access Managementarrow-up-right
    : Forward slash (
    /
    ).
  • Application username: HEAVY.AI recommends using the email address you used to log in to Okta.

  • : Boolean value that determines whether Okta signs the assertion; true by default.
  • saml-signed-response: Boolean value that determines whether Okta signs the response; true by default.

    For example:

  • In the web section, add the full physical path to the servers.json file; for example:

  • Reattempt login through Okta.

    Oktaarrow-up-right
    auth0arrow-up-right
    Ping Identityarrow-up-right
    Keycloakarrow-up-right
    Okta web pagearrow-up-right
    https://tonysingle.com:6273/saml-postarrow-up-right
    Auto-Creating Users with SAML
    saml-metadata-file = "/heavyai-storage/idp.xml"
    saml-sp-target-url = "https://tonysingle.com:6273/saml-post"
    saml-signed-assertion = true
    saml-signed-response = true
    [web]
    enable-https = true
    cert = "/heavyai-storage/ssl/server.crt"
    key = "/heavyai-storage/ssl/server.key"
    servers-json = "/heavyai-storage/servers.json"
    https://heavyai-tony.okta.com/app/heavyaiorg969324_heavyai_2/exk1p0m4blWiBsFiU357/sso/saml
     [
      {
        "enableJupyter": true,
         "url": "tonysingle.com",
         "port": "6273",
        "SAMLurl":"https://heavyai-tony.okta.com/app/heavyaiorg969324_heavyai_2/exk1p0m4blWiBsFiU357/sso/saml"
      }
    ]
    saml-sync-roles = true
    saml-metadata-file = "/heavyai-storage/idp.xml"
    saml-sp-target-url = "https://tonysingle.com:6273/saml-post"
    saml-sync-roles = true
    heavysql> \set_license
    dnf
    install
    heavyai-7.0.0_20230501_be4f51b048-1.x86_64

    Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

    .
  • Click Add Data Source.

  • Choose the flights_2008_10k table as the data source.

  • Click X Axis +Add Measure.

  • Choose depdelay.

  • Click Y Axis +Add Measure.

  • Choose arrdelay.

  • Click Size +Add Measure.

  • Choose airtime.

  • Click Color +Add Measure.

  • Choose dest_state.

  • The resulting chart clearly demonstrates that there is a direct correlation between departure delay and arrival delay. This insight can help in identifying areas for improvement and implementing strategies to minimize delays and enhance overall efficiency.\

    Gpu Drawed Scatterplot

    Create a new dashboard and a Table chart to verify that Heavy Immerse is working.

    1. Click New Dashboard.

    2. Click Add Chart.

    3. Click Bubble.

    4. Click Select Data Source.

    5. Choose the flights_2008_10k table as the data source

    6. Click Add Dimension.

    7. Choose carrier_name.

    8. Click Add Measure.

    9. Choose depdelay.

    10. Click Add Measure.

    11. Choose arrdelay.

    12. Click Add Measure.

    13. Choose #Records.

    The resulting chart shows, unsurprisingly, that also the average departure delay is correlated to the average of arrival delay, while there is quite a difference between Carriers.\

    https://openjdk.java.netarrow-up-right
    DNF Installation
    Tarball Installation
    nstall NVIDIA Drivers and Vukan on Rocky Linux and RHEL
    ¹
    https://fedoraproject.org/wiki/Firewalld?rd=FirewallDarrow-up-right
    ²
    herearrow-up-right
    ¹
    ¹

    Allows for a CORS exception to the same-origin policy. Required to be true if Immerse is hosted on a different domain or subdomain hosting heavy_web_server and heavydb.

    Allowing any origin is a less secure mode than what heavy_web_server requires by default.

    --allow-any-origin = false

    -b | backend-url <string>

    URL to http-port on heavydb. Change to avoid collisions with other services.

    http://localhost:6278

    -B | binary-backend-url <string>

    URL to http-binary-port on heavydb.

    http://localhost:6276

    cert string

    Certificate file for HTTPS. Change for testing and debugging.

    cert.pem

    -c | config <string>

    Path to HeavyDB configuration file. Change for testing and debugging.

    -d | data <string>

    Path to HeavyDB data directory. Change for testing and debugging.

    data

    data-catalog <string>

    Path to data catalog directory.

    n/a

    docs string

    Path to documentation directory. Change if you move your documentation files to another directory.

    docs

    enable-binary-thrift

    Use the binary thrift protocol.

    TRUE[1]

    enable-browser-logs [=arg]

    Enable access to current log files via web browser. Only super users (while logged in) can access log files.

    Log files are available at http[s]://host:port/logs/log_name.

    The web server log files: ACCESS - http[s]://host:port/logs/access ALL - http[s]://host:port/logs/all

    HeavyDB log files: INFO - http[s]://host:port/logs/info WARNING - http[s]://host:port/logs/warning ERROR - http[s]://host:port/logs/

    FALSE[0]

    enable-cert-verification

    TLS certificate verification is a security measure that can be disabled for the cases of TLS certificates not issued by a trusted certificate authority. If using a locally or unofficially generated TLS certificate to secure the connection between heavydb and heavy_web_server, this parameter must be set to false. heavy_web_server expects a trusted certificate authority by default.

    --enable-cert-verification = true

    enable-cross-domain [=arg]

    Enable frontend cross-domain authentication. Cross-domain session cookies require the SameSite = None; Secure headers. Can only be used with HTTPS domains; requires enable-https to be true.

    FALSE[0]

    enable-https

    Enable HTTPS support. Change to enable secure HTTP.

    enable-https-authentication

    Enable PKI authentication.

    enable-https-redirect [=arg]

    Enable a new port that heavy_web_server listens on for incoming HTTP requests. When received, it returns a redirect response to the HTTPS port and protocol, so that browsers are immediately and transparently redirected. Use to provide an HEAVY.AI front end that can run on both the HTTP protocol (http://my-heavyai-frontend.com) on default HTTP port 80, and on the primary HTTPS protocol (https://my-heavyai-frontend.com) on default https port 443, and have requests to the HTTP protocol automatically redirected to HTTPS. Without this, requests to HTTP fail. Assuming heavy_web_server can attach to ports below 1024, the configuration would be: enable-https-redirect = TRUE = 80

    FALSE[0]

    enable-non-kernel-time-query-interrupt

    Enable non-kernel-time query interrupt.

    TRUE[1]

    enable-runtime-query-interrupt

    Enbale runtime query interrupt.

    TRUE[1]

    enable-upload-extension-check

    Disables restrictive file extension upload check.

    encryption-key-file-path <string>

    Path to the file containing the credential payload cipher key. Key must be 256 bits in length.

    -f | frontend string

    Path to frontend directory. Change if you move the location of your frontend UI files.

    frontend

    http-to-https-redirect-port = arg

    Configures the http (incoming) port used by . The port option specifies the redirect port number. Use to provide an HEAVY.AI front end that can run on both the HTTP protocol (http://my-heavyai-frontend.com) on default HTTP port 80, and on the primary HTTPS protocol (https://my-heavyai-frontend.com) on default https port 443, and have requests to the HTTP protocol automatically redirected to HTTPS. Without this, requests to HTTP fail. Assuming heavy_web_server can attach to ports below 1024, the configuration would be: enable-https-redirect = TRUE = 80

    6280

    idle-session-duration = arg

    Idle session default, in minutes.

    60

    jupyter-prefix-string <string>

    Jupyter Hub base_url for Jupyter integration.

    /jupyter

    jupyter-url-string <string>

    URL for Jupyter integration.

    -j |jwt-key-file

    Path to a key file for client session encryption.

    The file is expected to be a PEM-formatted ( .pem ) certificate file containing the unencrypted private key in PKCS #1, PCKS #8, or ASN.1 DER form.

    Example PEM file creation using OpenSSL.

    Required only if using a high-availability server configuration or another server configuration that requires an instance of Immerse to talk to multiple heavy_web_server instances.

    Each heavy_web_server instance needs to use the same encryption key to encrypt and decrypt client session information which is used for session persistence ("sessionization") in Immerse.

    key <string>

    Key file for HTTPS. Change for testing and debugging.

    key.pem

    max-tls-version

    Refers to the version of TLS encryption used to secure web protocol connections. Specifies a maximum TLS version.

    min-tls-version

    Refers to the version of TLS encryption used to secure web protocol connections. Specifies a minimum TLS version.

    --min-tls-version = VersionTLS12

    peer-cert <string>

    Peer CA certificate PKI authentication.

    peercert.pem

    -p | port int

    Frontend server port. Change to avoid collisions with other services.

    6273

    -r | read-only

    Enable read-only mode. Prevent changes to the data.

    secure-acao-uri

    If set, ensures that all Access-Allow-Origin headers are set to the value provided.

    servers-json <string>

    Path to servers.json. Change for testing and debugging.

    session-id-header <string>

    Session ID header.

    immersesid

    ssl-cert <string>

    SSL validated public certificate.

    sslcert.pem

    ssl-private-key <string>

    SSL private key file.

    sslprivate.key

    strip-x-headers <strings>

    List of custom X http request headers to be removed from incoming requests. Use --strip-x-headers=""to allow all X headers through.

    [X-HeavyDB-Username]

    timeout duration

    Maximum request duration in #h#m#s format. For example 0h30m0s represents a duration of 30 minutes. Controls the maximum duration of individual HTTP requests. Used to manage resource exhaustion caused by improperly closed connections. This also limits the execution time of queries made over the Thrift HTTP transport. Increase the duration if queries are expected to take longer than the default duration of one hour; for example, if you COPY FROM a large file when using heavysql with the HTTP transport.

    1h0m0s

    tls-cipher-suites <strings>

    Refers to the combination of algorithms used in TLS encryption to secure web protocol connections.

    All available TLS cipher suites compatible with HTTP/2:

    • TLS_RSA_WITH_RC4_128_SHA

    • TLS_RSA_WITH_AES_128_CBC_SHA

    The following cipher suites are accepted by default:

    • TLS_ECDHE_RSA_WITH_AES_128_ GCM_SHA256

    • TLS_ECDHE_ECDSA_WITH_AES_128_ GCM_SHA256

    tls-curves <strings>

    Refers to the types of Elliptic Curve Cryptography (ECC) used in TLS encryption to secure web protocol connections.

    All available TLS elliptic Curve IDs:

    • secp256r1 (Curve ID P256)

    • CurveP256 (Curve ID P256)

    The following TLS curves are accepted by default:

    • CurveP521

    • CurveP384

    tmpdir string

    Path for temporary file storage. Used as a staging location for file uploads. Consider locating this directory on the same file system as the HEAVY.AI data directory. If not specified on the command line, heavyai_web_server recognizes the standard TMPDIR environment variable as well as a specific HEAVYAI_TMPDIR environment variable, the latter of which takes precedence. If you use neither the command-line argument nor one of the environment variables, the default, /tmp/ is used.

    /tmp

    ultra-secure-mode

    Enables secure mode that sets Access-Allow-Origin headers to --secure-acao-uriand sets security headers like X-Frame-Options, Content-Security-Policy, and Strict-Transport-Security.

    -v | verbose

    Enable verbose logging. Adds log messages for debugging purposes.

    version

    Return version.

    additional-file-upload-extensions <string>

    Denote additional file extensions for uploads. Has no effect if --enable-upload-extension-check is not set.

    allow-any-origin

    HEAVY.AI Installation on Ubuntu

    This is an end-to-end recipe for installing HEAVY.AI on a Ubuntu 22.04 machine using CPU and GPU devices.

    circle-exclamation

    The order of these instructions is significant. To avoid problems, install each component in the order presented.

    hashtag
    Assumptions

    These instructions assume the following:

    • You are installing on a “clean” Ubuntu 22.04 host machine with only the operating system installed.

    • Your HEAVY.AI host only runs the daemons and services required to support HEAVY.AI.

    • Your HEAVY.AI host is connected to the Internet.

    hashtag
    Preparation

    Prepare your Ubuntu machine by updating your system, creating the HEAVY.AI user (named heavyai), installing kernel headers, installing CUDA drivers, and optionally enabling the firewall.

    hashtag
    Update and Reboot

    1. Update the entire system:

    2. Install the utilities needed to create Heavy.ai repositories and download archives:

    3. Install the headless JDK and the utility apt-transport-https:

    4. Reboot to activate the latest kernel:

    hashtag
    Create the HEAVY.AI User

    Create a group called heavyai and a user named heavyai, who will be the owner of the HEAVY.AI software and data on the filesystem.

    1. Create the group, user, and home directory using the useradd command with the --user-group and --create-home switches.

    2. Set a password for the user:

    3. Log in with the newly created user:

    hashtag
    Installation

    Install the HEAVY.AI using APT and a tarball.

    circle-info

    The installation using the APT package manager is recommended to those who want a more automated install and upgrade procedure.

    hashtag
    Install NVIDIA Drivers ᴳᴾᵁ ᴼᴾᵀᴵᴼᴺ

    If your system uses NVIDIA GPUs, but the drivers not installed, install them now. See for details.

    hashtag
    Installing with APT

    Download and add a GPG key to APT.

    Add a source apt depending on the edition (Enterprise, Free, or Open Source) and execution device (GPU or CPU) you are going to use.

    Use apt to install the latest version of HEAVY.AI.

    circle-info

    If you need to install a specific version of HEAVY.AI, because you are upgrading from Omnisci or for different reasons, you must run the following command:

    hashtag
    Installing with a Tarball

    First create the installation directory.

    Download the archive and install the software. A different archive is downloaded depending on the Edition (Enterprise, Free, or Open Source) and the device used for runtime (GPU or CPU).

    hashtag
    Configuration

    Follow these steps to prepare your HEAVY.AI environment.

    hashtag
    Set Environment Variables

    For convenience, you can update .bashrc with these environment variables

    circle-exclamation

    Although this step is optional, you will find references to the HEAVYAI_BASE and HEAVYAI_PATH variables. These variables contain respectively the paths where configuration, license, and data files are stored and where the software is installed. Setting them is strongly recommended.

    hashtag
    Initialization

    Run the systemd installer to create heavyai services, a minimal config file, and initialize the data storage.

    Accept the default values provided or make changes as needed.

    The script creates a data directory in $HEAVYAI_BASE/storage (default /var/lib/heavyai/storage) with the directories catalogs, data, export and log.The import directory is created when you insert data the first time. If you are HEAVY.AI administrator, the log directory is of particular interest.

    hashtag
    Activation

    Start and use HeavyDB and Heavy Immerse.

    Heavy Immerse is not available in the OSS Edition, so if running the OSS Edition the systemctl command using the heavy_web_server has no effect.

    Enable the automatic startup of the service at reboot and start the HEAVY.AI services.

    hashtag
    Configure Firewall ᴼᴾᵀᴵᴼᴺᴬᴸ

    If a firewall is not already installed and you want to harden your system, install theufw.

    To use Heavy Immerse or other third-party tools, you must prepare your host machine to accept incoming HTTP(S) connections. Configure your firewall for external access.

    circle-info

    Most cloud providers use a different mechanism for firewall configuration. The commands above might not run in cloud deployments.

    For more information, see .

    hashtag
    Licensing HEAVY.AI ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

    If you are using Enterprise or Free Edition, you need to validate your HEAVY.AI instance with your license key.

    circle-exclamation

    Skip this section if you are on Open Source Edition

    1. Copy your license key of Enterprise or Free Edition from the registration email message. If you do not have a license and you want to evaluate HEAVI.AI in an unlimited

      enterprise environment, contact your Sales Representative or register for your 30-day trial of Enterprise Edition . If you need a Free License you can get one .

    2. Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.

    hashtag
    Final Checks

    To verify that everything is working, load some sample data, perform a heavysql query, and generate a Pointmap using Heavy Immerse

    hashtag
    Load Sample Data and Run a Simple Query

    HEAVY.AI ships with two sample datasets of airline flight information collected in 2008, and a census of New York City trees. To install sample data, run the following command.

    Connect to HeavyDB by entering the following command in a terminal on the host machine (default password is HyperInteractive):

    Enter a SQL query such as the following

    The results should be similar to the results below.

    hashtag
    Create a Dashboard Using Heavy Immerse ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

    After installing Enterprise or Free Edition, check if Heavy Immerse is running as intended.

    1. Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.

    2. Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

    Create a new dashboard and a Scatter Plot to verify that backend rendering is working.

    1. Click New Dashboard.

    2. Click Add Chart.

    hashtag
    ¹ In the OS Edition, Heavy Immerse is unavailable.

    hashtag
    ² The OS Edition does not require a license key.

    Release Notes

    Release notes for currently supported releases

    circle-info

    Use of HEAVY.AI is subject to the terms of the .

    hashtag

    Importing Geospatial Data

    circle-info

    If there is a potential for duplicate entries and you want to avoid loading duplicate rows, see on the Troubleshooting page.

    hashtag
    Importing Geospatial Data Using Heavy Immerse

    System Table Functions

    HEAVY.AI provides access to a set system-provided table functions, also known as tab_le-valued functions (T_VS). System table functions, like user-defined table functions, support execution of queries on both CPU and GPU over one or more SQL result-set inputs. Table function support in HEAVY.AI can be split into two broad categories: system table functions and user-defined table functions (UDTFs). System table functions are built-in to the HEAVY.AI server, while UDTFs can be declared dynamically at run-time by specifying them in , a subset of the Python language. For more information on UDTFs, see .

    To improve performance, table functions can be declared to enable filter pushdown optimization, which allows the Calcite optimizer to "push down" filters on the output(s) of a table functions to its input(s) when the inputs and outputs are declared to be semantically equivalent (for example, a longitude variable that is input and output from a table function). This can significantly increase performance in cases where only a small portion of one or more input tables is required to compute the filtered output of a table function.

    Whether system- or user-provided, table functions can execute over one or more result sets specified by subqueries, and can also take any number of additional constant literal arguments specified in the function definition. SQL subquery inputs can consist of any SQL expression (including multiple subqueries, joins, and so on) allowed by HeavyDB, and the output can be filtered, grouped by, joined, and so on like a normal SQL subquery, including being input into additional table functions by wrapping it in a

    sudo dnf -y update
    sudo reboot
    sudo dnf -y install dnf-utils curl libldap2-dev
    sudo dnf -y install java-1.8.0-openjdk-headless
    sudo useradd --user-group --create-home --groups wheel heavyai
    sudo passwd heavyai
    sudo su - heavyai
    sudo  dnf config-manager --add-repo \
    https://releases.heavy.ai/ee/yum/stable/cuda
    sudo yum-config-manager --add-repo \
    https://releases.heavy.ai/ee/yum/stable/cpu
    sudo yum-config-manager --add-repo \
    https://releases.heavy.ai/os/yum/stable/cuda
    sudo yum-config-manager --add-repo \
    https://releases.heavy.ai/os/yum/stable/cpu
    sudo dnf config-manager --save \
    --setopt="releases.heavy*.gpgkey=https://releases.heavy.ai/GPG-KEY-heavyai"
    sudo dnf -y install heavyai.x86_64
    sudo mkdir /opt/heavyai && sudo chown $USER /opt/heavyai
    curl \
    https://releases.heavy.ai/ee/tar/heavyai-ee-latest-Linux-x86_64-render.tar.gz \
    | sudo tar zxf - --strip-components=1 -C /opt/heavyai
    curl \
    https://releases.heavy.ai/ee/tar/heavyai-ee-latest-Linux-x86_64-cpu.tar.gz \
    | sudo tar zxf - --strip-components=1 -C /opt/heavyai
    curl \
    https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64.tar.gz \
    | sudo tar zxf - --strip-components=1 -C /opt/heavyai
    curl \
    https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64-cpu.tar.gz \
    | sudo tar zxf - --strip-components=1 -C /opt/heavyai
    echo "# HEAVY.AI variable and paths
    export HEAVYAI_PATH=/opt/heavyai
    export HEAVYAI_BASE=/var/lib/heavyai
    export HEAVYAI_LOG=\$HEAVYAI_BASE/storage/log
    export PATH=\$HEAVYAI_PATH/bin:$PATH" \
    >> ~/.bashrc
    source ~/.bashrc
    cd $HEAVYAI_PATH/systemd
    ./install_heavy_systemd.sh
    sudo systemctl enable heavydb --now
    sudo systemctl enable heavy_web_server --now
    sudo systemctl enable heavydb --now
    sudo dnf -y install firewalld
    sudo systemctl start firewalld
    sudo systemctl enable firewalld
    sudo systemctl status firewalld
    sudo firewall-cmd --zone=public --add-port=6273-6274/tcp --add-port=6278/tcp --permanent
    sudo firewall-cmd --reload
    cd $HEAVYAI_PATH
    sudo ./insert_sample_data --data /var/lib/heavyai/storage
    #     Enter dataset number to download, or 'q' to quit:
    Dataset           Rows    Table Name          File Name
    1)    Flights (2008)    7M      flights_2008_7M     flights_2008_7M.tar.gz
    2)    Flights (2008)    10k     flights_2008_10k    flights_2008_10k.tar.gz
    3)    NYC Tree Census (2015)    683k    nyc_trees_2015_683k    nyc_trees_2015_683k.tar.gz
    $HEAVYAI_PATH/bin/heavysql -p HyperInteractive
    SELECT origin_city AS "Origin", 
    dest_city AS "Destination", 
    AVG(airtime) AS "Average Airtime" 
    FROM flights_2008_10k WHERE distance < 175 
    GROUP BY origin_city, dest_city;
    Origin|Destination|Average Airtime
    Austin|Houston|33.055556
    Norfolk|Baltimore|36.071429
    Ft. Myers|Orlando|28.666667
    Orlando|Ft. Myers|32.583333
    Houston|Austin|29.611111
    Baltimore|Norfolk|31.714286

    TLS_ECDHE_RSA_WITH_AES_128_ GCM_SHA256

  • TLS_ECDHE_ECDSA_WITH_AES_128_ GCM_SHA256

  • TLS_ECDHE_RSA_WITH_AES_256_ GCM_SHA384

  • TLS_ECDHE_ECDSA_WITH_AES_256_ GCM_SHA384

  • TLS_ECDHE_RSA_WITH_CHACHA20_ POLY1305

  • TLS_ECDHE_ECDSA_WITH_CHACHA20_ POLY1305

  • TLS_AES_128_GCM_SHA256

  • TLS_AES_256_GCM_SHA384

  • TLS_CHACHA20_POLY1305_SHA256

  • TLS_FALLBACK_SCSV

    <code></code>

    Limit security vulnerabilities by specifying the allowed TLS ciphers in the encryption used to secure web protocol connections.

  • TLS_ECDHE_RSA_WITH_AES_256_ GCM_SHA384

  • TLS_RSA_WITH_AES_256_GCM_ SHA384

  • secp384r1 (Curve ID P384)

  • CurveP384 (Curve ID P384)

  • secp521r1 (Curve ID P521)

  • CurveP521 (Curve ID P521)

  • x25519 (Curve ID X25519)

  • X25519 (Curve ID X25519)

    Limit security vulnerabilities by specifying the allowed TLS cipher suites in the encryption used to secure web protocol connections.

  • CurveP256
    http-to-https-redirect-portarrow-up-right
    enable-https-redirectarrow-up-right
    http-to-https-redirect-portarrow-up-right
    Currently Supported Releases

    8.5.2 | 8.5.1 | 8.5.0 | 8.4.0 | 8.3.2 | 8.3.1 | 8.3.0 | 8.2.0 | 8.1.2 & 8.1.3 | 8.1.1 | 8.0.2 | 8.0.1 | 8.0.0 | 7.2.4 | 7.2.3 | 7.2.2 | 7.2.1 | 7.2.0 | 7.1.2 | 7.1.1 | 7.1.0 | 7.0.2 | 7.0.1 | 7.0.0

    For release notes for releases that are no longer supported, as well as links to documentation for those releases, see Archived Release Notes.

    circle-exclamation

    As with any software upgrade, it is important to back up your data before you upgrade HEAVY.AI. In addition, we recommend testing new releases before deploying in a production environment.

    For assistance during the upgrade process, contact HEAVY.AI Support by logging a request through the HEAVY.AI Support Portalarrow-up-right.

    hashtag
    Release 8.x.x

    circle-exclamation

    IMPORTANT - In HeavyDB Release 8.x.x, the system catalog is automatically migrated to support the new Column Level Security feature. This migration is necessary regardless if you intend to use this feature or not. Once the system catalog has been migrated in this manner, it is not backwards-compatible with earlier versions of HeavyDB. If you revert to an earlier version in this state, the system will be in an unstable state and manual intervention will be required. We recommend backing up your data before you upgrade HEAVY.AI.

    circle-exclamation

    8.x.x introduces a new licensing version that feature two new types of licenses: Node Locked Licenses and Floating Licenses. Enterprise customers that are upgrading to the 8.x release of HEAVY.AI are required to reach out to the HEAVY.AI Customer Success team for a new license before attempting to upgrade.

    hashtag
    Release 8.5.2

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a race condition that could occur when concurrently accessing the memory_summary and memory_details system tables.

    • Fixes an intermittent crash that could occur for join queries with row-level security enabled.

    • Fixes a crash that could occur for certain query patterns involving the IN operator and subqueries with text projections.

    • Fixes a crash that could occur when the APPROX_PERCENTILE function is called with invalid arguments.

    hashtag
    Heavy Render - New Features and Improvements

    • Improved logging around renderer device initialization and clean up

    hashtag
    Heavy Render - Fixed Issues

    • Fix for intermittent server crash when restarting the renderer after a CPU out-of-memory event

    hashtag
    HeavyImmerse - Fixed Issues

    • Combo chart with group by dimension displaying incorrect values in legend

    hashtag
    HeavyIQ / HeavyLM Fixed Issues

    • Fix chromadb index creation bug and added support for importing faiss module on master process

    hashtag
    Release 8.5.1

    hashtag
    HeavyImmerse - Fixed Issues

    • Crossfiltering a chart with a non-joined data source did not refresh a chart with a joined datasource

    • Adjusting chart-level filters via the notch menu on a dashboard caused the chart to crash

    hashtag
    Release 8.5.0

    hashtag
    HeavyImmerse - New Features and Improvements

    • Add ruler control to raster map charts and Esc key to close ruler control

    • Color all values - Colors an unlimited number of categorical values in raster charts using a hash function. Defaults to on, but can be turned off to access legacy coloring mode (coloring by top k)

    • Column names as export headers - Updates export logic to use column names as headers in the export file instead of names like measure0, dimension1; for the following charts: table, combo, pointmap, scatterchart, bubble chart, pie chart, heat

    • Scatter chart export - Implements data export for scatter plots and uses column names as export headers

    • Log scale axis - Implements log scale option for combo and scatter chart Y-axis

    • Infinite scrolling legend - Only active when “Color All Values” is on, allows the user to continually scroll the legend, automatically loading more values at the end of the scroll.

    hashtag
    HeavyImmerse - Fixed Issues

    • Parameter change (in custom SQL filter) not updating table chart

    hashtag
    Heavy Render - New Features and Improvements

    • Improve logged statistics whenever rendering errors occur

    hashtag
    Release 8.4.0

    hashtag
    General

    • ARM support (including NVIDIA Grace Hopper) - Docker install only

    • RHEL/Rocky 8 support - bare metal install only

    hashtag
    HeavyDB - New Features and Improvements

    • Adds new Uber H3 functions and removes existing deprecated functions

    • Change how the GEOs library path is specified for use by some geospatial functions

    • Update CPU join hash tables to use the "cpu-buffer-mem-bytes" configured CPU memory buffer pool for memory allocations

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur when a subquery within an IN clause references a non-existent column

    • Fixes a race condition that could occur when disk level caching is enabled for non-foreign tables

    hashtag
    Release 8.3.2

    hashtag
    HeavyImmerse - New Features and Improvements

    • RTL Text support in mapbox basemap labels

    hashtag
    HeavyImmerse - Fixed Issues

    • Fix for shows the palette selector in combo chart when “# Records” is selected as a color measure

    hashtag
    Release 8.3.1

    hashtag
    HeavyImmerse - Fixed Issues

    • Fixes a crash when loading older dashboards, or instances and dashboards with customized/removed color palettes with new color consistency feature.

    • Fix for render error on multilayer raster charts that use new color mapping features.

    • Fix for render error when switching chart types while using new color mapping features.

    • Fix for bubble chart crash when using numeric dimension.

    • Fix for legends not showing in combo chart for continuous measures with no color measure selected.

    hashtag
    Release 8.3.0

    hashtag
    HeavyDB - New Features and Improvements

    • Optimizes concurrent access and caching of table data during query execution.

    • Optimizes string dictionary memory allocations

    • Adds a validation to ensure that raster import/HeavyConnect with floating point coordinate types are either world or file space transformed.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur when attempting to import or HeavyConnect to Apache Spark generated empty Parquet files.

    • Fixes a crash that could occur when attempting to import or HeavyConnect to Parquet files containing Null columns.

    • Fixes a potential race condition that could occur when the Executor Resource Manager is enabled.

    • Fixes an exception that could occur for log based system tables when the log directory contains unexpected files.

    hashtag
    HeavyImmerse - New Features and Improvements

    • Support for customizable color mappings across charts, including Combo, Pointmap, Linemap, Choropleth, Scatterplot, Pie, and Bubble, enabling consistent and tailored color schemes for enhanced visual control. Users can create, import, manage, and apply mappings, with options to save changes, delete mappings, or reset to default hash coloring as needed.

    • Uses Hash Coloring to ensure consistent color assignment for the same column values across all charts and dashboards.

    • Unified, default Categorical Colors palette is now used across all supported charts, ensuring out-of-the-box consistency in visualizations.

    • The previous "Color Set 2" palette has been integrated into the Categorical Colors palette set to streamline options and reduce confusion.

    hashtag
    Release 8.2.0

    hashtag
    HeavyDB - New Features and Improvements

    • ARM support

    • Security updates and fixes

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur when executing queries against ODBC backed foreign tables with negative decimal column values.

    hashtag
    HeavyImmerse - Fixed Issues

    • Updated web server dependencies to improve security

    hashtag
    Release 8.1.2 (RHEL) & 8.1.3 (Ubuntu)

    hashtag
    HeavyDB - New Features and Improvements

    • Improves memory utilization for queries with APPROX_COUNT_DISTINCT function calls

    • Significantly faster columnarization of "lazy fetched" result set data for multi-step queries

    • Significantly faster string operations, particularly for high-cardinality string inputs (3X+ speedups in some cases)

    hashtag
    HeavyDB - Fixed Issues

    • Fixes an error where users without column level privileges could not access views

    • Fixes a crash that could occur for queries that order by none encoded text expressions

    • Fixes an issue where wrong results could be returned for repeated queries with joins on encoded text column expressions

    • Fixes an error that could occur when large integer values are used in row level security policies

    • Fixes a CUDA 700 crash that could occur for left join queries with geospatial function predicates

    hashtag
    HeavyImmerse - Fixed Issues

    • Fix side panel overlay issue for time-picking filter.

    hashtag
    HeavyIQ / HeavyLM - New Features and Improvements

    • Allow support for Guidance without an embedding server.

    • Support FAISS as an alternative to ChromaDB for embeddings.

    hashtag
    HeavyIQ / HeavyLM - New Features and Improvements

    • Various minor improvements and enhancements

    hashtag
    Release 8.1.1

    hashtag
    HeavyDB - New Features and Improvements

    • Improves memory management for import requests

    • Improves performance of group by queries through expanded use of shared memory

    • Improves performance of low cardinality group by queries

    • Improves performance of queries with sort on encoded string column expressions

    • Improves performance of result set reductions

    • Optimizes memory utilization and performance for sort queries

    • Adds more instrumentation around memory utilization

    • Adds support for ST_Centroid function calls with MULTILINESTRING column type arguments

    hashtag
    HeavyDB - Fixed Issues

    • Fixes an issue where S3 imports would not roll back correctly in certain error cases

    • Fixes an error where repetition of certain geo join queries could result in excessive memory utilization

    hashtag
    HeavyImmerse - New Features and Improvements

    • Make Dashboard configuration panel resizeable.

    • Add UI customization options for Cloud Edition.

    hashtag
    HeavyImmerse - Fixed Issues

    • Fix scrolling bug with Guidance Snippet list.

    • Improve handling for loading a database with no tables.

    • Fix map points display issue on first render.

    hashtag
    Heavy IQ / HeavyLM - New Features and Improvements

    • Add support for a vLLM Embedding server.

    hashtag
    HeavyIQ / HeavyLM - Fixed Issues

    • Fix ChromaDB configuration issues.

    hashtag
    Release 8.1.0

    hashtag
    HeavyDB - New Features and Improvements

    • Added LLM_TRANSFORM operator, allowing access to large language model inference within SQL

    • Adds support for bounding box clipping when importing geospatial files

    • Retries estimation queries on CPU if the initial query execution fails on GPU due to out of memory errors

    • Improves the error message that is logged when table data file reads result in errors

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur when Parquet backed foreign tables reference incompatible files

    • Fixes an issue where spatial transforms were not applied correctly for some raster files containing sub-datasets

    • Fixes an issue where some string functions would result in error responses when used in predicates and update and delete queries

    hashtag
    HeavyRender - New Features and Improvements

    • Significantly reduced GPU render memory usage and improved performance when rendering complex polygon data

    • Internal restructuring laying groundwork for improved concurrency between HeavyDB and Heavy Render

    hashtag
    HeavyRender - Fixed Issues

    • Fixed an issue where some GPUs would fail to identify a useful GPU memory heap reporting 0 available memory

    • Fixed a crash that could occur if a table involved in a query is removed between a render and a later hit-test query (mouse rollover in Immerse)

    • Fixed a crash that could occur during renderer shutdown and restart if an out of GPU memory error occurs during certain buffer allocation operations

    hashtag
    HeavyImmerse - New Features and Improvements

    • Manage Table and Column Comments in Data Manager

    • Immerse theming control panel

    • Add alternative geocoder support using custom location list from a CSV

    • Add support for custom AWS Regions

    hashtag
    HeavyImmerse - Fixed Issues

    • Fix layer ordering label change after layer switching

    • Fixed the addition of the last categorical color not being used in color by measure

    • Make SQL Notebook line charts use GMT instead of local timezone

    hashtag
    HeavyIQ / HeavyLM - New Features and Improvements

    • Added Column Preview summarizing for columns in SQL Notebook source list.

    • New Log File viewing for IQ Startup and IQ Build log tiles

    • Improvements to SQL Notebook UI

    • BETA: Added the new Guidance feature in the HeavyIQ SQL Notebook, powered by HeavyLM's advanced LLM service, provides contextual recommendations and real-time insights for optimized query generation and analysis.

    • Improved the SQL tool with optimized top-k calculations for string columns, adjusted default settings for cardinality thresholds, and added support for multiline comments in schema caching.

    hashtag
    Release 8.0.2

    hashtag
    HeavyIQ / HeavyLM - New Features and Improvements

    • Improve runtime support for Python versions > 3.10.

    hashtag
    HeavyIQ / HeavyLM - Fixed Issues

    • Various performance improvements and fixes.

    hashtag
    Release 8.0.1

    hashtag
    HeavyDB - New Features and Improvements

    • Relaxes Coordinate Reference System validations that occur on geospatial data import.

    • Adds support for a DOT_PRODUCT function with array parameters, which can be used to compute the similarity between arrays (vectors)

    hashtag
    HeavyDB - Fixed Issues

    • Fixes an issue where cross database queries could result in use of incorrect string dictionaries.

    • Fixes an issue where Host IDs could not be generated on OCI instances.

    • Fixes an issue where database switching resulted in errors on certain server configurations.

    • Fixes a CUDA 700 crash that could occur due to incorrect internal caching.

    • Fixes a parsing error that could occur when projecting array literals with the same element data types.

    • Fixes an issue where updates via subqueries with non unique rows could result in the wrong error message.

    • Fixes a crash that could occur when using the CARDINALITY function in certain cases.

    • Fixes a crash that could occur for certain query patterns involving casts to a REAL data type.

    • Fixes a crash that could occur for queries with certain case expression patterns involving DATE data types.

    • Fixes a crash that could occur in certain cases where resultset caching is enabled.

    • Fixes a crash that could occur for queries that specify LIMIT clauses with floating point values.

    • Fixes an issue where sql_validate requests with queries containing WIDTH_BUCKET function calls with subquery arguments can result in an error.

    • Fixes a crash that could occur for certain query patterns with subqueries that include LIMIT and/or OFFSET clauses.

    hashtag
    HeavyRender - Fixed Issues

    • Fix to return null hit-test result if not found.

    • Fixed an 8.0.0 issue with render gpu selection that would trigger a crash when using the start-gpu program option

    • Fixed a rare 8.0.0 issue with free GPU memory detection on systems with resizable BAR enabled

    hashtag
    Heavy Immerse - New Features and Improvements

    • Enable HeavyIQ SQL Notebook for Free Edition.

    hashtag
    Heavy Immerse - Fixed Issues

    • Fix to show “Column not found” error text in a raster chart pop-up instead of crashing chart when hit-test column is not found.

    • Fix for crossfiltering of Number chart when filtering on a Combo chart with a custom SQL base dimension.

    • Change feature flag from “ui/hide_charts_headers” to “ui/hide_text_chart_headers” and only hide headers on Text charts.

    • Fix to set priority colors in Pointmap charts.

    • Update categorical legend domain on map chart zoom.

    • Fixes regression with prioritized color layers.

    hashtag
    HeavyIQ / HeavyLM - New Features and Improvements

    • Enable HeavyIQ for Free Edition.

    • Improved caching and processing of table and column metadata.

    hashtag
    Release 8.0.0

    circle-exclamation

    8.0.0 introduces a new licensing version that feature two new types of licenses: Node Locked Licenses and Floating Licenses. Enterprise customers that are upgrading to the 8.0.0 release of HEAVY.AI are required to reach out to the HEAVY.AI Customer Success team for a new license before attempting to upgrade.

    hashtag
    HeavyDB - New Features and Improvements

    • SELECT privileges are now required in order to execute UPDATE or DELETE commands.

    • Adds a new "columns" system table that contains information about all table columns across all databases.

    • Adds support for foreign tables that are backed by raster files.

    • Adds a new raster import mechanism that stores raster data in a tiled format. This can be enabled by setting the "enable-legacy-raster-import" HeavyDB server configuration parameter to false.

    • Adds support for column level SELECT privileges.

    • Adds support for comments on tables.

    • Adds support for comments on columns.

    • Adds support for null value filtering on raster import.

    • Adds MULTIPOLYGON to MULTIPOLYGON ST_Contains function support.

    • Improves performance for certain query patterns with BETWEEN predicate clauses.

    • Enables the "use-cpu-mem-pool-for-output-buffers" HeavyDB server configuration parameter by default.

    • Updates the default value for the "ndv-group-estimator-multiplier" HeavyDB server configuration parameter.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur in the load_table_binary_columnar Thrift API when request payloads are malformed.

    • Fixes a crash that could occur in the detect_column_types Thrift API when an uneven number of levels across columns are read for Parquet files.

    • Fixes a crash that could occur in certain cases where Parquet files are imported from S3 with debug timers enabled.

    • Fixes a race condition that could occur with foreign table scheduled refreshes.

    • Fixes a crash that could occur when the ST_NPoints function is called with literal arguments.

    • Fixes a hang that could occur when a "clear CPU" or "clear GPU" request is made after the Executor Resource Manager returns an error response for a previous query.

    • Fixes an issue where certain sorted join query patterns could return wrong results.

    • Fixes an intermittent "Ran out of slots in the query output buffer" error that could occur due to incorrect reuse of the cardinality cache for similar queries.

    • Fixes a crash that could occur in certain cases where the null value results from the ST_Centroid function are passed in as arguments to other ST functions.

    • Fixes use cases where the Executor Resource Manager did not account for result set buffers that are allocated through the CPU buffer pool.

    • Fixes an issue where the use of the approx_count_distinct function could cause a slowdown for certain query patterns in distributed configurations.

    • Fixes an issue where wrong results could be returned due to an overflow when dividing two decimals.

    • Fixes a crash that could occur for certain query patterns with Common Table Expressions that project point columns.

    • Fixes a crash that could occur for queries that specify LIMIT clauses with floating point values.

    • Fixes a crash that could occur for certain cases where the output of an ST function is passed in as an argument to the ST_Contains function.

    • Fixes an issue where the ST_Distance function can return wrong results for cases where a polygon encloses another polygon.

    hashtag
    Heavy Render - New Features and Improvements

    • Shader compiler optimizations.

    • Increased default res-gpu-mem value from 385Mb to 768Mb.

    hashtag
    Heavy Render - Fixed Issues

    • Fixed shader compiler error when combining symbol rotation and accumulation.

    • Catch failed render result buffer allocations and throw an out of GPU memory error.

    • Fixed an issue that could cause a Cuda Error 700.

    • Fixes a crash that could occur for render requests containing certain sort query patterns.

    hashtag
    Heavy Immerse - New Features and Improvements

    • Introducing the SQL Notebook in Immerse, which in addition to supporting a notebook view of query history, allows inline visualization of query results using bar charts, line charts, scatterplots, heatmaps, choropleths, and pointmaps. The SQL notebook also integrates directly with the HeavyIQ conversational analytics module, allowing you to ask questions of your data in natural language, returning visualizations, natural language answers, and tabular results along with the generated SQL. The new SQL notebook is off by default, but it can be enabled by setting "dev/enable_notebook_ui_sql_editor": true in the servers.json file.

    • Viridis, Turbo, and Plasma added as new color palettes.

    hashtag
    Heavy Immerse - Fixed Issues

    • Fixes dropping of filter clauses from SQL on exiting dash panel edit mode.

    • Fixes cross-filtering on Number chart when filtering on a Vega Combo chart with a custom SQL base dimension.

    • Fixed regression for enabling feature flag “ui/enable_chart_filter_view".

    • Fixed dashboard loading error when switching tabs with a HeavyDB error present.

    • Fixed joined table referenced in a size measure on the Line Map chart.

    • Fixed issue with non-integer operands in modulo operation error for Contour chart.

    • Fixed issue with color range display of filtered values for the Contour chart.

    • Fixed code generation for order of evaluation in map charts using SAMPLE_RATIO.

    • Fixed issue with Pointmap rendering of filtered data.

    • Fixed various minor display and styling issue.

    hashtag
    HeavyIQ Conversational Analytics

    • HeavyIQ Conversational Analytics allows users to ask questions of their data using natural language by leveraging a custom Large Language Model (LLM) that has been fine-tuned on over 60,000+ training pairs to provide state-of-the-art performance on text-to-sql and other data analytics tasks. HeavyIQ requires no external API calls and guarantees data privacy by virtue of being a fully offline model.

    • The primary interface point for the new HeavyIQ conversational analytics capabilities are via the new SQL editor (see above), but it can also be utilized via API if desired.

    hashtag
    Release 7.x.x - Important Information

    circle-exclamation

    IMPORTANT - In HeavyDB Release 7.x.x, the “render groups” mechanism, part of the previous implementation of polygon rendering, has been removed. When you upgrade to HeavyDB Release 7.x.x, all existing tables that have a POLYGON or MULTIPOLYGON geo column are automatically migrated to remove a hidden column containing "render groups" metadata.

    This operation is performed on all tables in all catalogs at first startup, and the results are recorded in the INFO log.

    Once a table has been migrated in this manner, it is not backwards-compatible with earlier versions of HeavyDB. If you revert to an earlier version, the table may appear to have missing columns and behavior will be undefined. Attempting to query or render the POLYGON or MULTIPOLYGON data with the earlier version may fail or cause a server crash.

    As always, HEAVY.AI strongly recommends that all databases be backed up, or at the very least, dumps are made of tables with POLYGON or MULTIPOLYGON columns using the existing HeavyDB version, before upgrading to HeavyDB Release 7.x.x.

    Dumps of POLYGON and MULTIPOLYGON tables made with earlier versions can still be restored into HeavyDB Release 7.x.x. The superfluous metadata is automatically discarded. However, dumps of POLYGON and MULTIPOLYGON tables made with HeavyDB Release 7.x.x are not backwards-compatible with earlier versions.

    This applies only to tables with POLYGON or MULTIPOLYGON columns. Tables that contain other geo column types (POINT, LINESTRING, etc.), or only non-geo column types, do not require migration and remain backwards-compatible with earlier relea

    circle-info

    For Ubuntu installations, install libncurses5 with the following command:

    sudo apt install libncurses5

    hashtag
    Release 7.2.4 - March 20, 2024

    hashtag
    HeavyDB - Fixed Issues

    • Adds a new option for enabling or disabling the use of virtual addressing when accessing an S3 compatible endpoint for import or HeavyConnect.

    • Improves logging related to system locks.

    hashtag
    Heavy Immerse - Fixed Issues

    • Fixes issue with SAML authentication.

    hashtag
    Release 7.2.3 - February 5, 2024

    hashtag
    HeavyDB - New Features and Improvements

    • Improves performance of foreign tables that are backed by Parquet files in AWS S3.

    • Improves logging related to GPU memory allocations and data transfers.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur for certain query patterns with intermediate geometry projections.

    • Fixes a crash that could occur for certain query patterns containing IN operators with string function operands.

    • Fixes a crash that could occur for equi join queries that use functions as operands.

    • Fixes an intermittent error that could occur in distributed configurations when executing count distinct queries.

    • Fixes an issue where certain query patterns with LIMIT and OFFSET clauses could return wrong results.

    • Fixes a crash that could occur for certain query patterns with left joins on Common Table Expressions.

    • Fixes a crash that could occur for certain queries with window functions containing repeated window frames.

    hashtag
    Heavy Render - Fixed Issues

    • Fix several crashes that could occur during out-of-gpu memory error recovery

    hashtag
    Heavy Immerse - Fixed Issues

    • Fixed dashboard load error when switching tabs.

    • Fixed table reference in size measure of a client-side join data source for linemap chart.

    • Fixed client-side join name reference.

    hashtag
    Release 7.2.2 - December 15, 2023

    hashtag
    HeavyDB - New Features and Improvements

    • Adds support for output/result set buffer allocations via the "cpu-buffer-mem-bytes" configured CPU memory buffer pool. This feature can be enabled using the "use-cpu-mem-pool-for-output-buffers" server configuration parameter.

    • Adds a "ndv-group-estimator-multiplier" server configuration parameter that determines how the number of unique groups are estimated for specific query patterns.

    • Adds "default-cpu-slab-size" and "default-gpu-slab-size" server configuration parameters that are used to determine the default slab allocation size. The default size was previously based on the "max-cpu-slab-size" and "max-gpu-slab-size" configuration parameters.

    • Improves memory utilization when querying the "dashboards" system table.

    • Improves memory utilization in certain cases where queries are retried on CPU.

    • Improves error messages that are returned for some unsupported correlated subquery use cases.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes an issue where allocations could go beyond the configured "cpu-buffer-mem-bytes" value when fetching table chunks.

    • Fixes a crash that could occur when executing concurrent sort queries.

    • Fixes a crash that could occur when invalid geometry literals are passed to ST functions.

    hashtag
    Heavy Immerse - Fixed Issues

    Fix for rendering a gauge chart using a parameterized source (join sources, custom sources).

    hashtag

    hashtag
    Release 7.2.1 - December 4, 2023

    hashtag
    HeavyDB - New Features and Improvements

    • Improves instrumentation around Parquet import and HeavyConnect.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur for join queries that result in many bounding box overlaps.

    • Fixes a crash that could occur in certain cases for queries containing an IN operator with a subquery parameter.

    • Fixes an issue where the ST_POINTN function could return wrong results when called with negative indexes.

    • Fixes an issue where a hang could occur while parsing a complex query.

    hashtag
    Heavy Render - Fixed Issues

    • Fixed error when setting render-mem-bytes greater than 4gb.

    hashtag
    Heavy Immerse - Fixed Issues

    • Clamp contour interval size on the Contour Chart to prevent a modulo operation error.

    • Filter outlier values in the Contour Chart that skew color range.

    • Fixed sample ratio query ordering to address a pointmap rendering issue.

    • Fixed layer naming in the Hide Layer menu.

    hashtag
    Release 7.2.0 - November 16, 2023

    hashtag
    HeavyDB - New Features and Improvements

    • Adds support for URL_ENCODE, URL_DECODE, REGEXP_COUNT, and HASH string functions.

    • Enables log based system tables by default.

    • Adds support for log based system tables auto refresh behind a flag (Beta).

    • Improves the pre-flight query row count estimation process for projection queries without filters.

    • Improves the performance of the LIKE operator.

    hashtag
    HeavyDB - Fixed Issues

    hashtag
    General

    • Fixes errors that could occur when the REPLACE clause is applied to SQL DDL commands that do not support it.

    • Fixes an issue where the HeavyDB startup script could ignore command line arguments in certain cases.

    • Fixes a crash that could occur when requests were made to the detect_column_types API for Parquet files containing list columns.

    • Fixes a crash that could occur in heavysql when the \detect command is executed for Parquet files containing string list columns.

    • Fixes a crash that could occur when attempting to cast to text column types in SELECT queries.

    • Fixes a crash that could occur in certain cases where window functions were called with literal arguments.

    • Fixes a crash that could occur when executing the ENCODE_TEXT function on NULL values.

    • Fixes an issue where queries involving temporary tables could return wrong results due to incorrect cache invalidation.

    hashtag
    Geo

    • Fixes an issue where the ST_Distance function could return wrong results when at least one of its arguments is NULL.

    • Fixes an issue where the ST_Point function could return wrong results when the "y" argument is NULL.

    • Fixes an issue where the ST_NPoints function could return wrong results for NULL geometries.

    • Fixes a crash that could occur when the ST_PointN function is called with out-of-bounds index values.

    • Fixes an issue where the ST_Intersects and ST_Contains functions could incorrectly result in loop joins based on table order.

    • Fixes an issue where the ST_Transform function could return wrong results for NULL geometries.

    • Fixes an error that could occur for tables with polygon columns created from the output of user-defined table functions.

    hashtag
    Heavy Immerse - New Features and Improvements

    • [Beta] Geo Joins - Immerse now supports “contains” and “intersects” conditions for common geometry combinations when creating a join datasource in the no code join editor.

    • Join datasource crossfilter support: Charts that use single table data sources will now crossfilter and be crossfiltered by charts that use join data sources.

    • Layer Drawer - In layered map charts immerse now has a quick to access Layer Drawer, which allows for layer toggling, reordering, renaming, opacity, zoom visibility controls.

    • Zoom to filters - Map charts in immerse now support “zoom to filters” functionality, either on an individual chart layer (via the Layer Drawer) or on the whole chart.

    • Image support in map rollovers - URLs pointing to images will automatically be rendered as a scaled image, with clickthrough support to the full size image.

    hashtag
    Heavy Immerse - Fixed Issues

    • Choropleth/Line Map join datasource support - Significantly improves performance in Choropleth and Line Map charts when using join data sources. Auto aggregates measures on geometry.

    • Fixes issue where sql editor will horizontally scroll with long query strings

    hashtag
    Release 7.1.2 - October 4, 2023

    hashtag
    HeavyDB - New Features and Improvements

    • Improves how memory is allocated for the APPROX_MEDIAN aggregate function.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur when the DISTINCT qualifier is specified for aggregate functions that do not support the distinct operation.

    • Fixes an issue where wrong results could be returned for queries with window functions that return null values.

    • Fixes a crash that could occur in certain cases where queries have multiple aggregate functions.

    • Fixes a crash that could occur when tables are created with invalid options.

    • Fixes a potential data race that could occur when logging cache sizes.

    hashtag
    Release 7.1.1 - September 15, 2023

    hashtag
    HeavyDB - New Features and Improvements

    • Adds an EXPLAIN CALCITE DETAILED command that displays more details about referenced columns in the query plan.

    • Improved logging around system memory utilization for each query.

    • Adds an option to SQLImporter for disabling logging of connection strings.

    • Adds a "gpu-code-cache-max-size-in-bytes" server configuration parameter for limiting the amount of memory that can be used by the GPU code cache.

    • Improves column name representation in Parquet validation error messages.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a parser error that could occur for queries containing a NOT ILIKE clause.

    • Fixes a multiplication overflow error that could occur when retrying queries on CPU.

    • Fixes an issue where table dumps do not preserve quoted column names.

    • Fixes a "cannot start a transaction within a transaction" error that could occur in certain cases.

    • Fixes a crash that could occur for certain query patterns involving division by COUNT aggregation functions.

    • Removes a warning that is displayed on server startup when HeavyIQ is not configured.

    • Removes spurious warnings for CURSOR type checks when there are both cursor and scalar overloads for a user-defined table function.

    hashtag
    Heavy Render - New Features and Improvements

    • Adds hit testing support for custom measures that reference multiple tables.

    hashtag
    Heavy Immerse - Fixed Issues

    • Fixes SAML authentication regression in 7.1.0

    • Fixes chart export regression in 7.1.0

    hashtag
    Release 7.1.0 - August 22, 2023

    hashtag
    HeavyDB - New Features and Improvements

    hashtag
    Geospatial

    • Exposes new geo overlaps function ST_INTERSECTSBOX for very fast bounding box intersection detections.

    • Adds support for the max_reject COPY FROM option when importing raster files. This ensures that imports from large multi-file raster datasets continue after minor errors, but provides adjustable notification upon major ones.

    • Adds a new ST_AsBinary (also aliased as ST_AsWKB) function that returns the Well-Known Binary (WKB) representation of geometry values. This highly-efficient format is used by postGIS newer versions of Geopandas.

    • Adds a new ST_AsText (also aliased AS ST_AsWKT) function that returns the Well-Known Text (WKT) representation of geometry values. This is less efficient than WKB but compatible even with nonspatial databases.

    • Adds support for loading geometry values using the load_table_binary_arrow Thrift API.

    • New version of HeavyAI python library with direct Geopandas support.

    • New version of rbc-project with geo column support allowing extensions which input or output any geometric type.

    hashtag
    Core SQL

    • New JAROWINKLER_SIMILARITY string operator for fuzzy matching between string columns and values. This is a case insensitive measure including edit transitions and (slightly) sensitive to white space.

    • New LEVENSHTEIN_DISTANCE string operator for fuzzy matching between string columns and values. This is case insensitive and represents the number of edits needed to make two strings identical. An “edit” is defined by either an insertion of a character, a deletion of a character, or a replacement of a character.

    • Extends the ALTER COLUMN TYPE command to support string dictionary encoding size reduction.

    • Improves the error message returned when out of bound values are inserted into FLOAT and DOUBLE columns.

    • Adds a "watchdog-max-projected-rows-per-device" server configuration parameter and query hint that determines the maximum number of rows that can be projected by each GPU and CPU device.

    • Adds a "preflight-count-query-threshold" server configuration parameter and query hint that determines the threshold at which the preflight count query optimization should be executed.

    • Optimizes memory utilization for projection queries on instances with multiple GPUs.

    hashtag
    Predictive Modeling with HeavyML

    • Support for PCA models and PCA_PROJECT operator.

    • Support SHOW MODEL FEATURE DETAILS to show per-feature info for models, including regression coefficients and variable importance scores, if applicable.

    • Support for TRAIN_FRACTION option to specify proportion of the input data to a CREATE MODEL statement that should be trained on.

    • Support creation of models with only categorical predictors.

    • Enable categorical and numeric predictors to be specified in any order for CREATE MODEL statements and subsequent inference operations.

    • Enable Torch table functions (requires client to specify libtorch.so).

    • Add tf_torch_raster_object_detect for raster object detections (requires client to specify libtorch.so and provide trained model in torchscript format).

    hashtag
    Extensions Framework

    • Allow Array literals as arguments to scalar UDFs

    • Support table function (UDTF) output row sizes up to 16 trillion rows

    • Adds support for Column<TextEncodingNone> and ColumnList<TextEncodingNone> table function inputs and outputs.

    hashtag
    Performance Optimizations

    • SQL projections now are sized per GPU/CPU core instead of globally, meaning that projections are more memory efficient as a function of the number of GPUs/CPU threads used for a query. In particular, this means that various forms of in-situ rendering, for example, non-grouped pointmaps, renders can scale to N GPUs more points or use N GPUs less memory, depending on the configuration.

    • Better parallelize construction of metadata for subquery results for improved performance

    • Enables result set caching for queries with LIMIT clauses.

    • Enables the bounding box intersection optimization for certain spatial join operators and geometry types by default.

    hashtag
    HeavyDB - Fixed Issues

    • Fix potential crash when concatenating strings with the output of a UDF.

    • Fixes an issue where deleted rows with malformed data can prevent ALTER COLUMN TYPE command execution.

    • Fixes an error that could occur when parsing odbcinst.ini configuration files containing only one installed driver entry.

    • Fixes a table data corruption issue that could occur when the server crashes multiple times while executing write queries.

    • Fixes a crash that could occur when attempting to do a union of a string dictionary encoded text column and a none encoded text column.

    • Fixes a crash that could occur when the output of a table function is used as an argument to the strtok_to_array function.

    • Fixes a crash that could occur for queries involving projections of both geometry columns and geometry function expressions.

    • Fixes an issue where wrong results could be returned when the output of the DATE_TRUNC function is used as an argument to the count distinct function.

    • Fixes an issue where an error occurs if the COUNT_IF function is used in an arithmetic expression.

    • Fixes a crash that could occur when the WIDTH_BUCKET function is called with decimal columns.

    • Fixes an issue where the WIDTH_BUCKET function could return wrong results when called with decimal values close to the upper and lower boundary values.

    • Fixes a crash that could occur for queries with redundant projection steps in the query plan.

    hashtag
    Heavy Render - Fixed Issues

    • Fixes a crash that could occur on multi-gpu systems while handling an out of GPU memory error.

    hashtag
    Heavy Immerse - New Features and Improvements

    • Zoom to filters, setting map bounding box to extent of current filter set.

    • Image preview in map chart popups where image URLs are present.

    hashtag
    Heavy Immerse - Fixed Issues

    • Fixed error thrown by choropleth chart on polygon hover.

    hashtag
    Release 7.0.2 - June 28, 2023

    hashtag
    HeavyDB - New Features and Improvements

    • Adds support for nested window function expressions.

    • Adds support for exception propagation from table functions.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur when accessing 8-bit or 16-bit string dictionary encoded text columns on ODBC backed foreign tables.

    • Fixes unexpected GPU execution and memory allocations that could occur when executing sort queries with the CPU mode query hint.

    • Fixes an issue that could occur when inserting empty strings for geometry columns.

    • Fixes an issue that could occur when out of bounds fragment sizes are specified on table creation.

    • Fixes an issue where system dashboards could contain unexpected cached data.

    • Fixes a crash that could occur when executing aggregate functions over the result of join operations on scalar subqueries.

    • Fixes a server hang that could occur when GPU code compilation errors occur for user-defined table functions.

    • Fixes a data race that could occur when logging query plan cache size.

    hashtag
    Heavy Render - New Features and Improvements

    • Add support for rendering 1D “terrain” cross-section overlays.

    • Rewrite 2D cross-section mesh generation as a table function.

    • Further improvements to system state logging when a render out of memory error occurs, and move it to the ERROR log for guaranteed visibility.

    • Enable auto-clear-render-mem by default for any render-vega call taking < 10 seconds.

    hashtag
    Heavy Render - Fixed Issues

    • Render requests with 0 width or height could lead to a CHECK failure in encodePNG. Invalid image sizes now throw a non-fatal error during vega parsing.

    hashtag
    Heavy Immerse - New Features and Improvements

    • Visualize terrain at the base of atmospheric cross sections in the Cross Section chart with the new Base Terrain chart layer type.

    hashtag
    Heavy Immerse - Fixed Issues

    • Fixed local timezone issue with Chart Animation using cross filter replay.

    hashtag
    Release 7.0.1 - June 8, 2023

    hashtag
    HeavyDB - New Features and Improvements

    • Improves instrumentation around CPU and GPU memory utilization and certain crash scenarios.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes a crash that could occur for GPU executed join queries on dictionary encoded text columns with NULL values.

    hashtag
    Heavy Render - New Features and Improvements

    • Improve instrumentation and logging related to gpu memory utilization, particularly with polygon rendering, as well as command timeout issues

    hashtag
    Heavy Render - Fixed Issues

    • Fix a potential segfault when a Vulkan device lost error occurs

    hashtag
    Release 7.0.0 - May 1, 2023

    hashtag
    HeavyDB - New Features and Improvements

    circle-exclamation

    IMPORTANT - In HeavyDB Release 7.0, the “render groups” mechanism, part of the previous implementation of polygon rendering, has been removed. When you upgrade to HeavyDB Release 7.0, all existing tables that have a POLYGON or MULTIPOLYGON geo column are automatically migrated to remove a hidden column containing "render groups" metadata.

    This operation is performed on all tables in all catalogs at first startup, and the results are recorded in the INFO log.

    Once a table has been migrated in this manner, it is not backwards-compatible with earlier versions of HeavyDB. If you revert to an earlier version, the table may appear to have missing columns and behavior will be undefined. Attempting to query or render the POLYGON or MULTIPOLYGON data with the earlier version may fail or cause a server crash.

    As always, HEAVY.AI strongly recommends that all databases be backed up, or at the very least, dumps are made of tables with POLYGON or MULTIPOLYGON columns using the existing HeavyDB version, before upgrading to HeavyDB Release 7.0.

    Dumps of POLYGON and MULTIPOLYGON tables made with earlier versions can still be restored into HeavyDB Release 7.0. The superfluous metadata is automatically discarded. However, dumps of POLYGON and MULTIPOLYGON tables made with HeavyDB Release 7.0 are not backwards-compatible with earlier versions.

    This applies only to tables with POLYGON or MULTIPOLYGON columns. Tables that contain other geo column types (POINT, LINESTRING, etc.), or only non-geo column types, do not require migration and remain backwards-compatible with earlier releases.

    circle-info

    For Ubuntu installations, install libncurses5 with the following command:

    sudo apt install libncurses5

    • Adds new Executor Resource Manager enabling parallel CPU and CPU-GPU query execution, and support for CPU execution on data inputs larger than fit in memory.

    • Adds HeavyML, a suite of machine learning capabilities accessible directly in SQL, including support for linear regression, random forest, gradient boosted trees, and decision tree regression models, and KMeans and DBScan clustering methods. (BETA)

    • Adds HeavyConnect support for MULTIPOINT and MULTILINESTRING columns.

    • Adds ALTER COLUMN TYPE support for text columns.

    • Adds a REASSIGN ALL OWNED command that allows for object ownership change across all databases.

    • Adds an option for validating POLYGON and MULTIPOLYGON columns when importing using the COPY FROM command or when using HeavyConnect.

    • Adds support for CONDITIONAL_CHANGE_EVENT window function.

    • Adds support for automatic casting of table function CURSOR arguments.

    • Adds support for Column<GeoMultiPolygon>, Column<GeoMultiLineString>, and Column<GeoMultiPoint> table function inputs and outputs.

    • Adds support for none encoded text column, geometry column, and array column projections from the right table in left join queries.

    • Adds support for literal text scalar subqueries.

    • Adds support for ST_X and ST_Y function output cast to text.

    • Improves concurrent execution of DDL and SHOW commands.

    • Improves error messaging for when the storage directory is missing.

    • Optimizes memory utilization for auto-vacuuming after delete queries.

    hashtag
    HeavyDB - Fixed Issues

    • Fixes an issue where the root user could be deleted in certain cases.

    • Fixes an issue where staging directories for S3 import could remain when imports failed.

    • Fixes a crash that could occur when accessing the "tables" system table on instances containing tables with many columns.

    • Fixes a crash that could occur when accessing CSV and regex parsed file foreign tables that previously errored out during cache recovery.

    • Fixes an issue where dumping table foreign tables would produce an empty table.

    • Fixes an intermittent crash that could occur when accessing CSV and regex parsed file foreign tables that are backed by large files.

    • Fixes a "Ran out of slots in the query output buffer" exception that could occur when using stale cached cardinality values.

    • Fixes an issue where user defined table functions are erroneously categorized as ambiguous.

    • Fixes an error that could occur when a group by clause includes an alias that matches a column name.

    • Fixes a crash that could occur on GPUs with the Pascal architecture when executing join queries with case expression projections.

    • Fixes a crash that could occur when using the LAG_IN_FRAME window function.

    • Fixes a crash that could occur when projecting geospatial columns from the tf_raster_contour_polygons table function.

    • Fixes an issue that could occur when calling window functions on encoded date columns.

    • Fixes a crash that could occur when the coalesce function is called with geospatial or array columns.

    • Fixes a crash that could occur when projecting case expressions with geospatial or array columns.

    • Fixes a crash that could occur due to rounding error when using the WIDTH_BUCKET function.

    • Fixes a crash that could occur in certain cases where left join queries are executed on GPU.

    • Fixes a crash that could occur for queries with joins on encoded date columns.

    • Fixes a crash that could occur when using the SAMPLE function on a geospatial column.

    • Fixes a crash that could occur for table functions with cursor arguments that specify no field type.

    • Fixes an issue where automatic casting does not work correctly for table function calls with ColumnList input arguments.

    • Fixes an issue where table function argument types are not correctly inferred when arithmetic operations are applied.

    • Fixes an intermittent crash that could occur for join queries due to a race condition when changing hash table layouts.

    • Fixes an out of CPU memory error that could occur when executing a query with a count distinct function call on a high cardinality column.

    • Fixes a crash that could occur when running a HeavyDB instance in read-only mode after previously executing write queries on tables.

    • Fixes an issue where the auto-vacuuming process does not immediately evict chunks that were pulled in for vacuuming.

    • Fixes a crash that could occur in certain cases when HeavyConnect is used with Parquet files containing null string values.

    • Fixes potentially inaccurate calculation of vertical attenuation from antenna patterns in HeavyRF.

    hashtag
    Heavy Render - New Features and Improvements

    • Add support for rendering a 1d cross-section as a line

    • Package the Vulkan loader libVulkan1 alongside heavydb

    hashtag
    Heavy Render - Fixed Issues

    • Fix a device lost error that could occur with complex polygon renders

    hashtag
    Heavy Immerse - New Features and Improvements

    • Data source Joins as a new custom data source type. (BETA)

    • Adds improved query performance defaults for the Contour Chart.

    • Adds access to new control panel to users with role "immerse_control_panel", even if the user is not a superuser.

    • Adds custom naming of map layers.

    • Adds custom map layer limit option using flag “ui/max_map_layers” which can be set explicitly (defaults to 8) or to -1 to remove the limit.

    hashtag
    Heavy Immerse - Fixed Issues

    • Renames role from “immerse_trial_mode” to “immerse_export_disabled” and renames corresponding flag from “ui/enable_trial_mode” to “ui/user_export_disabled”.

    • Various minor UI fixes and polishing.

    • Fixes an issue where changing parameter value causes Choropleth popup to lose selected popup columns.

    • Fixes an issue where changing parameter value causes Pointmap to lose selected popup columns.

    • Fixes an issue where building a Skew-T chart results in a blank browser page.

    • Fixes an issue where Skew-T chart did not display wind barbs.

    • Fixes an issue with default date and time formatting.

    • Fixes an issue where setting flag "ui/enable_map_exports" to false unexpectedly disabled table chart export.

    • Fixes an issue with date filter presets.

    • Fixes an issue where filters "Does Not Contain" or "Does not equal" did not work on Crosslinked Columns.

    • Fixes an issue where charts were not redrawing to show the current bounding box filter set by the Linemap chart.

    HEAVY.AI End User License Agreement (EULA)arrow-up-right
    Cpu Drawed Bubble chart

    When prompted, paste your license key in the text box and click Apply.

  • Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

    .

  • Click SCATTER.
  • Click Add Data Source.

  • Choose the flights_2008_10k table as the data source.

  • Click X Axis +Add Measure.

  • Choose depdelay.

  • Click Y Axis +Add Measure.

  • Choose arrdelay.

  • Click Size +Add Measure.

  • Choose airtime.

  • Click Color +Add Measure.

  • Choose dest_state.

  • The resulting chart shows, unsurprisingly, that there is a correlation between departure delay and arrival delay.\

    Gpu Drawed Scatterplot

    Create a new dashboard and a Table chart to verify that Heavy Immerse is working.

    1. Click New Dashboard.

    2. Click Add Chart.

    3. Click Bubble.

    4. Click Select Data Source.

    5. Choose the flights_2008_10k table as the data source.

    6. Click Add Dimension.

    7. Choose carrier_name.

    8. Click Add Measure.

    9. Choose depdelay.

    10. Click Add Measure.

    11. Choose arrdelay.

    12. Click Add Measure.

    13. Choose #Records.

    The resulting chart shows, unsurprisingly, that also the average departure delay is correlated to the average of arrival delay, while there is quite a difference between Carriers.\

    Install NVIDIA Drivers and Vulkan on Ubuntu
    ¹
    https://help.ubuntu.com/lts/serverguide/firewall.htmlarrow-up-right
    ²
    herearrow-up-right
    herearrow-up-right
    ¹
    ¹
    You can use Heavy Immerse to import geospatial data into HeavyDB.

    Supported formats include:

    • Keyhole Markup Language (.kml)

    • GeoJSON (.geojson)

    • Shapefiles (.shp)

    • FlatGeobuf (.fgb)

    Shapefiles include four mandatory files: .shp, .shx, .dbf, and .prj. If you do not import the .prj file, the coordinate system will be incorrect and you cannot render the shapes on a map.

    To import geospatial definition data:

    1. Open Heavy Immerse.

    2. Click Data Manager.

    3. Click Import Data.

    4. Choose whether to import from a local file or an Amazon S3 instance. For details on importing from Amazon S3, see .

    5. Click the large + icon to select files for upload, or drag and drop the files to the Data Importer screen.

      When importing shapefiles, upload all required file types at the same time. If you upload them separately, Heavy Immerse issues an error message.

    6. Wait for the uploads to complete (indicated by green checkmarks on the file icons), then click Preview.

    7. On the Data Preview screen:

      • Edit the column headers (if needed).

      • Enter a name for the table in the field at the bottom of the screen.

    8. On the Successfully Imported Table screen, verify the rows and columns that compose your data table.

    hashtag
    Importing Well-Known Text

    You can import spatial representations in Well-known Text (WKT)arrow-up-right format. WKT is a text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects, and transformations between spatial reference systems.

    circle-info

    When representing longitude and latitude in HEAVY.AI geospatial primitives, the first coordinate is assumed to be longitude by default.

    hashtag
    WKT Data Supported in Geospatial Columns

    You can use heavysql to define tables with columns that store WKT geospatial objects.

    hashtag
    Insert

    You can use heavysql to insert data as WKT string values.

    hashtag
    Importing Delimited Files

    You can insert data from CSV/TSV files containing WKT strings. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI.

    You can use your own custom delimiter in your data files.

    hashtag
    Importing Legacy CSV/TSV Files

    hashtag
    Storing Geo Data

    You can import CSV and TSV files for tables that store longitude and latitude as either:

    • Separate consecutive scalar columns

    • A POINT field.

    If the data is stored as a POINT, you can use spatial functions like ST_Distance and ST_Contains. When location data are stored as a POINT column, they are displayed as such when querying the table:

    HEAVY.AI accepts data with any SRID, or with no SRID. HEAVY.AI supports SRID 4326 (WGS 84arrow-up-right), and allows projections from SRID 4326 to SRID 900913 (Google Web Mercator). Geometries declared with SRID 4326 are compressed by default, and can be rendered and used to calculate geodesic distance. Geometries declared with any other SRID, or no SRID, are treated as planar geometries; the SRIDs are ignored.

    circle-info

    If two geometries are used in one operation (for example, in ST_Distance), the SRID values need to match.

    hashtag
    Importing the Data

    If you are using heavysql, create the table in HEAVY.AI with the POINT field defined as below:

    Then, import the file using COPY FROM in heavysql. By default, the two columns as consumed as longitude x and then latitude y. If the order of the coordinates in the CSV file is reversed, load the data using the WITH option lonlat='false':

    Columns can exist on either side of the point field; the lon/lat in the source file does not have to be at the beginning or end of the target table. Fields can exist on either side of the lon/lat pair.

    If the imported coordinates are not 4326---for example, 2263---you can transform them to 4326 on the fly:

    hashtag
    Importing CSV, TSV, and TXT Files in Immerse

    In Immerse, you define the table when loading the data instead of predefining it before import. Immerse supports appending data to a table by loading one or more files.

    Longitude and latitude can be imported as separate columns.

    hashtag
    Importing Geospatial Files

    You can create geo tables by importing specific geo file formats. HEAVY.AI supports the following types:

    • ESRI shapefile (.shp and associated files)

    • GeoJSON (.geojson or .json)

    • KML (.kml or .kmz)

    • ESRI file geodatabase (.gdb)

    circle-exclamation

    An ESRI file geodatabase can have multiple layers, and importing it results in the creation of one table for each layer in the file. This behavior differs from that of importing shapefiles, GeoJSON, or KML files, which results in a single table. See Importing an ESRI File Geodatabase for more information.

    You import geo files using the COPY FROM command with the geo option:

    The geo file import process automatically creates the table by detecting the column names and types explicitly described in the geo file header. It then creates a single geo column (always called heavyai_geo) that is of one of the supported types (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, or MULTIPOLYGON).

    circle-info

    In Release 6.2 and higher, polygon render metadata assignment is disabled by default. This data is no longer required by the new polygon rendering algorithm introduced in Release 6.0. The new default results in significantly faster import for polygon table imports, particularly high-cardinality tables.

    If you need to revert to the legacy polygon rendering algorithm, polygons from tables imported in Release 6.2 may not render correctly. Those tables must be re-imported after setting the server configuration flag enable-assign-render-groups to true.

    The legacy polygon rendering algorithm and polygon render metadata server config will be removed completely in an upcoming release.

    circle-info

    Due to the prevalence of mixed POLYGON/MULTIPOLYGON geo files (and CSVs), if HEAVY.AI detects a POLYGON type geo file, HEAVY.AI creates a MULTIPOLYGON column and imports the data as single polygons.

    If the table does not already exist, it is created automatically.

    If the table already exists, and the data in the geo file has exactly the same column structure, the new file is appended to the existing table. This enables import of large geo data sets split across multiple files. The new file is rejected if it does not have the same column structure.

    By default, geo data is stored as GEOMETRY.

    You can also create tables with coordinates in SRID 3857 or SRID 900913 (Google Web Mercator). Importing data from shapefiles using SRID 3857 or 900913 is supported; importing data from delimited files into tables with these SRIDs is not supported at this time. To explicitly store in other formats, use the following WITH options in addition to geo='true':

    Compression used:

    • COMPRESSED(32) - 50% compression (default)

    • None - No compression

    Spatial reference identifier (SRID) type:

    • 4326 - EPSG:4326 (default)

    • 900913 - Google Web Mercator

    • 3857

    For example, the following explicitly sets the default values for encoding and SRID:

    circle-info

    Rendering of geo LINESTRING, MULTILINESTRING, POLYGON and MULTIPOLYGON is possible only with data stored in the default lon/lat WGS84 (SRID 4326) format, although the type and encoding are flexible. Unless compression is explictly disabled (NONE), all SRID 4326 geometries are compressed. For more information, see WSG84 Coordinate Compression.

    Note that rendering of geo MULTIPOINT is not yet supported.

    hashtag
    Importing an ESRI File Geodatabase

    An ESRI file geodatabase (.gdb) provides a method of storing GIS information in one large file that can have one or more "layers", with each layer containing disparate but related data. The data in each layer can be of different types. Importing a .gdb file results in the creation of one table for each layer in the file. You import an ESRI file geodatabase the same way that you import other geo file formats, using the COPY FROM command with the geo option:

    The layers in the file are scanned and defined by name and contents. Contents are classified as EMPTY, GEO, NON_GEO or UNSUPPORTED_GEO:

    • EMPTY layers are skipped because they contain no useful data.

    • GEO layers contain one or more geo columns of a supported type (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON) and one or more regular columns, and can be imported to a single table in the same way as the other geo file formats.

    • NON_GEO layers contain no geo columns and one or more regular columns, and can be imported to a regular table. Although the data comes from a geo file, data in this layer does not result in a geo table.

    • UNSUPPORTED_GEO layers contain geo columns of a type not currently supported (for example, GEOMETRYCOLLECTION). These layers are skipped because they cannot be imported completely.

    A single COPY FROM command can result in multiple tables, one for each layer in the file. The table names are automatically generated by appending the layer name to the provided table name.

    For example, consider the geodatabase file mydata.gdb which contains two importable layers with names A and B. Running COPY FROM creates two tables, mydata_A and mydata_B, with the data from layers A and B, respectively. The layer names are appended to the provided table name. If the geodatabase file only contains one layer, the layer name is not appended.

    You can load one specific layer from the geodatabase file by using the geo_layer_name option:

    This loads only layer A, if it is importable. The resulting table is called mydata, and the layer name is not appended. Use this import method if you want to set a different name for each table. If the layer name from the geodatabase file would result in an illegal table name when appended, the name is sanitized by removing any illegal characters.

    hashtag
    Importing Geo Files from Archives or Non-Local Storage

    You can import geo files directly from archive files (for example, .zip .tar .tgz .tar.gz) without unpacking the archive. You can directly import individual geo files compressed with Zip or GZip (GeoJSON and KML only). The server opens the archive header and loads the first candidate file it finds (.shp .geojson .json .kml), along with any associated files (in the case of an ESRI Shapefile, the associated files must be siblings of the first).

    You can import geo files or archives directly from an Amazon S3 bucket.

    You can provide Amazon S3 credentials, if required, by setting variables in the environment of the heavysql process…

    You can also provide your credentials explicitly in the COPY FROM command.

    You can import geo files or archives directly from an HTTP/HTTPS website.

    You can extend a column type specification to include spatial reference (SRID) and compression mode information.

    Geospatial objects declared with SRID 4326 are compressed 50% by default with ENCODING COMPRESSED(32). In the following definition of table geo2, the columns poly2 and mpoly2 are compressed.

    COMPRESSED(32) compression maps lon/lat degree ranges to 32-bit integers, providing a smaller memory footprint and faster query execution. The effect on precision is small, approximately 4 inches at the equator.

    You can disable compression by explicitly choosing ENCODING NONE.

    hashtag
    WGS84 Coordinate Compression

    You can extend a column type specification to include spatial reference (SRID) and compression mode information.

    Geospatial objects declared with SRID 4326 are compressed 50% by default with ENCODING COMPRESSED(32). In the following definition of table geo2, the columns poly2 and mpoly2 are compressed.

    COMPRESSED(32) compression maps lon/lat degree ranges to 32-bit integers, providing a smaller memory footprint and faster query execution. The effect on precision is small, approximately 4 inches at the equator.

    You can disable compression by explicitly choosing ENCODING NONE.

    How can I avoid creating duplicate rows?arrow-up-right
    CURSOR
    argument. The number and types of input arguments, as well as the number and types of output arguments, are specified in the table function definition itself.

    Table functions allow for the efficient execution of advanced algorithms that may be difficult or impossible to express in canonical SQL. By allowing execution of code directly over SQL result sets, leveraging the same hardware parallelism used for fast SQL execution and visualization rendering, HEAVY.AI provides orders-of-magnitude speed increases over the alternative of transporting large result sets to other systems for post-processing and then returning to HEAVY.AI for storage or downstream manipulation. You can easily invoke system-provided or user-defined algorithms directly inline with SQL and rendering calls, making prototyping and deployment of advanced analytics capabilities easier and more streamlined.

    hashtag
    Concepts

    hashtag
    CURSOR Subquery Inputs

    Table functions can take as input arguments both constant literals (including scalar results of subqueries) as well as results of other SQL queries (consisting of one or more rows). The latter (SQL query inputs), per the SQL standard, must be wrapped in the keyword CURSOR. Depending on the table function, there can be 0, 1, or multiple CURSOR inputs. For example:

    hashtag
    ColumnList Inputs

    Certain table functions can take 1 or more columns of a specified type or types as inputs, denoted as ColumnList<TYPE1 | Type2... TypeN>. Even if a function allows aColumnList input of multiple types, the arguments must be all of one type; types cannot be mixed. For example, if a function allows ColumnList<INT | TEXT ENCODING DICT>, one or more columns of either INTEGER or TEXT ENCODING DICT can be used as inputs, but all must be either INT columns or TEXT ENCODING DICT columns.

    hashtag
    Named Arguments

    All HEAVY.AI system table functions allow you to specify argument either in conventional comma-separated form in the order specified by the table function signature, or alternatively via a key-value map where input argument names are mapped to argument values using the => token. For example, the following two calls are equivalent:

    hashtag
    Filter Push-Down

    For performance reasons, particularly when table functions are used as actual tables in a client like Heavy Immerse, many system table functions in HEAVY.AI automatically "push down" filters on certain output columns in the query onto the inputs. For example, if a table does some computation over an x and y range such that x and y are in both the input and output for the table function, filter push-down would likely be enabled so that a query like the following would automatically push down the filter on the x and y outputs to the x and y inputs. This potentially increases query performance significantly.

    To determine whether filter push-down is used, you can check the Boolean value of the filter_table_transpose column from the query:

    circle-info

    Currently for system table functions, you cannot change push-down behavior.

    hashtag
    Querying Registered Table Functions

    You can query which table functions are available using SHOW TABLE FUNCTIONS:

    hashtag
    Query Metadata for a Specific Table Function

    Information about the expected input and output argument names and types, as well as other info such as whether the function can run on CPU, GPU or both, and whether filter push-down is enabled, can be queried via SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

    hashtag
    System Table Functions

    The following system table functions are available in HEAVY.AI. The table provides a summary and links to more inforamation about each function.

    Function
    Purpose

    Generates random string data.

    Generates a series of integer values.

    Generates a series of timestamp values from start_timestamp to end_timestamp.

    circle-info

    For information about the HeavyRF radio frequency propagation simulation and HeavyRF table functions, see HeavyRFarrow-up-right.

    circle-info

    The TABLE command is required to wrap a table function clause; for example: select * from TABLE(generate_series(1, 10));

    The CURSOR command is required to wrap any subquery inputs.

    Numbaarrow-up-right
    User-Defined Table Functions

    HEAVY.AI Installation using Docker on Ubuntu

    Follow these steps to install HEAVY.AI as a Docker container on a machine running with on CPU or with supported NVIDIA GPU cards using Ubuntu as the host OS.

    hashtag
    Preparation

    Prepare your host by installing Docker and if needed for your configuration NVIDIA drivers and NVIDIA runtime.

    hashtag
    Install Docker

    Remove any existing Docker Installs and if on GPU the legacy NVIDIA docker runtime.

    Add Docker's GPG key using curl and ca-certificates

    Add Docker to your Apt repository.

    Update your repository.

    Install Docker, the command line interface, and the container runtime.

    Run the following usermod command so that Docker command execution does not require sudo privilege. Log out and log back in for the changes to take effect. (recommended)

    Verify your Docker installation.

    For more information on Docker installation, see the .

    hashtag
    Install NVIDIA Drivers and NVIDIA Container ᴳᴾᵁ ᴼᴾᵀᴵᴼᴺ

    hashtag
    Install NVIDIA Drivers

    Install NVIDIA driver and Cuda Toolkit using

    hashtag
    Install NVIDIA Docker Runtime

    Use curl to add Nvidia's Gpg key:

    Update your sources list:

    Update apt-get and install nvidia-container-runtime:

    Edit /etc/docker/daemon.json to add the following, and save the changes:

    Restart the Docker daemon:

    hashtag
    Check NVIDIA Drivers

    Verify that docker and NVIDIA runtime work together.

    If everything is working you should get the output of nvidia-smi command showing the installed GPUs in the system.

    hashtag
    HEAVY.AI Installation

    Create a directory to store data and configuration files

    Then a minimal configuration file for the docker installation

    circle-exclamation

    Ensure that you have sufficient storage on the drive you choose for your storage dir running this command

    Optional: Download HEAVY.AI from Release Website

    The subsequent section will download and install an image using DockerHub. However, if you wish to avoid pulling from DockerHub and instead download and prepare a specific image, follow the instructions in this section. To Download a specific version, visit one of the following websites, choose the version that you wish to install, right click and select "COPY URL".

    Enterprise/Free Editions:

    Open Source Editions:

    circle-info

    Use files ending in -render-docker.tar.gz to install GPU edition and -cpu-docker.tar.gz to install CPU editions.

    Then, on the server where you wish to install HEAVY.AI, run the following command (Replacing $DOWNLOAD_URL with the URL from your clipboard).

    wget $DOWNLOAD_URL

    Await successful download and run ls | grep heavy to see the filename of the package you just downloaded. Copy the filename to your clipboard, and then run the next command replacing $DOWNLOADED_FILENAME with the contents of your clipboard.

    docker load < $DOWNLOADED_FILENAME

    The command will return a Docker image name. Replace heavyai/heavyai-(...):latest with the image you just loaded.

    Download HEAVY.AI from DockerHub and Start HEAVY.AI in Docker.

    Select the tab depending on the Edition (Enterprise, Free, or Open Source) and execution Device (GPU or CPU) you are going to use.

    circle-info

    Replace ":latest" with ":vX.Y.Z" to pull a specific docker version. (Eg: heavyai-ee-cuda:v8.0.1)

    Check that the docker is up and running a docker ps commnd:

    You should see an output similar to the following.

    See also the note regarding the in Optimizing Performance.

    hashtag
    Configure Firewall ᴼᴾᵀᴵᴼᴺᴬᴸ

    If a firewall is not already installed and you want to harden your system, install theufw.

    To use Heavy Immerse or other third-party tools, you must prepare your host machine to accept incoming HTTP(S) connections. Configure your firewall for external access.

    circle-info

    Most cloud providers use a different mechanism for firewall configuration. The commands above might not run in cloud deployments.

    For more information, see .

    hashtag
    Licensing HEAVY.AI ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

    If you are on Enterprise or Free Edition, you need to validate your HEAVY.AI instance using your license key. You must skip this section if you are on Open Source Edition

    1. Copy your license key of Enterprise or Free Edition from the registration email message. If you don't have a license and you want to evaluate HEAVY.AI in an enterprise environment, contact your Sales Representative or register for your 30-day trial of Enterprise Edition . If you need a Free License you can get one .

    2. Connect to Heavy Immerse using a web browser to your host on port 6273. For example, http://heavyai.mycompany.com:6273.

    hashtag
    Command-Line Access

    You can access the command line in the Docker image to perform configuration and run HEAVY.AI utilities.

    You need to know the container-id to access the command line. Use the command below to list the running containers.

    You see output similar to the following.

    Once you have your container ID, in the example 9e01e520c30c, you can access the command line using the Docker exec command. For example, here is the command to start a Bash session in the Docker instance listed above. The -it switch makes the session interactive.

    You can end the Bash session with the exit command.

    hashtag
    Final Checks

    To verify that everything is working, load some sample data, perform a heavysql query, and generate a Scatter Plot or a Bubble Chart using Heavy Immerse

    hashtag
    Load Sample Data and Run a Simple Query

    HEAVY.AI ships with two sample datasets of airline flight information collected in 2008, and a census of New York City trees. To install sample data, run the following command.

    Where <container-id> is the container in which HEAVY.AI is running.

    When prompted, choose whether to insert dataset 1 (7,000,000 rows), dataset 2 (10,000 rows), or dataset 3 (683,000 rows). The examples below use dataset 2.

    Connect to HeavyDB by entering the following command (a password willò be asked; the default password is HyperInteractive):

    Enter a SQL query such as the following:

    The results should be similar to the results below.

    hashtag
    Create a Dashboard Using Heavy Immerse ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

    Installing Enterprise or Free Edition, check if Heavy Immerse is running as intended.

    1. Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.

    2. Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

    Create a new dashboard and a Scatter Plot to verify that backend rendering is working.

    1. Click New Dashboard.

    2. Click Add Chart.

    hashtag
    ¹ In the OS Edition, Heavy Immerse Service is unavailable.

    hashtag
    ² The OS Edition does not require a license key.

    Upgrading from Omnisci to HEAVY.AI 6.0

    This section is giving a recipe to upgrade from Omnisci platform 5.5+ to HEAVY.AI 6.0.

    circle-exclamation

    If the version of Omnisci is older than 5.5 an intermediate upgrade step to the 5.5 version is needed. Check the docs on how to do the .

    sudo apt update
    sudo apt upgrade
    sudo apt install curl
    sudo apt install libncurses5
    sudo apt install default-jre-headless apt-transport-https
    sudo reboot
    sudo useradd --user-group --create-home --group sudo heavyai
    sudo passwd heavyai
    sudo su - heavyai
    curl https://releases.heavy.ai/GPG-KEY-heavyai | sudo apt-key add -
    echo "deb https://releases.heavy.ai/ee/apt/ stable cuda" \
    | sudo tee /etc/apt/sources.list.d/heavyai.list
    echo "deb https://releases.heavy.ai/ee/apt/ stable cpu" \
    | sudo tee /etc/apt/sources.list.d/heavyai.list
    echo "deb https://releases.heavy.ai/os/apt/ stable cuda" \
    | sudo tee /etc/apt/sources.list.d/heavyai.list
    echo "deb https://releases.heavy.ai/os/apt/ stable cpu" \
    | sudo tee /etc/apt/sources.list.d/heavyai.list
    sudo apt update
    sudo apt install heavyai
    hai_version="6.0.0"
    sudo apt install heavyai=$(apt-cache madison heavyai | grep $hai_version | cut -f 2 -d '|' | xargs)
    sudo mkdir /opt/heavyai && sudo chown $USER /opt/heavyai
    curl \
    https://releases.heavy.ai/ee/tar/heavyai-ee-latest-Linux-x86_64-render.tar.gz \
    | sudo tar zxf - --strip-components=1 -C /opt/heavyai
    curl \
    https://releases.heavy.ai/ee/tar/heavyai-ee-latest-Linux-x86_64-cpu.tar.gz \
    | sudo tar zxf - --strip-components=1 -C /opt/heavyai
    curl \
    https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64.tar.gz \
    | sudo tar zxf - --strip-components=1 -C /opt/heavyai
    curl \
    https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64-cpu.tar.gz \
    | sudo tar zxf - --strip-components=1 -C /opt/heavyai
    echo "# HEAVY.AI variable and paths
    export HEAVYAI_PATH=/opt/heavyai
    export HEAVYAI_BASE=/var/lib/heavyai
    export HEAVYAI_LOG=$HEAVYAI_BASE/storage/log
    export PATH=$HEAVYAI_PATH/bin:$PATH" \
    >> ~/.bashrc
    source ~/.bashrc
    cd $HEAVYAI_PATH/systemd
    ./install_heavy_systemd.sh
    sudo systemctl enable heavydb --now
    sudo systemctl enable heavy_web_server --now
    sudo systemctl enable heavydb --now
    sudo apt install ufw
    sudo ufw allow ssh
    sudo ufw disable
    sudo ufw allow 6273:6278/tcp
    sudo ufw enable
    cd $HEAVYAI_PATH
    sudo ./insert_sample_data --data /var/lib/heavyai/storage
    #     Enter dataset number to download, or 'q' to quit:
    Dataset           Rows    Table Name          File Name
    1)    Flights (2008)    7M      flights_2008_7M     flights_2008_7M.tar.gz
    2)    Flights (2008)    10k     flights_2008_10k    flights_2008_10k.tar.gz
    3)    NYC Tree Census (2015)    683k    nyc_trees_2015_683k    nyc_trees_2015_683k.tar.gz
    $HEAVYAI_PATH/bin/heavysql
    password: ••••••••••••••••
    SELECT origin_city AS "Origin", 
    dest_city AS "Destination", 
    AVG(airtime) AS "Average Airtime" 
    FROM flights_2008_10k WHERE distance < 175 
    GROUP BY origin_city, dest_city;
    Origin|Destination|Average Airtime
    Austin|Houston|33.055556
    Norfolk|Baltimore|36.071429
    Ft. Myers|Orlando|28.666667
    Orlando|Ft. Myers|32.583333
    Houston|Austin|29.611111
    Baltimore|Norfolk|31.714286
    heavysql> \d geo
    CREATE TABLE geo (
    p POINT,
    l LINESTRING,
    poly POLYGON)
    heavysql> INSERT INTO geo values('POINT(20 20)', 'LINESTRING(40 0, 40 40)', 
    'POLYGON(( 0 0, 40 0, 40 40, 0 40, 0 0 ))');
    > cat geo.csv
    "p", "l", "poly"
    "POINT(1 1)", "LINESTRING( 2 0,  2  2)", "POLYGON(( 1 0,  0 1, 1 1 ))"
    "POINT(2 2)", "LINESTRING( 4 0,  4  4)", "POLYGON(( 2 0,  0 2, 2 2 ))"
    "POINT(3 3)", "LINESTRING( 6 0,  6  6)", "POLYGON(( 3 0,  0 3, 3 3 ))"
    "POINT(4 4)", "LINESTRING( 8 0,  8  8)", "POLYGON(( 4 0,  0 4, 4 4 ))"
    heavysql> COPY geo FROM 'geo.csv';
    Result
    Loaded: 4 recs, Rejected: 0 recs in 0.356000 secs
    > cat geo1.csv
    "p", "l", "poly"
    POINT(5 5); LINESTRING(10 0, 10 10); POLYGON(( 5 0, 0 5, 5 5 ))
    heavysql> COPY geo FROM 'geo1.csv' WITH (delimiter=';', quoted='false');
    Result
    Loaded: 1 recs, Rejected: 0 recs in 0.148000 secs
    select * from destination_points;
    name|pt
    Just Fishing Around|POINT (-85.499999999727588 44.6929999755849)
    Moonlight Cove Waterfront|POINT (-85.5046011346879 44.6758447935227)
    CREATE TABLE new_geo (p GEOMETRY(POINT,4326))
    heavysql> COPY new_geo FROM 'legacy_geo.csv' WITH (lonlat='false');
    heavysql> COPY new_geo FROM 'legacy_geo_2263.csv' WITH (source_srid=2263, lonlat='false');
    heavysql> COPY states FROM 'states.shp' WITH (geo='true');
    heavysql> COPY zipcodes FROM 'zipcodes.geojson' WITH (geo='true');
    heavysql> COPY cell_towers FROM 'cell_towers.kml' WITH (geo='true');
    geo_coords_encoding='COMPRESSED(32)'
    geo_coords_srid=4326
    heavysql> COPY counties FROM 'counties.gdb' WITH (geo='true');
    COPY mydata FROM 'mydata.gdb' WITH (geo='true', geo_layer_name='A');
    $ unzip -l states.zip
    Archive:  states.zip
      Length      Date    Time    Name
    ---------  ---------- -----   ----
            0  2018-02-13 11:09   states/
       446116  2017-11-06 12:15   states/cb_2014_us_state_20m.shp
         8434  2017-11-06 12:15   states/cb_2014_us_state_20m.dbf
            9  2017-11-06 12:15   states/cb_2014_us_state_20m.cpg
          165  2017-11-06 12:15   states/cb_2014_us_state_20m.prj
          516  2017-11-06 12:15   states/cb_2014_us_state_20m.shx
    ---------                     -------
       491525                     6 files
    
    heavysql> COPY states FROM 'states.zip' with (geo='true');
    heavysql> COPY zipcodes FROM 'zipcodes.geojson.gz' with (geo='true');
    heavysql> COPY zipcodes FROM 'zipcodes.geojson.zip' with (geo='true');
    heavysql> COPY cell_towers FROM 'cell_towers.kml.gz' with (geo='true');
    heavysql> COPY states FROM 's3://mybucket/myfolder/states.shp' with (geo='true');
    heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' with (geo='true');
    heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.gz' with (geo='true');
    heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.zip' with (geo='true');
    AWS_REGION=us-west-1
    AWS_ACCESS_KEY_ID=********************
    AWS_SECRET_ACCESS_KEY=****************************************
    heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' WITH (geo='true', s3_region='us-west-1', s3_access_key='********************', s3_secret_key='****************************************');  
    heavysql> COPY states FROM 'http://www.mysite.com/myfolder/states.zip' with (geo='true');
    CREATE TABLE geo2 (
    p2 GEOMETRY(POINT, 4326) ENCODING NONE,
    l2 GEOMETRY(LINESTRING, 900913),
    poly2 GEOMETRY(POLYGON, 4326),
    mpoly2 GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32));
    SELECT * FROM (TABLE(my_table_function /* This is only an example! */ (
     CURSOR(SELECT arg1, arg2, arg3 FROM input_1 WHERE x > 10) /* First CURSOR 
     argument consisting of 3 columns */,
     CURSOR(SELECT arg1, AVG(arg2) FROM input_2 GROUP BY arg1 where y < 40) 
     /* Second CURSOR argument constisting of 2 columns. This could be from the same
     table as the first CURSOR, or as is the case here, a completely different table
     (or even joined table or logical value expression) */,
     'Fred' /* TEXT constant literal argument */,
     true /* BOOLEAN constant literal argument */,
     (SELECT COUNT(*) FROM another_table), /* scalar subquery results do not need
     to be wrapped in a CURSOR */,
     27.3 /* FLOAT constant literal argument */))
    WHERE output1 BETWEEN 32.2 AND 81.8;
    /* The following two table function calls, the first with unnamed
     signature-ordered arguments, and the second with named arguments,
     are equivalent */
    
    select
      *
    from
      table(
        tf_compute_dwell_times(
          /* Without the use of named arguments, input arguments must
          be ordered as specified by the table function signature */
          cursor(
            select
              user_id,
              movie_id,
              ts
            from
              netflix_audience_behavior
          ),
          3,
          600,
          10800
        )
      )
    order by
      num_dwell_points desc
    limit
      10;
    
    
    select
      *
    from
      table(
        tf_compute_dwell_times(
         /* Using named arguments, input arguments can be
         ordered in any order, as long as all arguments are named */
         min_dwell_seconds => 600,
         max_inactive_seconds => 10800
          data => cursor(
            select
              user_id,
              movie_id,
              ts
            from
              netflix_audience_behavior
          ),
          min_dwell_points => 3
        )
      )
    order by
      num_dwell_points desc
    limit
      10;
    SELECT
      *
    FROM
      TABLE(
        my_spatial_table_function(
          CURSOR(
            SELECT
              x,
              y
            from
              spatial_data_table
              /* Presuming filter push down is enabled for 
              my_spatial_table_function, the filter applied to 
              x and y will be applied here to the table function
              input CURSOR */
          )
        )
      )
    WHERE
      x BETWEEN 38.2
      AND 39.1
      and Y BETWEEN -121.4
      and -120.1;
    SHOW TABLE FUNCTIONS DETAILS <table_function_name>;
    SHOW TABLE FUNCTIONS;
    
    Table UDF
    
    tf_feature_similarity
    tf_feature_self_similarity
    tf_geo_rasterize_slope
    ...
    SHOW TABLE FUNCTIONS DETAILS <table_function_name>;
    
    name|signature|input_names|input_types|output_names|output_types|CPU|GPU|Runtime|filter_table_transpose
    generate_series|(i64 series_start, i64 series_stop, i64 series_step) -> Column<i64>|[series_start, series_stop, series_step]|[i64, i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false
    generate_series|(i64 series_start, i64 series_stop) -> Column<i64>|[series_start, series_stop]|[i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false

    If you are loading the data files into a distributed system, verify under Import Settings that the Replicate Table checkbox is selected.

  • Click Import Data.

  • - EPSG:3857
    Importing Data from Amazon S3

    Given a query input with entity keys and timestamps, and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session.

    tf_feature_self_similarity

    Given a query input of entity keys/IDs, a set of feature columns, and a metric column, scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.

    tf_feature_similarity

    Given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity. The score is computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.

    tf_geo_rasterize

    Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, with taking the maximum z value across all points in each bin as the output value for the bin. The aggregate performed to compute the value for each bin is specified by agg_type, with allowed aggregate types of AVG, COUNT, SUM, MIN, and MAX.

    tf_geo_rasterize_slope

    Similar to tf_geo_rasterize, but also computes the slope and aspect per output bin. Aggregates point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type) across all points in each bin as the output value for the bin.

    tf_graph_shortest_path

    Given a distance-weighted directed graph, consisting of a queryCURSOR input consisting of the starting and ending node for each edge and a distance, and a specified origin and destination node, computes the shortest distance-weighted path through the graph between origin_node and destination_node.

    tf_graph_shortest_paths_distances

    Given a distance-weighted directed graph, consisting of a queryCURSOR input consisting of the starting and ending node for each edge and a distance, and a specified origin node, computes the shortest distance-weighted path distance between the origin_node and every other node in the graph.

    tf_load_point_cloud

    Loads one or more las or laz point cloud/LiDAR files from a local file or directory source, optionally tranforming the output SRID to out_srs. If not specified, output points are automatically transformed to EPSG:4326 lon/lat pairs).

    tf_mandelbrot tf_mandelbrot_cuda tf_mandelbrot_cuda_float tf_mandelbrot_float

    Computes the Mandelbrot setarrow-up-right over the complex domain [x_min, x_max), [y_min, y_max), discretizing the xy-space into an output of dimensions x_pixels X y_pixels.

    tf_point_cloud_metadata

    Returns metadata for one or more las or laz point cloud/LiDAR files from a local file or directory source, optionally constraining the bounding box for metadata retrieved to the lon/lat bounding box specified by the x_min, x_max, y_min, y_max arguments.

    tf_raster_contour_lines tf_raster_contour_polygons

    Process a raster input to derive contour lines or regions and output as LINESTRING or POLYGON for rendering or further processing.

    tf_raster_graph_shortest_slope_weighted_path

    Aggregate point data into x/y bins of a given size in meters to form a dense spatial grid, computing the specified aggregate (using agg_type) across all points in each bin as the output value for the bin.

    tf_rf_prop

    Used for generating top-k signals where 'k' represents the maximum number of antennas to consider at each geographic location. The full relevant parameter name is strongest_k_sources_per_terrain_bin.

    tf_rf_prop_max_signal (Directional Antennas) tf_rf_prop_max_signal (Isotropic Antennas)

    Taking a set of point elevations and a set of signal source locations as input, tf_rf_prop_max_signal executes line-of-sight 2.5D RF signal propagation from the provided sources over a binned 2.5D elevation grid derived from the provided point locations, calculating the max signal in dBm at each grid cell, using the formula for free-space power loss.

    generate_random_strings
    generate_series (Integers)
    generate_series (Timestamps)
    tf_compute_dwell_times
    Cpu Drawed Bubble chart

    Reserved Words

    Following is a list of HEAVY.AI keywords.

    ABS
    ACCESS
    ADD
    ALL
    ALLOCATE
    ALLOW
    ALTER
    AMMSC
    AND
    ANY
    ARCHIVE
    ARE
    ARRAY_MAX_CARDINALITY
    ARRAY
    AS
    ASC
    ASENSITIVE
    ASYMMETRIC
    AT
    ATOMIC
    AUTHORIZATION
    AVG
    BEGIN
    BEGIN_FRAME
    BEGIN_PARTITION
    BETWEEN
    BIGINT
    BINARY
    BIT
    BLOB
    BOOLEAN
    BOTH
    BY
    CALL
    CALLED
    CARDINALITY
    CASCADED
    CASE
    CAST
    CEIL
    CEILING
    CHAR
    CHARACTER
    CHARACTER_LENGTH
    CHAR_LENGTH
    CHECK
    CLASSIFIER
    CLOB
    CLOSE
    COALESCE
    COLLATE
    COLLECT
    COLUMN
    COMMIT
    CONDITION
    CONNECT
    CONSTRAINT
    CONTAINS
    CONTINUE
    CONVERT
    COPY
    CORR
    CORRESPONDING
    COUNT
    COVAR_POP
    COVAR_SAMP
    CREATE
    CROSS
    CUBE
    CUME_DIST
    CURRENT
    CURRENT_CATALOG
    CURRENT_DATE
    CURRENT_DEFAULT_TRANSFORM_GROUP
    CURRENT_PATH
    CURRENT_ROLE
    CURRENT_ROW
    CURRENT_SCHEMA
    CURRENT_TIME
    CURRENT_TIMESTAMP
    CURRENT_TRANSFORM_GROUP_FOR_TYPE
    CURRENT_USER
    CURSOR
    CYCLE
    DASHBOARD
    DATABASE
    DATE
    DATE_TRUNC
    DATETIME
    DAY
    DEALLOCATE
    DEC
    DECIMAL
    DECLARE
    DEFAULT
    DEFINE
    DELETE
    DENSE_RANK
    DEREF
    DESC
    DESCRIBE
    DETERMINISTIC
    DISALLOW
    DISCONNECT
    DISTINCT
    DOUBLE
    DROP
    DUMP
    DYNAMIC
    EACH
    EDIT
    EDITOR
    ELEMENT
    ELSE
    EMPTY
    END
    END-EXEC
    END_FRAME
    END_PARTITION
    EQUALS
    ESCAPE
    EVERY
    EXCEPT
    EXEC
    EXECUTE
    EXISTS
    EXP
    EXPLAIN
    EXTEND
    EXTERNAL
    EXTRACT
    FALSE
    FETCH
    FILTER
    FIRST
    FIRST_VALUE
    FLOAT
    FLOOR
    FOR
    FOREIGN
    FOUND
    FRAME_ROW
    FREE
    FROM
    FULL
    FUNCTION
    FUSION
    GEOGRAPHY 
    GEOMETRY 
    GET
    GLOBAL
    GRANT
    GROUP
    GROUPING
    GROUPS
    HAVING
    HOLD
    HOUR
    IDENTITY
    IF
    ILIKE
    IMPORT
    IN
    INDICATOR
    INITIAL
    INNER
    INOUT
    INSENSITIVE
    INSERT
    INT
    INTEGER
    INTERSECT
    INTERSECTION
    INTERVAL
    INTO
    IS
    JOIN
    LAG
    LANGUAGE
    LARGE
    LAST_VALUE
    LAST
    LATERAL
    LEAD
    LEADING
    LEFT
    LENGTH
    LIKE
    LIKE_REGEX
    LIMIT
    LINESTRING 
    LN
    LOCAL
    LOCALTIME
    LOCALTIMESTAMP
    LOWER
    MATCH
    MATCH_NUMBER
    MATCH_RECOGNIZE
    MATCHES
    MAX
    MEASURES
    MEMBER
    MERGE
    METHOD
    MIN
    MINUS
    MINUTE
    MOD
    MODIFIES
    MODULE
    MONTH
    MULTIPOLYGON 
    MULTISET
    NATIONAL
    NATURAL
    NCHAR
    NCLOB
    NEW
    NEXT
    NO
    NONE
    NORMALIZE
    NOT
    NOW
    NTH_VALUE
    NTILE
    NULL
    NULLIF
    NULLX
    NUMERIC
    OCCURRENCES_REGEX
    OCTET_LENGTH
    OF
    OFFSET
    OLD
    OMIT
    ON
    ONE
    ONLY
    OPEN
    OPTIMIZE
    OPTION
    OR
    ORDER
    OUT
    OUTER
    OVER
    OVERLAPS
    OVERLAY
    PARAMETER
    PARTITION
    PATTERN
    PER
    PERCENT
    PERCENT_RANK
    PERCENTILE_CONT
    PERCENTILE_DISC
    PERIOD
    PERMUTE
    POINT 
    POLYGON 
    PORTION
    POSITION
    POSITION_REGEX
    POWER
    PRECEDES
    PRECISION
    PREPARE
    PREV
    PRIMARY
    PRIVILEGES
    PROCEDURE
    PUBLIC
    RANGE
    RANK
    READS
    REAL
    RECURSIVE
    REF
    REFERENCES
    REFERENCING
    REGR_AVGX
    REGR_AVGY
    REGR_COUNT
    REGR_INTERCEPT
    REGR_R2
    REGR_SLOPE
    REGR_SXX
    REGR_SXY
    REGR_SYY
    RELEASE
    RENAME
    RESET
    RESULT
    RESTORE
    RETURN
    RETURNS
    REVOKE
    RIGHT
    ROLE 
    ROLLBACK
    ROLLUP
    ROW
    ROW_NUMBER
    ROWS
    ROWID 
    RUNNING
    SAVEPOINT
    SCHEMA
    SCOPE
    SCROLL
    SEARCH
    SECOND
    SEEK
    SELECT
    SENSITIVE
    SESSION_USER
    SET
    SHOW
    SIMILAR
    SKIP
    SMALLINT
    SOME
    SPECIFIC
    SPECIFICTYPE
    SQL
    SQLEXCEPTION
    SQLSTATE
    SQLWARNING
    SQRT
    START
    STATIC
    STDDEV_POP
    STDDEV_SAMP
    STREAM
    SUBMULTISET
    SUBSET
    SUBSTRING
    SUBSTRING_REGEX
    SUCCEEDS
    SUM
    SYMMETRIC
    SYSTEM
    SYSTEM_TIME
    SYSTEM_USER
    TABLE
    TABLESAMPLE
    TEMPORARY
    TEXT
    THEN
    TIME
    TIMESTAMP
    TIMEZONE_HOUR
    TIMEZONE_MINUTE
    TINYINT
    TO
    TRAILING
    TRANSLATE
    TRANSLATE_REGEX
    TRANSLATION
    TREAT
    TRIGGER
    TRIM
    TRIM_ARRAY
    TRUE
    TRUNCATE
    UESCAPE
    UNION
    UNIQUE
    UNKNOWN
    UNNEST
    UPDATE
    UPPER
    UPSERT
    USER
    USING
    VALUE
    VALUE_OF
    VALUES
    VARBINARY
    VARCHAR
    VAR_POP
    VAR_SAMP
    VARYING
    VERSIONING
    VIEW
    WHEN
    WHENEVER
    WHERE
    WIDTH_BUCKET
    WINDOW
    WITH
    WITHIN
    WITHOUT
    WORK
    YEAR
    When prompted, paste your license key in the text box and click Apply.
  • Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

  • Click SCATTER.
  • Click Add Data Source.

  • Choose the flights_2008_10k table as the data source.

  • Click X Axis +Add Measure.

  • Choose depdelay.

  • Click Y Axis +Add Measure.

  • Choose arrdelay.

  • Click Size +Add Measure.

  • Choose airtime.

  • Click Color +Add Measure.

  • Choose dest_state.

  • The resulting chart shows, unsurprisingly, that there is a correlation between departure delay and arrival delay.\

    Gpu Drawed Scatterplot

    Create a new dashboard and a Table chart to verify that Heavy Immerse is working.

    1. Click New Dashboard.

    2. Click Add Chart.

    3. Click Bubble.

    4. Click Select Data Source.

    5. Choose the flights_2008_10k table as the data sour

    6. Click Add Dimension.

    7. Choose carrier_name.

    8. Click Add Measure.

    9. Choose depdelay.

    10. Click Add Measure.

    11. Choose arrdelay.

    12. Click Add Measure.

    13. Choose #Records.

    The resulting chart shows, unsurprisingly, that also the average departure delay is correlated to the average of arrival delay, while there is quite a difference between Carriers.\

    Docker Installation Guidearrow-up-right
    Install NVIDIA Drivers and Vulkan on Ubuntu
    https://releases.heavy.ai/ee/tar/arrow-up-right
    https://releases.heavy.ai/os/tar/arrow-up-right
    CUDA JIT Cachearrow-up-right
    https://help.ubuntu.com/lts/serverguide/firewall.htmlarrow-up-right
    ²
    herearrow-up-right
    herearrow-up-right
    ¹
    ¹
    Standard NVIDIA-SMI output shows the GPU visible in your container.
    hashtag
    Considerations when Upgrading from Omnisci to HEAVY.AI Platform

    If you are upgrading from Omnisci to HEAVY.AI, there are a lot of additional steps compared to a simple sub-version upgrade.

    hashtag
    Before Upgrading to Release 6.0

    circle-exclamation

    IMPORTANT - Before you begin, stop all the running services / docker images of your Omnisci installation and create a backup $OMNISCI_STORAGE folder (typically /var/lib/omnisci). A backup is essential for recoverability; do not proceed with the upgrade without confirming that a full and consistent backup is available and ready to be restored.

    The omnisci the database will not be automatically renamed to the new default name heavyai.This will be done manually and it's documented in the upgrade steps.

    circle-exclamation

    All the dumps created with the dump command on Omnisci cannot be restored after the database is upgraded to this version.

    hashtag
    Essential Changes for release 6.0 of HEAVY.AI compared to Omnisci

    The following table describes the changes to environment variables, storage locations, and filenames in Release 6.0 compared to Release 5.x. Except where noted, revised storage subfolders, symlinks for old folder names, and filenames are created automatically on server start.

    Change descriptions in bold require user intervention.

    Description
    Omnisci 5.x
    HEAVY.AI 6.0

    Environmental variable for storage location

    $OMNISCI_STORAGE

    $HEAVYAI_BASE

    Default location for $HEAVYAI_BASE / $OMNISCI_STORAGE

    /var/lib/omnisci

    /var/lib/heavyai

    hashtag
    Upgrade Instructions

    circle-exclamation

    The order of these instructions is significant. To avoid problems, follow the order of the instruction provided and don't skip any step.

    hashtag
    Assumptions

    This upgrade procedure is assuming that you are using the default storage location for both Omnisci and HEAVY.AI.

    $OMNISCI_STORAGE

    $HEAVYAI_BASE

    /var/lib/omnisci

    /var/lib/heavyai

    hashtag
    Upgrading Using Docker

    Stop all containers running Omnisci services.

    In a terminal window, get the Docker container IDs:

    You should see an output similar to the following. The first entry is the container ID. In this example, it is 9e01e520c30c:

    Stop the HEAVY.AI Docker container. For example:

    Backup the Omnisci data directory (typically /var/lib/omnisci).

    Rename the Omnisci data directory to reflect the HEAVY.AI naming scheme.

    Create a new configuration file for heavydb changing the data parameter to point to the renamed data directory.

    Rename the Omnisci license file (EE and FREE only).

    Download and run the 6.0 version of the HEAVY.AI Docker image.

    Select the tab depending on the Edition (Enterprise, Free, or Open Source) and execution Device (GPU or CPU) you are upgrading.

    Check that Docker is up and running using a docker ps command:

    You should see output similar to the following:

    Using the new container ID rename the default omnisci database to heavyai:

    Check that everything is running as expected.

    hashtag
    Upgrading to HEAVY.AI Using Package Managers or Tarball

    To upgrade an existing system installed with package managers or tarball. The commands upgrade HEAVY.AI in place without disturbing your configuration or stored data.

    hashtag
    Back up the Omnisci Database

    Stop the Omnisci services.

    Backup the Omnisci data directory (typically /var/lib/omnisci).

    Create a user named heavyai who will be the owner of the HEAVY.AI software and data on the filesystem.

    Set a password for the user. It'll need when sudo-ing.

    Login with the newly created user

    Rename the Omnisci data directory to reflect the HEAVY.AI naming scheme and change the ownership to heavyai user.

    Create the "semaphore" catalog directory; we'll have to remove it later "

    Check that everything is in order and that the "semaphore" directory is created,

    All the directories must belong to the heavyai user, and the directory catalogs would be present

    Rename the license file. (EE and FREE only)

    hashtag
    Install the HEAVY.AI Software

    Install the HEAVY.AI software following all the instructions for your Operative System. CentOS/RHELarrow-up-right and Ubuntu.

    circle-exclamation

    Please follow all the installation and configuration steps until the Initialization step.

    hashtag
    Update the configuration file and rename the default database

    Log in with the heavyai user and ensure the heavyai services are stopped.

    Create a new configuration file for heavydb, changing the data parameter to point to the /var/lib/heavyai/storage directory and the frontend to the new install directory.

    All the settings of the upgraded database will be moved to the new configuration file.

    Now we have to complete the database migration.

    Remove the "semaphore" directory we previously created. (this is a fundamental step needed for the omnsci to heavydb upgrade)

    To complete the upgrade, start the HEAVY.AI servers.

    Check if the database migrated, running this command and checking for the Rebrand migration complete message.

    Rename the default omnisci database to heavyai. Run the command using an administrative user (typically admin) with his password (default HyperInteractive)

    Restart the database service and check that everything is running as expected.

    hashtag
    Remove Omnisci Software from the System

    After all the checks confirmed that the upgraded system is stable, clean up the system to remove the Omnisci install and relative system configuration. Remove permanently the configuration of the services.

    Remove the installed software.

    Delete the YUM or APT repositories.

    upgradearrow-up-right

    SHOW

    Use SHOW commands to get information about databases, tables, and user sessions.

    hashtag
    SHOW CREATE SERVER

    Shows the CREATE SERVER statement that could have been used to create the server.

    hashtag
    Syntax

    hashtag
    Example

    hashtag
    SHOW CREATE TABLE

    Shows the CREATE TABLE statement that could have been used to create the table.

    hashtag
    Syntax

    hashtag
    Example

    hashtag
    SHOW DATABASES

    Retrieve the databases accessible for the current user, showing the database name and owner.

    hashtag
    Example

    hashtag
    SHOW FUNCTIONS

    Show registered compile-time UDFs and extension functions in the system and their arguments.

    hashtag
    Syntax

    hashtag
    Example

    hashtag
    SHOW POLICIES

    Displays a list of all row-level security (RLS) policies that exist for a user or role; admin rights are required. If EFFECTIVE is used, the list also includes any policies that exist for all roles that apply to the requested user or role.

    hashtag
    Syntax

    hashtag
    SHOW QUERIES

    Returns a list of queued queries in the system; information includes session ID, status, query string, account login name, client address, database name, and device type (CPU or GPU).

    hashtag
    Example

    Admin users can see and interrupt all queries, and non-admin users can see and interrupt only their own queries

    NOTE: SHOW QUERIES is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt) is set.

    To interrupt a query in the queue, see .

    hashtag
    SHOW ROLES

    If included with a name, lists the role granted directly to a user or role. SHOW EFFECTIVE ROLES with a name lists the roles directly granted to a user or role, and also lists the roles indirectly inherited through the directly granted roles.

    hashtag
    Syntax

    If the user name or role name is omitted, then a regular user sees their own roles, and a superuser sees a list of all roles existing in the system.

    hashtag
    SHOW RUNTIME FUNCTIONS

    Show user-defined runtime functions and table functions.

    hashtag
    Syntax

    hashtag
    SHOW SUPPORTED DATA SOURCES

    Show data connectors.

    hashtag
    Syntax

    hashtag
    SHOW TABLE DETAILS

    Displays storage-related information for a table, such as the table ID/name, number of data/metadata files used by the table, total size of data/metadata files, and table epoch values.

    You can see table details for all tables that you have access to in the current database, or for only those tables you specify.

    hashtag
    Syntax

    hashtag
    Examples

    Show details for all tables you have access to:

    Show details for table omnisci_states:

    circle-info

    The number of columns returned includes system columns. As a result, the number of columns in column_count can be up to two greater than the number of columns created by the user.

    hashtag
    SHOW TABLE FUNCTIONS

    Displays the list of available system (built-in) table functions.

    For more information, see .

    hashtag
    SHOW TABLE FUNCTIONS DETAILS

    Show detailed output information for the specified table function. Output details vary depending on the table function specified.

    hashtag
    Syntax

    hashtag
    Example - generate_series

    View SHOW output for the generate_series table function:

    Output Header
    Output Details

    hashtag
    SHOW SERVERS

    Retrieve the servers accessible for the current user.

    hashtag
    Example

    hashtag
    SHOW TABLES

    Retrieve the tables accessible for the current user.

    hashtag
    Example

    hashtag
    SHOW USER DETAILS

    Lists name, ID, and default database for all or specified users for the current database. If the command is issued by a superuser, login permission status is also shown. Only superusers see users who do not have permission to log in.

    hashtag
    Example

    SHOW [ALL] USER DETAILS lists name, ID, superuser status, default database, and login permission status for all users across the HeavyDB instance. This variant of the command is available only to superusers. Regular users who run the SHOW ALL USER DETAILS command receive an error message.

    hashtag
    Superuser Output

    Show all user details for all users:

    Show all user details for specified users ue, ud, ua, and uf:

    If a specified user is not found, the superuser sees an error message:

    Show user details for specified users ue, ud, and uf:

    Show user details for all users:

    hashtag
    Non-Superuser Output

    Running SHOW ALL USER DETAILS results in an error message:

    Show user details for all users:

    If a specified user is not found, the user sees an error message:

    Show user details for user ua:

    hashtag
    SHOW USER SESSIONS

    Retrieve all persisted user sessions, showing the session ID, user login name, client address, and database name. Admin or superuser privileges required.

    hashtag
    KILL QUERY

    Interrupt a queued query. Specify the query by using its session ID.

    To see the queries in the queue, use the command:

    To interrupt the last query in the list (ID 946-ooNP):

    Showing the queries again indicates that 946-ooNP has been deleted:

    circle-info
    • KILL QUERY is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt) is set.

    • Interrupting a query in ‘PENDING_QUEUE’ status is supported in both distributed and single-server mode.

    sudo apt-get purge nvidia-docker
    for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
    sudo apt-get install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
      $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt update
    sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    sudo usermod  --append --groups docker $USER
    sudo docker run hello-world
    curl --silent --location https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
    sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl --silent --location https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
      sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
    sudo apt-get update
    sudo apt-get install -y nvidia-container-runtime
    {
      "default-runtime": "nvidia",
      "runtimes": {
         "nvidia": {
             "path": "/usr/bin/nvidia-container-runtime",
             "runtimeArgs": []
         }
     }
    }
    sudo pkill -SIGHUP dockerd
    sudo docker run --gpus=all \
    --rm nvidia/cuda:12.4.1-runtime-ubuntu22.04 nvidia-smi
    sudo mkdir -p /var/lib/heavyai && sudo chown $USER /var/lib/heavyai
    echo "port = 6274
    http-port = 6278
    calcite-port = 6279
    data = \"/var/lib/heavyai\"
    null-div-by-zero = true
    
    [web]
    port = 6273
    frontend = \"/opt/heavyai/frontend\"" \
    >/var/lib/heavyai/heavy.conf
    if test -d /var/lib/heavyai; then echo "There is $(df -kh /var/lib/heavyai --output="avail" | sed 1d) avaibale space in you storage dir"; else echo "There was a problem with the creation of storage dir";  fi;
    sudo docker run -d --gpus=all \
    -v /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/heavyai-ee-cuda:latest
    sudo docker run -d \
    -v /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/heavyai-ee-cpu:latest
    sudo docker run -d --gpus=all \
    -v /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/core-os-cuda:latest
    sudo docker run -d \
    -v /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/core-os-cpu:latest
    sudo docker container ps --format "{{.Image}} {{.Status}}" \
    -f status=running | grep heavyai\/
    heavyai/heavyai-ee-cuda Up 48 seconds ago 
    sudo apt install ufw
    sudo ufw allow ssh
    sudo ufw disable
    sudo ufw allow 6273:6278/tcp
    sudo ufw enable
    sudo docker container ps
    CONTAINER ID        IMAGE                     COMMAND                     CREATED             STATUS              PORTS                                            NAMES
    9e01e520c30c        heavyai/heavyai-ee-gpu    "/bin/sh -c '/heavyai..."   50 seconds ago      Up 48 seconds ago   0.0.0.0:6273-6280->6273-6280/tcp                 confident_neumann
    sudo docker exec -it 9e01e520c30c bash
    sudo docker exec -it <container-id> \
    ./insert_sample_data --data /var/lib/heavyai/storage
    Enter dataset number to download, or 'q' to quit:
    #     Dataset                   Rows    Table Name             File Name
    1)    Flights (2008)            7M      flights_2008_7M        flights_2008_7M.tar.gz
    2)    Flights (2008)            10k     flights_2008_10k       flights_2008_10k.tar.gz
    3)    NYC Tree Census (2015)    683k    nyc_trees_2015_683k    nyc_trees_2015_683k.tar.gz
    sudo docker exec -it <container-id> bin/heavysql 
    SELECT origin_city AS "Origin", 
    dest_city AS "Destination", 
    ROUND(AVG(airtime),1) AS "Average Airtime" 
    FROM flights_2008_10k 
    WHERE distance < 175 GROUP BY origin_city,
    dest_city;
    Origin|Destination|Average Airtime
    West Palm Beach|Tampa|33.8
    Norfolk|Baltimore|36.1
    Ft. Myers|Orlando|28.7
    Indianapolis|Chicago|39.5
    Tampa|West Palm Beach|33.3
    Orlando|Ft. Myers|32.6
    Austin|Houston|33.1
    Chicago|Indianapolis|32.7
    Baltimore|Norfolk|31.7
    Houston|Austin|29.6
    sudo docker run -d --gpus=all \
    -v /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/heavyai-ee-cuda:v6.0.0
    sudo docker run -d \
    -v /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/heavyai-ee-cpu:v6.0.0
    sudo docker run -d --gpus=all \
    -v /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/core-os-cuda:v6.0.0
    sudo docker run -d \
    -v /var/lib/heavyai:/var/lib/heavyai \
    -p 6273-6278:6273-6278 \
    heavyai/core-os-cpu:v6.0.0
    sudo useradd --shell /bin/bash --user-group --create-home --group wheel heavyai
    sudo useradd --shell /bin/bash --user-group --create-home --group sudo heavyai
    sudo rm /etc/yum.repos.d/omnisci.repo
    sudo rm /etc/apt/sources.list.d/omnisci.list
    sudo docker container ps --format "{{.Id}} {{.Image}}" \
    -f status=running | grep omnisci\/
    9e01e520c30c omnisci/omnisci-ee-gpu
    sudo docker container stop 9e01e520c3
    tar zxvf /backup_dir/omnisci_storage_backup.tar.gz /var/lib/omnisci
    sudo mv /var/lib/omnisci /var/lib/heavyai
    sudo mv /var/lib/heavyai/data /var/lib/heavyai/storage
    cat /var/lib/heavyai/omnisci.conf | \
    sed "s/^\(data.*=.*\)/#\1\\ndata = \"\/var\/lib\/heavyai\/storage\"/" | \
    sed "s/^\(frontend.*=.*\)/#\1\\nfrontend = \"\/opt\/heavyai\/frontend\"/" 
    >/var/lib/heavyai/heavy.conf
    mv /var/lib/heavyai/storage/omnisci.license \
    /var/lib/heavyai/storage/heavyai.license
    sudo docker container ps --format "{{.Id}} {{.Image}} {{.Status}}" \
    -f status=running | grep heavyai\/
    9e01e520c30c heavyai/heavyai-ee-cuda Up 48 seconds ago 
    sudo docker exec -it 9e01e520c30c \
    echo "alter database omnisci rename to heavyai;" \
    | bin/heavysql omnisci 
    sudo systemctl stop omnisci_web_server omnisci_server
    tar zcvf /backup_dir/omnisci_storage_backup.tar.gz /var/lib/omnisci
    sudo passwd heavyai
    sudo su - heavyai
    sudo chown -R heavyai:heavyai /var/lib/omnisci
    sudo mv /var/lib/omnisci /var/lib/heavyai
    mv /var/lib/heavyai/data /var/lib/heavyai/storage
    mkdir /var/lib/heavyai/storage/catalogs
    ls -la /var/lib/heavyai/storage/
    total 32
    drwxr-xr-x  8 heavyai heavyai 4096 lug 15 16:03 .
    drwxr-xr-x  4 heavyai heavyai 4096 lug 15 16:02 ..
    drwxrwxr-x  2 heavyai heavyai 4096 lug 15 16:03 catalogs
    drwxr-xr-x  2 heavyai heavyai 4096 lug 15 15:54 mapd_catalogs
    drwxr-xr-x 52 heavyai heavyai 4096 lug 15 15:54 mapd_data
    drwxr-xr-x  2 heavyai heavyai 4096 lug 15 15:54 mapd_export
    drwxr-xr-x  2 heavyai heavyai 4096 lug 15 15:54 mapd_log
    drwxr-xr-x  2 heavyai heavyai 4096 lug 15 15:54 omnisci_disk_cache
    -rw-r--r--  1 heavyai heavyai 1229 lug 15 16:07 omnisci-licence
    mv /var/lib/heavyai/storage/omnisci.license \
    /var/lib/heavyai/storage/heavyai.license
    sudo systemctl stop heavy_web_server heavydb
    cat /var/lib/heavyai/omnisci.conf | \
    sed "s/^\(data.*=.*\)/#\1\\ndata = \"\/var\/lib\/heavyai\/storage\"/" | \
    sed "s/^\(frontend.*=.*\)/#\1\\nfrontend = \"\/opt\/heavyai\/frontend\"/" \
    >/var/lib/heavyai/heavy.conf
    rmdir /var/lib/heavyai/storage/catalogs
    sudo systemctl start heavydb heavy_web_server
    sudo systemctl status heavydb
    echo "alter database omnisci rename to heavyai;" \
    | /opt/heavyai/bin/heavysql -p HyperInteractive -u admin omnisci 
    sudo rm /lib/systemd/omnisci_server*.service
    sudo rm /lib/systemd/omnisci_web_server*.service
    sudo systemctl daemon-reload
    sudo systemctl reset-failed
    sudo rm -Rf /opt/omnisci
    Cpu Drawed Bubble chart

    Fixed location for Docker $HEAVYAI_BASE / $OMNISCI_STORAGE

    /omnisci-storage

    /var/lib/heavyai

    The folder containing catalogs for $HEAVYAI_BASE / $OMNISCI_STORAGE

    data/

    storage/

    Storage subfolder - data

    data/mapd_data

    storage/data

    Storage subfolder - catalog

    data/mapd_catalogs

    storage/catalogs

    Storage subfolder - import

    data/mapd_import

    storage/import

    Storage subfolder - export

    data/mapd_export

    storage/export

    Storage subfolder - logs

    data/mapd_log

    storage/log

    Server INFO logs

    omnisci_server.INFO

    heavydb.INFO

    Server ERROR logs

    omnisci_server.ERROR

    heavydb.ERROR

    Server WARNING logs

    omnisci_server.WARNING

    heavydb.WARNING

    Web Server ACCESS logs

    omnisci_web_server.ACCESS

    heavy_web_server.ACCESS

    Web Server ALL logs

    omnisci_web_server.ALL

    heavy_web_server.ALL

    Install directory

    /omnisci (Docker) /opt/omnisci (bare metal)

    /opt/heavyai/ (Docker and bare metal)

    Binary file - core server (located in install directory)

    bin/omnsici_server

    bin/heavydb

    Binary file - web server (located in install directory)

    bin/omnisci_web_server

    bin/heavy_web_server

    Binary file - command- line SQL utility

    bin/omnisql

    bin/heavysql

    Binary file - JDBC jar

    bin/omnisci-jdbc-5.10.2-SNAPSHOT.jar

    bin/heavydb-jdbc-6.0.0-SNAPSHOT.jar

    Binary file - Utilities (SqlImporter) jar

    bin/omnisci-utility-5.10.2-SNAPSHOT.jar

    bin/heavydb-utility-6.0.0-SNAPSHOT.jar

    HEAVY.AI Server service (for bare metal install)

    omnisci_server

    heavydb

    HEAVY.AI Web Server service (for bare metal install)

    omnisci_web_server

    heavy_web_server

    Default configuration file

    omnisci.conf

    heavy.conf

    output_names

    generate_series

    output_types

    Column i64

    CPU

    true

    GPU

    true

    runtime

    false

    filter_table_transpose

    false

    To enable query interrupt for tables imported from data files in local storage, set enable_non_kernel_time_query_interrupt to TRUE. (It is enabled by default.)

    name

    generate_series

    signature

    (i64 series_start, i64 series_stop, i64 series_step) (i64 series_start, i64 series_stop) -> Column

    input_names

    series_start, series_stop, series_step series_start, series_stop

    input_types

    KILL QUERY
    System Table Functions
    SHOW QUERIES

    i64

    SHOW CREATE SERVER <servername>
    SHOW CREATE SERVER default_local_delimited;
    create_server_sql
    CREATE SERVER default_local_delimited FOREIGN DATA WRAPPER DELIMITED_FILE
    WITH (STORAGE_TYPE='LOCAL_FILE');
    SHOW CREATE TABLE <tablename>
    SHOW CREATE TABLE heavyai_states;
    CREATE TABLE heavyai_states (
     id TEXT ENCODING DICT(32),
     abbr TEXT ENCODING DICT(32),
     name TEXT ENCODING DICT(32),
     omnisci_geo GEOMETRY(MULTIPOLYGON, 4326
    ) NOT NULL);
    SHOW DATABASES
    Database         Owner
    omnisci          admin
    2004_zipcodes    admin
    game_results     jane
    signals          jason
    ...
    SHOW FUNCTIONS [DETAILS]
    SHOW FUNCTIONS
    Scalar UDF
    distance_point_line
    ST_DWithin_Polygon_Polygon
    ST_Distance_Point_ClosedLineString
    Truncate
    ct_device_selection_udf_any
    area_triangle
    _h3RotatePent60cw
    ST_Intersects_Polygon_Point
    ST_DWithin_LineString_Polygon
    ST_Intersects_Point_Polygon
    box_contains_box
    SHOW [EFFECTIVE] POLICIES <name>;
    show queries;
    query_session_id|current_status|submitted          |query_str                                                   |login_name|client_address     |db_name   |exec_device_type
    834-8VAA        |Pending       |2020-05-06 08:21:15|select d_date_sk, count(1) from date_dim group by d_date_sk;|admin     |tcp:localhost:48596|tpcds_sf10|CPU
    826-CLKk        |Running       |2020-05-06 08:20:57|select count(1) from store_sales, store_returns;            |admin     |tcp:localhost:48592|tpcds_sf10|CPU
    828-V6s7        |Pending       |2020-05-06 08:21:13|select count(1) from store_sales;                           |admin     |tcp:localhost:48594|tpcds_sf10|GPU
    946-rtJ7        |Pending       |2020-05-06 08:20:58|select count(1) from item;                                  |admin     |tcp:localhost:48610|tpcds_sf10|GPU
    SHOW [EFFECTIVE] ROLES <name>
    SHOW RUNTIME [TABLE] FUNCTIONS
    SHOW RUNTIME [TABLE] FUNCTION DETAILS
    show supported data sources
    SHOW TABLE DETAILS [<table-name>, <table-name>, ...]
    omnisql> show table details;
    table_id|table_name       |column_count|is_sharded_table|shard_count|max_rows           |fragment_size|max_rollback_epochs|min_epoch|max_epoch|min_epoch_floor|max_epoch_floor|metadata_file_count|total_metadata_file_size|total_metadata_page_count|total_free_metadata_page_count|data_file_count|total_data_file_size|total_data_page_count|total_free_data_page_count
    1       |heavyai_states   |11          |false           |0          |4611686018427387904|32000000     |-1                 |1        |1        |0              |0              |1                  |16777216                |4096                     |4082                          |1              |536870912           |256                  |242
    2       |heavyai_counties |13          |false           |0          |4611686018427387904|32000000     |-1                 |1        |1        |0              |0              |1                  |16777216                |4096                     |NULL                          |1              |536870912           |256                  |NULL
    3       |heavyai_countries|71          |false           |0          |4611686018427387904|32000000     |-1                 |1        |1        |0              |0              |1                  |16777216                |4096                     |4022                          |1              |536870912           |256                  |182
    omnisql> show table details heavyai_states;
    table_id|table_name    |column_count|is_sharded_table|shard_count|max_rows           |fragment_size|max_rollback_epochs|min_epoch|max_epoch|min_epoch_floor|max_epoch_floor|metadata_file_count|total_metadata_file_size|total_metadata_page_count|total_free_metadata_page_count|data_file_count|total_data_file_size|total_data_page_count|total_free_data_page_count
    1       |heavyai_states|11          |false           |0          |4611686018427387904|32000000     |-1                 |1        |1        |0              |0              |1                  |16777216                |4096                     |4082                          |1              |536870912           |256                  |242
    SHOW TABLE FUNCTIONS;
    tf_compute_dwell_times
    tf_feature_self_similarity
    tf_feature_similarity
    tf_rf_prop
    tf_rf_prop_max_signal
    tf_geo_rasterize_slope
    tf_geo_rasterize
    generate_random_strings
    generate_series
    tf_mandelbrot_cuda_float
    tf_mandelbrot_cuda
    tf_mandelbrot_float
    tf_mandelbrot
    SHOW TABLE FUNCTIONS DETAILS <function_name>
    SHOW SERVERS;
    server_name|data_wrapper|created_at|options
    default_local_delimited|DELIMITED_FILE|2022-03-15 10:06:05|{"STORAGE_TYPE":"LOCAL_FILE"}
    default_local_parquet|PARQUET_FILE|2022-03-15 10:06:05|{"STORAGE_TYPE":"LOCAL_FILE"}
    default_local_regex_parsed|REGEX_PARSED_FILE|2022-03-15 10:06:05|{"STORAGE_TYPE":"LOCAL_FILE"}
    ...
    SHOW TABLES;
    table_name
    ----------
    omnisci_states
    omnisci_counties
    omnisci_countries
    streets_nyc
    streets_miami
    ...
    SHOW USER DETAILS
    NAME            ID         DEFAULT_DB 
    mike.nuumann    191        mondale
    Dale            184        churchill
    Editor_Test     141        mondale
    Jerry.wong      181        alluvial
    AA_superuser    139        
    BB_superuser    2140
    PlinyTheElder   183        windsor
    aaron.tyre      241        db1
    achristie       243        sid
    eve.mandela     202        nancy
    ...
    heavysql> show all user details;
    NAME|ID|IS_SUPER|DEFAULT_DB|CAN_LOGIN
    admin|0|true|(-1)|true
    ua|2|false|db1(2)|true
    ub|3|false|db1(2)|true
    uc|4|false|db1(2)|false
    ud|5|false|db2(3)|true
    ue|6|false|db2(3)|true
    uf|7|false|db2(3)|false
    heavysql> \db db2
    User admin switched to database db2
    
    heavysql> show all user details ue, ud, uf, ua;
    NAME|ID|IS_SUPER|DEFAULT_DB|CAN_LOGIN
    ua|2|false|db1(2)|true
    ud|5|false|db2(3)|true
    ue|6|false|db2(3)|true
    uf|7|false|db2(3)|false
    heavysql> show user details ue, ud, uf, ua;
    User "ua" not found. 
    heavysql> show user details ue, ud, uf;
    NAME|ID|DEFAULT_DB|CAN_LOGIN
    ud|5|db2(3)|true
    ue|6|db2(3)|true
    uf|7|db2(3)|false
    heavysql> show user details;
    NAME|ID|DEFAULT_DB|CAN_LOGIN
    ud|5|db2(3)|true
    ue|6|db2(3)|true
    uf|7|db2(3)|false
    heavysql> \db
    User ua is using database db1
    heavysql> show all user details;
    SHOW ALL USER DETAILS is only available to superusers. (Try SHOW USER DETAILS instead?)
    heavysql> show user details;
    NAME|ID|DEFAULT_DB
    ua|2|db1
    ub|3|db1
    heavysql> show user details ua, ub, uc;
    User "uc" not found.
    heavysql> show user details ua;
    NAME|ID|DEFAULT_DB
    ua|2|db1
    SHOW USER SESSIONS;
    session_id   login_name   client_address         db_name
    453-X6ds     mike         http:198.51.100.1      game_results
    453-0t2r     erin         http:198.51.100.11     game_results
    421-B64s     shauna       http:198.51.100.43     game_results
    213-06dw     ahmed        http:198.51.100.12     signals
    333-R28d     cat          http:198.51.100.233    signals
    497-Xyz6     inez         http:198.51.100.5      ships
    ...
    show queries;
    query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
    713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
    491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
    451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
    190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
    720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU
    947-ooNP        |RUNNING_IMPORTER    |0          |2021-08-03 ...|IMPORT_GEO_TABLE|Rio       |tcp:::ffff:127.0.0.1:47314|omnisci|CPU
    kill query '946-ooNP'
    show queries;
    query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
    713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
    491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
    451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
    190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
    720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU

    Using HeavyImmerse Data Manager

    HeavyImmerse supports file upload for .csv, .tsv, and .txt files, and supports comma, tab, and pipe delimiters.

    HeavyImmerse also supports upload of compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

    You can import data to HeavyDB using the Immerse import wizard. You can upload data from a local delimited file, from an Amazon S3 data source, or from the Data Catalog.

    For methods specific to geospatial data, see also Importing Geospatial Data Using Immersearrow-up-right.

    circle-info
    • If there is a potential for duplicate entries, and you prefer to avoid loading duplicate rows, see .

    • If a source file uses a reserved word, OmniSci automatically adds an underscore at the end of the reserved word. For example, year is converted to year_.

    • If you click the

    hashtag
    Importing Non-Geospatial Data from a Local File

    Follow these steps to import your data:

    1. Click DATA MANAGER.

    2. Click Import Data.

    3. Click Import data from a local file.

    You can also import locally stored shape files in a variety of formats. See .

    hashtag
    Importing Data from Amazon S3

    To import data from your Amazon S3 instance, you need:

    • The Region and Path for the file in your S3 bucket, or the direct URL to the file (S3 Link).

    • If importing private data, your Access Key and Secret Key for your personal IAM account in S3.

    hashtag
    Locating the Data File S3 Region, Path, and URL

    For information on opening and reviewing items in your S3 instance, see

    In an S3 bucket, the Region is in the upper-right corner of the screen – US West (N. California) in this case:

    Click the file you want to import. To load your S3 file to HEAVY.AI using the steps for S3 Region | Bucket | Path, below, click Copy path to copy to your clipboard the path to your file within your S3 bucket. Alternatively, you can copy the link to your file. The Link in this example is https://s3-us-west-1.amazonaws.com/my-company-bucket/trip_data.7z.

    hashtag
    Obtaining Your S3 Access Key and Secret Key

    To learn about creating your S3 Access Key and Secret Key, see

    If the data you want to copy is publicly available, you do not need to provide an Access Key and Secret Key.

    You can import any file you can see using your IAM account with your Access Key and Secret Key.

    Your Secret Key is created with your Access Key, and cannot be retrieved afterward. If you lose your Secret Key, you must create a new Access Key and Secret Key.

    hashtag
    Loading Your S3 Data to HEAVY.AI

    Follow these steps to import your S3 data:

    1. Click DATA MANAGER.

    2. Click Import Data.

    3. Click Import data from Amazon S3.

    hashtag
    Importing from the Data Catalog

    The Data Catalog provides access to sample datasets you can use to exercise data visualization features in Heavy Immerse. The selection of datasets continually changes, independent of product releases.

    To import from the data catalog:

    1. Open the Data Manager.

    2. Click Data Catalog.

    3. Use the Search box to locate a specific data set, or scroll to find the dataset you want to use. The Contains Geo toggle filters for data sets that contain Geographical information.

    hashtag
    Appending Data to a Table

    You can append additional data to an existing table.

    To append data to a table:

    1. Open Data Manager.

    2. Select the table you want to append.

    3. Click Append Data.

    To append data from AWS, click Append Data, then follow the instructions for .

    hashtag
    Truncating a Table

    Sometimes you might want to remove or replace the data in a table without losing the table definition itself.

    To remove all data from a table:

    1. Open Data Manager.

    2. Select the table you want to truncate.

    3. Click Delete All Rows.

    hashtag
    Deleting a Table

    You can drop a table entirely using Data Manager.

    To delete a table:

    1. Open Data Manager.

    2. Select the table you want to delete.

    3. Click DELETE TABLE.

    Back
    button (or accidentally two-finger swipe your mousepad) before your data load is complete, OmniSciDB stops the data load and any records that had transferred are invalidated.

    Either click the plus sign (+) or drag your file(s) for upload. If you are uploading multiple files, the column names and data types must match. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI. In addition to CSV, TSV, and TXT files, you can import compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

  • Choose Import Settings:

    • Null string: If, instead using a blank for null cells in your upload document, you have substituted strings such as NULL, enter that string in the Null String field. The values are treated as null values on upload.

    • Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma, tab, or pipe.

    • Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.

    • Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.

    • Replicate Table: If you are importing non-geospatial data to a distributed database with more than one node, select this checkbox to replicate the table to all nodes in the cluster. This effectively adds the PARTITIONS='REPLICATED' option to the create table statement. See .

  • Click Import Files.

  • The Table Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance. Immerse defaults to second precision for all timestamp columns. You can reset the precision to second, millisecond, nanosecond, or microsecond. If your column headers contain SQL reserved words, reserved characters (for example, year, /, or #), or spaces, the importer alters the characters to make them safe and notifies you of the changes. You can also change the column labels.

  • Name the table, and click Save Table.

  • Choose whether to import using the S3 Region | Bucket | Path or a direct full link URL to the file (S3 Link).

    1. To import data using S3 Region | Bucket | Path:

      1. Select your Region from the pop-up menu.

      2. Enter the unique name of your S3 Bucket.

      3. Enter or paste the Path to the file stored in your S3 bucket.

    2. To import data using S3 link:

      1. Copy the Link URL from the file Overview in your S3 bucket.

      2. Paste the link in the Full Link URL field of the HEAVY.AI Table Importer.

  • If the data is publicly available, you can disable the Private Data checkbox. If you are importing Private Data, enter your credentials:

    1. Enable the Private Data checkbox.

    2. Enter your S3 Access Key.

    3. Enter your S3 Secret Key.

  • Choose the appropriate Import Settings. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV.

    1. Null string: If you have substituted a string such as NULL for null values in your upload document, enter that string in the Null String field. The values are treated as null values on upload.

    2. Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma or pipe.

    3. Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.

    4. Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.

  • Click Import Files.

  • The Table Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance. If your column headers contain SQL reserved words, reserved characters (for example, year, /, or #), or spaces, the importer alters the characters to make them safe and notifies you of the changes. You can also change the column labels.

  • Name the table, and click Save Table.

  • Click the Import button beneath the dataset you want to use.

  • Verify the table and column names in the Data Preview screen.

  • Click Import Data.

  • Click Import data from a local file.
  • Either click the plus sign (+) or drag your file(s) for upload. The column names and data types of the files you select must match the existing table. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI. In addition to CSV, TSV, and TXT files, you can import compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

  • Click Preview.

  • Click Import Settings

  • Choose Import Settings:

    • Null string: If, instead using a blank for null cells in your upload document, you have substituted strings such as NULL, enter that string in the Null String field. The values are treated as null values on upload.

    • Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma, tab, or pipe.

    • Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.

    • Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.

    • Replicate Table: If you are importing non-geospatial data to a distributed database with more than one node, select this checkbox to replicate the table to all nodes in the cluster. This effectively adds the PARTITIONS='REPLICATED' option to the create table statement. See .

  • Close Import Settings.

  • The Data Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance.

    If your data contains column headers, verify they match the existing headers.

  • Click Import Data.

  • A very scary red dialog box reminds you that the operation cannot be undone. Click DELETE TABLE ROWS.

    Immerse displays the table information with a row count of 0.

    A very scary red dialog box reminds you that the operation cannot be undone. Click DELETE TABLE.

    Immerse deletes the table and returns you to the Data Manager TABLES list.

    How can I avoid creating duplicate rows?arrow-up-right
    Importing Geospatial Data Using Immersearrow-up-right
    https://docs.aws.amazon.com/AmazonS3/latest/gsg/OpeningAnObject.htmlarrow-up-right
    https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKeyarrow-up-right
    Loading S3 Data to HEAVY.AI

    Datatypes

    Datatypes and Fixed Encoding

    This topic describes standard datatypes and space-saving variations for values stored in HEAVY.AI.

    hashtag
    Datatypes

    Each HEAVY.AI datatype uses space in memory and on disk. For certain datatypes, you can use fixed encoding for a more compact representation of these values. You can set a default value for a column by using the DEFAULT constraint; for more information, see .

    Replicated Tablesarrow-up-right
    Replicated Tables
    Datatypes, variations, and sizes are described in the following table.
    Datatype
    Size (bytes)
    Notes

    BIGINT

    8

    Minimum value: -9,223,372,036,854,775,807; maximum value: 9,223,372,036,854,775,807.

    BIGINT ENCODING FIXED(8)

    1

    Minimum value: -127; maximum value: 127

    [1] - In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see DATE ENCODING DAYS(16)).

    [2] - See Storage and Compression below for information about geospatial datatype sizes.

    circle-info
    • HEAVY.AI does not support geometry arrays.

    • Timestamp values are always stored in 8 bytes. The greater the precision, the lower the fidelity.

    hashtag
    Geospatial Datatypes

    HEAVY.AI supports the LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON, POINT, and MULTIPOINT geospatial datatypes.

    In the following example:

    • p0, p1, ls0, and poly0 are simple (planar) geometries.

    • p4 is point geometry with Web Mercator longitude/latitude coordinates.

    • p2, p3, mp, ls1, ls2, mls1, mls2, poly1, and mpoly0 are geometries using WGS84 SRID=4326 longitude/latitude coordinates.

    hashtag
    Storage

    Geometry storage requirements are largely dependent on coordinate data. Coordinates are normally stored as 8-byte doubles, two coordinates per point, for all points that form a geometry. Each POINT geometry in the p1 column, for example, requires 16 bytes.

    hashtag
    Compression

    WGS84 (SRID 4326) coordinates are compressed to 32 bits by default. This sacrifices some precision but reduces storage requirements by half.

    For example, columns p2, mp, ls1, mls1, poly1, and mpoly0 in the table defined above are compressed. Each geometry in the p2 column requires 8 bytes, compared to 16 bytes for p0.

    You can explicitly disable compression. WGS84 columns p3, ls2, mls2 are not compressed and continue using doubles. Simple (planar) columns p0, p1, ls0, poly1 and non-4326 column p4 are not compressed.

    For more information about geospatial datatypes and functions, see Geospatial Capabilities.

    hashtag
    Defining Arrays

    Define datatype arrays by appending square brackets, as shown in the arrayexamples DDL sample.

    You can also define fixed-length arrays. For example:

    Fixed-length arrays require less storage space than variable-length arrays.

    hashtag
    Fixed Encoding

    To use fixed-length fields, the range of the data must fit into the constraints as described. Understanding your schema and the scope of potential values in each field helps you to apply fixed encoding types and save significant storage space.

    These encodings are most effective on low-cardinality TEXT fields, where you can achieve large savings of storage space and improved processing speed, and on TIMESTAMP fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07. If a TEXT ENCODING field does not match the defined cardinality, HEAVY.AI substitutes a NULL value and logs the change.

    For DATE types, you can use the terms FIXED and DAYS interchangeably. Both are synonymous for the DATE type in HEAVY.AI.

    Some of the INTEGER options overlap. For example, INTEGER ENCODINGFIXED(8) and SMALLINT ENCODINGFIXED(8) are essentially identical.

    hashtag
    Shared Dictionaries

    You can improve performance of string operations and optimize storage using shared dictionaries. You can share dictionaries within a table or between different tables in the same database. The table with which you want to share dictionaries must exist when you create the table that references the TEXT ENCODING DICT field, and the column that you are referencing in that table must also exist. The following small DDL shows the basic structure:

    In the table definition, make sure that referenced columns appear before the referencing columns.

    For example, this DDL is a portion of the schema for the flights database. Because airports are both origin and destination locations, it makes sense to reuse the same dictionaries for name, city, state, and country values.

    To share a dictionary in a different existing table, replace the table name in the REFERENCES instruction. For example, if you have an existing table called us_geography, you can share the dictionary by following the pattern in the DDL fragment below.

    circle-info

    The referencing column cannot specify the encoding of the dictionary, because it uses the encoding from the referenced column.

    CREATE TABLE

    System Tables

    HeavyDB system tables provide a way to access information about database objects, database object permissions, and system resource (storage, CPU, and GPU memory) utilization. These system tables can be found in the information_schema database that is available by default on server startup. You can query system tables in the same way as regular tables, and you can use the SHOW CREATE TABLE command to view the table schemas.

    hashtag
    Users

    The users

    CREATE TABLE geo ( name TEXT ENCODING DICT(32),
                       p0 POINT,
                       p1 GEOMETRY(POINT),
                       p2 GEOMETRY(POINT, 4326),
                       p3 GEOMETRY(POINT, 4326) ENCODING NONE,
                       p4 GEOMETRY(POINT, 900913),
                       mp GEOMETRY(MULTIPOINT, 4326),
                       ls0  LINESTRING,
                       ls1 GEOMETRY(LINESTRING, 4326) ENCODING COMPRESSED(32),
                       ls2 GEOMETRY(LINESTRING, 4326) ENCODING NONE,
                       mls1 GEOMETRY(MULTILINESTRING, 4326) ENCODING COMPRESSED(32),
                       mls2 GEOMETRY(MULTILINESTRING, 4326) ENCODING NONE,
                       poly0 POLYGON,
                       poly1 GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32),
                       mpoly0 GEOMETRY(MULTIPOLYGON, 4326)
                      );
    CREATE TABLE arrayexamples (
      tiny_int_array TINYINT[],
      int_array INTEGER[],
      big_int_array BIGINT[],
      text_array TEXT[] ENCODING DICT(32), --OmniSci supports only DICT(32) TEXT arrays.
      float_array FLOAT[],
      double_array DOUBLE[],
      decimal_array DECIMAL(18,6)[],
      boolean_array BOOLEAN[],
      date_array DATE[],
      time_array TIME[],
      timestamp_array TIMESTAMP[])
    CREATE TABLE arrayexamples (
      float_array3 FLOAT[3],
      date_array4 DATE[4]
    CREATE TABLE text_shard (
    i TEXT ENCODING DICT(32),
    s TEXT ENCODING DICT(32),
    SHARD KEY (i))
    WITH (SHARD_COUNT = 2);
    
    CREATE TABLE text_shard1 (
    i TEXT,
    s TEXT ENCODING DICT(32),
    SHARD KEY (i),
    SHARED DICTIONARY (i) REFERENCES text_shard(i))
    WITH (SHARD_COUNT = 2);
    create table flights (
    *
    *
    *
    dest_name TEXT ENCODING DICT,
    dest_city TEXT ENCODING DICT,
    dest_state TEXT ENCODING DICT,
    dest_country TEXT ENCODING DICT,
    
    *
    *
    *
    origin_name TEXT,
    origin_city TEXT,
    origin_state TEXT,
    origin_country TEXT,
    *
    *
    *
    
    SHARED DICTIONARY (origin_name) REFERENCES flights(dest_name),
    SHARED DICTIONARY (origin_city) REFERENCES flights(dest_city),
    SHARED DICTIONARY (origin_state) REFERENCES flights(dest_state),
    SHARED DICTIONARY (origin_country) REFERENCES flights(dest_country),
    *
    *
    *
    )
    WITH(
    *
    *
    *
    )
    create table flights (
    
    *
    *
    *
    
    SHARED DICTIONARY (origin_city) REFERENCES us_geography(city),
    SHARED DICTIONARY (origin_state) REFERENCES us_geography(state),
    SHARED DICTIONARY (origin_country) REFERENCES us_geography(country),
    SHARED DICTIONARY (dest_city) REFERENCES us_geography(city),
    SHARED DICTIONARY (dest_state) REFERENCES us_geography(state),
    SHARED DICTIONARY (dest_country) REFERENCES us_geography(country),
    
    *
    *
    *
    )
    WITH(
    *
    *
    *
    );

    BIGINT ENCODING FIXED(16)

    2

    Same as SMALLINT.

    BIGINT ENCODING FIXED(32)

    4

    Same as INTEGER.

    BOOLEAN

    1

    TRUE: 'true', '1', 't'. FALSE: 'false', '0', 'f'. Text values are not case-sensitive.

    DATE[1]

    4

    Same as DATE ENCODING DAYS(32).

    DATE ENCODING DAYS(16)

    2

    Range in days: -32,768 - 32,767 Range in years: +/-90 around epoch, April 14, 1880 - September 9, 2059. Minumum value: -2,831,155,200; maximum value: 2,831,068,800. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.

    DATE ENCODING DAYS(32)

    4

    Range in years: +/-5,883,517 around epoch. Maximum date January 1, 5885487 (approximately). Minimum value: -2,147,483,648; maximum value: 2,147,483,647. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.

    DATE ENCODING FIXED(16)

    2

    In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

    DATE ENCODING FIXED(32)

    4

    In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

    DECIMAL

    2, 4, or 8

    Takes precision and scale parameters: DECIMAL(precision,scale)

    Size depends on precision:

    • Up to 4: 2 bytes

    • 5 to 9: 4 bytes

    • 10 to 18 (maximum): 8 bytes

    Scale must be less than precision.

    DOUBLE

    8

    Variable precision. Minimum value: -1.79e308; maximum value: 1.79e308

    EPOCH

    8

    Seconds ranging from -30610224000 (1/1/1000 00:00:00) through 185542587100800 (1/1/5885487 23:59:59).

    FLOAT

    4

    Variable precision. Minimum value: -3.4e38; maximum value: 3.4e38.

    INTEGER

    4

    Minimum value: -2,147,483,647; maximum value: 2,147,483,647.

    INTEGER ENCODING FIXED(8)

    1

    Minumum value: -127; maximum value: 127.

    INTEGER ENCODING FIXED(16)

    2

    Same as SMALLINT.

    LINESTRING

    Variable**[2]**

    Geospatial datatype. A sequence of 2 or more points and the lines that connect them. For example: LINESTRING(0 0,1 1,1 2)

    MULTILINESTRING

    Variable**[2]**

    Geospatial datatype. A set of associated lines. For example: MULTILINESTRING((0 0, 1 0, 2 0), (0 1, 1 1, 2 1))

    MULTIPOINT

    Variable**[2]**

    Geospatial datatype. A set of points. For example: MULTIPOINT((0 0), (1 0), (2 0))

    MULTIPOLYGON

    Variable**[2]**

    Geospatial datatype. A set of one or more polygons. For example:MULTIPOLYGON(((0 0,4 0,4 4,0 4,0 0),(1 1,2 1,2 2,1 2,1 1)), ((-1 -1,-1 -2,-2 -2,-2 -1,-1 -1)))

    POINT

    Variable**[2]**

    Geospatial datatype. A point described by two coordinates. When the coordinates are longitude and latitude, HEAVY.AI stores longitude first, and then latitude. For example: POINT(0 0)

    POLYGON

    Variable**[2]**

    Geospatial datatype. A set of one or more rings (closed line strings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings). For example: POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1))

    SMALLINT

    2

    Minimum value: -32,767; maximum value: 32,767.

    SMALLINT ENCODING FIXED(8)

    1

    Minumum value: -127 ; maximum value: 127.

    TEXT ENCODING DICT

    4

    Max cardinality 2 billion distinct string values. Maximum string length is 32,767.

    TEXT ENCODING DICT(8)

    1

    Max cardinality 255 distinct string values.

    TEXT ENCODING DICT(16)

    2

    Max cardinality 64 K distinct string values.

    TEXT ENCODING NONE

    Variable

    Size of the string + 6 bytes. Maximum string length is 32,767.

    • Note: Importing TEXT ENCODING NONE fields using the Data Manager has limitations for Immerse. When you use string instead of string [dict. encode] for a column when importing, you cannot use that column in Immerse dashboards.

    TIME

    8

    Minimum value: 00:00:00; maximum value: 23:59:59.

    TIME ENCODING FIXED(32)

    4

    Minimum value: 00:00:00; maximum value: 23:59:59.

    TIMESTAMP(0)

    8

    Linux timestamp from -30610224000 (1/1/1000 00:00:00) through 29379542399 (12/31/2900 23:59:59). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS or YYYY-MM-DDTHH:MM:SS (the T is dropped when the field is populated).

    TIMESTAMP(3) (milliseconds)

    8

    Linux timestamp from -30610224000000 (1/1/1000 00:00:00.000) through 29379542399999 (12/31/2900 23:59:59.999). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.fff or YYYY-MM-DDTHH:MM:SS.fff (the T is dropped when the field is populated).

    TIMESTAMP(6) (microseconds)

    8

    Linux timestamp from -30610224000000000 (1/1/1000 00:00:00.000000) through 29379542399999999 (12/31/2900 23:59:59.999999). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.ffffff or YYYY-MM-DDTHH:MM:SS.ffffff (the T is dropped when the field is populated).

    TIMESTAMP(9) (nanoseconds)

    8

    Linux timestamp from -9223372036854775807 (09/21/1677 00:12:43.145224193) through 9223372036854775807 (11/04/2262 23:47:16.854775807). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.fffffffff or YYYY-MM-DDTHH:MM:SS.fffffffff (the T is dropped when the field is populated).

    TIMESTAMP ENCODING FIXED(32)

    4

    Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07. Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS or YYYY-MM-DDTHH:MM:SS (the T is dropped when the field is populated).

    TINYINT

    1

    Minimum value: -127; maximum value: 127.

    system table provides information about all database users and contains the following columns:

    Column Name

    Column Type

    Description

    user_id

    INTEGER

    ID of database user.

    user_name

    TEXT

    Username of database user.

    hashtag
    Databases

    The databases system table provides information about all created databases on the server and contains the following columns:

    Column Name

    Column Type

    Description

    database_id

    INTEGER

    ID of database.

    database_name

    TEXT

    Name of database.

    hashtag
    Permissions

    The permissions system table provides information about all user/role permissions for all database objects and contains the following columns:

    Column Name

    Column Type

    Description

    role_name

    TEXT

    Username or role name associated with permission.

    is_user_role

    BOOLEAN

    Boolean indicating whether or not the role_name column identifies a user or a role.

    hashtag
    Roles

    The roles system table lists all created database roles and contains the following columns:

    Column Name

    Column Type

    Description

    role_name

    TEXT

    Role name.

    hashtag
    Tables

    The tables system table provides information about all database tables and contains the following columns:

    Column Name

    Column Type

    Description

    database_id

    INTEGER

    ID of database that contains the table.

    database_name

    TEXT

    Name of database that contains the table.

    hashtag
    Dashboards

    The dashboards system table provides information about created dashboards (enterprise edition only) and contains the following columns:

    Column Name

    Column Type

    Description

    database_id

    INTEGER

    ID of database that contains the dashboard.

    database_name

    TEXT

    Name of database that contains the dashboard.

    hashtag
    Role Assignments

    The role_assignments system table provides information about database roles that have been assigned to users and contains the following columns:

    Column Name

    Column Type

    Description

    role_name

    TEXT

    Name of assigned role.

    user_name

    TEXT

    Username of user that was assigned the role.

    hashtag
    Memory Summary

    The memory_summary system table provides high level information about utilized memory across CPU and GPU devices and contains the following columns:

    Column Name

    Column Type

    Description

    node

    TEXT

    Node from which memory information is fetched.

    device_id

    INTEGER

    Device ID.

    hashtag
    Memory Details

    The memory_details system table provides detailed information about allocated memory segments across CPU and GPU devices and contains the following columns:

    Column Name

    Column Type

    Description

    node

    TEXT

    Node from which memory information is fetched.

    database_id

    INTEGER

    ID of database that contains the table that memory was allocated for.

    hashtag
    Storage Details

    The storage_details system table provides detailed information about utilized storage per table and contains the following columns:

    Column Name

    Column Type

    Description

    node

    TEXT

    Node from which storage information is fetched.

    database_id

    INTEGER

    ID of database that contains the table.

    hashtag
    Log-Based System Tables

    circle-info

    Log-based system tables are considered beta functionality in Release 6.1.0 and are disabled by default.

    hashtag
    Request Logs

    The request_logs system table provides information about HeavyDB Thrift API requests and contains the following columns:

    Column Name

    Column Type

    Description

    log_timestamp

    TIMESTAMP

    Timestamp of log entry.

    severity

    TEXT

    Severity level of log entry. Possible values are F (fatal), E (error), W (warning), and I (info).

    hashtag
    Server Logs

    The server_logs system table provides HeavyDB server logs in tabular form and contains the following columns:

    Column Name

    Column Type

    Description

    node

    TEXT

    Node containing logs.

    log_timestamp

    TIMESTAMP

    Timestamp of log entry.

    hashtag
    Web Server Logs

    The web_server_logs system table provides HEAVY.AI Web Server logs in tabular form and contains the following columns (Enterprise Edition only):

    Column Name

    Column Type

    Description

    log_timestamp

    TIMESTAMP

    Timestamp of log entry.

    severity

    TEXT

    Severity level of log entry. Possible values are fatal, error, warning, and info.

    hashtag
    Web Server Access Logs

    The web_server_access_logs system table provides information about requests made to the HEAVY.AIarrow-up-right Web Server. The table contains the following columns:

    Column Name

    Column Type

    Description

    ip_address

    TEXT

    IP address of client making the web server request.

    log_timestamp

    TIMESTAMP

    Timestamp of log entry.

    hashtag
    Refreshing Logs System Tables

    The logs system tables must be refreshed manually to view new log entries. You can run the REFRESH FOREIGN TABLES SQL command (for example, REFRESH FOREIGN TABLES server_logs, request_logs; ), or click the Refresh Data Now button on the table’s Data Manager page in Heavy Immerse.

    hashtag
    Request Logs and Monitoring System Dashboard

    The Request Logs and Monitoring system dashboard is built on the log-based system tables and provides visualization of request counts, performance, and errors over time, along with the server logs.

    hashtag
    System Dashboards

    Preconfigured system dashboards are built on various system tables. Specifically, two dashboards named System Resources and User Roles and Permissions are available by default. The Request Logs and Monitoring system dashboard is considered beta functionality and disabled by default. These dashboards can be found in the information_schema database, along with the system tables that they use.

    circle-info

    Access to system dashboards is controlled using Heavy Immerse privileges; only users with Admin privileges or users/roles with access to the information_schema database can access the system dashboards.

    Cross-linking must be enabled to allow cross-filtering across charts that use different system tables. Enable cross-linking by adding "ui/enable_crosslink_panel": true to the feature_flags section of the servers.json file.

    Tables

    These functions are used to create and modify data tables in HEAVY.AI.

    hashtag
    Nomenclature Constraints

    Table names must use the NAME format, described in regexarrow-up-right notation as:

    Table and column names can include quotes, spaces, and the underscore character. Other special characters are permitted if the name of the table or column is enclosed in double quotes (" ").

    circle-exclamation
    • Spaces and special characters other than underscore (_) cannot be used in Heavy Immerse.

    • Column and table names enclosed in double quotes cannot be used in Heavy Immerse

    hashtag
    CREATE TABLE

    Create a table named <table> specifying <columns> and table properties.

    hashtag
    Supported Datatypes

    Datatype
    Size (bytes)
    Notes

    * In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see ).

    For more information, see .

    For geospatial datatypes, see .

    hashtag
    Examples

    Create a table named tweets and specify the columns, including type, in the table.

    Create a table named delta and assign a default value San Francisco to column city.

    circle-info

    Default values currently have the following limitations:

    • Only literals can be used for column DEFAULT values; expressions are not supported.

    • You cannot define a DEFAULT value for a shard key. For example, the following does not parse:

    hashtag
    Supported Encoding

    Encoding
    Descriptions

    hashtag
    WITH Clause Properties

    Property
    Description

    hashtag
    Sharding

    Sharding partitions a database table across multiple servers so each server has a part of the table with the same columns but with different rows. Partitioning is based on a sharding key defined when you create the table.

    Without sharding, the dimension tables involved in a join are replicated and sent to each GPU, which is not feasible for dimension tables with many rows. Specifying a shard key makes it possible for the query to execute efficiently on large dimension tables.

    Currently, specifying a shard key is useful for joins, only:

    • If two tables specify a shard key with the same type and the same number of shards, a join on that key only sends a part of the dimension table column data to each GPU.

    • For multi-node installs, the dimension table does not need to be replicated and the join executes locally on each leaf.

    hashtag
    Constraints

    • A shard key must specify a single column to shard on. There is no support for sharding by a combination of keys.

    • One shard key can be specified for a table.

    • Data are partitioned according to the shard key and the number of shards (shard_count).

    hashtag
    Recommendations

    • Set shard_count to the number of GPUs you eventually want to distribute the data table across.

    • Referenced tables must also be shard_count -aligned.

    • Sharding should be minimized because it can introduce load skew accross resources, compared to when sharding is not used.

    Examples

    Basic sharding:

    Sharding with shared dictionary:

    hashtag
    Temporary Tables

    Using the TEMPORARY argument creates a table that persists only while the server is live. They are useful for storing intermediate result sets that you access more than once.

    circle-info

    Adding or dropping a column from a temporary table is not supported.

    hashtag
    Example

    hashtag
    CREATE TABLE AS SELECT

    Create a table with the specified columns, copying any data that meet SELECT statement criteria.

    hashtag
    WITH Clause Properties

    Property
    Description

    hashtag
    Examples

    Create the table newTable. Populate the table with all information from the table oldTable, effectively creating a duplicate of the original table.

    Create a table named trousers. Populate it with data from the columns name, waist, and inseam from the table wardrobe.

    Create a table named cosmos. Populate it with data from the columns star and planet from the table universe where planet has the class M.

    hashtag
    ALTER TABLE

    hashtag
    Examples

    Rename the table tweets to retweets.

    Rename the column source to device in the table retweets.

    Add the column pt_dropoff to table tweets with a default value point(0,0).

    Add multiple columns a, b, and c to table table_one with a default value of 15 for column b.

    circle-info

    Default values currently have the following limitations:

    • Only literals can be used for column DEFAULT values; expressions are not supported.

    • For arrays, use the following syntax:

    Add the column lang to the table tweets using a TEXT ENCODING DICTIONARY.

    Add the columns lang and encode to the table tweets using a TEXT ENCODING DICTIONARY for each.

    Drop the column pt_dropoff from table tweets.

    Limit on-disk data growth by setting the number of allowed epoch rollbacks to 50:

    circle-info
    • You cannot add a dictionary-encoded string column with a shared dictionary when using ALTER TABLE ADD COLUMN.

    • Currently, HEAVY.AI does not support adding a geo column type (POINT, LINESTRING, POLYGON, or MULTIPOLYGON) to a table.

    Change a text column “id” to an integer column:

    Change text columns “id” and “location” to big integer and point columns respectively:

    circle-info

    Currently, only text column types (dictionary encoded and none encoded text columns) can be altered.

    hashtag
    DROP TABLE

    Deletes the table structure, all data from the table, and any dictionary content unless it is a shared dictionary. (See the Note regarding .)

    hashtag
    Example

    hashtag
    DUMP TABLE

    Archives data and dictionary files of the table <table> to file <filepath>.

    Valid values for <compression_program> include:

    • gzip (default)

    • pigz

    • lz4

    If you do not choose a compression option, the system uses gzip if it is available. If gzip is not installed, the file is not compressed.

    The file path must be enclosed in single quotes.

    circle-info
    • Dumping a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being dumped.

    • The DUMP command is not supported on distributed configurations.

    hashtag
    Example

    hashtag
    RENAME TABLE

    Rename a table or multiple tables at once.

    hashtag
    Examples

    Rename a single table:

    Swap table names:

    Swap table names multiple times:

    hashtag
    RESTORE TABLE

    Restores data and dictionary files of table <table> from the file at <filepath>. If you specified a compression program when you used the DUMP TABLE command, you must specify the same compression method during RESTORE.

    Restoring a table decompresses and then reimports the table. You must have enough disk space for both the new table and the archived table, as well as enough scratch space to decompress the archive and reimport it.

    The file path must be enclosed in single quotes.

    You can also restore a table from archives stored in S3-compatible endpoints:

    s3_region is required. All features discussed in , such as custom S3 endpoints and server privileges, are supported.

    circle-info
    • Restoring a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being restored.

    • The RESTORE command is not supported on distributed configurations.

    circle-exclamation

    Do not attempt to use RESTORE TABLE with a table dump created using a release of HEAVY.AI that is higher than the release running on the server where you will restore the table.

    hashtag
    Examples

    Restore table tweets from /opt/archive/tweetsBackup.gz:

    Restore table tweets from a public S3 file or using server privileges (with the allow-s3-server-privileges server flag enabled):

    Restore table tweets from a private S3 file using AWS access keys:

    Restore table tweets from a private S3 file using temporary AWS access keys/session token:

    Restore table tweets from an S3-compatible endpoint:

    hashtag
    TRUNCATE TABLE

    Use the TRUNCATE TABLE statement to remove all rows from a table without deleting the table structure.

    This releases table on-disk and memory storage and removes dictionary content unless it is a shared dictionary. (See the note regarding .)

    Removing rows is more efficient than using . Dropping followed by recreating the table invalidates dependent objects of the table requiring you to regrant object privileges. Truncating has none of these effects.

    hashtag
    Example

    circle-info

    When you DROP or TRUNCATE, the command returns almost immediately. The directories to be purged are marked with the suffix \_DELETE_ME_. The files are automatically removed asynchronously.

    In practical terms, this means that you will not see a reduction in disk usage until the automatic task runs, which might not start for up to five minutes.

    You might also see directory names appended with \_DELETE_ME_. You can ignore these, with the expectation that they will be deleted automatically over time.

    hashtag
    OPTIMIZE TABLE

    Use this statement to remove rows from storage that have been marked as deleted via DELETE statements.

    When run without the vacuum option, the column-level metadata is recomputed for each column in the specified table. HeavyDB makes heavy use of metadata to optimize query plans, so optimizing table metadata can increase query performance after metadata widening operations such as updates or deletes. If the configuration parameter enable-auto-metadata-update is not set, HeavyDB does not narrow metadata during an update or delete — metadata is only widened to cover a new range.

    When run with the vacuum option, it removes any rows marked "deleted" from the data stored on disk. Vacuum is a checkpointing operation, so new copies of any vacuum records are deleted. Using OPTIMIZE with the VACUUM option compacts pages and deletes unused data files that have not been repopulated.

    circle-info

    Beginning with Release 5.6.0, OPTIMIZE should be used infrequently, because UPDATE, DELETE, and IMPORT queries manage space more effectively.

    hashtag
    VALIDATE

    Performs checks for negative and inconsistent epochs across table shards for single-node configurations.

    If VALIDATE detects epoch-related issues, it returns a report similar to the following:

    If no issues are detected, it reports as follows:

    hashtag
    VALIDATE CLUSTER

    Perform checks and report discovered issues on a running HEAVY.AI cluster. Compare metadata between the aggregator and leaves to verify that the logical components between the processes are identical.

    VALIDATE CLUSTER also detects and reports issues related to table epochs. It reports when epochs are negative or when table epochs across leaf nodes or shards are inconsistent.

    hashtag
    Examples

    If VALIDATE CLUSTER detects issues, it returns a report similar to the following:

    If no issues are detected, it will report as follows:

    You can include the WITH(REPAIR_TYPE) argument. (REPAIR_TYPE='NONE') is the same as running the command with no argument. (REPAIR_TYPE='REMOVE') removes any leaf objects that have issues. For example:

    hashtag
    Epoch Issue Example

    This example output from the VALIDATE CLUSTER command on a distributed setup shows epoch-related issues:

    [A-Za-z_][A-Za-z0-9\$_]*

    is_super_user

    BOOLEAN

    Indicates whether or not the database user is a super user.

    default_db_id

    INTEGER

    ID of user’s default database on login.

    default_db_name

    TEXT

    Name of user’s default database on login.

    can_login

    BOOLEAN

    Indicates whether or not the database user account is activated and can log in.

    owner_id

    INTEGER

    User ID of database owner.

    owner_user_name

    TEXT

    Username of database owner.

    database_id

    INTEGER

    ID of database that contains the database object for which permission was granted.

    database_name

    TEXT

    Name of database that contains the database object on which permission was granted.

    object_name

    TEXT

    Name of database object on which permission was granted.

    object_id

    INTEGER

    ID of database object on which permission was granted.

    object_owner_id

    INTEGER

    User id of the owner of the database object on which permission was granted.

    object_owner_user_name

    TEXT

    Username of the owner of the database object on which permission was granted.

    object_permission_type

    TEXT

    Type of database object on which permission was granted.

    object_permissions

    TEXT[]

    List of permissions that were granted on database object.

    table_id

    INTEGER

    Table ID.

    table_name

    TEXT

    Table name.

    owner_id

    INTEGER

    User ID of table owner.

    owner_user_name

    TEXT

    Username of table owner.

    column_count

    INTEGER

    Number of table columns. Note that internal system columns are included in this count.

    table_type

    TEXT

    Type of tables. Possible values are DEFAULT, VIEW, TEMPORARY , and FOREIGN.

    view_sql

    TEXT

    For views, SQL statement used in the view.

    max_fragment_size

    INTEGER

    Number of rows per fragment used by the table.

    max_chunk_size

    BIGINT

    Maximum size (in bytes) of table chunks.

    fragment_page_size

    INTEGER

    Size (in bytes) of table data pages.

    max_rows

    BIGINT

    Maximum number of rows allowed by table.

    max_rollback_epochs

    INTEGER

    Maximum number of epochs a table can be rolled back to.

    shard_count

    INTEGER

    Number of shards that exists for table.

    ddl_statement

    TEXT

    CREATE TABLE DDL statement for table.

    dashboard_id

    INTEGER

    Dashboard ID.

    dashboard_name

    TEXT

    Dashboard name.

    owner_id

    INTEGER

    User ID of dashboard owner.

    owner_user_name

    TEXT

    Username of dashboard owner.

    last_updated_at

    TIMESTAMP

    Timestamp of last dashboard update.

    data_sources

    TEXT[]

    List to data sources/tables used by dashboard.

    device_type

    TEXT

    Type of device. Possible values are CPU and GPU.

    max_page_count

    BIGINT

    Maximum number of memory pages that can be allocated on the device.

    page_size

    BIGINT

    Size (in bytes) of a memory page on the device.

    allocated_page_count

    BIGINT

    Number of allocated memory pages on the device.

    used_page_count

    BIGINT

    Number of used allocated memory pages on the device.

    free_page_count

    BIGINT

    Number of free allocated memory pages on the device.

    database_name

    TEXT

    Name of database that contains the table that memory was allocated for.

    table_id

    INTEGER

    ID of table that memory was allocated for.

    table_name

    TEXT

    Name of table that memory was allocated for.

    column_id

    INTEGER

    ID of column that memory was allocated for.

    column_name

    TEXT

    Name of column that memory was allocated for.

    chunk_key

    INTEGER[]

    ID of cached table chunk.

    device_id

    INTEGER

    Device ID.

    device_type

    TEXT

    Type of device. Possible values are CPU and GPU.

    memory_status

    TEXT

    Memory segment use status. Possible values are FREE and USED.

    page_count

    BIGINT

    Number pages in the segment.

    page_size

    BIGINT

    Size (in bytes) of a memory page on the device.

    slab_id

    INTEGER

    ID of slab containing memory segment.

    start_page

    BIGINT

    Page number of the first memory page in the segment.

    last_touched_epoch

    BIGINT

    Epoch at which the segment was last accessed.

    database_name

    TEXT

    Name of database that contains the table.

    table_id

    INTEGER

    Table ID.

    table_name

    TEXT

    Table Name.

    epoch

    INTEGER

    Current table epoch.

    epoch_floor

    INTEGER

    Minimum epoch table can be rolled back to.

    fragment_count

    INTEGER

    Number of table fragments.

    shard_id

    INTEGER

    Table shard ID. This value is only set for sharded tables.

    data_file_count

    INTEGER

    Number of data files created for table.

    metadata_file_count

    INTEGER

    Number of metadata files created for table.

    total_data_file_size

    BIGINT

    Total size (in bytes) of data files.

    total_data_page_count

    BIGINT

    Total number of pages across all data files.

    total_free_data_page_count

    BIGINT

    Total number of free pages across all data files.

    total_metadata_file_size

    BIGINT

    Total size (in bytes) of metadata files.

    total_metadata_page_count

    BIGINT

    Total number of pages across all metadata files.

    total_free_metadata_page_count

    BIGINT

    Total number of free pages across all metadata files.

    total_dictionary_data_file_size

    BIGINT

    Total size (in bytes) of string dictionary files.

    process_id

    INTEGER

    Process ID of the HeavyDB instance that generated the log entry.

    query_id

    INTEGER

    ID associated with a SQL query. A value of 0 indicates that either the log entry is unrelated to a SQL query or no query ID has been set for the log entry.

    thread_id

    INTEGER

    ID of thread that generated the log entry.

    file_location

    TEXT

    Source file name and line number where the log entry was generated.

    api_name

    TEXT

    Name of Thrift API that the request was sent to.

    request_duration_ms

    BIGINT

    Thrift API request duration in milliseconds.

    database_name

    TEXT

    Request session database name.

    user_name

    TEXT

    Request session username.

    public_session_id

    TEXT

    Request session ID.

    query_string

    TEXT

    Query string for SQL query requests.

    client

    TEXT

    Protocol and IP address of client making the request.

    dashboard_id

    INTEGER

    Dashboard ID for SQL query requests coming from Immerse dashboards.

    dashboard_name

    TEXT

    Dashboard name for SQL query requests coming from Immerse dashboards.

    chart_id

    INTEGER

    Chart ID for SQL query requests coming from Immerse dashboards.

    execution_time_ms

    BIGINT

    Execution time in milliseconds for SQL query requests.

    total_time_ms

    BIGINT

    Total execution time (execution_time_ms + serialization time) in milliseconds for SQL query requests.

    severity

    TEXT

    Severity level of log entry. Possible values are F (fatal), E (error), W (warning), and I (info).

    process_id

    INTEGER

    Process ID of the HeavyDB instance that generated the log entry.

    query_id

    INTEGER

    ID associated with a SQL query. A value of 0 indicates that either the log entry is unrelated to a SQL query or no query ID has been set for the log entry.

    thread_id

    INTEGER

    ID of thread that generated the log entry.

    file_location

    TEXT

    Source file name and line number where the log entry was generated.

    message

    TEXT

    Log message.

    message

    TEXT

    Log message.

    http_method

    TEXT

    HTTP request method.

    endpoint

    TEXT

    Web server request endpoint.

    http_status

    SMALLINT

    HTTP response status code.

    response_size

    BIGINT

    Response payload size in bytes.

    4

    Same as DATE ENCODING DAYS(32).

    DATE ENCODING DAYS(32)

    4

    Range in years: +/-5,883,517 around epoch. Maximum date January 1, 5885487 (approximately). Minimum value: -2,147,483,648; maximum value: 2,147,483,647. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.

    DATE ENCODING DAYS(16)

    2

    Range in days: -32,768 - 32,767 Range in years: +/-90 around epoch, April 14, 1880 - September 9, 2059. Minumum value: -2,831,155,200; maximum value: 2,831,068,800. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.

    DATE ENCODING FIXED(32)

    4

    In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

    DATE ENCODING FIXED(16)

    2

    In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

    DECIMAL

    2, 4, or 8

    Takes precision and scale parameters: DECIMAL(precision,scale).

    Size depends on precision:

    • Up to 4: 2 bytes

    • 5 to 9

    DOUBLE

    8

    Variable precision. Minimum value: -1.79 x e^308; maximum value: 1.79 x e^308.

    FLOAT

    4

    Variable precision. Minimum value: -3.4 x e^38; maximum value: 3.4 x e^38.

    INTEGER

    4

    Minimum value: -2,147,483,647; maximum value: 2,147,483,647.

    SMALLINT

    2

    Minimum value: -32,767; maximum value: 32,767.

    TEXT ENCODING DICT

    4

    Max cardinality 2 billion distinct string values

    TEXT ENCODING NONE

    Variable

    Size of the string + 6 bytes

    TIME

    8

    Minimum value: 00:00:00; maximum value: 23:59:59.

    TIMESTAMP

    8

    Linux timestamp from -30610224000 (1/1/1000 00:00:00.000) through 29379542399 (12/31/2900 23:59:59.999).

    Can also be inserted and stored in human-readable format:

    • YYYY-MM-DD HH:MM:SS

    TINYINT

    1

    Minimum value: -127; maximum value: 127.

    CREATE TABLE tbl (id INTEGER NOT NULL DEFAULT 0, name TEXT, shard key (id)) with (shard_count = 2);
  • For arrays, use the following syntax: ARRAY[A, B, C, …. N]

    The syntax {A, B, C, ... N} is not supported.

  • Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with malformed literal as a default value, but when you try to insert a row with a default value, it will throw an error.

  • partitions

    Partition strategy option:

    • SHARDED: Partition table using sharding.

    • REPLICATED: Partition table using replication.

    shard_count

    Number of shards to create, typically equal to the number of GPUs across which the data table is distributed.

    sort_column

    Name of the column on which to sort during bulk import.

    A value in the column specified as a shard key is always sent to the same partition.

  • The number of shards should be equal to the number of GPUs in the cluster.

  • Sharding is allowed on the following column types:

    • DATE

    • INT

    • TEXT ENCODING DICT

    • TIME

    • TIMESTAMP

  • Tables must share the dictionary for the column to be involved in sharded joins. If the dictionary is not specified as shared, the join does not take advantage of sharding. Dictionaries are reference-counted and only dropped when the last reference drops.

  • partitions

    Partition strategy option:

    • SHARDED: Partition table using sharding.

    • REPLICATED: Partition table using replication.

    use_shared_dictionaries

    Controls whether the created table creates its own dictionaries for text columns, or instead shares the dictionaries of its source table. Uses shared dictionaries by default (true), which increases the speed of table creation.

    Setting to false shrinks the dictionaries if SELECT for the created table has a narrow filter; for example: CREATE TABLE new_table AS SELECT * FROM old_table WITH (USE_SHARED_DICTIONARIES='false');

    vacuum

    Formats the table to more efficiently handle DELETE requests. The only parameter available is delayed. Rather than immediately remove deleted rows, vacuum marks items to be deleted, and they are removed at an optimal time.

    ARRAY[A, B, C, …. N]
    . The syntax
    {A, B, C, ... N}
    is not supported.
  • Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with a malformed literal as a default value, but when you try to insert a row with a default value, it throws an error.

  • HEAVY.AI supports ALTER TABLE RENAME TABLE and ALTER TABLE RENAME COLUMN for temporary tables. HEAVY.AI does not support ALTER TABLE ADD COLUMN to modify a temporary table.

    none

    You must have a least GRANT CREATE ON DATABASE privilege level to use the DUMP command.

    You must have a least GRANT CREATE ON DATABASE privilege level to use the RESTORE command.

    BIGINT

    8

    Minimum value: -9,223,372,036,854,775,807; maximum value: 9,223,372,036,854,775,807.

    BOOLEAN

    1

    TRUE: 'true', '1', 't'. FALSE: 'false', '0', 'f'. Text values are not case-sensitive.

    DICT

    Dictionary encoding on string columns (default for TEXT columns). Limit of 2 billion unique string values.

    FIXED (bits)

    Fixed length encoding of integer or timestamp columns. See Datatypes and Fixed Encoding.

    NONE

    No encoding. Valid only on TEXT columns. No Dictionary is created. Aggregate operations are not possible on this column type.

    fragment_size

    Number of rows per fragment that is a unit of the table for query processing. Default: 32 million rows, which is not expected to be changed.

    max_rollback_epochs

    Limit the number of epochs a table can be rolled back to. Limiting the number of epochs helps to limit the amount of on-disk data and prevent unmanaged data growth.

    Limiting the number of rollback epochs also can increase system startup speed, especially for systems on which data is added in small batches or singleton inserts. Default: 3.

    The following example creates the table test_table and sets the maximum epoch rollback number to 50:

    CREATE TABLE test_table(a int) WITH (MAX_ROLLBACK_EPOCHS = 50);

    max_rows

    Used primarily for streaming datasets to limit the number of rows in a table, to avoid running out of memory or impeding performance. When the max_rows limit is reached, the oldest fragment is removed. When populating a table from a file, make sure that your row count is below the max_rows setting. If you attempt load more rows at one time than the max_rows setting defines, the records up to the max_rows limit are removed, leaving only the additional rows. Default: 2^62. In a distributed system, the maximum number of rows is calculated as max_rows * leaf_count. In a sharded distributed system, the maximum number of rows is calculated as max_rows * shard_count.

    page_size

    fragment_size

    Number of rows per fragment that is a unit of the table for query processing. Default = 32 million rows, which is not expected to be changed.

    max_chunk_size

    Size of chunk that is a unit of the table for query processing. Default: 1073741824 bytes (1 GB), which is not expected to be changed.

    max_rows

    Used primarily for streaming datasets to limit the number of rows in a table. When the max_rows limit is reached, the oldest fragment is removed. When populating a table from a file, make sure that your row count is below the max_rows setting. If you attempt load more rows at one time than the max_rows setting defines, the records up to the max_rows limit are removed, leaving only the additional rows. Default = 2^62.

    page_size

    DATE ENCODING FIXED(16)arrow-up-right
    Datatypes and Fixed Encoding
    Geospatial Primitives
    disk space reclamationarrow-up-right
    the S3 import documentation
    disk space reclamationarrow-up-right
    DROP TABLEarrow-up-right

    DATE*

    Number of I/O page bytes. Default: 1MB, which does not need to be changed.

    Number of I/O page bytes. Default = 1MB, which does not need to be changed.

    Configuration Parameters for HeavyDB

    Following are the parameters for runtime settings on HeavyDB. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.

    For example, consider allow-loop-joins [=arg(=1)] (=0).

    • If you do not use this flag, loop joins are not allowed by default.

    CREATE [TEMPORARY] TABLE [IF NOT EXISTS] <table>
      (<column> <type> [NOT NULL] [DEFAULT <value>] [ENCODING <encodingSpec>],
      [SHARD KEY (<column>)],
      [SHARED DICTIONARY (<column>) REFERENCES <table>(<column>)], ...)
      [WITH (<property> = value, ...)];
    CREATE TABLE IF NOT EXISTS tweets (
       tweet_id BIGINT NOT NULL,
       tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
       lat FLOAT,
       lon FLOAT,
       sender_id BIGINT NOT NULL,
       sender_name TEXT NOT NULL ENCODING DICT,
       location TEXT ENCODING  DICT,
       source TEXT ENCODING DICT,
       reply_to_user_id BIGINT,
       reply_to_tweet_id BIGINT,
       lang TEXT ENCODING  DICT,
       followers INT,
       followees INT,
       tweet_count INT,
       join_time TIMESTAMP ENCODING  FIXED(32),
       tweet_text TEXT,
       state TEXT ENCODING  DICT,
       county TEXT ENCODING DICT,
       place_name TEXT,
       state_abbr TEXT ENCODING DICT,
       county_state TEXT ENCODING DICT,
       origin TEXT ENCODING DICT,
       phone_numbers bigint);
    CREATE TABLE delta (
       id INTEGER NOT NULL, 
       name TEXT NOT NULL, 
       city TEXT NOT NULL DEFAULT 'San Francisco' ENCODING DICT(16));
    CREATE TABLE  customers(
       accountId text,
       name text,
       SHARD KEY (accountId))
      WITH (shard_count = 4);
    CREATE TABLE transactions(
       accountId text,
       action text,
       SHARD KEY (accountId),
       SHARED DICTIONARY (accountId) REFERENCES customers(accountId))
      WITH (shard_count = 4);
    CREATE TEMPORARY TABLE customers(
       accountId TEXT,
       name TEXT,
       timeCreated TIMESTAMP)
    CREATE TABLE [IF NOT EXISTS] <newTableName> AS (<SELECT statement>) [WITH (<property> = value, ...)];
    CREATE TABLE newTable AS (SELECT * FROM oldTable);
    CREATE TABLE trousers AS (SELECT name, waist, inseam FROM wardrobe);
    CREATE TABLE IF NOT EXISTS cosmos AS (SELECT star, planet FROM universe WHERE class='M');
    ALTER TABLE <table> RENAME TO <table>;
    ALTER TABLE <table> RENAME COLUMN <column> TO <column>;
    ALTER TABLE <table> ADD [COLUMN] <column> <type> [NOT NULL] [ENCODING <encodingSpec>];
    ALTER TABLE <table> ADD (<column> <type> [NOT NULL] [ENCODING <encodingSpec>], ...);
    ALTER TABLE <table> ADD (<column> <type> DEFAULT <value>);
    ALTER TABLE <table> DROP COLUMN <column_1>[, <column_2>, ...];
    ALTER TABLE <table> SET MAX_ROLLBACK_EPOCHS=<value>;
    ALTER TABLE <table> ALTER COLUMN <column> TYPE <type>, ALTER COLUMN <column> TYPE <type>, ...;
    ALTER TABLE tweets RENAME TO retweets;
    ALTER TABLE retweets RENAME COLUMN source TO device;
    ALTER TABLE tweets ADD COLUMN pt_dropoff POINT DEFAULT 'point(0 0)';
    ALTER TABLE table_one ADD a INTEGER, b INTEGER NOT NULL DEFAULT 15, c TEXT;
    ALTER TABLE tweets ADD COLUMN lang TEXT ENCODING DICT;
    ALTER TABLE tweets ADD (lang TEXT ENCODING DICT, encode TEXT ENCODING DICT);
    ALTER TABLE tweets DROP COLUMN pt_dropoff;
    ALTER TABLE test_table SET MAX_ROLLBACK_EPOCHS=50;
    ALTER TABLE my_table ALTER COLUMN id TYPE INTEGER;
    ALTER TABLE my_table ALTER COLUMN id TYPE BIGINT, ALTER COLUMN location TYPE GEOMETRY(POINT, 4326);
    DROP TABLE [IF EXISTS] <table>;
    DROP TABLE IF EXISTS tweets;
    DUMP TABLE <table> TO '<filepath>' [WITH (COMPRESSION='<compression_program>')];
    DUMP TABLE tweets TO '/opt/archive/tweetsBackup.gz' WITH (COMPRESSION='gzip');
    RENAME TABLE <table> TO <table>[, <table> TO <table>, <table> TO <table>...];
    RENAME TABLE table_A TO table_B;
    RENAME TABLE table_A TO table_B, table_B TO table_A;
    
    RENAME TABLE table_A TO table_B, table_B TO table_C, table_C TO table_A;
    RENAME TABLE table_A TO table_A_stale, table_B TO table_B_stale, table_A_new TO table_A, table_B_new TO table_B;
    RESTORE TABLE <table> FROM '<filepath>' [WITH (COMPRESSION='<compression_program>')];
    RESTORE TABLE <table> FROM '<S3_file_URL>' 
      WITH (compression = '<compression_program>', 
            s3_region = '<region>', 
            s3_access_key = '<access_key>', 
            s3_secret_key = '<secret_key>', 
            s3_session_token = '<session_token>');
    RESTORE TABLE tweets FROM '/opt/archive/tweetsBackup.gz' 
       WITH (COMPRESSION='gzip');
    RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz'
       WITH (compression = 'gzip', 
          s3_region = 'us-east-1');
    RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 2 
       WITH (compression = 'gzip', 
          s3_region = 'us-east-1', 
          s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy');
    RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 
       WITH (compression = 'gzip', 
          s3_region = 'us-east-1', 
          s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy',
          s3_session_token = 'zzzzzzzz');
    RESTORE TABLE tweets FROM 's3://my-gcp-bucket/archive/tweetsBackup.gz' 2 
       WITH (compression = 'gzip', 
          s3_region = 'us-east-1', 
          s3_endpoint = 'storage.googleapis.com');
    TRUNCATE TABLE <table>;
    TRUNCATE TABLE tweets;
    OPTIMIZE TABLE [<table>] [WITH (VACUUM='true')]
    VALIDATE
    heavysql> validate;
    Result
    
    Negative epoch value found for table "my_table". Epoch: -1.
    Epoch values for table "my_table_2" are inconsistent:
    Table Id  Epoch     
    ========= ========= 
    4         1         
    5         2
    Instance OK
    VALIDATE CLUSTER [WITH (REPAIR_TYPE = ['NONE' | 'REMOVE'])];
    mapd@thing3 ~]$ /mnt/gluster/dist_mapd/mapd-sw2/bin/mapdql -p HyperInteractive
    User admin connected to database heavyai
    heavysql> validate cluster;
    Result
     Node          Table Count 
     ===========   =========== 
     Aggregator     1116
     Leaf 0         1114
     Leaf 1         1114
    No matching table on Leaf 0 for Table cities_dtl_POINTS table id 56
    No matching table on Leaf 1 for Table cities_dtl_POINTS table id 56
    No matching table on Leaf 0 for Table cities_dtl table id 80
    No matching table on Leaf 1 for Table cities_dtl table id 80
    Table details don't match on Leaf 0 for Table view_geo table id 95
    Table details don't match on Leaf 1 for Table view_geo table id 95
    Cluster OK
    VALIDATE CLUSTER WITH (REPAIR_TYPE = 'REMOVE');
    heavysql> validate cluster;
    Result
    
    Negative epoch value found for table "my_table". Epoch: -16777216.
    Epoch values for table "my_table_2" are inconsistent:
    Node      Table Id  Epoch     
    ========= ========= ========= 
    Leaf 0    4         1         
    Leaf 1    4         2
    : 4 bytes
  • 10 to 18 (maximum): 8 bytes

  • Scale must be less than precision.

    YYYY-MM-DDTHH:MM:SS (The T is dropped when the field is populated.)

    If you provide no arguments, the implied value is 1 (true) (allow-loop-joins).
  • If you provide the argument 0, that is the same as the default (allow-loop-joins=0).

  • If you provide the argument 1, that is the same as the implied value (allow-loop-joins=1).

  • Flag

    Description

    Default Value

    allow-cpu-retry [=arg]

    Allow the queries that failed on GPU to retry on CPU, even when watchdog is enabled. When watchdog is enabled, most queries that run on GPU and throw a watchdog exception fail. Turn this on to allow queries that fail the watchdog on GPU to retry on CPU. The default behavior is for queries that run out of memory on GPU to throw an error if watchdog is enabled. Watchdog is enabled by default.

    TRUE[1]

    allow-cpu-kernel-concurrency

    Allow for multiple queries to run execution kernels concurrently on CPU.

    Example: In a system with a number of executor of 4 (controlled by the parameter number-executors) 3+1 (the +1 is depending by the allow-cpu-gpu-kernel-concurrency) can run concurrently in the CPU.

    DEFAULT: ON

    hashtag
    Additional Enterprise Edition Parameters

    Following are additional parameters for runtime settings for the Enterprise Edition of HeavyDB. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.

    Flag

    Description

    Default Value

    cluster arg

    Path to data leaves list JSON file. Indicates that the HEAVY.AI server instance is an aggregator node, and where to find the rest of its cluster. Change for testing and debugging.

    $HEAVYAI_BASE

    compression-limit-bytes [=arg(=536870912)] (=536870912)

    Compress result sets that are transferred between leaves. Minimum length of payload above which data is compressed.

    536870912

    allow-cpu-gpu-kernel-concurrency

    Allow multiple queries to run execution kernels concurrently on CPU while a GPU query is executing.

    Example: In a system with a number of executor of 4 (controlled by the parameter number-executors), on of the 4 slot can be used to run a GPU query, along with other 3 running on CPU.

    DEFAULT: ON

    allow-local-auth-fallback [=arg(=1)] (=0)

    If SAML or LDAP logins are enabled, and the logins fail, this setting enables authentication based on internally stored login credentials. Command-line tools or other tools that do not support SAML might reject those users from logging in unless this feature is enabled. This allows a user to log in using credentials on the local database.

    FALSE[0]

    allow-loop-joins [=arg(=1)] (=0)

    Enables all join queries to fall back to the loop join implementation. During a loop join, queries loop over all rows from all tables involved in the join, and evaluate the join condition. By default, loop joins are only allowed if the number of rows in the inner table is fewer than the trivial-loop-join-thresholdarrow-up-right, since loop joins are computationally expensive and run for an extended period. Modifying the trivial-loop-join-threshold is a safer alternative to globally enabling loop joins. You might choose to globally enable loop joins when you have many small tables for which loop join performance has been determined to be acceptable but modifying the trivial join loop threshold would be tedious.

    FALSE[0]

    allowed-export-paths = ["root_path_1", root_path_2", ...]

    Specify a list of allowed root paths that can be used in export operations, such as the COPY TO command. Helps prevent exploitation of security vulnerabilities and prevent server crashes, data breaches, and full remote control of the host machine. For example:

    allowed-export-paths = ["/heavyai-storage/data/heavyai_export", "/home/centos"] The list of paths must be on the same line as the configuration parameter.

    Allowed file paths are enforced by default. The default export path (<data directory>/heavyai_export) is allowed by default, and all child paths of that path are allowed.

    When using commands with other paths, the provided paths must be under an allowed root path. If you try to use a nonallowed path in a COPY TO command, an error response is returned.

    N/A

    allow-s3-server-privileges

    Allow S3 server privileges if IAM user credentials are not provided. Credentials can be specified with environment variables (such as AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and so on), an AWS credentials file, or when running on an EC2 instance, with an IAM role that is attached to the instance.

    FALSE[0]

    allowed-import-paths = ["root_path_1", "root_path_2", ...]

    Specify a list of allowed root paths that can be used in import operations, such as the COPY FROM command. Helps prevent exploitation of security vulnerabilities and prevent server crashes, data breaches, and full remote control of the host machine.

    For example:

    allowed-import-paths = ["/heavyai-storage/data/heavyai_import", "/home/centos"] The list of paths must be on the same line as the configuration parameter.

    Allowed file paths are enforced by default. The default import path (<data directory>/heavyai_import) is allowed by default, and all child paths of that allowed path are allowed.

    When using commands with other paths, the provided paths must be under an allowed root path. If you try to use a nonallowed path in a COPY FROM command, an error response is returned.

    N/A

    approx_quantile_buffer arg

    Size of a temporary buffer that is used to copy in the data for APPROX_MEDIAN calculation. When full, is sorted before being merged into the internal distribution buffer configured in approx_quantile_centroids.

    1000

    approx_quantile_centroids arg

    Size of the internal buffer used to approximate the distribution of the data for which the APPOX_MEDIAN calculation is taken. The larger the value, the greater the accuracy of the answer.

    300

    auth-cookie-name arg

    Configure the authentication cookie name. If not explicitly set, the default name is oat.

    oat

    bigint-count [=arg]

    Use 64-bit count. Disabled by default because 64-bit integer atomics are slow on GPUs. Enable this setting if you see negative values for a count, indicating overflow. In addition, if your data set has more than 4 billion records, you likely need to enable this setting.

    FALSE[0]

    bitmap-memory-limitarg

    Set the maximum amount of memory (in GB) allocated for APPROX_COUNT_DISTINCT bitmaps per execution kernel (thread or GPU).

    8

    calcite-max-mem arg

    Max memory available to calcite JVM. Change if Calcite reports out-of-memory errors.

    1024

    calcite-port arg

    Calcite port number. Change to avoid collisions with ports already in use.

    6279

    calcite-service-timeout

    Service timeout value, in milliseconds, for communications with Calcite. On databases with large numbers of tables, large numbers of concurrent queries, or many parallel updates and deletes, Calcite might return less quickly. Increasing the timeout value can prevent THRIFT_EAGAIN timeout errors.

    5000

    columnar-large-projections[=arg]

    Sets automatic use of columnar output, instead of row-wise output, for large projections.

    TRUE

    columnar-large-projections-threshold arg

    Set the row-number threshold size for columnar output instead of row-wise output.

    1000000

    config arg

    Path to heavy.conf. Change for testing and debugging.

    $HEAVYAI_STORAGE/ heavy.conf

    cpu-only

    Run in CPU-only mode. Set this flag to force HeavyDB to run in CPU mode, even when GPUs are available. Useful for debugging and on shared-tenancy systems where the current HeavyDB instance does not need to run on GPUs.

    FALSE

    cpu-buffer- mem-bytes arg

    Size (in bytes) of memory reserved for CPU buffers. Change to restrict the amount of CPU/system memory HeavyDB can consume. A default value of 0 indicates that 80% of total CPU memory will be used by the HeavyDB server.

    0

    cuda-block-size arg

    Size of block to use on GPU. GPU performance tuning: Number of threads per block. Default of 0 means use all threads per block.

    0

    cuda-grid-size arg

    Size of grid to use on GPU. GPU performance tuning: Number of blocks per device. Default of 0 means use all available blocks per device.

    0

    data arg

    Directory path to HEAVY.AI catalogs. Change for testing and debugging.

    $HEAVYAI_STORAGE

    db-query-list arg

    Path to file containing HEAVY.AI queries. Use a query list to autoload data to GPU memory on startup to speed performance. See Preloading Dataarrow-up-right.

    N/A

    dynamic-watchdog-time-limit [=arg]

    Dynamic watchdog time limit, in milliseconds. Change if dynamic watchdog is stopping queries expected to take longer than this limit.

    100000

    enable-auto-clear-render-mem [=arg]

    Enable/disable clear render gpu memory on out-of-memory errors during rendering. If an out-of-gpu-memory exception is thrown while rendering, many users respond by running \clear_gpu via the heavysql command-line interface to refresh/defrag the memory heap. This process can be automated with this flag enabled. At present, only GPU memory in the renderer is cleared automatically.

    TRUE[1]

    enable-auto-metadata-update [=arg]

    Enable automatic metadata updates on UPDATE queries. Automatic metadata updates are turned on by default. Disabling may result in stale metadata and reductions in query performance.

    TRUE[1]

    enable-columnar-output [=arg]

    Allows HEAVY.AI Core to directly materialize intermediate projections and the final ResultSet in Columnar format where appropriate. Columnar output is an internal performance enhancement that projects the results of an intermediate processing step in columnar format. Consider disabling this feature if you see unexpected performance regressions in your queries.

    TRUE[1]

    enable-data-recycler [=arg]

    Set to TRUE to enable the data recycler. Enabling the recycler enables the following:

    • Hashtable recycler, which is the cache storage.

    • Hashing scheme recycler, which preserves a hashtable layout (such as perfect hashing and keyed hashing).

    • Overlaps hashtable tuning parameter recycler. Each overlap hashtable has its own parameters used during hashtable building.

    TRUE[0]

    enable-debug-timer [=arg]

    Enable fine-grained query execution timers for debug. For debugging, logs verbose timing information for query execution (time to load data, time to compile code, and so on).

    FALSE[0]

    enable-direct-columnarization [=arg(=1)](=0)

    Columnarization organizes intermediate results in a multi-step query in the most efficient way for the next step in the process. If you see an unexpected performance regression, you can try setting this value to false, enabling the earlier HEAVY.AI columnarization behavior.

    TRUE[1]

    enable-dynamic-watchdog [=arg]

    Enable dynamic watchdog.

    FALSE[0]

    --enable-executor-resource-mgr [=arg]

    Disable the executor resource manager.

    TRUE[1]

    enable-filter-push-down [=arg(=1)] (=0)

    Enable filter push-down through joins. Evaluates filters in the query expression for selectivity and pushes down highly selective filters into the join according to selectivity parameters. See also What is Predicate Pushdown?arrow-up-right

    FALSE[0]

    enable-foreign-table-scheduled-refresh [=arg]

    Enable scheduled refreshes of foreign tables. Enables automated refresh of foreign tables with "REFRESH_TIMING_TYPE" option of "SCHEDULED" based on the specified refresh schedule.

    TRUE[1]

    enable-geo-ops-on-uncompressed-coords [=arg(=1)] (=0)

    Allow geospatial operations ST_Contains and ST_Intersects to process uncompressed coordinates where possible to increase execution speed. Provides control over the selection of ST_Contains and ST_Intersects implementations. By default, for certain combinations of compressed geospatial arguments, such as ST_Contains(POLYGON, POINT), the implementation can process uncompressed coordinate values. This can result in much faster execution but could decrease precision. Disabling this option enables full decompression, which is slower but more precise.

    TRUE[1]

    enable-logs-system-tables [=arg(=1)] (=0)

    Enable use of logs system tables. Also enables the Request Logs and Monitoring system dashboard (Enterprise Edition only).

    FALSE[0]

    enable-overlaps-hashjoin [=arg(=1)] (=0)

    Enable the overlaps hash join framework allowing for range join (for example, spatial overlaps) computation using a hash table.

    TRUE[1]

    enable-runtime-query-interrupt [=arg(=1)] (=0)

    Enable the runtime query interrupt. Enables runtime query interrupt. Setting to TRUE can reduce performance slightly. Use with runtime-query-interrupt-frequencyarrow-up-right to set the interrupt frequency.

    FALSE[0]

    enable-runtime-udf

    Enable runtime user defined function registration. Enables runtime registration of user defined functions. This functionality is turned off unless you specifically request it, to prevent unintentional inclusion of nonstandard code. This setting is a precursor to more advanced object permissions planned in future releases.

    FALSE[0]

    enable-string-dict-hash-cache[=arg(=1)] (=0)

    When importing a large table with low cardinality, set the flag to TRUE and leave it on to assist with bulk queries. If using String Dictionary Server, set the flag to FALSE if the String Dictionary server uses more memory than the physical system can support.

    TRUE[1]

    enable-thrift-logs [=arg(=1)] (=0)

    Enable writing messages directly from Thrift to stdout/stderr. Change to enable verbose Thrift messages on the console.

    FALSE[0]

    enable-watchdog [arg]

    Enable watchdog.

    TRUE[1]

    executor-cpu-result-mem-ratio

    Set executor resource manager reserved memory for query result sets as a ratio greater than 0, representing the fraction of the system memory not allocatable for the CPU buffer pool. Values of 1.0 are permitted to allow over-subscription when warranted, but too high a value can cause out-of-memory errors.

    Example: In a system with 256GB of Ram, the default of the cpu-buffer-size is 204.8GB so this ratio will be calculated on 51.2GB, limiting the maximum result set memory for a single query to 41GB

    Executor-cpu-result-mem-bytes

    Set executor resource manager reserved memory for query result sets in bytes. This overrides the default reservation of 80% the size of the system memory that is not allocated for the CPU buffer pool. Use 0 for auto.

    DEFAULT: None (result memory size is controlled via the ratio setting above)

    executor-per-query-max-cpu-threads-ratio

    Set max fraction of executor resource manager total CPU slots/threads that can be allocated for a single query.

    Note we allow executor-per-query-max-cpu-threads-ratio to have values > 1 to allow over-subscription of threads when warranted, given we may be overly pessimistic about kernel core occupation for some classes of queries. Care should be taken however with setting this value too high as thrashing and thread starvation can result. Example: on a physical server with 24 logical CPUs or in a VM with 24 vCPU the executor thread will be doubled to 48, so a value of 0.9 will use up 43 threads for a single query. This value can be lowered to lower memory requirements of single queries.

    DEFAULT: 0.9

    executor-per-query-max-cpu-result-mem-ratio

    Set max fraction of executor resource manager total CPU result memory reservation that can be allocated for a single query.

    Note we allow executor-per-query-max-cpu-result-mem-ratio to have values > 0 to allow over-subscription of memory when warranted, but user should be careful with this as too high a value can cause out-of-memory errors.

    Default: 0.8

    filter-push-down-low-frac

    Higher threshold for selectivity of filters which are pushed down. Filters with selectivity lower than this threshold are considered for a push down.

    filter-push-down-passing-row-ubound

    Upper bound on the number of rows that should pass the filter if the selectivity is less than the high fraction threshold.

    flush-log [arg]

    Immediately flush logs to disk. Set to FALSE if this is a performance bottleneck.

    TRUE[1]

    from-table-reordering [=arg(=1)] (=1)

    Enable automatic table reordering in FROM clause. Reorders the sequence of a join to place large tables on the inside of the join clause and smaller tables on the outside. HEAVY.AI also reorders tables between join clauses to prefer hash joins over loop joins. Change this value only in consultation with an HEAVY.AI engineer.

    TRUE[1]

    gpu-buffer-mem-bytes [=arg]

    Size of memory reserved for GPU buffers in bytes per GPU. Change to restrict the amount of GPU memory HeavyDB can consume per GPU. A default value of 0 indicates no limit on GPU memory use (HeavyDB uses all available GPU memory across all active GPUs on the system).

    0

    Maximum amount of memory in bytes that can be used for the GPU code cache.

    134217728 (128MB)

    gpu-input-mem-limit arg

    Force query to CPU when input data memory usage exceeds this percentage of available GPU memory. OmniSciDB loads data to GPU incrementally until data exceeds GPU memory, at which point the system retries on CPU. Loading data to GPU evicts any resident data already loaded or any query results that are cached. Use this limit to avoid attempting to load datasets to GPU when they obviously will not fit, preserving cached data on GPU and increasing query performance. If watchdog is enabled and allow-cpu-retry is not enabled, the query fails instead of re-running on CPU.

    0.9

    hashtable-cache-total-bytes [=arg]

    The total size of the cache storage for hashtable recycler, in bytes. Increase the cache size to store more hashtables. Must be larger than or equal to the value defined in max-cacheable-hashtable-size-bytes.

    4294967296 (4GB)

    hll-precision-bits [=arg]

    Number of bits used from the hash value used to specify the bucket number. Change to increase or decrease approx_count_distinct() precision. Increased precision decreases performance.

    11

    http-port arg

    HTTP port number. Change to avoid collisions with ports already in use.

    6278

    idle-session-duration arg

    Maximum duration of an idle session, in minutes. Change to increase or decrease duration of an idle session before timeout.

    60

    inner-join-fragment-skipping [=arg(=1)] (=0)

    Enable or disable inner join fragment skipping. Enables skipping fragments for improved performance during inner join operations.

    FALSE[0]

    license arg

    Path to the file containing the license key. Change if your license file is in a different location or has a different name.

    log-auto-flush

    Flush logging buffer to file after each message. Changing to false can improve performance, but log lines might not appear in the log for a very long time. HEAVY.AI does not recommend changing this setting.

    TRUE[1]

    log-directory arg

    Path to the log directory. Can be either a relative path to the $HEAVYAI_STORAGE/data directory or an absolute path. Use this flag to control the location of your HEAVY.AI log files. If the directory does not exist, HEAVY.AI creates the top level directory. For example, a/b/c/logdir is created only if the directory path a/b/c already exists.

    /var/lib/heavyai/ data/heavyai_log

    log-file-name

    Boilerplate for the name of the HEAVY.AI log files. You can customize the name of your HEAVY.AI log files. {SEVERITY} is the only braced token recognized. It allows you to create separate files for each type of error message greater than or equal to the log-severity configuration option.

    heavydb.{SEVERITY}. %Y%m%d-%H%M%S.log

    log-max-files

    Maximum number of log files to keep. When the number of log files exceeds this number, HEAVY.AI automatically deletes the oldest files.

    100

    log-min-free-space

    Minimum number of bytes left on device before oldest log files are deleted. This is a safety feature to be sure the disk drive of the log directory does not fill up, and guarantees that at least this many bytes are free.

    20971520

    log-rotation-size

    Maximum file size in bytes before new log files are started. Change to increase/decrease size of files. If log files fill quickly, you might want to increase this number so that there are fewer log files.

    10485760

    log-rotate-daily

    Start new log files at midnight. Set to false to write to log files until they are full, rather than restarting each day.

    TRUE[1]

    log-severity

    Log to file severity levels:

    DEBUG4

    DEBUG3

    DEBUG2

    DEBUG1

    INFO

    WARNING

    ERROR

    FATAL

    All levels after your chosen base severity level are listed. For example, if you set the severity level to WARNING, HEAVY.AI only logs WARNING, ERROR, and FATAL messages.

    INFO

    log-severity-clog

    Log to console severity level: INFO WARNING ERROR FATAL. Output chosen severity messages to STDERR from running process.

    WARNING

    log-symlink

    Symbolic link to the active log. Creates a symbolic link for every severity greater than or equal to the log-severityarrow-up-right configuration option.

    heavydb. {SEVERITY}.log

    log-user-id

    Log internal numeric user IDs instead of textual user names.

    log-user-origin

    Look up the origin of inbound connections by IP address and DNS name and print this information as part of stdlog. Some systems throttle DNS requests or have other network constraints that preclude timely return of user origin information. Set to FALSE to improve performance on those networks or when large numbers of users from different locations make rapid connect/disconnect requests to the server.

    TRUE[1]

    logs-system-tables-max-files-count [=arg]

    Maximum number of log files that can be processed by each logs system table.

    100

    max-cacheable-hashtable-size-bytes [=arg]

    Maximum size of the hashtable that the hashtable recycler can store. Limiting the size can enable more hashtables to be stored. Must be lesser than or equal to the value defined in hashtable-cache-total-bytes.

    2147483648 (2GB)

    max-session-duration arg

    Maximum duration of the active session, in minutes. Change to increase or decrease session duration before timeout.

    43200 (30 days)

    ndv-group-estimator-multiplier

    A value that determines how the result of the Number of Distinct Values (NDV) group estimator is scaled. This value must be between 1.0 and 2.0.

    1.5

    null-div-by-zero [=arg]

    Allows processing to complete when when the dataset would cause a divide by zero error. Set to TRUE if you prefer to return null when dividing by zero, and set to FALSE to throw an exception.

    FALSE[0]

    num-executors arg

    Beta functionality in Release 5.7. Set the number of executors.

    num-gpus arg

    Number of GPUs to use. In a shared environment, you can assign the number of GPUs to a particular application. The default, -1, uses all available GPUs. Use in conjunction with start-gpuarrow-up-right.

    -1

    num-reader-threads arg

    Number of reader threads to use. Drop the number of reader threads to prevent imports from using all available CPU power. Default is to use all threads.

    0

    overlaps-bucket- threshold arg

    The minimum size of a bucket corresponding to a given inner table range for the overlaps hash join.

    -p | port int

    HeavyDB server port. Change to avoid collisions with other services if 6274 is already in use.

    6274

    pending-query-interrupt-freq=arg

    Frequency with which to check the interrupt status of pending queries, in milliseconds. Values larger than 0 are valid. If you set pending-query-interrupt-freq=100, each session's interrupt status is checked every 100 ms.

    For example, assume you have three sessions (S1, S2, and S3) in your queue, and assume S1 contains a running query, and S2 and S3 hold pending queries. If you setpending-query-interrupt-freq=1000 both S2 and S3 are interrupted every 1000 ms (1 sec). See running-query-interrupt-freq for information about interrupting running queries. Decreasing the value increases the speed with which pending queries are removed, but also increases resource usage.

    1000 (1 sec)

    pki-db-client-auth [=arg]

    Attempt authentication of users through a PKI certificate. Set to TRUE for the server to attempt PKI authentication.

    FALSE[0]

    read-only [=arg(=1)]

    Enable read-only mode. Prevents changes to the dataset.

    FALSE[0]

    render-mem-bytes arg

    Specifies the size of a per-GPU buffer that render query results are written to; allocated at the first rendering call. Persists while the server is running unless you run \clear_gpu_memory. Increase if rendering a large number of points or symbols and you get the following out-of-memory exception: Not enough OpenGL memory to render the query results.

    Default is 500 MB.

    500000000

    render-oom-retry-threshold = arg

    A render execution time limit in milliseconds to retry a render request if an out-of-gpu-memory error is thrown. Requires enable-auto-clear-render-mem = true. If enable-auto-clear-render-mem = true, a retry of the render request can be performed after an out-of-gpu-memory exception. A retry only occurs if the first run took less than the threshold set here (in milliseconds). The retry is attempted after the render gpu memory is automatically cleared. If an OOM exception occurs, clearing the memory might get the request to succeed. Providing a reasonable threshold might give more stability to memory-constrained servers w/ rendering enabled. Only a single retry is attempted. A value of 0 disables retries.

    rendering [=arg]

    Enable or disable backend rendering. Disable rendering when not in use, freeing up memory reserved by render-mem-bytes. To reenable rendering, you must restart HEAVY.AI Server.

    TRUE[1]

    res-gpu-mem =arg

    Reserved memory for GPU. Reserves extra memory for your system (for example, if the GPU is also driving your display, such as on a laptop or single-card desktop). HEAVY.AI uses all the memory on the GPU except for render-mem-bytes + res-gpu-mem. Also useful if other processes, such as a machine-learning pipeline, share the GPU with HEAVY.AI. In advanced rendering scenarios or distributed setups, increase to free up additional memory for the renderer, or for aggregating results for the renderer from multiple leaf nodes. HEAVY.AI recommends always setting res-gpu-mem when using backend rendering.

    134217728

    running-query-interrupt-freq arg

    Controls the frequency of interruption status checking for running queries. Range: 0.0 (less frequently) to 1.0 (more frequently).

    For example, if you have 10 threads that evaluate a query of a table that has 1000 rows, then each thread advances its thread index up to 10 times. In this case, if you set the flag close to 1.0, you check a session's interrupt status for every increment of the thread index.

    If we set the flag value as close to 0.0, you only check the session's interrupt status when the index increment is close to 10. The default value of running interrupt checking is close to half of the maximum increment of the thread index.

    Frequent interrupt status checking reduces latency for the interrupt but also can decrease query performance.

    seek-kafka-commit = <N>

    Set the offset of the last Kafka message to be committed from a Kafka data stream. Set the offset of the last Kafka message to be committed from a Kafka data stream. This way, Kafka does not resend those messages. After the Kafka server commits messages through the number N, it resends messages starting at message N+1. This is particularly useful when you want to create a replica of the HEAVY.AI server from an existing data directory.

    N/A

    ssl-cert path

    Path to the server's public PKI certificate (.crt file). Define the path the the .crt file. Used to establish an encrypted binary connection.

    ssl-keystore path

    Path to the server keystore. Used for an encrypted binary connection. The path to Java trust store containing the server's public PKI key. Used by HeavyDB to connect to the encrypted Calcite server port.

    ssl-keystore-password password

    The password for the SSL keystore. Used to create a binary encrypted connection to the Calcite server.

    ssl-private-key path

    Path to the server's private PKI key. Define the path to the HEAVY.AI server PKI key. Used to establish an encrypted binary connection.

    ssl-trust-ca path

    Enable use of CA-signed certificates presented by Calcite. Defines the file that contains trusted CA certificates. This information enables the server to validate the TCP/IP Thrift connections it makes as a client to the Calcite server. The certificate presented by the Calcite server is the same as the certificate used to identify the database server to its clients.

    ssl-trust-ca-server path

    Path to the file containing trusted CA certificates; for PKI authentication. Used to validate certificates submitted by clients. If the certificate provided by the client (in the password field of the connect command) was not signed by one of the certificates in the trusted file, then the connection fails. PKI authentication works only if the server is configured to encrypt connections via TLS. The common name extracted from the client certificate is used as the name of the user to connect. If this name does not already exist, the connection fails. If LDAP or SAML are also enabled, the servers fall back to these authentication methods if PKI authentication fails. Currently works only with JDBCarrow-up-right clients. To allow connection from other clients, set allow-local-auth-fallback or add LDAP/SAML authentication.

    ssl-trust-password password

    The password for the SSL trust store. Password to the SSL trust store containing the server's public PKI key. Used to establish an encrypted binary connection.

    ssl-trust-store path

    The path to Java trustStore containing the server's public PKI key. Used by the Calcite server to connect to the encrypted OmniSci server port, to establish an encrypted binary connection.

    start-gpu arg

    First GPU to use. Used in shared environments in which the first assigned GPU is not GPU 0. Use in conjunction with num-gpusarrow-up-right.

    FALSE[0]

    trivial-loop-join-threshold [=arg]

    The maximum number of rows in the inner table of a loop join considered to be trivially small.

    1000

    use-cpu-mem-pool-for-output-buffers

    Use the CPU memory buffer pool (whose capacity is determined by the cpu-buffer-mem-bytes configuration parameter) for output buffer allocations. When this configuration parameter is set to false, output (e.g. result set) buffer allocations will use heap memory outside the cpu-buffer-mem-bytes based memory buffer pool.

    TRUE[1]

    use-hashtable-cache

    Set to TRUE to enable the hashtable recycler. Supports complex scenarios, such as hashtable recycling for queries that have subqueries.

    TRUE[1]

    vacuum-min-selectivity [=arg]

    Specify the percentage (with a value of 0 implying 0% and a value of 1 implying 100%) of deleted rows in a fragment at which to perform automatic vacuuming.

    Automatic vacuuming occurs when deletes or updates on variable-length columns result in a percentage of deleted rows in a fragment exceeding the specified threshold. The default threshold is 10% of deleted rows in a fragment.

    When changing this value, consider the most common types of queries run on the system. In general, if you have infrequent updates and deletes, set vacuum-min-selectivity to a low value. Set it higher if you have frequent updates and deletes, because vacuuming adds overhead to affected UPDATE and DELETE queries.

    watchdog-none-encoded-string-translation-limit [=arg]

    The number of strings that can be casted using the ENCODED_TEXT string operator.

    1,000,000

    window-function-frame-aggregation-tree-fanout [=arg]

    Tree fan out of the aggregation tree is used to compute aggregation over the window frame.

    8

    compressor arg (=lz4hc)

    Compressor algorithm to be used by the server to compress data being transferred between server. See Data Compressionarrow-up-right for compression algorithm options.

    lz4hc

    ldap-dn arg

    LDAP Distinguished Name.

    ldap-role-query-regex arg

    RegEx to use to extract role from role query result.

    ldap-role-query-url arg

    LDAP query role URL.

    ldap-superuser-role arg

    The role name to identify a superuser.

    ldap-uri arg

    LDAP server URI.

    leaf-conn-timeout [=arg]

    Leaf connect timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if a connection cannot be established.

    20000

    leaf-recv-timeout [=arg]

    Leaf receive timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if data is not received in the time allotted.

    300000

    leaf-send-timeout [=arg]

    Leaf send timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if data is not sent in the time allotted.

    300000

    saml-metadata-file arg

    Path to identity provider metadata file.

    Required for running SAML. An identity provider (like Okta) supplies a metadata file. From this file, HEAVY.AI uses:

    1. Public key of the identity provider to verify that the SAML response comes from it and not from somewhere else.

    2. URL of the SSO login page used to obtain a SAML token.

    saml-sp-target-url arg

    URL of the service provider for which SAML assertions should be generated. Required for running SAML. Used to verify that a SAML token was issued for HEAVY.AI and not for some other service.

    saml-sync-roles arg (=0)

    Enable mapping of SAML groups to HEAVY.AI roles. The SAML Identity provider (for example, Okta) automatically creates users at login and assigns them roles they already have as groups in SAML.

    saml-sync-roles [=0]

    string-servers arg

    Path to string servers list JSON file. Indicates that HeavyDB is running in distributed mode and is required to designate a leaf server when running in distributed mode.

    gpu-code-cache-max-size-in-bytes [=arg]

    Geospatial Capabilities

    HEAVY.AI supports a subset of object types and functions for storing and writing queries for geospatial definitions.

    hashtag
    Geospatial Datatypes

    Type
    Size
    Example

    For information about geospatial datatype sizes, see and in .

    For more information on WKT primitives, see .

    HEAVY.AI supports SRID 4326 () and 900913 (Google Web Mercator), and 32601-32660,32701-32760 (Universal Transverse Mercator (UTM) Zones). When using geospatial fields, you set the SRID to determine which reference system to use. HEAVY.AI does not assign a default SRID.

    If you do not set the SRID of the geo field in the table, you can set it in a SQL query using ST_SETSRID(column_name, SRID). For example, ST_SETSRID(a.pt,4326).

    circle-info

    When representing longitude and latitude, the first coordinate is assumed to be longitude in HEAVY.AI geospatial primitives.

    You create geospatial objects as geometries (planar spatial data types), which are supported by the planar geometry engine at run time. When you call ST_DISTANCE on two geometry objects, the engine returns the shortest straight-line planar distance, in degrees, between those points. For example, the following query returns the shortest distance between the point(s) in p1 and the polygon(s) in poly1:

    For information about importing data, see .

    hashtag
    Geospatial Literals

    Geospatial functions that expect geospatial object arguments accept geospatial columns, geospatial objects returned by other functions, or string literals containing WKT representations of geospatial objects. Supplying a WKT string is equivalent to calling a geometry constructor. For example, these two queries are identical:

    You can create geospatial literals with a specific SRID. For example:

    hashtag
    Support for Geography

    HEAVY.AI provides support for geography objects and geodesic distance calculations, with some limitations.

    hashtag
    Exporting Coordinates from Immerse

    HeavyDB supports import from any coordinate system supported by the Geospatial Data Abstraction Library (GDAL). On import, HeavyDB will convert to and store in WGS84 encoding, and rendering is accurate in Immerse.

    However, no built-in way to reference the original coordinates currently exists in Immerse, and coordinates exported from Immerse will be WGS84 coordinates. You can work around this limitation by adding to the dataset a column or columns in non-geo format that could be included for display in Immerse (for example, in a popup) or on export.

    hashtag
    Distance Calculation

    Currently, HEAVY.AI supports spheroidal distance calculation between:

    • Two points using either SRID 4326 or 900913.

    • A point and a polygon/multipolygon using SRID 900913.

    circle-info

    Using SRID 900913 results in variance compared to SRID 4326 as polygons approach the North and South Poles.

    The following query returns the points and polygons within 1,000 meters of each other:

    See the tables in below for examples.

    hashtag
    Geospatial Functions

    HEAVY.AI supports the functions listed.

    hashtag
    Geometry Constructors

    Function
    Description

    hashtag
    Geometry to String Conversion

    Function
    Description

    hashtag
    Geometry Processing

    Function
    Description

    FunctionDescription Special processing is automatically applied to WGS84 input geometries (SRID=4326) to limit buffer distortion:

    • Implementation first determines the best planar SRID to which to project the 4326 input geometry.

    • Preferred SRIDs are UTM and Lambert (LAEA) North/South zones, with Mercator used as a fallback.

    • Buffer distance is interpreted as distance in meters (units of all planar SRIDs being considered).

    Example: Build 10-meter buffer geometries (SRID=4326) with limited distortion:SELECT ST_Buffer(poly4326, 10.0) FROM tbl; .ST_CentroidComputes the geometric center of a geometry as a POINT.

    hashtag
    Geometry Editors

    Function
    Description

    hashtag
    Geometry Accessors

    Function
    Description

    hashtag
    Overlay Functions

    Function
    Description

    hashtag
    Spatial Relationships and Measurements

    Function
    Description

    hashtag
    Additional Geo Notes

    • You can use SQL code similar to the examples in this topic as global filters in Immerse.

    • CREATE TABLE AS SELECT is not currently supported for geo data types in distributed mode.

    • GROUP BY is not supported for geo types (

    MULTIPOINT

    Variable

    A set of one or more points. For example: MULTIPOINT((0 0), (1 1), (2 2))

    MULTILINESTRING

    Variable

    A set of one or more associated lines, each of two or more points. For example: MULTILINESTRING((0 0, 1 0, 2 0), (0 1, 1 1, 2 1))

    ST_GeogFromText(WKT, SRID)

    Return a specified geography value from Well-known Text representation and an SRID.

    ST_Point(double lon, double lat)

    Return a point constructed on the fly from the provided coordinate values. Constant coordinates result in construction of a POINT literal.

    Example: ST_Contains(poly4326, ST_SetSRID(ST_Point(lon, lat), 4326))

    The input geometry is transformed to the best planar SRID and handed to GEOS, along with buffer distance.

  • The buffer geometry built by GEOS is then transformed back to SRID=4326 and returned.

  • ST_YMIN

    Returns Y minima of a geometry.

    ST_YMAX

    Returns Y maxima of a geometry.

    ST_STARTPOINT

    Returns the first point of a LINESTRING as a POINT.

    ST_ENDPOINT

    Returns the last point of a LINESTRING as a POINT.

    ST_POINTN

    Return the Nth point of a LINESTRING as a POINT.

    ST_NPOINTS

    Returns the number of points in a geometry.

    ST_NRINGS

    Returns the number of rings in a POLYGON or a MULTIPOLYGON.

    ST_SRID

    Returns the spatial reference identifier for the underlying object.

    ST_NUMGEOMETRIES

    Returns the MULTI count of MULTIPOINT, MULTILINESTRING or MULTIPOLYGON. Returns 1 for non-MULTI geometry.

    ST_INTERSECTS

    Returns true if two geometries intersect spatially, false if they do not share space. For example:

    SELECT ST_INTERSECTS( 'POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))', 'POINT(1 1)' ) FROM tbl;

    ST_AREA

    Returns the area of planar areas covered by POLYGON and MULTIPOLYGON geometries. For example:

    SELECT ST_AREA( 'POLYGON((1 0, 0 1, -1 0, 0 -1, 1 0),(0.1 0, 0 0.1, -0.1 0, 0 -0.1, 0.1 0))' ) FROM tbl;

    ST_AREA does not support calculation of geographic areas, but rather uses planar coordinates. Geographies must first be projected in order to use ST_AREA. You can do this ahead of time before import or at runtime, ideally using an equal area projection (for example, a national equal-area Lambert projection). The area is calculated in the projection's units. For example, you might use Web Mercator runtime projection to get the area of a polygon in square meters:

    ST_AREA(

    ST_PERIMETER

    Returns the cartesian perimeter of POLYGON and MULTIPOLYGON geometries. For example: SELECT ST_PERIMETER('POLYGON( (1 0, 0 1, -1 0, 0 -1, 1 0), (0.1 0, 0 0.1, -0.1 0, 0 -0.1, 0.1 0) )' ) from tbl; It will also return the geodesic perimeter of POLYGON and MULTIPOLYGON geometries. For example:

    SELECT ST_PERIMETER( ST_GeogFromText( 'POLYGON( (-76.6168198439371 39.9703199555959,

    ST_LENGTH

    Returns the cartesian length of LINESTRING geometries. For example: SELECT ST_LENGTH('LINESTRING(1 0, 0 1, -1 0, 0 -1, 1 0)') FROM tbl; It also returns the geodesic length of LINESTRING geographies. For example:

    SELECT ST_LENGTH( ST_GeogFromText('LINESTRING( -76.6168198439371 39.9703199555959, -80.5189990254673 40.6493554919257, -82.5189990254673 42.6493554919257)', 4326) ) FROM tbl;

    ST_WITHIN

    Returns true if geometry A is completely within geometry B. For example the following SELECT statement returns true:

    SELECT ST_WITHIN( 'POLYGON ((1 1, 1 2, 2 2, 2 1))', 'POLYGON ((0 0, 0 3, 3 3, 3 0))' ) FROM tbl;

    ST_DWITHIN

    Returns true if the geometries are within the specified distance of each one another. Distance is specified in units defined by the spatial reference system of the geometries. For example: SELECT ST_DWITHIN( 'POINT(1 1)', 'LINESTRING (1 2,10 10,3 3)', 2.0 ) FROM tbl; ST_DWITHIN supports geodesic distances between geographies, currently limited to geographic points. For example, you can check whether Los Angeles and Paris, specified as WGS84 geographic point literals, are within 10,000km of one another.

    SELECT ST_DWITHIN(

    ST_GeogFromText(

    ST_DFULLYWITHIN

    Returns true if the geometries are fully within the specified distance of one another. Distance is specified in units defined by the spatial reference system of the geometries. For example: SELECT ST_DFULLYWITHIN( 'POINT(1 1)', 'LINESTRING (1 2,10 10,3 3)', 10.0) FROM tbl; This function supports:

    ST_DFULLYWITHIN(POINT, LINESTRING, distance) ST_DFULLYWITHIN(LINESTRING, POINT, distance)

    ST_DISJOINT

    Returns true if the geometries are spatially disjoint (that is, the geometries do not overlap or touch. For example:

    SELECT ST_DISJOINT( 'POINT(1 1)', 'LINESTRING (0 0,3 3)' ) FROM tbl;

    POINT
    ,
    MULTIPOINT
    ,
    LINESTRING
    ,
    MULTILINESTRING
    ,
    POLYGON
    , or
    MULTIPOLYGON
    .
  • You can use \d table_name to determine if the SRID is set for the geo field:

    If no SRID is returned, you can set the SRID using ST_SETSRID(column_name, SRID). For example, ST_SETSRID(myPoint, 4326).

  • LINESTRING

    Variable

    A sequence of 2 or more points and the lines that connect them. For example: LINESTRING(0 0,1 1,1 2)

    MULTIPOLYGON

    Variable

    A set of one or more polygons. For example:MULTIPOLYGON(((0 0,4 0,4 4,0 4,0 0),(1 1,2 1,2 2,1 2,1 1)), ((-1 -1,-1 -2,-2 -2,-2 -1,-1 -1)))

    POINT

    Variable

    A point described by two coordinates. When the coordinates are longitude and latitude, HEAVY.AI stores longitude first, and then latitude. For example: POINT(0 0)

    POLYGON

    Variable

    ST_Centroid

    Computes the geometric center of a geometry as a POINT.

    ST_GeomFromText(WKT)

    Return a specified geometry value from Well-known Text representation.

    ST_GeomFromText(WKT, SRID)

    Return a specified geometry value from Well-known Text representation and an SRID.

    ST_GeogFromText(WKT)

    ST_AsText(geom) | ST_AsWKT(geom)

    Converts a geometry input to a Well-Known-Text (WKT) string

    ST_AsBinary(geom) | ST_AsWKB(geom)

    Converts a geometry input to a Well-Known-Binary (WKB) string

    ST_Buffer

    Returns a geometry covering all points within a specified distance from the input geometry. Performed by the GEOS module. The output is currently limited to the MULTIPOLYGON type.

    Calculations are in the units of the input geometry’s SRID. Buffer distance is expressed in the same units. Example:

    SELECT ST_Buffer('LINESTRING(0 0, 10 0, 10 10)', 1.0);

    Special processing is automatically applied to WGS84 input geometries (SRID=4326) to limit buffer distortion:

    • Implementation first determines the best planar SRID to which to project the 4326 input geometry.

    • Preferred SRIDs are UTM and Lambert (LAEA) North/South zones, with Mercator used as a fallback.

    • Buffer distance is interpreted as distance in meters (units of all planar SRIDs being considered).

    • The input geometry is transformed to the best planar SRID and handed to GEOS, along with buffer distance.

    • The buffer geometry built by GEOS is then transformed back to SRID=4326 and returned.

    Example: Build 10-meter buffer geometries (SRID=4326) with limited distortion:

    SELECT ST_Buffer(poly4326, 10.0) FROM tbl;

    ST_Centroid

    Computes the geometric center of a geometry as a POINT.

    ST_TRANSFORM

    Returns a geometry with its coordinates transformed to a different spatial reference. Currently, WGS84 to Web Mercator transform is supported. For example:ST_DISTANCE( ST_TRANSFORM(ST_GeomFromText('POINT(-71.064544 42.28787)', 4326), 900913), ST_GeomFromText('POINT(-13189665.9329505 3960189.38265416)', 900913) )

    ST_TRANSFORM is not currently supported in projections. It can be used only to transform geo inputs to other functions, such as ST_DISTANCE.

    ST_SETSRID

    Set the SRIDarrow-up-right to a specific integer value. For example:

    ST_TRANSFORM(

    ST_SETSRID(ST_GeomFromText('POINT(-71.064544 42.28787)'), 4326), 900913 )

    ST_X

    Returns the X value from a POINT column.

    ST_Y

    Returns the Y value from a POINT column.

    ST_XMIN

    Returns X minima of a geometry.

    ST_XMAX

    ST_INTERSECTION

    Returns a geometry representing an intersection of two geometries; that is, the section that is shared between the two input geometries. Performed by the GEOS module.

    The output is currently limited to MULTIPOLYGON type, because HEAVY.AI does not support mixed geometry types within a geometry column, and ST_INTERSECTION can potentially return points, lines, and polygons from a single intersection operation. Lower-dimension intersecting features such as points and line strings are returned as very small buffers around those features. If needed, true points can be recovered by applying the ST_CENTROID method to point intersection results. In addition, ST_PERIMETER/2 of resulting line intersection polygons can be used to approximate line length. Empty/NULL geometry outputs are not currently supported.

    Examples: SELECT ST_Intersection('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))'); SELECT ST_Area(ST_Intersection(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;

    ST_DIFFERENCE

    Returns a geometry representing the portion of the first input geometry that does not intersect with the second input geometry. Performed by the GEOS module. Input order is important; the return geometry is always a section of the first input geometry.

    The output is currently limited to MULTIPOLYGON type, for the same reasons described in ST_INTERSECTION. Similar post-processing methods can be applied if needed. Empty/NULL geometry outputs are not currently supported.

    Examples: SELECT ST_Difference('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))'); SELECT ST_Area(ST_Difference(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;

    ST_UNION

    ST_DISTANCE

    Returns shortest planar distance between geometries. For example: ST_DISTANCE(poly1, ST_GeomFromText('POINT(0 0)')) Returns shortest geodesic distance between two points, in meters, if given two point geographies. Point geographies can be specified through casts from point geometries or as literals. For example: ST_DISTANCE( CastToGeography(p2), ST_GeogFromText('POINT(2.5559 49.0083)', 4326) )

    SELECT a.name, ST_DISTANCE( CAST(a.pt AS GEOGRAPHY), CAST(b.pt AS GEOGRAPHY) ) AS dist_meters FROM starting_point a, destination_points b;

    You can also calculate the distance between a POLYGON and a POINT. If both fields use SRID 4326, then the calculated distance is in 4326 units (degrees). If both fields use SRID 4326, and both are transformed into 900913, then the results are in 900913 units (meters).

    The following SQL code returns the names of polygons where the distance between the point and polygon is less than 1,000 meters.

    SELECT a.poly_name FROM poly a, point b WHERE ST_DISTANCE( ST_TRANSFORM(b.location,900913), ST_TRANSFORM(a.heavyai_geo,900913) ) < 1000;

    ST_EQUALS

    Returns TRUE if the first input geometry and the second input geometry are spatially equal; that is, they occupy the same space. Different orderings of points can be accepted as equal if they represent the same geometry structure.

    POINTs comparison is performed natively. All other geometry comparisons are performed by GEOS.

    If input geometries are both uncompressed or compressed, all comparisons to identify equality are precise. For mixed combinations, the comparisons are performed with a compression-specific tolerance that allows recognition of equality despite subtle precision losses that the compression may introduce. Note: Geo columns and literals with SRID=4326 are compressed by default.

    Examples: SELECT COUNT(*) FROM tbl WHERE ST_EQUALS('POINT(2 2)', pt); SELECT ST_EQUALS('POLYGON ((0 0,1 0,0 1))', 'POLYGON ((0 0,0 0.5,0 1,1 0,0 0))');

    ST_MAXDISTANCE

    Returns longest planar distance between geometries. In effect, this is the diameter of a circle that encloses both geometries.For example:

    Currently supported variants:

    ST_CONTAINS

    Storage
    Compression
    Datatypes
    Wikipedia: Well-known Text: Geometric objectsarrow-up-right
    WGS 84arrow-up-right
    Importing Geospatial Dataarrow-up-right
    Geospatial Functions

    A set of one or more rings (closed line strings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings). For example: POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1))

    Return a specified geography value from Well-known Text representation.

    Returns X maxima of a geometry.

    Returns a geometry representing the union (or combination) of the two input geometries. Performed by the GEOS module.

    The output is currently limited to MULTIPOLYGON type for the same reasons described in ST_INTERSECTION. Similar post-processing methods can be applied if needed. Empty/NULL geometry outputs are not currently supported.

    Examples: SELECT ST_UNION('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))'); SELECT ST_AREA(ST_UNION(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;

    Returns true if the first geometry object contains the second object. For example:

    You can also use ST_CONTAINS to:

    • Return the count of polys that contain the point (here as WKT): SELECT count(*) FROM geo1 WHERE ST_CONTAINS(poly1, 'POINT(0 0)');

    • Return names from a polys table that contain points in a points table: SELECT a.name FROM polys a, points b WHERE ST_CONTAINS(a.heavyai_geo, b.location);

    heavysql> \d starting_point
    CREATE TABLE starting_point (
                                   name TEXT ENCODING DICT(32),
                                   myPoint GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32)
                                 )
    CREATE TABLE simple_geo (
                              name TEXT ENCODING DICT(32), 
                              location GEOMETRY(POINT,4326)
                             );
    SELECT ST_DISTANCE(p1, poly1) FROM geo1;
    SELECT COUNT(*) FROM geo1 WHERE ST_DISTANCE(p1, `POINT(1 2)`) < 1.0;
    SELECT COUNT(*) FROM geo1 WHERE ST_DISTANCE(p1, ST_GeomFromText('POINT(1 2)')) < 1.0;
    SELECT ST_CONTAINS(
                         mpoly2, 
                         ST_GeomFromText('POINT(-71.064544 42.28787)', 4326)
                       )
                       FROM geo2;
    SELECT a.poly_name, b.pt_name FROM poly a, pt b 
    WHERE ST_Distance(
       ST_Transform(b.heavyai_geo, 900913),
       ST_Transform(b.location, 900913))<1000;
  • Return names from a polys table that contain points in a points table, using a single point in WKT instead of a field in another table: SELECT name FROM poly WHERE ST_CONTAINS( heavyai_geo, ST_GeomFromText('POINT(-98.4886935 29.4260508)', 4326) );

  • ST_TRANSFORM(
    ST_GeomFromText(
    'POLYGON((-76.6168198439371 39.9703199555959,
    -80.5189990254673 40.6493554919257,
    -82.5189990254673 42.6493554919257,
    -76.6168198439371 39.9703199555959)
    )', 4326
    ),
    900913)
    )

    <code></code>

    Web Mercator is not an equal area projection, however. Unless compensated by a scaling factor, Web Mercator areas can vary considerably by latitude.

    -80.5189990254673 40.6493554919257,
    -82.5189990254673 42.6493554919257,
    -76.6168198439371 39.9703199555959)
    )',
    4326)
    )
    from tbl;
    'POINT(-118.4079 33.9434)', 4326),
    ST_GeogFromText('POINT(2.5559 49.0083)',
    4326 ),
    10000000.0) FROM tbl;

    Functions and Operators

    Functions and Operators (DML)

    hashtag
    Basic Mathematical Operators

    Operator
    Description

    +numeric

    hashtag
    Mathematical Operator Precedence

    1. Parenthesization

    2. Multiplication and division

    3. Addition and subtraction

    hashtag
    Comparison Operators

    Operator
    Description

    hashtag
    Mathematical Functions

    Function
    Description

    hashtag
    Trigonometric Functions

    Function
    Description

    hashtag
    Geometric Functions

    Function
    Description

    hashtag
    String Functions

    Function
    Description

    hashtag
    Pattern-Matching Functions

    Name
    Example
    Description

    Usage Notes

    The following wildcard characters are supported by LIKE and ILIKE:

    • % matches any number of characters, including zero characters.

    • _ matches exactly one character.

    hashtag
    Date/Time Functions

    Function
    Description

    hashtag
    Supported Types

    Supported date_part types:

    Supported interval types:

    hashtag
    Accepted Date, Time, and Timestamp Formats

    Datatype
    Formats
    Examples

    hashtag
    Usage Notes

    • For two-digit years, years 69-99 are assumed to be previous century (for example, 1969), and 0-68 are assumed to be current century (for example, 2016).

    • For four-digit years, negative years (BC) are not supported.

    • Hours are expressed in 24-hour format.

    hashtag
    Statistical and Aggregate Functions

    Both double-precision (standard) and single-precision floating point statistical functions are provided. Single-precision functions run faster on GPUs but might cause overflow errors.

    hashtag
    Usage Notes

    • COUNT(DISTINCT x), especially when used in conjunction with GROUP BY, can require a very large amount of memory to keep track of all distinct values in large tables with large cardinalities. To avoid this large overhead, use APPROX_COUNT_DISTINCT.

    • APPROX_COUNT_DISTINCT(x,

    hashtag
    Miscellaneous Functions

    Function
    Description

    hashtag
    User-Defined Functions

    You can create your own C++ functions and use them in your SQL queries.

    circle-info
    • User-defined Functions (UDFs) require clang++ version 9. You can verify the version installed using the command clang++ --version.

    • UDFs currently allow any authenticated user to register and execute a runtime function. By default, runtime UDFs are globally disabled but can be enabled with the runtime flag enable-runtime-udf

    1. Create your function and save it in a .cpp file; for example, /var/lib/omnisci/udf_myFunction.cpp.

    2. Add the UDF configuration flag to omnisci.conf. For example:

    3. Use your function in a SQL query. For example:

    hashtag
    Sample User-Defined Function

    This function, udf_diff.cpp, returns the difference of two values from a table.

    hashtag
    Code Commentary

    Include the standard integer library, which supports the following datatypes:

    • bool

    • int8_t (cstdint), char

    • int16_t (cstdint), short

    The next four lines are boilerplate code that allows OmniSci to determine whether the server is running with GPUs. OmniSci chooses whether it should compile the function inline to achieve the best possible performance.

    The next line is the actual user-defined function, which returns the difference between INTEGER values x and y.

    To run the udf_diff function, add this line to your /var/lib/omnisci/omnisci.conf file (in this example, the .cpp file is stored at /var/lib/omnisci/udf_diff.cpp):

    Restart the OmniSci server.

    Use your command from an OmniSci SQL client to query, for example, a table named myTable that contains the INTEGER columns myInt1 and myInt2.

    OmniSci returns the difference as an INTEGER value.

    <

    Less than

    <=

    Less than or equal to

    BETWEEN x AND y

    Is a value within a range

    NOT BETWEEN x AND y

    Is a value not within a range

    IS NULL

    Is a value that is null

    IS NOT NULL

    Is a value that is not null

    NULLIF(x, y)

    Compare expressions x and y. If different, return x. If they are the same, return null. For example, if a dataset uses ‘NA’ for null values, you can use this statement to return null using SELECT NULLIF(field_name,'NA').

    IS TRUE

    True if a value resolves to TRUE.

    IS NOT TRUE

    True if a value resolves to FALSE.

    FLOOR(x)

    Returns the largest integer not greater than the argument

    LN(x)

    Returns the natural logarithm of x

    LOG(x)

    Returns the natural logarithm of x

    LOG10(x)

    Returns the base-10 logarithm of the specified float expression x

    MOD(x,y)

    Returns the remainder of int x divided by int y

    PI()

    Returns the value of pi

    POWER(x,y)

    Returns the value of x raised to the power of y

    RADIANS(x)

    Converts degrees to radians

    ROUND(x)

    Rounds x to the nearest integer value, but does not change the data type. For example, the double value 4.1 rounds to the double value 4.

    ROUND_TO_DIGIT (x,y)

    Rounds x to y decimal places

    SIGN(x)

    Returns the sign of x as -1, 0, 1 if x is negative, zero, or positive

    SQRT(x)

    Returns the square root of x.

    TRUNCATE(x,y)

    Truncates x to y decimal places

    WIDTH_BUCKET(target,lower-boundary,upper-boundary,bucket-count)

    Define equal-width intervals (buckets) in a range between the lower boundary and the upper boundary, and returns the bucket number to which the target expression is assigned.

    • target - A constant, column variable, or general expression for which a bucket number is returned.

    • lower-boundary - Lower boundary for the range of values to be partitioned equally.

    COS(x)

    Returns the cosine of x

    COT(x)

    Returns the cotangent of x

    SIN(x)

    Returns the sine of x

    TAN(x)

    Returns the tangent of x

    ENCODE_TEXT(none_encoded_str)

    Converts a none-encoded string to a transient dictionary-encoded string to allow for operations like group-by on top. When the watchdog is enabled, the number of strings that can be casted using this operator is capped by the value set with the watchdog-none-encoded-string-translation-limit flag (1,000,000 by default).

    HASH(str)

    Deterministically Hashes a string input to a BIGINT output using a pseudo-random function. Can be useful for bucketing string values or deterministcally coloring by string values for a high-cardinality TEXT column. Note that currently HASH only accepts TEXT inputs, but in the future may also accept other data types. It should also be noted that NULL values always hash to NULL outputs.

    INITCAP(str)

    Returns the string with initial caps after any of the defined delimiter characters, with the remainder of the characters lowercased. Valid delimiter characters are !, ?, @, ", ^, #, $, &, ~,

    JAROWINKLER_SIMILARITY( str1, str2 )

    Computes the Jaro-Winkler similarity score between two input strings. The output will be an integer between 0 and 100, with 0 representing completely dissimilar strings, and 100 representing exactly matching strings.

    JSON_VALUE(json_str, path)

    Returns the string of a field given by path instr. Paths start with the $ character, with sub-fields split by . and array members indexed by [], with array indices starting at 0. For example, JSON_VALUE('{"name": "Brenda", "scores": [89, 98, 94]}', '$.scores[1]') would yield a TEXT return field of '98'. Note that currentlyLAX parsing mode (any unmatched path returns null rather than errors) is the default, and STRICT parsing mode is not supported.

    KEY_FOR_STRING(str)

    Returns the dictionary key of a dictionary-encoded string column.

    LCASE(str)

    Returns the string in all lower case. Only ASCII character set is currently supported. Same as LOWER.

    LEFT(str, num)

    Returns the left-most number (num) of characters in the string (str).

    LENGTH(str)

    Returns the length of a string in bytes. Only works with unencoded fields (ENCODING set to none).

    LEVENSHTEIN_DISTANCE( str1, str2 )

    Computes the edit distance, or number of single-character insertions, deletions, or substitutions, that must be made to make the first string equal the second. It returns an integer greater than or equal to 0, with 0 meaning the strings are equal. The higher the return value, the more the two strings can be thought of as dissimilar.

    LOWER(str)

    Returns the string in all lower case. Only ASCII character set is currently supported. Same as LCASE.

    LPAD(str,`` len, [lpad_str ``])

    Left-pads the string with the string defined in lpad_str to a total length of len. If the optional lpad_str is not specified, the space character is used to pad. If the length of str is greater than len, then characters from the end of str are truncated to the length of len. Characters are added from lpad_str successively until the target length len is met. If lpad_str concatenated with str

    LTRIM(str,`` chars)

    Removes any leading characters specified in chars from the string. Alias for TRIM.

    OVERLAY(strPLACING`` replacement_strFROM`` start ``[FORlen])

    Replaces in str the number of characters defined in len with characters defined in replacement_str at the location start. Regardless of the length of replacement_str, len characters are removed from str unless start + replacement_str is greater than the length of str, in which case all characters from start

    POSITION (`` search_str``IN``str``[FROM``start_position])

    Returns the position of the first character in search_str if found in str, optionally starting the search at start_position. If search_str is not found, 0 is returned. If search_str or str are null, null is returned.

    REGEXP_COUNT(str,`` pattern``[,``position, [flags]])

    Returns the number of times that the provided pattern occurs in the search string str. position specifies the starting position in str for which the search for pattern will start (all matches before position will be ignored. If position is negative, the search will start that many characters from the end of the string str. Use the following optional flags to control the matching behavior: c - Case-sensitive matching. i - Case-insensitive matching.

    REGEXP_REPLACE(str,`` pattern``[,``new_str,`` position,`` occurrence, [

    Replace one or all matches of a substring in string str that matches pattern , which is a regular expression in POSIX regex syntax.

    new_str (optional) is the string that replaces the string matching the pattern. If new_str is empty or not supplied, all found matches are removed.

    The occurrence integer argument (optional) specifies the single match occurrence of the pattern to replace, starting from the beginning of str; 0 (replace all) is the default. Use a negative occurrence argument to signify the n

    REGEXP_SUBSTR(str,`` pattern ``[,``position,`` occurrence,`` flags, group_num])

    Search string str for pattern, which is a , and return the matching substring.

    Use position to set the character position to begin searching. Use occurrence to specify the occurrence of the pattern to match.

    Use a positive position argument to indicate the number of characters from the beginning of str. Use a negative position argument to indicate the number of characters from the end of str

    REPEAT(str,`` num)

    Repeats the string the number of times defined in num.

    REPLACE(str,`` from_str,`` new_str)

    Replaces all occurrences of substring from_str within a string, with a new substring new_str.

    REVERSE(str)

    Reverses the string.

    RIGHT(str, num)

    Returns the right-most number (num) of characters in the string (str).

    RPAD(str,`` len,`` rpad_str)

    Right-pads the string with the string defined in rpad_str to a total length of len. If the optional rpad_str is not specified, the space character is used to pad. If the length of str is greater than len, then characters from the beginning of str are truncated to the length of len. Characters are added from rpad_str successively until the target length len is met. If rpad_str concatenated with str

    RTRIM(str)

    Removes any trailing spaces from the string.

    SPLIT_PART(str,`` delim,`` field_num)

    Split the string based on a delimiter delim and return the field identified by field_num. Fields are numbered from left to right.

    STRTOK_TO_ARRAY(str, [delim])

    Tokenizes the string str using optional delimiter(s) delim and returns an array of tokens. An empty array is returned if no tokens are produced in tokenization. NULL is returned if either parameter is a NULL.

    SUBSTR(str,`` start, [len])

    Alias for SUBSTRING.

    SUBSTRING(str FROM start [ FOR len])

    Returns a substring of str starting at index start for len characters.

    The start position is 1-based (that is, the first character of str is at index 1, not 0). However, start 0 aliases to start 1.

    If start is negative, it is considered to be

    TRIM([BOTH | LEADING | TRAILING] [trim_str``FROM``str])

    Removes characters defined in trim_str from the beginning, end, or both of str. If trim_str is not specified, the space character is the default. If the trim location is not specified, defined characters are trimmed from both the beginning and end of str.

    TRY_CAST( str AS type)

    Attempts to cast/convert a string type to any valid numeric, timestamp, date, or time type. If the conversion cannot be performed, null is returned. Note that TRY_CAST is not valid for non-string input types.

    UCASE(str)

    Returns the string in uppercase format. Only ASCII character set is currently supported. Same as UPPER.

    UPPER(str)

    Returns the string in uppercase format. Only ASCII character set is currently supported. Same as UCASE.

    URL_DECODE( str )

    Decode a url-encoded string. This is the inverse of the URL_ENCODE function.

    URL_ENCODE( str )

    Url-encode a string. Alphanumeric and the 4 characters: _-.~ are untranslated. The space character is translated to +. All other characters are translated into a 3-character sequence %XX where XX is the 2-digit hexadecimal ASCII value of the character.

    'AB' ILIKE 'ab'

    Returns true if the string matches the pattern (case-insensitive). Supported only when the right side is a string literal; for example, colors.name ILIKE 'b%

    str REGEXP POSIX pattern

    '^[a-z]+r$'

    Lowercase string ending with r

    REGEXP_LIKE ( str , POSIX pattern )

    '^[hc]at'

    cat or hat

    DATEDIFF('date_part', date, date)

    Returns the difference between two dates, calculated to the lowest level of the date_part you specify. For example, if you set the date_part as DAY, only the year, month, and day are used to calculate the result. Other fields, such as hour and minute, are ignored.

    Example:

    SELECT DATEDIFF('YEAR', plane_issue_date, now()) Years_In_Service FROM flights_2008_10k LIMIT 10;

    DATEPART('interval', date | timestamp)

    Returns a specified part of a given date or timestamp as an integer value. Note that 'interval' must be enclosed in single quotes.

    Example:

    SELECT DATEPART('YEAR', plane_issue_date) Year_Issued FROM flights_2008_10k LIMIT 10;

    DATE_TRUNC(date_part, timestamp)

    Truncates the timestamp to the specified date_part. DATE_TRUNC(week,...) starts on Monday (ISO), which is different than EXTRACT(dow,...), which starts on Sunday.

    Example:

    SELECT DATE_TRUNC(MINUTE, arr_timestamp) Arrival FROM flights_2008_10k LIMIT 10;

    EXTRACT(date_part FROM timestamp)

    Returns the specified date_part from timestamp.

    Example:

    SELECT EXTRACT(HOUR FROM arr_timestamp) Arrival_Hour FROM flights_2008_10k LIMIT 10;

    INTERVAL 'count' date_part

    Adds or Subtracts count date_part units from a timestamp. Note that 'count' is enclosed in single quotes.

    Example:

    SELECT arr_timestamp + INTERVAL '10' YEAR FROM flights_2008_10k LIMIT 10;

    NOW()

    Return the current timestamp in the GMT time zone. Same as CURRENT_TIMESTAMP().

    Example:

    NOW();

    TIMESTAMPADD(date_part, count, timestamp | date)

    Adds an interval of count date_part to timestamp or date and returns signed date_part units in the provided timestamp or date form.

    Example:

    SELECT TIMESTAMPADD(DAY, 14, arr_timestamp) Fortnight FROM flights_2008_10k LIMIT 10;

    TIMESTAMPDIFF(date_part, timestamp1, timestamp2)

    Subtracts timestamp1 from timestamp2 and returns the result in signed date_part units.

    Example:

    SELECT TIMESTAMPDIFF(MINUTE, arr_timestamp, dep_timestamp) Flight_Time FROM flights_2008_10k LIMIT 10;

    DD-MON-YY

    31-Oct-13

    DATE

    DD/Mon/YYYY

    31/Oct/2013

    EPOCH

    1383262225

    TIME

    HH:MM

    23:49

    TIME

    HHMMSS

    234901

    TIME

    HH:MM:SS

    23:49:01

    TIMESTAMP

    DATE TIME

    31-Oct-13 23:49:01

    TIMESTAMP

    DATETTIME

    31-Oct-13T23:49:01

    TIMESTAMP

    DATE:TIME

    11/31/2013:234901

    TIMESTAMP

    DATE TIME ZONE

    31-Oct-13 11:30:25 -0800

    TIMESTAMP

    DATE HH.MM.SS PM

    31-Oct-13 11.30.25pm

    TIMESTAMP

    DATE HH:MM:SS PM

    31-Oct-13 11:30:25pm

    TIMESTAMP

    1383262225

    When time components are separated by colons, you can write them as one or two digits.

  • Months are case insensitive. You can spell them out or abbreviate to three characters.

  • For timestamps, decimal seconds are ignored. Time zone offsets are written as +/-HHMM.

  • For timestamps, a numeric string is converted to +/- seconds since January 1, 1970. Supported timestamps range from -30610224000 (January 1, 1000) through 29379456000 (December 31, 2900).

  • On output, dates are formatted as YYYY-MM-DD. Times are formatted as HH:MM:SS.

  • Linux EPOCH values range from -30610224000 (1/1/1000) through 185542587100800 (1/1/5885487). Complete range in years: +/-5,883,517 around epoch.

  • Returns the count of distinct values of x

    APPROX_COUNT_DISTINCT(x, e)

    Returns the approximate count of distinct values of x with defined expected error rate e, where e is an integer from 1 to 100. If no value is set for e, the approximate count is calculated using the system-widehll-precision-bits configuration parameter.

    APPROX_MEDIAN(x)

    Returns the approximate median of x. Two server configuration parameters affect memory usage:

    • <code></code><code></code>

    • <code></code>

    Accuracy of APPROX_MEDIAN depends on the distribution of data; see . Query steps with this operator will run on CPU.

    APPROX_PERCENTILE(x,y)

    Returns the approximate quantile of x, where y is the value between 0 and 1.

    For example, y=0 returns MIN(x), y=1 returns MAX(x), and y=0.5 returns APPROX_MEDIAN(x). Query steps with this operator will run on CPU.

    MAX(x)

    Returns the maximum value of x

    MIN(x)

    Returns the minimum value of x

    MODE(x)

    Returns the most common value of x. Query steps with this operator will run on CPU.

    SINGLE_VALUE

    Returns the input value if there is only one distinct value in the input; otherwise, the query fails.

    SUM(x)

    Returns the sum of the values of x

    SAMPLE(x)

    Returns one sample value from aggregated column x. For example, the following query returns population grouped by city, along with one value from the state column for each group:

    Note: This was previously LAST_SAMPLE, which is now deprecated.

    CORRELATION(x, y)

    CORRELATION_FLOAT(x, y)

    Alias of CORR. Returns the coefficient of correlation of a set of number pairs.

    CORR(x, y)

    CORR_FLOAT(x, y)

    Returns the coefficient of correlation of a set of number pairs.

    COUNT_IF(conditional_expr)

    Returns the number of rows satisfying the given condition_expr.

    COVAR_POP(x, y)

    COVAR_POP_FLOAT(x, y)

    Returns the population covariance of a set of number pairs.

    COVAR_SAMP(x, y)

    COVAR_SAMP_FLOAT(x, y)

    Returns the sample covariance of a set of number pairs.

    STDDEV(x)

    STDDEV_FLOAT(x)

    Alias of STDDEV_SAMP. Returns sample standard deviation of the value.

    STDDEV_POP(x)

    STDDEV_POP_FLOAT(x)

    Returns the population standard the standard deviation of the value.

    STDDEV_SAMP(x)

    STDDEV_SAMP_FLOAT(x)

    Returns the sample standard deviation of the value.

    SUM_IF(conditional_expr)

    Returns the sum of all expression values satisfying the given condition_expr.

    VARIANCE(x)

    VARIANCE_FLOAT(x)

    Alias of VAR_SAMP. Returns the sample variance of the value.

    VAR_POP(x)

    VAR_POP_FLOAT(x)

    Returns the population variance sample variance of the value.

    VAR_SAMP(x)

    VAR_SAMP_FLOAT(x)

    Returns the sample variance of the value.

    e
    )
    gives an approximate count of the value
    x
    , based on an expected error rate defined in
    e
    . The error rate is an integer value from 1 to 100. The lower the value of
    e
    , the higher the precision, and the higher the memory cost. Select a value for
    e
    based on the level of precision required. On large tables with large cardinalities, consider using
    APPROX_COUNT_DISTINCT
    when possible to preserve memory. When data cardinalities permit, OmniSci uses the precise implementation of
    COUNT(DISTINCT
    x
    )
    for
    APPROX_COUNT_DISTINCT
    . Set the default error rate using the
    -hll-precision-bits
    configuration parameter.
  • The accuracy of APPROX_MEDIAN (x) upon the distribution of data. For example:

    • For 100,000,000 integers (1, 2, 3, ... 100M) in random order, APPROX_MEDIAN can provide a highly accurate answer 5+ significant digits.

    • For 100,000,001 integers, where 50,000,000 have value of 0 and 50,000,001 have value of 1, APPROX_MEDIAN returns a value close to 0.5, even though the median is 1.

  • Currently, OmniSci does not support grouping by non-dictionary-encoded strings. However, with the SAMPLE aggregate function, you can select non-dictionary-encoded strings that are presumed to be unique in a group. For example:

    If the aggregated column (user_description in the example above) is not unique within a group, SAMPLE selects a value that might be nondeterministic because of the parallel nature of OmniSci query execution.

  • .
    int32_t (cstdint), int
  • int64_t (cstdint), size_t

  • float

  • double

  • void

  • Returns numeric

    –numeric

    Returns negative value of numeric

    numeric1 + numeric2

    Sum of numeric1 and numeric2

    numeric1 – numeric2

    Difference of numeric1 and numeric2

    numeric1 * numeric2

    Product of numeric1 and numeric2

    numeric1 / numeric2

    Quotient (numeric1 divided by numeric2)

    =

    Equals

    <>

    Not equals

    >

    Greater than

    >=

    ABS(x)

    Returns the absolute value of x

    CEIL(x)

    Returns the smallest integer not less than the argument

    DEGREES(x)

    Converts radians to degrees

    EXP(x)

    ACOS(x)

    Returns the arc cosine of x

    ASIN(x)

    Returns the arc sine of x

    ATAN(x)

    Returns the arc tangent of x

    ATAN2(y,x)

    DISTANCE_IN_METERS(fromLon, fromLat, toLon, toLat)

    Calculates distance in meters between two WGS84 positions.

    CONV_4326_900913_X(x)

    Converts WGS84 latitude to WGS84 Web Mercator x coordinate.

    CONV_4326_900913_Y(y)

    Converts WGS84 longitude to WGS84 Web Mercator y coordinate.

    BASE64_DECODE(str)

    Decodes a BASE64-encoded string.

    BASE64_ENCODE(str)

    Encodes a string to a BASE64-encoded string.

    CHAR_LENGTH(str)

    Returns the number of characters in a string. Only works with unencoded fields (ENCODING set to none).

    str1 || str2 [ || str3... ]

    str LIKE pattern

    'ab' LIKE 'ab'

    Returns true if the string matches the pattern (case-sensitive)

    str NOT LIKE pattern

    'ab' NOT LIKE 'cd'

    Returns true if the string does not match the pattern

    CURRENT_DATE

    CURRENT_DATE()

    Returns the current date in the GMT time zone.

    Example:

    SELECT CURRENT_DATE();

    CURRENT_TIME

    CURRENT_TIME()

    Returns the current time of day in the GMT time zone.

    Example:

    SELECT CURRENT_TIME();

    CURRENT_TIMESTAMP

    CURRENT_TIMESTAMP()

    Return the current timestamp in the GMT time zone. Same as NOW().

    Example:

    SELECT CURRENT_TIMESTAMP();

    DATEADD('date_part', interval, date | timestamp)

    DATE

    YYYY-MM-DD

    2013-10-31

    DATE

    MM/DD/YYYY

    10/31/2013

    Double-precision FP Function

    Single-precision FP Function

    Description

    AVG(x)

    Returns the average value of x

    COUNT()

    Returns the count of the number of rows returned

    SAMPLE_RATIO(x)

    Returns a Boolean value, with the probability of True being returned for a row equal to the input argument. The input argument is a numeric value between 0.0 and 1.0. Negative input values (return False), input values greater than 1.0 returns True, and null input values return False.

    The result of the function is deterministic per row; that is, all calls of the operator for a given row return the same result. The sample ratio is probabilistic, but is generally within a thousandth of a percentile of the actual range when the underlying dataset is millions of records or larger.

    The following example filters approximately 50% of the rows from t and returns a count that is approximately half the number of rows in t:

    SELECT COUNT(*) FROM t WHERE SAMPLE_RATIO(0.5)

    Greater than or equal to

    Returns the value of e to the power of x

    Returns the arc tangent of (x, y) in the range (-π,π]. Equal to ATAN(y/x) for x > 0.

    Returns the string that results from concatenating the strings specified. Note that numeric, date, timestamp, and time types will be implicitly casted to strings as necessary, so explicit casts of non-string types to string types is not required for inputs to the concatenation operator. Note that concatenating a variable string with a string literal, i.e. county_name

    str ILIKE pattern

    Returns a date after a specified time/date interval has been added.

    Example:

    SELECT DATEADD('MINUTE', 6000, dep_timestamp) Arrival_Estimate FROM flights_2008_10k LIMIT 10;

    DATE

    COUNT(DISTINCT x)

    SELECT user_name, SAMPLE(user_decription) FROM tweets GROUP BY user_name;
    DATE_TRUNC [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
                MICROSECOND, NANOSECOND, MILLENNIUM, CENTURY, DECADE, WEEK, 
                WEEK_SUNDAY, QUARTERDAY]
    EXTRACT    [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
                MICROSECOND, NANOSECOND, DOW, ISODOW, DOY, EPOCH, QUARTERDAY, 
                WEEK, WEEK_SUNDAY, DATEEPOCH]
    DATEDIFF   [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
                MICROSECOND, NANOSECOND, WEEK]
    DATEADD       [DECADE, YEAR, QUARTER, MONTH, WEEK, WEEKDAY, DAY, 
                   HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, NANOSECOND]
    TIMESTAMPADD  [YEAR, QUARTER, MONTH, WEEKDAY, DAY, HOUR, MINUTE,
                   SECOND, MILLISECOND, MICROSECOND, NANOSECOND]
    DATEPART      [YEAR, QUARTER, MONTH, DAYOFYEAR, QUARTERDAY, WEEKDAY, DAY, HOUR,
                   MINUTE, SECOND, MILLISECOND, MICROSECOND, NANOSECOND]
    udf = "/var/lib/omnisci/udf_myFunction.cpp"
    SELECT udf_myFunction FROM myTable
    #include <cstdint>
    #if defined(__CUDA_ARCH__) && defined(__CUDACC__) && defined(__clang__)
    #define DEVICE __device__
    #define NEVER_INLINE
    #define ALWAYS_INLINE
    #else
    #define DEVICE
    #define NEVER_INLINE __attribute__((noinline))
    #define ALWAYS_INLINE __attribute__((always_inline))
    #endif
    #define EXTENSION_NOINLINE extern "C" NEVER_INLINE DEVICE
    EXTENSION_NOINLINE int32_t udf_diff(const int32_t x, const int32_t y) { return x - y; }
    #include <cstdint>
    #include <cstdint>
    #if defined(__CUDA_ARCH__) && defined(__CUDACC__) && defined(__clang__)
    #define DEVICE __device__
    #define NEVER_INLINE
    #define ALWAYS_INLINE
    #else
    #define DEVICE
    #define NEVER_INLINE __attribute__((noinline))
    #define ALWAYS_INLINE __attribute__((always_inline))
    #endif
    #define EXTENSION_NOINLINE extern "C" NEVER_INLINE DEVICE
    EXTENSION_NOINLINE int32_t udf_diff(const int32_t x, const int32_t y) { return x - y; }
    udf = "/var/lib/omnisci/udf_diff.cpp"
    SELECT udf_diff(myInt1, myInt2) FROM myTable LIMIT 1;

    upper-boundary - Upper boundary for the range of values to be partitioned equally.

  • partition_count - Number of equal-width buckets in the range defined by the lower and upper boundaries.

  • Expressions can be constants, column variables, or general expressions.

    Example Create 10 age buckets of equal size, with lower bound 0 and upper bound 100 ([0,10], [10,20]... [90,100]), and classify the

    age of a customer accordingly:

    SELECT WIDTH_BUCKET(age, 0, 100, 10) FROM customer;

    For example, a customer of age 34 is assigned to bucket 3 ([30,40]) and the function returns the value 3.

    _
    ,
    ,
    ,
    .
    ,
    :
    ,
    ;
    ,
    +
    ,
    -
    ,
    *
    ,
    %
    ,
    /
    ,
    |
    ,
    \
    ,
    [
    ,
    ]
    ,
    (
    ,
    )
    ,
    {
    ,
    }
    ,
    <
    ,
    >
    .
    is not long enough to equal the target
    len
    ,
    lpad_str
    is repeated, partially if necessary, until the target length is met.
    to the end of
    str
    are replaced. If
    start
    is negative, it specifies the number of characters from the end of
    str
    .
    flags
    ]])
    th-to-last occurrence to be replaced.

    pattern uses POSIX regular expression syntaxarrow-up-right.

    Use a positive position argument to indicate the number of characters from the beginning of str. Use a negative position argument to indicate the number of characters from the end of str.

    Back-references/capture groups can be used to capture and replace specific sub-expressions.

    Use the following optional flags to control the matching behavior: c - Case-sensitive matching. i - Case-insensitive matching.

    If not specified, REGEXP_REPLACE defaults to case sensitive search.

    .

    The occurrence integer argument (optional) specifies the single match occurrence of the pattern to replace, with 0 being mapped to the first (1) occurrence. Use a negative occurrence argument to signify the nth-to-last group in pattern is returned.

    Use optional flags to control the matching behavior: c - Case-sensitive matching.

    e - Extract submatches. i - Case-insensitive matching.

    The c and i flags cannot be used together; e can be used with either. If neither c nor i are specified, or if pattern is not provided, REGEXP_SUBSTR defaults to case-sensitive search.

    If the e flag is used, REGEXP_SUBSTR returns the capture group group_num of pattern matched in str. If the e flag is used, but no capture groups are provided in pattern, REGEXP_SUBSTR returns the entire matching pattern, regardless of group_num. If the e flag is used but no group_num is provided, a value of 1 for group_num is assumed, so the first capture group is returned.

    is not long enough to equal the target
    len
    , r
    pad_str
    is repeated, partially if necessary, until the target length is met.
    regular expression in POSIX syntaxarrow-up-right
    approx_quantile_centroids
    approx_quantile_buffer
    Usage Notes

    Roles and Privileges

    HEAVY.AI supports data security using a set of database object access privileges granted to users or roles.

    hashtag
    Users and Privileges

    When you create a database, the admin superuser is created by default. The admin superuser is granted all privileges on all database objects. Superusers can create new users that, by default, have no database object privileges.

    Superusers can grant users selective access privileges on multiple database objects using two mechanisms: role-based privileges and user-based privileges.

    hashtag
    Role-based Privileges

    1. Grant roles access privileges on database objects.

    2. Grant roles to users.

    3. Grant roles to other roles.

    hashtag
    User-based Privileges

    When a user has privilege requirements that differ from role privileges, you can grant privileges directly to the user. These mechanisms provide data security for many users and classes of users to access the database.

    You have the following options for granting privileges:

    • Each object privilege can be granted to one or many roles, or to one or many users.

    • A role and/or user can be granted privileges on one or many objects.

    • A role can be granted to one or many users or other roles.

    This supports the following many-to-many relationships:

    • Objects and roles

    • Objects and users

    • Roles and users

    These relationships provide flexibility and convenience when granting/revoking privileges to and from users.

    Granting object privileges to roles and users, and granting roles to users, has a cumulative effect. The result of several grant commands is a combination of all individual grant commands. This applies to all database object types and to privileges inherited by objects. For example, object privileges granted to the object of database type are propagated to all table-type objects of that database object.

    hashtag
    Who Can Grant Object Privileges?

    Only a superuser or an object owner can grant privileges for on object.

    • A superuser has all privileges on all database objects.

    • A non-superuser user has only those privileges on a database object that are granted by a superuser.

    • A non-superuser user has ALL privileges on a table created by that user.

    hashtag
    Roles and Privileges Persistence

    • Roles can be created and dropped at any time.

    • Object privileges and roles can be granted or revoked at any time, and the action takes effect immediately.

    • Privilege state is persistent and restored if the HEAVY.AI session is interrupted.

    hashtag
    Database Object Privileges

    There are five database object types, each with its own privileges.

    ACCESS - Connect to the database. The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

    ALL - Allow all privileges on this database except issuing grants and dropping the database.

    SELECT, INSERT, TRUNCATE, UPDATE, DELETE - Allow these operations on any table in the database.

    ALTER SERVER - Alter servers in the current database.

    CREATE SERVER - Create servers in the current database.

    CREATE TABLE - Create a table in the current database. (Also CREATE.)

    Privileges granted on a database-type object are inherited by all tables of that database.

    hashtag
    Privilege Commands

    SQL
    Description

    hashtag
    Example

    The following example shows a valid sequence for granting access privileges to non-superuser user1 by granting a role to user1 and by directly granting a privilege. This example presumes that table1 and user1 already exist, and that user1 has ACCESS privileges on the database where table1 exists.

    1. Create the r_select role.

    2. Grant the SELECT privilege on table1 to the r_select role. Any user granted the r_select role gains the SELECT privilege.

    See for a more complete example.

    hashtag
    CREATE ROLE

    Create a role. Roles are granted to users for role-based database object access.

    This clause requires superuser privilege and <roleName> must not exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <roleName>

    Name of the role to create.

    hashtag
    Example

    Create a payroll department role called payrollDept.

    hashtag
    See Also

    hashtag
    DROP ROLE

    Remove a role.

    This clause requires superuser privilege and <roleName> must exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <roleName>

    Name of the role to drop.

    hashtag
    Example

    Remove the payrollDept role.

    hashtag
    See Also

    hashtag
    GRANT

    Grant role privileges to users and to other roles.

    circle-info

    The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

    This clause requires superuser privilege. The specified <roleNames> and <userNames> must exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <roleNames>

    Names of roles to grant to users and other roles. Use commas to separate multiple role names.

    <userNames>

    Names of users. Use commas to separate multiple user names.

    hashtag
    Examples

    Assign payrollDept role privileges to user dennis.

    Grant payrollDept and accountsPayableDept role privileges to users dennis and mike and role hrDept.

    hashtag
    See Also

    hashtag
    REVOKE

    Remove role privilege from users or from other roles. This removes database object access privileges granted with the role.

    This clause requires superuser privilege. The specified <roleNames> and <userNames> must exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <roleNames>

    Names of roles to remove from users and other roles. Use commas to separate multiple role names.

    <userName>

    Names of the users. Use commas to separate multiple user names.

    hashtag
    Example

    Remove payrollDept role privileges from user dennis.

    Revoke payrollDept and accountsPayableDept role privileges from users dennis and fred and role hrDept.

    hashtag
    See Also

    hashtag
    GRANT ON TABLE

    Define the privilege(s) a role or user has on the specified table. You can specify any combination of the INSERT, SELECT, DELETE, UPDATE, DROP, or TRUNCATE privilege or specify all privileges.

    circle-info

    The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

    This clause requires superuser privilege, or <tableName> must have been created by the user invoking this command. The specified <tableName> and users or roles defined in <entityList> must exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <tableName>

    Name of the database table.

    <entityList>

    Name of entity or entities to be granted the privilege(s).

    Parameter Value
    Descriptions

    hashtag
    Examples

    Permit all privileges on the employees table for the payrollDept role.

    Permit SELECT-only privilege on the employees table for user chris.

    Permit INSERT-only privilege on the employees table for the hrdept and accountsPayableDept roles.

    Permit INSERT, SELECT, and TRUNCATE privileges on the employees table for the role hrDept and for users dennis and mike.

    hashtag
    See Also

    hashtag
    REVOKE ON TABLE

    Remove the privilege(s) a role or user has on the specified table. You can remove any combination of the INSERT, SELECT, DELETE, UPDATE, or TRUNCATE privileges, or remove all privileges.

    This clause requires superuser privilege or <tableName> must have been created by the user invoking this command. The specified <tableName> and users or roles in <entityList> must exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <tableName>

    Name of the database table.

    <entityList>

    Name of entities to be denied the privilege(s).

    Parameter Value
    Descriptions

    hashtag
    Example

    Prohibit SELECT and INSERT operations on the employees table for the nonemployee role.

    Prohibit SELECT operations on the directors table for the employee role.

    Prohibit INSERT operations on the directors table for role employee and user laura.

    Prohibit INSERT, SELECT, and TRUNCATE privileges on the employees table for the role nonemployee and for users dennis and mike.

    hashtag
    See Also

    hashtag
    GRANT ON VIEW

    Define the privileges a role or user has on the specified view. You can specify any combination of the SELECT, INSERT, or DROP privileges, or specify all privileges.

    This clause requires superuser privileges, or <viewName> must have been created by the user invoking this command. The specified <viewName> and users or roles in <entityList> must exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <viewName>

    Name of the database view.

    <entityList>

    Name of entities to be granted the privileges.

    Parameter Value
    Descriptions

    hashtag
    Examples

    Permit SELECT, INSERT, and DROP privileges on the employees view for the payrollDept role.

    Permit SELECT-only privilege on the employees view for the employee role and user venkat.

    Permit INSERT and DROP privileges on the employees view for the hrDept and acctPayableDept roles and users simon and dmitri.

    hashtag
    See Also

    hashtag
    REVOKE ON VIEW

    Remove the privileges a role or user has on the specified view. You can remove any combination of the INSERT, DROP, or SELECT privileges, or remove all privileges.

    This clause requires superuser privilege, or <viewName> must have been created by the user invoking this command. The specified <viewName> and users or roles in <entityList> must exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <viewName>

    Name of the database view.

    <entityList>

    Name of entity to be denied the privilege(s).

    Parameter Value
    Descriptions

    hashtag
    Example

    Prohibit SELECT, DROP, and INSERT operations on the employees view for the nonemployee role.

    Prohibit SELECT operations on the directors view for the employee role.

    Prohibit INSERT and DROP operations on the directors view for the employee and manager role and for users ashish and lindsey.

    hashtag
    See Also

    hashtag
    GRANT ON DATABASE

    Define the valid privileges a role or user has on the specified database. You can specify any combination of privileges, or specify all privileges.

    circle-info

    The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

    This clause requires superuser privileges.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <dbName>

    Name of the database, which must exist, created by CREATE DATABASE.

    <entityList>

    Name of the entity to be granted the privilege.

    Parameter Value
    Descriptions

    hashtag
    Examples

    Permit all operations on the companydb database for the payrollDept role and user david.

    Permit SELECT-only operations on the companydb database for the employee role.

    Permit INSERT, UPDATE, and DROP operations on the companydb database for the hrdept and manager role and for users irene and stephen.

    hashtag
    See Also

    • E

    hashtag
    REVOKE ON DATABASE

    Remove the operations a role or user can perform on the specified database. You can specify privileges individually or specify all privileges.

    This clause requires superuser privilege or the user must own the database object. The specified <dbName> and roles or users in <entityList> must exist.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <dbName>

    Name of the database.

    <entityList>

    Parameter Value
    Descriptions

    hashtag
    Example

    Prohibit all operations on the employees database for the nonemployee role.

    Prohibit SELECT operations on the directors database for the employee role and for user monica.

    Prohibit INSERT, DROP, CREATE, and DELETE operations on the directors database for employee role and for users max and alex.

    hashtag
    See Also

    hashtag
    GRANT ON SERVER

    Define the valid privileges a role or user has for working with servers. You can specify any combination of privileges or specify all privileges.

    This clause requires superuser privileges, or <serverName> must have been created by the user invoking the command.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <serverName>

    Name of the server, which must exist on the current database, created by CREATE SERVER ON DATABASE.

    <entityList>

    Parameter Value
    Descriptions

    hashtag
    Examples

    Grant DROP privilege on server parquet_s3_server to user fred:

    Grant ALTER privilege on server parquet_s3_server to role payrollDept:

    Grant USAGE and ALTER privileges on server parquet_s3_server to role payrollDept and user jamie:

    hashtag
    See Also

    hashtag
    REVOKE ON SERVER

    Remove privileges a role or user has for working with servers. You can specify any combination of privileges or specify all privileges.

    This clause requires superuser privileges, or <serverName> must have been created by the user invoking the command.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <serverName>

    Name of the server, which must exist on the current database, created by CREATE SERVER ON DATABASE.

    <entityList>

    Parameter Value
    Descriptions

    hashtag
    Examples

    Revoke DROP privilege on server parquet_s3_server for user inga:

    Grant ALTER privilege on server parquet_s3_server for role payrollDept:

    Grant USAGE and ALTER privileges on server parquet_s3_server for role payrollDept and user marvin:

    hashtag
    See Also

    hashtag
    GRANT ON DASHBOARD

    Define the valid privileges a role or user has for working with dashboards. You can specify any combination of privileges or specify all privileges.

    This clause requires superuser privileges.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <dashboardId>

    ID of the dashboard, which must exist, created by CREATE DASHBOARD. To show a list of all dashboards and IDs in heavysql, run the \dash command when logged in as superuser.

    <entityList>

    Parameter Value
    Descriptions

    hashtag
    Examples

    Permit all privileges on the dashboard ID 740 for the payrollDept role.

    Permit VIEW-only privilege on dashboard 730 for the hrDept role and user dennis.

    Permit EDIT and DELETE privileges on dashboard 740 for the hrDept and accountsPayableDept roles and for user pavan.

    hashtag
    See Also

    hashtag
    REVOKE ON DASHBOARD

    Remove privileges a role or user has for working with dashboards. You can specify any combination of privileges, or all privileges.

    This clause requires superuser privileges.

    hashtag
    Synopsis

    hashtag
    Parameters

    <privilegeList>

    Parameter Value
    Descriptions

    <dashboardId>

    ID of the dashboard, which must exist, created by CREATE DASHBOARD.

    <entityList>

    Parameter Value
    Descriptions

    Revoke DELETE privileges on dashboard 740 for the payrollDept role.

    Revoke all privileges on dashboard 730 for hrDept role and users dennis and mike.

    Revoke EDIT and DELETE of dashboard 740 for the hrDept and accountsPayableDept roles and for users dante and jonathan.

    hashtag
    See Also

    hashtag
    Common Privilege Levels for Non-Superusers

    The following privilege levels are typically recommended for non-superusers in Immerse. Privileges assigned for users in your organization may vary depending on access requirements.

    Privilege
    Command Syntax to Grant Privilege

    hashtag
    Example: Roles and Privileges

    These examples assume that tables table1 through table4 are created as needed:

    The following examples show how to work with users, roles, tables, and dashboards.

    hashtag
    Create User Accounts

    hashtag
    Grant Access to Users on Database

    hashtag
    Create Marketing Department Roles

    hashtag
    Grant Marketing Department Roles to Marketing Department Employees

    hashtag
    Grant Privelege to Marketing Department Roles

    hashtag
    Create Sales Department Roles

    hashtag
    Grant Sales Department Roles to Sales Department Employees

    hashtag
    Grant Privilege to Sales Department Roles

    hashtag
    Grant All Sales Roles to Sales Department Manager and Marketing Department Manager

    hashtag
    Grant View on Dashboards

    Use the \dash command to list all dashboards and their unique IDs in HEAVY.AI:

    Here, the Marketing_Summary dashboard uses table2 as a data source. The role marketingDeptRole2 has select privileges on that table. Grant view access on the Marketing_Summary dashboard to marketingDeptRole2:

    hashtag
    Relationships Between Users, Roles, and Tables

    The following table shows the roles and privileges for each user created in the previous example.

    User
    Roles Granted
    Table Privileges

    hashtag
    Commands to Report Roles and Privileges

    Use the following commands to list current roles and assigned privileges. If you have superuser access, you can see privileges for all users. Otherwise, you can see only those roles and privileges for which you have access.

    circle-info

    Results for users, roles, privileges, and object privileges are returned in creation order.

    hashtag
    \dash

    Lists all dashboards and dashboard IDs in HEAVY.AI. Requires superuser privileges. Dashboard privileges are assigned by dashboard ID because dashboard names may not be unique.

    Example

    heavysql> \dash database heavyai Dashboard ID | Dashboard Name | Owner 1 | Marketing_Summary | heavyai

    hashtag
    \object_privileges objectType `_objectName`_

    Reports all privileges granted to the specified object for all roles and users. If the specified objectName does not exist, no results are reported. Used for databases and tables only.

    Example

    hashtag
    \privileges roleName | userName

    Reports all object privileges granted to the specified role or user. The roleName or userName specified must exist.

    Example

    hashtag
    \role_list userName

    Reports all roles granted to the given user. The userName specified must exist.

    Example

    hashtag
    \roles

    Reports all roles.

    Example

    hashtag
    \u

    Lists all users.

    Example

    hashtag
    Example: Data Security

    The following example demonstrates field-level security using two views:

    • view_users_limited, in which users only see three of seven fields: userid, First_Name, and Department.

    • view_users_full, users see all seven fields.

    hashtag
    Source Data

    hashtag
    Create Views

    hashtag
    Create Users

    hashtag
    Grant Access to Users on Database

    hashtag
    Create Roles

    hashtag
    Grant Roles to Users

    hashtag
    Grant Privilege to View Roles

    hashtag
    Verify Views

    User readonly1 sees no tables, only the specific view granted, and only the three specific columns returned in the view:

    User readonly2 sees no tables, only the specific view granted, and all seven columns returned in the view:

    A user can be granted one or many roles.
    CREATE VIEW - Create a view for the current database.

    CREATE DASHBOARD - Create a dashboard for the current database.

    DELETE DASHBOARD - Delete a dashboard for this database.

    DROP SERVER - Drop servers from the current database.

    DROP - Drop a table from the database.

    DROP VIEW - Drop a view for this database.

    EDIT DASHBOARD - Edit a dashboard for this database.

    SELECT VIEW - Select a view for this database.

    SERVER USAGE - Use servers (through foreign tables) in the current database.

    VIEW DASHBOARD - View a dashboard for this database.

    VIEW SQL EDITOR - Access the SQL Editor in Immerse for this database.

    circle-info

    Users with SELECT privilege on views do not require SELECT privilege on underlying tables referenced by the view to retrieve the data queried by the view. View queries work without error whether or not users have direct access to referenced tables. This also applies to views that query tables in other databases.

    To create views, users must have SELECT privilege on queried tables in addition to the CREATE VIEW privilege.

    SELECT, INSERT, TRUNCATE, UPDATE, DELETE - Allow these SQL statements on this table.

    DROP - Drop this table.

    circle-info

    Users with SELECT privilege on views do not require SELECT privilege on underlying tables referenced by the view to retrieve the data queried by the view. View queries work without error whether or not users have direct access to referenced tables. This also applies to views that query tables in other databases.

    To create views, users must have SELECT privilege on queried tables in addition to the CREATE VIEW privilege.

    SELECT - Select from this view. Users do not need privileges on objects referenced by this view.

    DROP - Drop this view.

    circle-info

    Users with SELECT privilege on views do not require SELECT privilege on underlying tables referenced by the view to retrieve the data queried by the view. View queries work without error whether or not users have direct access to referenced tables. This also applies to views that query tables in other databases.

    To create views, users must have SELECT privilege on queried tables in addition to the CREATE VIEW privilege.

    VIEW - View this dashboard.

    EDIT - Edit this dashboard.

    DELETE - Delete this dashboard.

    DROP - Drop this server from the current database.

    ALTER - Alter this server in the current database.

    USAGE - Use this server (through foreign tables) in the current database.

    Grant role privilege(s) on a database table to a role or user.

    Revoke role privilege(s) on database table from a role or user.

    Grant role privilege(s) on a database view to a role or user.

    Revoke role privilege(s) on database view from a role or user.

    Grant role privilege(s) on database to a role or user.

    Revoke role privilege(s) on database from a role or user.

    Grant role privilege(s) on server to a role or user.

    Revoke role privilege(s) on server from a role or user.

    Grant role privilege(s) on dashboard to a role or user.

    Revoke role privilege(s) on dashboard from a role or user.

    Grant the r_select role to user1, giving user1 the SELECT privilege on table1.

  • Directly grant user1 the INSERT privilege on table1.

  • GRANT ON DATABASE

    INSERT

    Grant INSERT privilege on <tableName> to <entityList>.

    SELECT

    Grant SELECT privilege on <tableName> to <entityList>.

    TRUNCATE

    Grant TRUNCATE privilege on <tableName> to <entityList>.

    UPDATE

    Grant UPDATE privilege on <tableName> to <entityList>.

    GRANT ON DATABASE

    INSERT

    Remove INSERT privilege for <entityList> on <tableName>.

    SELECT

    Remove SELECT privilege for <entityList> on <tableName>.

    TRUNCATE

    Remove TRUNCATE privilege for <entityList> on <tableName>.

    UPDATE

    Remove UPDATE privilege for <entityList> on <tableName>.

    GRANT ON DATABASE

    CREATE SERVER

    Grant CREATE SERVER privilege on <dbName> to <entityList>;

    CREATE TABLE

    Grant CREATE TABLE privilege on <dbName> to <entityList>. Previously CREATE.

    CREATE VIEW

    Grant CREATE VIEW privilege on <dbName> to <entityList>.

    CREATE DASHBOARD

    Grant CREATE DASHBOARD privilege on <dbName> to <entityList>.

    CREATE

    Grant CREATE privilege on <dbName> to <entityList>.

    DELETE

    Grant DELETE privilege on <dbName> to <entityList>.

    DELETE DASHBOARD

    Grant DELETE DASHBOARD privilege on <dbName> to <entityList>.

    DROP

    Grant DROP privilege on <dbName> to <entityList>.

    DROP SERVER

    Grant DROP privilege on <dbName> to <entityList>.

    DROP VIEW

    Grant DROP VIEW privilege on <dbName> to <entityList>.

    EDIT DASHBOARD

    Grant EDIT DASHBOARD privilege on <dbName> to <entityList>.

    INSERT

    Grant INSERT privilege on <dbName> to <entityList>.

    SELECT

    Grant SELECT privilege on <dbName> to <entityList>.

    SELECT VIEW

    Grant SELECT VIEW privilege on <dbName> to <entityList>.

    SERVER USAGE

    Grant SERVER USAGE privilege on <dbName> to <entityList>.

    TRUNCATE

    Grant TRUNCATE privilege on <dbName> to <entityList>.

    UPDATE

    Grant UPDATE privilege on <dbName> to <entityList>.

    VIEW DASHBOARD

    Grant VIEW DASHBOARD privilege on <dbName> to <entityList>.

    VIEW SQL EDITOR

    Grant VIEW SQL EDITOR privilege in Immerse on <dbName> to <entityList>.

    CREATE TABLE

    Remove CREATE TABLE privilege on <dbName> from <entityList>. Previously CREATE.

    CREATE VIEW

    Remove CREATE VIEW privilege on <dbName> from <entityList>.

    CREATE DASHBOARD

    Remove CREATE DASHBOARD privilege on <dbName> from <entityList>.

    CREATE

    Remove CREATE privilege on <dbName> from <entityList>.

    CREATE SERVER

    Remove CREATE SERVER privilege on <dbName> from <entityList>.

    DELETE

    Remove DELETE privilege on <dbName> from <entityList>.

    DELETE DASHBOARD

    Remove DELETE DASHBOARD privilege on <dbName> from <entityList>.

    DROP

    Remove DROP privilege on <dbName> from <entityList>.

    DROP SERVER

    Remove DROP SERVER privilege on <dbName> from <entityList>.

    DROP VIEW

    Remove DROP VIEW privilege on <dbName> from <entityList>.

    EDIT DASHBOARD

    Remove EDIT DASHBOARD privilege on <dbName> from <entityList>.

    INSERT

    Remove INSERT privilege on <dbName> from <entityList>.

    SELECT

    Remove SELECT privilege on <dbName> from <entityList>.

    SELECT VIEW

    Remove SELECT VIEW privilege on <dbName> from <entityList>.

    SERVER USAGE

    Remove SERVER USAGE privilege on <dbName> from <entityList>.

    TRUNCATE

    Remove TRUNCATE privilege on <dbName> from <entityList>.

    UPDATE

    Remove UPDATE privilege on <dbName> from <entityList>.

    VIEW DASHBOARD

    Remove VIEW DASHBOARD privilege on <dbName> from <entityList>.

    VIEW SQL EDITOR

    Remove VIEW SQL EDITOR privilege in Immerse on <dbName> from <entityList>.

    VIEW

    Grant VIEW privilege on <dashboardId> to <entityList>.

    VIEW

    Revoke VIEW privilege on <dashboardId> for <entityList>.

    GRANT VIEW ON DASHBOARD <dashboardId> TO <entityList>;

    Create a dashboard

    GRANT CREATE DASHBOARD ON DATABASE <dbName> TO <entityList>;

    Edit a dashboard

    GRANT EDIT ON DASHBOARD TO ;

    Delete a dashboard

    GRANT DELETE DASHBOARD ON DATABASE <dbName> TO <entityList>;

    salesDeptRole2

    SELECT on Table 3

    salesDeptEmployee4

    salesDeptRole3

    SELECT on Table 4

    salesDeptManagerEmployee5

    salesDeptRole1, salesDeptRole2, salesDeptRole3

    SELECT on Tables 1, 3, 4

    marketingDeptEmployee1

    marketingDeptRole1

    SELECT on Tables 1, 2

    marketingDeptEmployee2

    marketingDeptRole2

    SELECT on Table 2

    marketingDeptManagerEmployee3

    marketingDeptRole1, marketingDeptRole2, salesDeptRole1, salesDeptRole2, salesDeptRole3

    SELECT on Tables 1, 2, 3, 4

    CREATE ROLE

    Create role.

    DROP ROLE

    Drop role.

    GRANT

    Grant role to user or to another role.

    REVOKE

    ALL

    Grant all possible access privileges on <tableName> to <entityList>.

    ALTER TABLE

    Grant ALTER TABLE privilege on <tableName> to <entityList>.

    DELETE

    Grant DELETE privilege on <tableName> to <entityList>.

    DROP

    role

    Name of role.

    user

    Name of user.

    ALL

    Remove all access privilege for <entityList> on <tableName>.

    ALTER TABLE

    Remove ALTER TABLE privilege for <entityList> on <tableName>.

    DELETE

    Remove DELETE privilege for <entityList> on <tableName>.

    DROP

    role

    Name of role.

    user

    Name of user.

    ALL

    Grant all possible access privileges on <viewName> to <entityList>.

    DROP

    Grant DROP privilege on <viewName> to <entityList>.

    INSERT

    Grant INSERT privilege on <viewName> to <entityList>.

    SELECT

    role

    Name of role.

    user

    Name of user.

    ALL

    Remove all access privilege for <entityList> on <viewName>.

    DROP

    Remove DROP privilege for <entityList> on <viewName>.

    INSERT

    Remove INSERT privilege for <entityList> on <viewName>.

    SELECT

    role

    Name of role.

    user

    Name of user.

    ACCESS

    Grant ACCESS (connection) privilege on <dbName> to <entityList>.

    ALL

    Grant all possible access privileges on <dbName> to <entityList>.

    ALTER TABLE

    Grant ALTER TABLE privilege on <dbName> to <entityList>.

    ALTER SERVER

    role

    Name of role, which must exist.

    user

    Name of user, which must exist. See Users and Databasesarrow-up-right.

    ACCESS

    Remove ACCESS (connection) privilege on <dbName> from <entityList>.

    ALL

    Remove all possible privileges on <dbName> from <entityList>.

    ALTER SERVER

    Remove ALTER SERVER privilege on <dbName> from <entityList>

    ALTER TABLE

    role

    Name of role.

    user

    Name of user.

    DROP

    Grant DROP privileges on <serverName> on current database to <entityList>.

    ALTER

    Grant ALTER privilege on <serverName> on current database to <entityList>.

    USAGE

    Grant USAGE privilege (through foreign tables) on <serverName> on current database to <entityList>.

    role

    Name of role, which must exist.

    user

    Name of user, which must exist. See Users and Databasesarrow-up-right.

    DROP

    Remove DROP privileges on <serverName> on current database for <entityList>.

    ALTER

    Remove ALTER privilege on <serverName> on current database for <entityList>.

    USAGE

    Remove USAGE privilege (through foreign tables) on <serverName> on current database for <entityList>.

    role

    Name of role, which must exist.

    user

    Name of user, which must exist. See Users and Databasesarrow-up-right.

    ALL

    Grant all possible access privileges on <dashboardId> to <entityList>.

    CREATE

    Grant CREATE privilege to <entityList>.

    DELETE

    Grant DELETE privilege on <dashboardId> to <entityList>.

    EDIT

    role

    Name of role, which must exist.

    user

    Name of user, which must exist. See Users and Databasesarrow-up-right.

    ALL

    Revoke all possible access privileges on <dashboardId> for <entityList>.

    CREATE

    Revoke CREATE privilege for <entityList>.

    DELETE

    Revoke DELETE privilege on <dashboardId> for <entityList>.

    EDIT

    role

    Name of role, which must exist.

    user

    Name of user, which must exist. See Users and Databasesarrow-up-right.

    Access a database

    GRANT ACCESS ON DATABASE <dbName> TO <entityList>;

    Create a table

    GRANT CREATE TABLE ON DATABASE <dbName> TO <entityList>;

    Select a table

    GRANT SELECT ON TABLE <tableName> TO <entityList>;

    salesDeptEmployee1

    salesDeptRole1

    SELECT on Tables 1, 3

    salesDeptEmployee2

    salesDeptRole2

    SELECT on Table 3

    Example Roles and Privileges Session
    Users and Databases
    DROP ROLE
    Users and Databases
    CREATE ROLE
    Users and Databases
    CREATE ROLE
    GRANT ON TABLE
    Users and Databases
    CREATE ROLE
    GRANT
    REVOKE ON TABLE
    Tables
    CREATE ROLE
    GRANT ON TABLE
    GRANT ON DATABASE
    REVOKE ON VIEW
    DDL-VIEWS
    CREATE ROLE
    GRANT ON VIEW
    GRANT ON DATABASE
    REVOKE ON DATABAS
    GRANT ON TABLE
    CREATE ROLE
    GRANT ON DATABASE
    GRANT ON TABLE
    REVOKE ON SERVER
    GRANT ON SERVER
    REVOKE ON DASHBOARD
    GRANT ON DASHBOARD

    Revoke role from user or from another role.

    Grant DROP privilege on <tableName> to <entityList>.

    Remove DROP privilege for <entityList> on <tableName>.

    Grant SELECT privilege on <viewName> to <entityList>.

    Remove SELECT privilege for <entityList> on <viewName>.

    Grant ALTER SERVER privilege on <dbName> to <entityList>.

    Remove ALTER TABLE privilege on <dbName> from <entityList>.

    Grant EDIT privilege on <dashboardId> to <entityList>.

    Revoke EDIT privilege on <dashboardId> for <entityList>.

    View a dashboard

    salesDeptEmployee3

    GRANT INSERT ON TABLE table1 TO user1;
    GRANT SELECT ON TABLE table1 TO r_select;
    CREATE ROLE <roleName>;
    CREATE ROLE payrollDept;
    DROP ROLE [IF EXISTS] <roleName>;
    DROP ROLE payrollDept;
    GRANT <roleNames> TO <userNames>, <roleNames>;
    GRANT payrollDept TO dennis;
    GRANT payrollDept, accountsPayableDept TO dennis, mike, hrDept;
    REVOKE <roleNames> FROM <userNames>, <roleNames>;
    REVOKE payrollDept FROM dennis;
    REVOKE payrollDept, accountsPayableDept FROM dennis, fred, hrDept;
    GRANT <privilegeList> ON TABLE <tableName> TO <entityList>;
    GRANT ALL ON TABLE employees TO payrollDept;
    GRANT SELECT ON TABLE employees TO chris;
    GRANT INSERT ON TABLE employees TO hrDept, accountsPayableDept;
    GRANT INSERT, SELECT, TRUNCATE ON TABLE employees TO hrDept, dennis, mike;
    REVOKE <privilegeList> ON TABLE <tableName> FROM <entityList>;
    REVOKE ALL ON TABLE employees FROM nonemployee;
    REVOKE SELECT ON TABLE directors FROM employee;
    REVOKE INSERT ON TABLE directors FROM employee, laura;
    REVOKE INSERT, SELECT, TRUNCATE ON TABLE employees FROM nonemployee, dennis, mike;
    GRANT <privilegeList> ON VIEW <viewName> TO <entityList>;
    GRANT ALL ON VIEW employees TO payrollDept;
    GRANT SELECT ON VIEW employees TO employee, venkat;
    GRANT INSERT, DROP ON VIEW employees TO hrDept, acctPayableDept, simon, dmitri;
    REVOKE <privilegeList> ON VIEW <viewName> FROM <entityList>;
    REVOKE ALL ON VIEW employees FROM nonemployee;
    REVOKE SELECT ON VIEW directors FROM employee;
    REVOKE INSERT, DROP ON VIEW directors FROM employee, manager, ashish, lindsey;
    GRANT <privilegeList> ON DATABASE <dbName> TO <entityList>;
    GRANT ALL ON DATABASE companydb TO payrollDept, david;
    GRANT ACCESS, SELECT ON DATABASE companydb TO employee;
    GRANT ACCESS, INSERT, UPDATE, DROP ON DATABASE companydb TO hrdept, manager, irene, stephen;
    REVOKE <privilegeList> ON DATABASE <dbName> FROM <entityList>;
    REVOKE ALL ON DATABASE employees FROM nonemployee;
    REVOKE SELECT ON DATABASE directors FROM employee;
    REVOKE INSERT, DROP, CREATE, DELETE ON DATABASE directors FROM employee;
    GRANT <privilegeList> ON SERVER <serverName> TO <entityList>;
    GRANT DROP ON SERVER parquet_s3_server TO fred
    GRANT ALTER ON SERVER parquet_s3_server TO payrollDept;
    GRANT USAGE, ALTER ON SERVER parquet_s3_server TO payrollDept, jamie;
    REVOKE <privilegeList> ON SERVER <serverName> FROM <entityList>;
    REVOKE DROP ON SERVER parquet_s3_server FROM inga
    REVOKE ALTER ON SERVER parquet_s3_server FROM payrollDept;
    REVOKE USAGE, ALTER ON SERVER parquet_s3_server FROM payrollDept, marvin;
    GRANT <privilegeList> [ON DASHBOARD <dashboardId>] TO <entityList>;
    GRANT ALL ON DASHBOARD 740 TO payrollDept;
    GRANT VIEW ON DASHBOARD 730 TO hrDept, dennis;
    GRANT EDIT, DELETE ON DASHBOARD 740 TO hrdept, accountsPayableDept, pavan;
    REVOKE <privilegeList> [ON DASHBOARD <dashboardId>] FROM <entityList>;
    REVOKE DELETE ON DASHBOARD 740 FROM payrollDept;
    REVOKE ALL ON DASHBOARD 730 FROM hrDept, dennis, mike;
    REVOKE EDIT, DELETE ON DASHBOARD 740 FROM hrdept, accountsPayableDept, dante, jonathan;
    create table table1 (id smallint);
    create table table2 (id smallint);
    create table table3 (id smallint);
    create table table4 (id smallint);
    create user marketingDeptEmployee1 (password = 'md1');
    create user marketingDeptEmployee2 (password = 'md2');
    create user marketingDeptManagerEmployee3 (password = 'md3');
    
    create user salesDeptEmployee1 (password = 'sd1');
    create user salesDeptEmployee2 (password = 'sd2');
    create user salesDeptEmployee3 (password = 'sd3');
    create user salesDeptEmployee4 (password = 'sd4');
    create user salesDeptManagerEmployee5 (password = 'sd5');
    grant access on database heavyai to marketingDeptEmployee1, marketingDeptEmployee2, marketingDeptManagerEmployee3;
    grant access on database heavyai to salesDeptEmployee1, salesDeptEmployee2, salesDeptEmployee3, salesDeptEmployee4, salesDeptManagerEmployee5;
    create role marketingDeptRole1;
    create role marketingDeptRole2;
    grant marketingDeptRole1 to marketingDeptEmployee1, marketingDeptManagerEmployee3;
    grant marketingDeptRole2 to marketingDeptEmployee2, marketingDeptManagerEmployee3;
    grant select on table table1 to marketingDeptRole1;
    grant select on table table2 to marketingDeptRole1;
    grant select on table table2 to marketingDeptRole2;
    create role salesDeptRole1;
    create role salesDeptRole2;
    create role salesDeptRole3;
    grant salesDeptRole1 to salesDeptEmployee1;
    grant salesDeptRole2 to salesDeptEmployee2, salesDeptEmployee3;
    grant salesDeptRole3 to salesDeptEmployee4;
    grant select on table table1 to salesDeptRole1;
    grant select on table table3 to salesDeptRole1, salesDeptRole2;
    grant select on table table4 to salesDeptRole3;
    grant salesDeptRole1, salesDeptRole2, salesDeptRole3 to salesDeptManagerEmployee5, marketingDeptManagerEmployee3;
    heavysql> \dash 
    Dashboard ID | Dashboard Name    | Owner 
    1            | Marketing_Summary | heavyai
    grant view on dashboard 1 to marketingDeptRole2;
    heavysql> \dash database heavyai 
    Dashboard ID | Dashboard Name    | Owner 
    1            | Marketing_Summary | heavyai
    heavysql> \object_privileges database heavyai 
    marketingDeptEmployee1 privileges: login-access 
    marketingDeptEmployee2 privileges: login-access marketingDeptManagerEmployee3 privileges: login-access
    salesDeptEmployee1 privileges: login-access 
    salesDeptEmployee2 privileges: login-access 
    salesDeptEmployee3 privileges: login-access 
    salesDeptEmployee4 privileges: login-access 
    salesDeptManagerEmployee5 privileges: login-access
    heavysql> \privileges salesDeptRole1 
    table1 (table): select 
    table3 (table): select
    heavysql> \privileges salesDeptManagerEmployee5 
    mapd (database): login-access
    
    heavysql> \privileges marketingdeptrole2 
    table2 (table): select
    Marketing_Summary (dashboard): view
    heavysql> \role_list salesDeptManagerEmployee5
    salesDeptRole3 
    salesDeptRole2 
    salesDeptRole1
    heavysql> \roles
    marketingDeptRole1 
    marketingDeptRole2 
    salesDeptRole1 
    salesDeptRole2 
    salesDeptRole3
    heavysql> \u 
    heavyai 
    marketingDeptEmployee1 
    marketingDeptEmployee2 
    salesDeptEmployee1 
    salesDeptEmployee2 
    salesDeptEmployee3 
    salesDeptEmployee4 
    salesDeptManagerEmployee5 
    marketingDeptManagerEmployee3
    create view view_users_limited as select userid, First_Name, Department from users;
    create view view_users_full as select userid, First_Name, Department, Address, City, State, Zip from users;
    create user readonly1 (password = 'rr1');
    create user readonly2 (password = 'rr2');
    grant access on database heavyai to readonly1, readonly2;
    create role limited_readonly;
    create role full_readonly;
    grant limited_readonly to readonly1;
    grant full_readonly to readonly2;
    grant select on view view_users_limited to limited_readonly;
    grant select on view view_users_full TO full_readonly;
    heavysql> \t
    heavysql> \v
    view_users_limited
    heavysql> select * from view_users_limited;
    userid|First_Name|Department
    1|Todd|C Suite
    2|Don|Sales
    3|Mike|Customer Success
    heavysql> \t
    heavysql> \v
    view_users_full
    heavysql> select * from view_users_full;
    userid|First_Name|Department|Address|City|State|Zip
    1|Todd|C Suite|1 Front Street|San Francisco|CA|94111
    2|Don|Sales|1 5th Avenue|New York|NY|10001
    3|Mike|Customer Succes|100 Main Street|Reston|VA|20191
    GRANT ON TABLE
    REVOKE ON TABLE
    GRANT ON VIEW
    REVOKE ON VIEW
    GRANT ON DATABASE
    REVOKE ON DATABASE
    GRANT ON SERVER
    REVOKE ON SERVER
    GRANT ON DASHBOARD
    REVOKE ON DASHBOARD

    Loading Data with SQL

    This topic describes several ways to load data to HEAVY.AI using SQL commands.

    circle-info
    • If there is a potential for duplicate entries, and you want to avoid loading duplicate rows, see How can I avoid creating duplicate rows?arrow-up-right on the Troubleshooting page.

    • If a source file uses a reserved word, HEAVY.AI automatically adds an underscore at the end of the reserved word. For example, year is converted to year_.

    hashtag
    COPY FROM

    hashtag
    CSV/TSV Import

    Use the following syntax for CSV and TSV files:

    <file pattern> must be local on the server. The file pattern can contain wildcards if you want to load multiple files. In addition to CSV, TSV, and TXT files, you can import compressed files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

    circle-info

    COPY FROM appends data from the source into the target table. It does not truncate the table or overwrite existing data.

    You can import client-side files (\copy command in heavysql) but it is significantly slower. For large files, HEAVY.AI recommends that you first scp the file to the server, and then issue the COPY command.

    HEAVYAI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI.

    Available properties in the optional WITH clause are described in the following table.

    Parameter
    Description
    Default Value
    circle-info

    By default, the CSV parser assumes one row per line. To import a file with multiple lines in a single field, specify threads = 1 in the WITH clause.

    hashtag
    Examples

    hashtag
    Geo Import

    You can use COPY FROM to import geo files. You can create the table based on the source file and then load the data:

    You can also append data to an existing, predefined table:

    Use the following syntax, depending on the file source.

    circle-info
    • If you are using COPY FROM to load to an existing table, the field type must match the metadata of the source file. If it does not, COPY FROM throws an error and does not load the data.

    The following WITH options are available for geo file imports from all sources.

    circle-info

    Currently, a manually created geo table can have only one geo column. If it has more than one, import is not performed.

    Any GDAL-supported file type can be imported. If it is not supported, GDAL throws an error.

    circle-exclamation

    An ESRI file geodatabase can have multiple layers, and importing it results in the creation of one table for each layer in the file. This behavior differs from that of importing shapefiles, GeoJSON, or KML files, which results in a single table. For more information, see .

    circle-info

    The first compatible file in the bundle is loaded; subfolders are traversed until a compatible file is found. The rest of the contents in the bundle are ignored. If the bundle contains multiple filesets, unpack the file manually and specify it for import.

    For more information about importing specific geo file formats, see .

    CSV files containing WKT strings are not considered geo files and should not be parsed with the source_type='geo' option. When importing WKT strings from CSV files, you must create the table first. The geo column type and encoding are specified as part of the DDL. For example, for a polygon with no encoding, try the following:

    hashtag
    Raster Import

    You can use COPY FROM to import raster files supported by GDAL as one row per pixel, where a pixel may consist of one or more data bands, with optional corresponding pixel or world-space coordinate columns. This allows the data to be rendered as a point/symbol cloud that approximates a 2D image.

    Use the same syntax that you would for , depending on the file source.

    The following WITH options are available for raster file imports from all sources.

    Parameter
    Description
    Default Value
    circle-info

    Illegal combinations of raster_point_type and raster_point_transform are rejected. For example, world transform can only be performed on raster files that have a geospatial coordinate system in their metadata, and cannot be performed if <type> is an integer format (which cannot represent world-space coordinate values).

    Any GDAL-supported file type can be imported. If it is not supported, GDAL throws an error.

    circle-exclamation

    HDF5 and possibly other GDAL drivers may not be thread-safe, so use WITH (threads=1) when importing.

    circle-info

    Archive file import (.zip, .tar, .tar.gz) is not currently supported for raster files.

    Band and Column Names

    The following raster file formats contain the metadata required to derive sensible names for the bands, which are then used for their corresponding columns:

    • GRIB2 - geospatial/meteorological format

    • OME TIFF - an OpenMicroscopy format

    The band names from the file are sanitized (illegal characters and spaces removed) and de-duplicated (addition of a suffix in cases where the same band name is repeated within the file or across datasets).

    For other formats, the columns are named band_1_1, band_1_2 , and so on.

    The sanitized and de-duplicated names must be used for the raster_import_bands option.

    Band and Column Data Types

    Raster files can have bands in the following data types:

    • Signed or unsigned 8-, 16-, or 32-bit integer

    • 32- or 64-bit floating point

    • Complex number formats (not supported)

    Signed data is stored in the directly corresponding column type, as follows:

    int8 -> TINYINT int16 -> SMALLINT int32 -> INT float32 -> FLOAT float64 -> DOUBLE

    Unsigned integer column types are not currently supported, so any data of those types is converted to the next size up signed column type:

    uint8 -> SMALLINT uint16 -> INT uint32 -> BIGINT

    Column types cannot currently be overridden.

    hashtag
    Raster Tiled Import

    circle-info

    Tiled Raster Import is currently a beta feature and currently does not have suppport for some of the features and flags supported by the existing Raster Import. For best performance, we currently suggest specifying point data as raster_lon/raster_lat DOUBLE values.

    A new feature allows raster import to support data-tiling and can be used by using the flag enable-legacy-raster-import=false at startup. This will allow the import to organize raster data by geospatial coordinates to better optimize data access.

    Once the server flag is set, the functionality is invoked using the COPY FROM sql command. A notable difference between the tiled import and the legacy import is that the tiled import requires the table to be created in advance with columns specified, and uses some new WITH options.

    The following additional WITH options are available for raster files using the new tiled importer

    Parameter
    Description
    Default Value

    hashtag
    ODBC Import

    circle-info

    ODBC import is currently a beta feature.

    You can use COPY FROM to import data from a Relational Database Management System (RDMS) or data warehouse using the Open Database Connectivity (ODBC) interface.

    The following WITH options are available for ODBC import.

    hashtag
    Examples

    Using a data source name:

    Using a connection string:

    For information about using ODBC HeavyConnect, see .

    hashtag
    Globbing, Filtering, and Sorting Parquet and CSV Files

    These examples assume the following folder and file structure:

    hashtag
    Globbing

    Local Parquet/CSV files can now be globbed by specifying either a path name with a wildcard or a folder name.

    Globbing a folder recursively returns all files under the specified folder. For example,

    COPY table_1 FROM ".../subdir";

    returns file_3, file_4, file_5.

    Globbing with a wildcard returns any file paths matching the expanded file path. So

    COPY table_1 FROM ".../subdir/file*"; returns file_3, file_4.

    Does not apply to S3 cases, because file paths specified for S3 always use prefix matching.

    hashtag
    Filtering

    Use file filtering to filter out unwanted files that have been globbed. To use filtering, specify the REGEX_PATH_FILTER option. Files not matching this pattern are not included on import. Consistent across local and S3 use cases.

    The following regex expression:

    COPY table_1 from ".../" WITH (REGEX_PATH_FILTER=".*file_[4-5]");

    returns file_4, file_5.

    hashtag
    Sorting

    Use the FILE_SORT_ORDER_BY option to specify the order in which files are imported.

    FILE_SORT_ORDER_BY Options

    • pathname (default)

    • date_modified

    • regex

    *FILE_SORT_REGEX option required

    Using FILE_SORT_ORDER_BY

    COPY table_1 from ".../" WITH (FILE_SORT_ORDER_BY="date_modified");

    Using FILE_SORT_ORDER_BY with FILE_SORT_REGEX

    Regex sort keys are formed by the concatenation of all capture groups from the FILE_SORT_REGEX expression. Regex sort keys are strings but can be converted to dates or FLOAT64 with the appropriate FILE_SORT_ORDER_BY option. File paths that do not match the provided capture groups or that cannot be converted to the appropriate date or FLOAT64 are treated as NULLs and sorted to the front in a deterministic order.

    Multiple Capture Groups:

    FILE_SORT_REGEX=".*/data_(.*)_(.*)_" /root/dir/unmatchedFile → <NULL> /root/dir/data_andrew_54321_ → andrew54321 /root/dir2/data_brian_Josef_ → brianJosef

    Dates:

    FILE_SORT_REGEX=".*data_(.*) /root/data_222 → <NULL> (invalid date conversion) /root/data_2020-12-31 → 2020-12-31 /root/dir/data_2021-01-01 → 2021-01-01

    Import:

    COPY table_1 from ".../" WITH (FILE_SORT_ORDER_BY="regex", FILE_SORT_REGEX=".*file_(.)");

    hashtag
    Geo and Raster File Globbing

    Limited filename globbing is supported for both geo and raster import. For example, to import a sequence of same-format GeoTIFF files into a single table, you can run the following:

    COPY table FROM '/path/path/something_*.tiff' WITH (source_type='raster_file')

    The files are imported in alphanumeric sort order, per regular glob rules, and all appended to the same table. This may fail if the files are not all of the same format (band count, names, and types).

    circle-info

    For non-geo/raster files (CSV and Parquet), you can provide just the path to the directory OR a wildcard; for example:

    /path/to/directory/ /path/to/directory /path/to/directory/*

    For geo/raster files, a wildcard is required, as shown in the last example.

    hashtag
    SQLImporter

    SQLImporter is a Java utility run at the command line. It runs a SELECT statement on another database through JDBC and loads the result set into HeavyDB.

    hashtag
    Usage

    hashtag
    Flags

    HEAVY.AI recommends that you use a service account with read-only permissions when accessing data from a remote database.

    circle-info

    In release 4.6 and higher, the user ID (-u) and password (-p) flags are required. If your password includes a special character, you must escape the character using a backslash (\).

    If the table does not exist in HeavyDB, SQLImporter creates it. If the target table in HeavyDB does not match the SELECT statement metadata, SQLImporter fails.

    If the truncate flag is used, SQLImporter truncates the table in HeavyDB before transferring the data. If the truncate flag is not used, SQLImporter appends the results of the SQL statement to the target table in HeavyDB.

    The -i argument provides a path to an initialization file. Each line of the file is sent as a SQL statement to the remote database. You can use -i to set additional custom parameters before the data is loaded.

    circle-info

    The SQLImporter string is case-sensitive. Incorrect case returns the following:

    Error: Could not find or load main class com.mapd.utility.SQLimporter

    hashtag
    PostgreSQL/PostGIS Support

    You can migrate geo data types from a PostgreSQL database. The following table shows the correlation between PostgreSQL/PostGIS geo types and HEAVY.AI geo types.

    point
    point

    Other PostgreSQL types, including circle, box, and path, are not supported.

    hashtag
    HeavyDB Example

    circle-info

    By default, 100,000 records are selected from HeavyDB. To select a larger number of records, use the LIMIT statement.

    hashtag
    Hive Example

    hashtag
    Google Big Query Example

    hashtag
    PostgreSQL Example

    hashtag
    SQLServer Example

    hashtag
    MySQL Example

    hashtag
    StreamInsert

    Stream data into HeavyDB by attaching the StreamInsert program to the end of a data stream. The data stream can be another program printing to standard out, a Kafka endpoint, or any other real-time stream output. You can specify the appropriate batch size, according to the expected stream rates and your insert frequency. The target table must exist before you attempt to stream data into the table.

    Setting
    Default
    Description

    For more information on creating regex transformation statements, see .

    hashtag
    Example

    hashtag
    Importing AWS S3 Files

    You can use the SQL COPY FROM statement to import files stored on Amazon Web Services Simple Storage Service (AWS S3) into an HEAVY.AI table, in much the same way you would with local files. In the WITH clause, specify the S3 credentials and region information of the bucket accessed.

    Access key and secret key, or session token if using temporary credentials, and region are required. For information about AWS S3 credentials, see .

    circle-info

    HEAVY.AI does not support the use of asterisks (*) in URL strings to import items. To import multiple files, pass in an S3 path instead of a file name, and COPY FROM imports all items in that path and any subpath.

    hashtag
    Custom S3 Endpoints

    HEAVY.AI supports custom S3 endpoints, which allows you to import data from S3-compatible services, such as Google Cloud Storage.

    To use custom S3 endpoints, add s3_endpoint to the WITH clause of a COPY FROM statement; for example, to set the S3 endpoint to point to Google Cloud Services:

    For information about interoperability and setup for Google Cloud Services, see .

    circle-info

    You can also configure custom S3 endpoints by passing the s3_endpoint field to Thrift import_table.

    hashtag
    Examples

    The following examples show failed and successful attempts to copy the table from AWS S3.

    The following example imports all the files in the trip.compressed directory.

    hashtag
    trips Table

    The table trips is created with the following statement:

    hashtag
    Using Server Privileges to Access AWS S3

    You can configure HEAVY.AI server to provide AWS credentials, which allows S3 Queries to be run without specifying AWS credentials. S3 Regions are not configured by the server, and will need to be passed in either as a client side environment variable or as an option with the request.

    Example Commands

    • \detect: $ export AWS_REGION=us-west-1 heavysql > \detect <s3-bucket-uri

    • import_table: $ ./Heavyai-remote -h localhost:6274 import_table "'<session-id>'" "<table-name>" '<s3-bucket-uri>' 'TCopyParams(s3_region="'us-west-1'")'

    hashtag
    Configuring AWS Credentials

    1. Enable server privileges in the server configuration file heavy.conf allow-s3-server-privileges = true

    2. For bare metal installations set the following environment variables and restart the HeavyDB service: AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=xxx AWS_SESSION_TOKEN=xxx

    hashtag
    KafkaImporter

    You can ingest data from an existing Kafka producer to an existing table in HEAVY.AI using KafkaImporter on the command line:

    circle-info

    KafkaImporter requires a functioning Kafka cluster. See the and the .

    hashtag
    KafkaImporter Options

    Setting
    Default
    Description

    hashtag
    KafkaImporter Logging Options

    KafkaImporter Logging Options

    Configure KafkaImporter to use your target table. KafkaImporter listens to a pre-defined Kafka topic associated with your table. You must create the table before using the KafkaImporter utility. For example, you might have a table named customer_site_visit_events that listens to a topic named customer_site_visit_events_topic.

    The data format must be a record-level format supported by HEAVY.AI.

    KafkaImporter listens to the topic, validates records against the target schema, and ingests topic batches of your designated size to the target table. Rejected records use the existing reject reporting mechanism. You can start, shut down, and configure KafkaImporter independent of the HeavyDB engine. If KafkaImporter is running and the database shuts down, KafkaImporter shuts down as well. Reads from the topic are nondestructive.

    KafkaImporter is not responsible for event ordering; a streaming platform outside HEAVY.AI (for example, Spark streaming, flink) should handle the stream processing. HEAVY.AI ingests the end-state stream of post-processed events.

    KafkaImporter does not handle dynamic schema creation on first ingest, but must be configured with a specific target table (and its schema) as the basis. There is a 1:1 correspondence between target table and topic.

    hashtag
    StreamImporter

    StreamImporter is an updated version of the StreamInsert utility used for streaming reads from delimited files into HeavyDB. StreamImporter uses a binary columnar load path, providing improved performance compared to StreamInsert.

    You can ingest data from a data stream to an existing table in HEAVY.AI using StreamImporter on the command line.

    hashtag
    StreamImporter Options

    Setting
    Default
    Description

    hashtag
    StreamImporter Logging Options

    Setting
    Default
    Description

    Configure StreamImporter to use your target table. StreamImporter listens to a pre-defined data stream associated with your table. You must create the table before using the StreamImporter utility.

    The data format must be a record-level format supported by HEAVY.AI.

    StreamImporter listens to the stream, validates records against the target schema, and ingests batches of your designated size to the target table. Rejected records use the existing reject reporting mechanism. You can start, shut down, and configure StreamImporter independent of the HeavyDB engine. If StreamImporter is running but the database shuts down, StreamImporter shuts down as well. Reads from the stream are non-destructive.

    StreamImporter is not responsible for event ordering - a first class streaming platform outside HEAVY.AI (for example, Spark streaming, flink) should handle the stream processing. HEAVY.AI ingests the end-state stream of post-processed events.

    StreamImporter does not handle dynamic schema creation on first ingest, but must be configured with a specific target table (and its schema) as the basis.

    There is a 1:1 correspondence between target table and a stream record.

    hashtag
    Importing Data from HDFS with Sqoop

    You can consume a CSV or Parquet file residing in HDFS (Hadoop Distributed File System) into HeavyDB.

    Copy the HEAVY.AI JDBC driver into the Apache Sqoop library, normally found at /usr/lib/sqoop/lib/.

    hashtag
    Example

    The following is a straightforward import command. For more information on options and parameters for using Apache Sqoop, see the user guide at .

    The --connect parameter is the address of a valid JDBC port on your HEAVY.AI instance.

    hashtag
    Troubleshooting: Avoiding Duplicate Rows

    To detect duplication prior to loading data into HeavyDB, you can perform the following steps. For this example, the files are labeled A,B,C...Z.

    1. Load file A into table MYTABLE.

    2. Run the following query.

      There should be no rows returned; if rows are returned, your first A file is not unique.

    Size of the input file buffer, in bytes.

    8388608

    delimiter

    A single-character string for the delimiter between input fields; most commonly:

    • , for CSV files

    • for tab-delimited files

    Other delimiters include

    ,~, ^, and;.

    Note: OmniSci does not use file extensions to determine the delimiter.

    escape

    A single-character string for escaping quotes.

    '"' (double quote)

    geo

    Import geo data. Deprecated and scheduled for removal in a future release.

    'false'

    header

    Either 'true' or 'false', indicating whether the input file has a header line in Line 1 that should be skipped.

    'true'

    line_delimiter

    A single-character string for terminating each line.

    '\n'

    lonlat

    In OmniSci, POINT fields require longitude before latitude. Use this parameter based on the order of longitude and latitude in your source data.

    'true'

    max_reject

    Number of records that the COPY statement allows to be rejected before terminating the COPY command. Records can be rejected for a number of reasons, including invalid content in a field, or an incorrect number of columns. The details of the rejected records are reported in the ERROR log. COPY returns a message identifying how many records are rejected. The records that are not rejected are inserted into the table, even if the COPY stops because the max_reject count is reached.

    Note: If you run the COPY command from OmniSci Immerse, the COPY command does not return messages to Immerse once the SQL is verified. Immerse does not show messages about data loading, or about data-quality issues that result in max_reject triggers.

    100,000

    nulls

    A string pattern indicating that a field is NULL.

    An empty string, 'NA', or

    parquet

    Import data in Parquet format. Parquet files can be compressed using Snappy. Other archives such as .gz or .zip must be unarchived before you import the data. Deprecated and scheduled for removal in a future release.

    'false'

    plain_text

    Indicates that the input file is plain text so that it bypasses the libarchive decompression utility.

    CSV, TSV, and TXT are handled as plain text.

    quote

    A single-character string for quoting a field.

    " (double quote). All characters inside quotes are imported “as is,” except for line delimiters.

    quoted

    Either 'true' or 'false', indicating whether the input file contains quoted fields.

    'true'

    source_srid

    When importing into GEOMETRY(*, 4326) columns, specifies the SRID of the incoming geometries, all of which are transformed on the fly. For example, to import from a file that contains EPSG:2263 (NAD83 / New York Long Island) geometries, run the COPY command and include WITH (source_srid=2263). Data targeted at non-4326 geometry columns is not affected.

    0

    source_type='<type>'

    Type can be one of the following:

    delimited_file - Import as CSV.

    geo_file - Import as Geo file. Use for shapefiles, GeoJSON, and other geo files. Equivalent to deprecated geo='true'.

    raster_file - Import as a raster file.

    parquet_file - Import as a Parquet file. Equivalent to deprecated parquet='true'

    delimited_file

    threads

    Number of threads for performing the data import.

    Number of CPU cores on the system

    trim_spaces

    Indicate whether to trim side spaces ('true') or not ('false').

    'false'

    COPY FROM
    appends data from the source into the target table. It does not truncate the table or overwrite existing data.
  • Supported DATE formats when using COPY FROM include mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, and dd/mmm/yyyy.

  • COPY FROM fails for records with latitude or longitude values that have more than 4 decimal places.

  • Explodes MULTIPOLYGON, MULTILINESTRING, or MULTIPOINT geo data into multiple rows in a POLYGON, LINESTRING, or POINT column, with all other columns duplicated.

    When importing from a WKT CSV with a MULTIPOLYGON column, the table must have been manually created with a POLYGON column.

    When importing from a geo file, the table is automatically created with the correct type of column.

    When the input column contains a mixture of MULTI and single geo, the MULTI geo are exploded, but the singles are imported normally. For example, a column containing five two-polygon MULTIPOLYGON rows and five POLYGON rows imports as a POLYGON column of fifteen rows.

    false

    geo_validate_geometry

    Boolean. If enabled, the importer passes any incoming POLYGON or MULTIPOLYGON data through a validation process. If the geo is considered invalid by OGC (PostGIS) standards (for example, self-intersecting polygons), then the row or feature that contains it is rejected.

    This option is available only if the optional is installed; otherwise invoking the option throws an error.

    Specifies the required type for the additional pixel coordinate columns: auto - Create columns based on raster file type (double for geo, int or smallint for non-geo, dependent on size).

    none - Do not create pixel coordinate columns.

    smallint or int - Create integer columns of names raster_x and

    auto

    Raster point type is no longer a necessary option using the tiled importer, as this import process requires the column types to be specified before import, and therefore the point type can be deduced from the column types.

    Password credential for the RDMS. This option only applies when data_source_name is used.

    credential_string

    A set of semicolon separated “key=value” pairs, which define the access credential parameters for an RDMS. For example:

    Username=username;Password=password

    Applies only when connection_string is used.

    *
  • regex_date *

  • regex_number *

  • -u

    n/a

    User name

    -p

    n/a

    User password

    --host

    n/a

    Name of OmniSci host

    --delim

    comma (,)

    Field delimiter, in single quotes

    --line

    newline (\n)

    Line delimiter, in single quotes

    --batch

    10000

    Number of records in a batch

    --retry_count

    10

    Number of attempts before job fails

    --retry_wait

    5

    Wait time in seconds after server connection failure

    --null

    n/a

    String that represents null values

    --port

    6274

    Port number for OmniSciDB on localhost

    `-t

    --transform`

    n/a

    Regex transformation

    --print_error

    False

    Print error messages

    --print_transform

    False

    Print description of transform.

    --help

    n/a

    List options

    COPY FROM: heavysql > COPY <table-name> FROM <s3-bucket-uri> WITH(s3_region='us-west-1');

    (required only for AWS STS credentials)
  • For HeavyDB docker images, start a new container mounted with the configuration file using the option: -v <dirname-containing-heavy.conf>:/var/lib/heavyai and set the following environment options: -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=xxx -e AWS_SESSION_TOKEN=xxx (required only for AWS STS credentials)

    1. Enable server privileges in the server configuration file heavy.conf allow-s3-server-privileges = true

    2. For bare metal installations Specify a shared AWS credentials file and profile with the following environment variables and restart the HeavyDB service. AWS_SHARED_CREDENTIALS_FILE=~/.aws/credentials AWS_PROFILE=default

    3. For HeavyDB docker images, start a new container mounted with the configuration file and AWS shared credentials file using the following options: -v <dirname-containing-/heavy.conf>:/var/lib/heavyai -v <dirname-containing-/credentials>:/<container-credential-path> and set the following environment options: -e AWS_SHARED_CREDENTIALS_FILE=<container-credential-path> -e AWS_PROFILE=<active-profile>

    Prerequisites

    1. An IAM Policy that has sufficient access to the S3 bucket.

    2. An IAM AWS Service Role of type Amazon EC2 , which is assigned the IAM Policy from (1).

    Setting Up an EC2 Instance with Roles

    For a new EC2 Instance:

    1. AWS Management Console > Services > Compute > EC2 > Launch Instance.

    2. Select desired Amazon Machine Image (AMI) > Select.

    3. Select desired Instance Type > Next: Configure Instance Details.

    For an existing EC2 Instance:

    1. AWS Management Console > Services > Compute > EC2 > Instances.

    2. Mark desired instance(s) > Actions > Security > Modify IAM Role.

    3. Select desired IAM Role > Save.

    -u <username>

    n/a

    User name

    -p <password>

    n/a

    User password

    --host <hostname>

    localhost

    Name of OmniSci host

    --port <port_number>

    6274

    Port number for OmniSciDB on localhost

    --http

    n/a

    Use HTTP transport

    --https

    n/a

    Use HTTPS transport

    --skip-verify

    n/a

    Do not verify validity of SSL certificate

    --ca-cert <path>

    n/a

    Path to the trusted server certificate; initiates an encrypted connection

    --delim <delimiter>

    comma (,)

    Field delimiter, in single quotes

    --line <delimiter>

    newline (\n)

    Line delimiter, in single quotes

    --batch <batch_size>

    10000

    Number of records in a batch

    --retry_count <retry_number>

    10

    Number of attempts before job fails

    --retry_wait <seconds>

    5

    Wait time in seconds after server connection failure

    --null <string>

    n/a

    String that represents null values

    --quoted <boolean>

    false

    Whether the source contains quoted fields

    `-t

    --transform`

    n/a

    Regex transformation

    --print_error

    false

    Print error messages

    --print_transform

    false

    Print description of transform

    --help

    n/a

    List options

    --group-id <id>

    n/a

    Kafka group ID

    --topic <topic>

    n/a

    The Kafka topic to be ingested

    --brokers <broker_name:broker_port>

    localhost:9092

    One or more brokers

    n/a

    Log filename relative to logging directory; has format KafkaImporter.{SEVERITY}.%Y%m%d-%H%M%S.log

    --log-symlink <symlink>

    n/a

    Symlink to active log; has format KafkaImporter.{SEVERITY}

    --log-severity <level>

    INFO

    Log-to-file severity level: INFO, WARNING, ERROR, or FATAL

    --log-severity-clog <level>

    ERROR

    Log-to-console severity level: INFO, WARNING, ERROR, or FATAL

    --log-channels

    n/a

    Log channel debug info

    --log-auto-flush

    n/a

    Flush logging buffer to file after each message

    --log-max-files <files_number>

    100

    Maximum number of log files to keep

    --log-min-free-space <bytes>

    20,971,520

    Minimum number of bytes available on the device before oldest log files are deleted

    --log-rotate-daily

    1

    Start new log files at midnight

    --log-rotation-size <bytes>

    10485760

    Maximum file size, in bytes, before new log files are created

    -u <username>

    n/a

    User name

    -p <password>

    n/a

    User password

    --host <hostname>

    n/a

    Name of OmniSci host

    --port <port>

    6274

    Port number for OmniSciDB on localhost

    --http

    n/a

    Use HTTP transport

    --https

    n/a

    Use HTTPS transport

    --skip-verify

    n/a

    Do not verify validity of SSL certificate

    --ca-cert <path>

    n/a

    Path to the trusted server certificate; initiates an encrypted connection

    --delim <delimiter>

    comma (,)

    Field delimiter, in single quotes

    --null <string>

    n/a

    String that represents null values

    --line <delimiter>

    newline (\n)

    Line delimiter, in single quotes

    --quoted <boolean>

    true

    Either true or false, indicating whether the input file contains quoted fields.

    --batch <number>

    10000

    Number of records in a batch

    --retry_count <retry_number>

    10

    Number of attempts before job fails

    --retry_wait <seconds>

    5

    Wait time in seconds after server connection failure

    `-t

    --transform`

    n/a

    Regex transformation

    --print_error

    false

    Print error messages

    --print_transform

    false

    Print description of transform

    --help

    n/a

    List options

    n/a

    Symlink to active log; has format StreamImporter.{SEVERITY}

    --log-severity <level>

    INFO

    Log-to-file severity level: INFO, WARNING, ERROR, or FATAL

    --log-severity-clog <level>

    ERROR

    Log-to-console severity level: INFO, WARNING, ERROR, or FATAL

    --log-channels

    n/a

    Log channel debug info

    --log-auto-flush

    n/a

    Flush logging buffer to file after each message

    --log-max-files <files_number>

    100

    Maximum number of log files to keep

    --log-min-free-space <bytes>

    20,971,520

    Minimum number of bytes available on the device before oldest log files are deleted

    --log-rotate-daily

    1

    Start new log files at midnight

    --log-rotation-size <bytes>

    10485760

    Maximum file size, in bytes, before new log files are created

    Load file B into table TEMPTABLE.
  • Run the following query.

    There should be no rows returned if file B is unique. Fix B if the information is not unique using details from the selection.

  • Load the fixed B file into MYFILE.

  • Drop table TEMPTABLE.

  • Repeat steps 3-6 for the rest of the set for each file prior to loading the data to the real MYTABLE instance.

  • array_delimiter

    A single-character string for the delimiter between input values contained within an array.

    , (comma)

    array_marker

    A two-character string consisting of the start and end characters surrounding an array.

    { }(curly brackets). For example, data to be inserted into a table with a string array in the second column (for example, BOOLEAN, STRING[], INTEGER) can be written as true,{value1,value2,value3},3

    Local server

    COPY [tableName] FROM '/filepath' WITH (source_type='geo_file', ...);

    Web site

    COPY [tableName] FROM '[http ``_https_]://_website/filepath_' WITH (source_type='geo_file', ...);

    Amazon S3

    COPY [tableName] FROM 's3://bucket/filepath' WITH (source_type='geo_file', s3_region='region', s3_access_key='accesskey', s3_secret_key='secretkey', ... );

    geo_coords_type

    Coordinate type used; must be geography.

    N/A

    geo_coords_encoding

    Coordinates encoding; can be geoint(32) or none.

    geoint(32)

    geo_coords_srid

    Coordinates spatial reference; must be 4326 (WGS84 longitude/latitude).

    N/A

    raster_import_bands='<bandname>[,<bandname>,...]'

    Allows specification of one or more band names to selectively import; useful in the context of large raster files where not all the bands are relevant. Bands are imported in the order provided, regardless of order in the file. You can rename bands using <bandname>=<newname>[,<bandname>=<newname,...>] Names must be those discovered by the detection process, including any suffixes for de-duplication.

    An empty string, indicating to import all bands from all datasets found in the file.

    raster_point_transform='<transform>'

    Specifies the processing for floating-point coordinate values: auto - Transform based on raster file type (world for geo, none for non-geo).

    none - No affine or world-space conversion. Values will be equivalent to the integer pixel coordinates.

    file - File-space affine transform only. Values will be in the file's coordinate system, if any (e.g. geospatial).

    world - World-space geospatial transform. Values will be projected to WGS84 lon/lat (if the file has a geospatial SRID).

    auto

    raster_tile_width

    Specifies the file/block width by which the raster data should be grouped. If none is specified, this will default to the block size provided by the file.

    auto

    raster_tile_height

    Specifies the file/block height by which the raster data should be grouped. If none is specified, this will default to the block size provided by the file.

    auto

    data_source_name

    Data source name (DSN) configured in the odbc.ini file. Only one of data_source_name or connection_string can be specified.

    connection_string

    A set of semicolon-separated key=value pairs that define the connection parameters for an RDMS. For example: Driver=DriverName;Database=DatabaseName;Servername=HostName;Port=1234

    Only one of data_source_name or connection_string can be specified.

    sql_order_by

    Comma-separated list of column names that provide a unique ordering for the result set returned by the specified SQL SELECT statement.

    username

    Username on the RDMS. Applies only when data_source_name is used.

    lseg

    linestring

    linestring

    linestring

    polygon

    polygon

    multipolygon

    <table_name>

    n/a

    Name of the target table in OmniSci

    <database_name>

    n/a

    <table_name>

    n/a

    Name of the target table in OmniSci

    <database_name>

    n/a

    Setting

    Default

    Description

    --log-directory <directory>

    mapd_log

    Logging directory; can be relative to data directory or absolute

    <table_name>

    n/a

    Name of the target table in OmniSci

    <database_name>

    n/a

    --log-directory <directory>

    mapd_log

    Logging directory; can be relative to data directory or absolute

    --log-file-name <filename>

    n/a

    Log filename relative to logging directory; has format StreamImporter.{SEVERITY}.%Y%m%d-%H%M%S.log

    Importing an ESRI File Geodatabase
    Importing Geospatial Files
    geo files
    ODBC Data Wrapper Reference
    RegEx Replacearrow-up-right
    https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keysarrow-up-right
    Cloud Storage Interoperabilityarrow-up-right
    tripsarrow-up-right
    Kafka websitearrow-up-right
    Confluent schema registry documentationarrow-up-right
    sqoop.apache.orgarrow-up-right

    buffer_size

    geo_explode_collections

    raster_point_type='<type>'

    raster_point_type (deprecated)

    password

    multipolygon

    Name of the target database in OmniSci

    Name of the target database in OmniSci

    --log-file-name <filename>

    Name of the target database in OmniSci

    --log-symlink <symlink>

    select count(t1.uniqueCol) as dups from MYTABLE t1 join MYTABLE t2 on t1.uCol = t2.uCol;
    COPY <table> FROM '<file pattern>' [WITH (<property> = value, ...)];
    COPY tweets FROM '/tmp/tweets.csv' WITH (nulls = 'NA'); 
    COPY tweets FROM '/tmp/tweets.tsv' WITH (delimiter = '\t', quoted = 'false'); 
    COPY tweets FROM '/tmp/*' WITH (header='false'); 
    COPY trips FROM '/mnt/trip/trip.parquet/part-00000-0284f745-1595-4743-b5c4-3aa0262e4de3-c000.snappy.parquet' with (parquet='true');
    COPY FROM 'source' WITH (source_type='geo_file', ...);
    COPY tableName FROM 'source' WITH (source_type='geo_file', ...);
    ggpoly GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32)
    COPY FROM 'source' WITH (source_type='raster_file', ...);
    CREATE TABLE <table_name> (raster_lon DOUBLE, raster_lat DOUBLE, <expected_band_columns>);
    
    COPY <table_name> FROM 'source' WITH (source_type='raster_file', ...);
    COPY <table_name> FROM '<select_query>' WITH (source_type = 'odbc', ...);
    COPY example_table
      FROM 'SELECT * FROM remote_postgres_table WHERE event_timestamp > ''2020-01-01'';'
      WITH 
        (source_type = 'odbc', 
         sql_order_by = 'event_timestamp',
         data_source_name = 'postgres_db_1',
         username = 'my_username',
         password = 'my_password');
    COPY example_table
      FROM 'SELECT * FROM remote_postgres_table WHERE event_timestamp > ''2020-01-01'';'
      WITH 
        (source_type = 'odbc',
         sql_order_by = 'event_timestamp',
         connection_string = 'Driver=PostgreSQL;Database=my_postgres_db;Servername=my_postgres.example.com;Port=1234',
         credential_string = 'Username=my_username;Password=my_password');
    java -cp [HEAVY.AI utility jar file]:[3rd party JDBC driver]
    SQLImporter
    -u <userid>; -p <password>; [(--binary|--http|--https [--insecure])]
    -s <heavyai server host> -db <omnsci db> --port <heavyai server port>
    [-d <other database JDBC drive class>] -c <other database JDBC connection string>
    -su <other database user> -sp <other database user password> -ss <other database sql statement>
    -t <HEAVY.AI target table> -b <transfer buffer size> -f <table fragment size>
    [-tr] [-nprg] [-adtf] [-nlj] -i <init commands file>
    -r                                     Row load limit 
    -h,--help                              Help message
    -r <arg>;                              Row load limit 
    -h,--help                              Help message 
    -u,--user <arg>;                       HEAVY.AI user 
    -p,--passwd <arg>;                     HEAVY.AI password 
    --binary                               Use binary transport to connect to HEAVY.AI 
    --http                                 Use http transport to connect to HEAVY.AI 
    --https                                Use https transport to connect to HEAVY.AI 
    -s,--server <arg>;                     HEAVY.AI Server 
    -db,--database <arg>;                  HEAVY.AI Database 
    --port <arg>;                          HEAVY.AI Port 
    --ca-trust-store <arg>;                CA certificate trust store 
    --ca-trust-store-passwd <arg>;         CA certificate trust store password 
    --insecure <arg>;                      Insecure TLS - Do not validate server HEAVY.AI 
                                           server certificates 
    -d,--driver <arg>;                     JDBC driver class 
    -c,--jdbcConnect <arg>;                JDBC connection string 
    -su,--sourceUser <arg>;                Source user 
    -sp,--sourcePasswd <arg>;              Source password 
    -ss,--sqlStmt <arg>;                   SQL Select statement 
    -t,--targetTable <arg>;                HEAVY.AI Target Table 
    -b,--bufferSize <arg>;                 Transfer buffer size 
    -f,--fragmentSize <arg>;               Table fragment size 
    -tr,--truncate                         Truncate table if it exists 
    -nprg,--noPolyRenderGroups             Disable render group assignment  
    -adtf,--allowDoubleToFloat             Allow narrow casting
    -nlj,--no-log-jdbc-connection-string   Omit JDBC connection string from logs   
    -i,--initializeFile <arg>;             File containing init command for DB
    java -cp /opt/heavyai/bin/heavyai-utility-<db-version}>.jar 
    com.mapd.utility.SQLImporter -u admin -p HyperInteractive -db heavyai --port 6274 
    -t mytable -su admin -sp HyperInteractive -c "jdbc:heavyai:myhost:6274:heavyai" 
    -ss "select * from mytable limit 1000000000"
    java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/hive-jdbc-1.2.1000.2.6.1.0-129-standalone.jar
    com.mapd.utility.SQLImporter
    -u user -p password
    -db Heavyai_database_name --port 6274 -t Heavyai_table_name
    -su source_user -sp source_password
    -c "jdbc:hive2://server_address:port_number/database_name"
    -ss "select * from source_table_name"
    java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:./GoogleBigQueryJDBC42.jar:
    ./google-oauth-client-1.22.0.jar:./google-http-client-jackson2-1.22.0.jar:./google-http-client-1.22.0.jar:./google-api-client-1.22.0.jar:
    ./google-api-services-bigquery-v2-rev355-1.22.0.jar 
    com.mapd.utility.SQLImporter
    -d com.simba.googlebigquery.jdbc42.Driver 
    -u user -p password
    -db Heavyai_database_name --port 6274 -t Heavyai_table_name
    -su source_user -sp source_password 
    -c "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=project-id;OAuthType=0;
    [email protected];OAuthPvtKeyPath=/home/simba/myproject.json;"
    -ss "select * from schema.source_table_name"
    java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/tmp/postgresql-42.2.5.jar 
    com.mapd.utility.SQLImporter 
    -u user -p password
    -db Heavyai_database_name --port 6274 -t Heavyai_table_name
    -su source_user -sp source_password 
    -c "jdbc:postgresql://127.0.0.1/postgres"
    -ss "select * from schema_name.source_table_name"
    java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/path/sqljdbc4.jar
    com.mapd.utility.SQLImporter
    -d com.microsoft.sqlserver.jdbc.SQLServerDriver 
    -u user -p password
    -db Heavyai_database_name --port 6274 -t Heavyai_table_name
    -su source_user -sp source_password 
    -c "jdbc:sqlserver://server:port;DatabaseName=database_name"
    -ss "select top 10 * from dbo.source_table_name"
    java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:mysql/mysql-connector-java-5.1.38-bin.jar
    com.mapd.utility.SQLImporter 
    -u user -p password
    -db Heavyai_database_name --port 6274 -t Heavyai_table_name
    -su source_user -sp source_password 
    -c "jdbc:mysql://server:port/database_name"
    -ss "select * from schema_name.source_table_name"
    <data stream> | StreamInsert <table name> <database name> \
    {-u|--user} <user> {-p|--passwd} <password> [{--host} <hostname>] \
    [--port <port number>][--delim <delimiter>][--null <null string>] \
    [--line <line delimiter>][--batch <batch size>][{-t|--transform} \
    transformation ...][--retry_count <num_of_retries>] \
    [--retry_wait <wait in secs>][--print_error][--print_transform]
    cat file.tsv | /path/to/heavyai/SampleCode/StreamInsert stream_example \
    heavyai --host localhost --port 6274 -u imauser -p imapassword \
    --delim '\t' --batch 1000
    COPY <table> FROM '<S3_file_URL>' WITH ([[s3_access_key = '<key_name>',s3_secret_key = '<key_secret>',] | [s3_session_token - '<AWS_session_token']] s3_region = '<region>');
    COPY trips FROM 's3://heavyai-importtest-data/trip-data/trip_data_9.gz' WITH (header='true', s3_endpoint='storage.googleapis.com');
    heavysql> COPY trips FROM 's3://heavyai-s3-no-access/trip_data_9.gz';
    Exception: failed to list objects of s3 url 's3://heavyai-s3-no-access/trip_data_9.gz': AccessDenied: Access Denied
    heavysql> COPY trips FROM 's3://heavyai-s3-no-access/trip_data_9.gz' with (s3_access_key='xxxxxxxxxx',s3_secret_key='yyyyyyyyy');
    Exception: failed to list objects of s3 url 's3://heavyai-s3-no-access/trip_data_9.gz': AuthorizationHeaderMalformed: Unable to parse ExceptionName: AuthorizationHeaderMalformed Message: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-1'
    heavysql> COPY trips FROM 's3://heavyai-testdata/trip.compressed/trip_data_9.csv' with (s3_access_key='xxxxxxxx',s3_secret_key='yyyyyyyy',s3_region='us-west-1');
    Result
    Loaded: 100 recs, Rejected: 0 recs in 0.361000 secs
    heavysql> copy trips from 's3://heavyai-testdata/trip.compressed/' with (s3_access_key='xxxxxxxx',s3_secret_key='yyyyyyyy',s3_region='us-west-1');
    Result
    Loaded: 105200 recs, Rejected: 0 recs in 1.890000 secs
    heavysql> \d trips
            CREATE TABLE trips (
            medallion TEXT ENCODING DICT(32),
            hack_license TEXT ENCODING DICT(32),
            vendor_id TEXT ENCODING DICT(32),
            rate_code_id SMALLINT,
            store_and_fwd_flag TEXT ENCODING DICT(32),
            pickup_datetime TIMESTAMP,
            dropoff_datetime TIMESTAMP,
            passenger_count SMALLINT,
            trip_time_in_secs INTEGER,
            trip_distance DECIMAL(14,2),
            pickup_longitude DECIMAL(14,2),
            pickup_latitude DECIMAL(14,2),
            dropoff_longitude DECIMAL(14,2),
            dropoff_latitude DECIMAL(14,2))
    WITH (FRAGMENT_SIZE = 75000000);
    KafkaImporter <table_name> <database_name> {-u|--user <user_name> \
    {-p|--passwd <user_password>} [{--host} <hostname>] \
    [--port <HeavyDB_port>] [--http] [--https] [--skip-verify] \
    [--ca-cert <path>] [--delim <delimiter>] [--batch <batch_size>] \
    [{-t|--transform} transformation ...] [retry_count <retry_number>] \
    [--retry_wait <delay_in_seconds>] --null <null_value_string> [--quoted true|false] \
    [--line <line_delimiter>] --brokers=<broker_name:broker_port> \ 
    --group-id=<kafka_group_id> --topic=<topic_type> [--print_error] [--print_transform]
    cat tweets.tsv | -./KafkaImporter tweets_small heavyai-u imauser-p imapassword--delim '\t'--batch 100000--retry_count 360--retry_wait 10--null null--port 9999--brokers=localhost:9092--group-id=testImport1--topic=tweet
    cat tweets.tsv | ./KafkaImporter tweets_small heavyai
    -u imauser
    -p imapassword
    --delim '\t'
    --batch 100000
    --retry_count 360
    --retry_wait 10
    --null null
    --port 9999
    --brokers=localhost:9092
    --group-id=testImport1
    --topic=tweet
    StreamImporter <table_name> <database_name> {-u|--user <user_name> \
    {-p|--passwd <user_password>} [{--host} <hostname>] [--port <HeavyDB_port>] \
    [--http] [--https] [--skipverify] [--ca-cert <path>] [--delim <delimiter>] \
    [--null <null string>] [--line <line delimiter>]  [--quoted <boolean>] \
     [--batch <batch_size>] [{-t|--transform} transformation ...] \
    [retry_count <number_of_retries>] [--retry_wait <delay_in_seconds>]  \
    [--print_error] [--print_transform]
    cat tweets.tsv | ./StreamImporter tweets_small heavyai
    -u imauser
    -p imapassword
    --delim '\t'
    --batch 100000
    --retry_count 360
    --retry_wait 10
    --null null
    --port 9999
    sqoop-export --table iAmATable \
    --export-dir /user/cloudera/ \
    --connect "jdbc:heavyai:000.000.000.0:6274:heavyai" \
    --driver com.heavyai.jdbc.HeavyaiDriver \
    --username imauser \
    --password imapassword \
    --direct \
    --batch
    select count(t1.uniqueCol) as dups from MYTABLE t1 join MYTABLE t2 on t1.uCol = t2.uCol;
    IAM Role > Select desired IAM Role > Review and Launch.
  • Review other options > Launch.

  • Restart the EC2 Instance.
    .
    raster_y
    and fill with the raw pixel coordinates from the file.

    float or double - Create floating-point columns of names raster_x and raster_y (or raster_lon and raster_lat) and fill with file-space or world-space projected coordinates.

    point - Create a POINT column of name raster_point and fill with file-space or world-space projected coordinates.

    GEOS libraryarrow-up-right