1 of 100

v8.3.0

Overview

HEAVY.AI is an analytics platform designed to handle very large datasets. It leverages the processing power of GPUs alongside traditional CPUs to achieve very high performance. HEAVY.AI combines an open-source SQL engine (), server-side rendering (), and web-based data visualization () to provide a comprehensive platform for data analysis.

HeavyDB

The foundation of the platform is HeavyDB, an open-source, GPU-accelerated database. HeavyDB harnesses GPU processing power and returns SQL query results in milliseconds, even on tables with billions of rows. HeavyDB delivers high performance with rapid query compilation, query vectorization, and advanced memory management.

Installation and Configuration

System Requirements

Software Requirements

Operating Systems

HEAVY Version

Bare Metal

Hosts / Docker

Information

8.x

Additional Components
- OpenJDK version 8 or higher
- EPEL

HEAVY.AI Version

Minimum NVIDIA Driver

Up to date Vulkan drivers
Supported web browsers (Enterprise Edition, Immerse). Latest stable release of:
- Chrome

Installation

You can download the latest version of HEAVY.AI for your preferred platform by reaching our support site or referring to the links provided in the instructions.

The CPU (no GPUs) install does not support backend rendering. For example, Pointmap and Scatterplot charts are not available. The GPU install supports all chart types.

The Open Source options do not require a license, and omit HeavyImmerse.

Docker

Free Version

HEAVY.AI Free is a full-featured version of the HEAVY.AI platform available at no cost for non-hosted commercial use.

HEAVY.AI Free includes access to the following:

Up to 32GB of RAM
Supports 1 GPU

Installing on Docker

Installing HEAVY.AI on Docker

In this section you will find the recipes to install HEAVY.AI platform using Docker. We provide instructions for installing Docker and HEAVY.AI using an Ubuntu host machine. However, advanced users may install the CPU version of HEAVY.AI (EE/Free or Open Source) on any host system running Docker, or install the GPU (EE/Free or Open Source) version of HEAVY.AI on any Linux-based system with supported NVIDIA drivers and nvidia-docker packages (See Software Requirements).

Installing on Ubuntu

In this section, you will find recipes to install HEAVY.AI platform and NVIDIA drivers using package manager like apt or tarball.

Installing on Rocky Linux / RHEL

In this section you will find a recipe to install the HEAVY.AI platform on Red Hat and derivates like Rocky Linux.

Upgrading

In this section, you will find recipes to upgrade from the OmniSci to the HEAVY.AI platform and upgrade between versions of the HEAVY.AI platform.

Supported Upgrade Path

The following table shows the steps needed to move from one version to a later one.

Initial Version

Uninstalling

This is a recipe to permanently remove HEAVY.AI Software, services, and data from your system.

Uninstalling HEAVY.AI from Docker

To uninstall HEAVY.AI in Docker, stop and delete the current Docker container.

In a terminal window, get the Docker container ID:

Ports

HEAVY.AI uses the following ports.

Port

Service

Use

Services and Utilities

Configuration Parameters

Configuration Parameters for HeavyIQ

Following are the configuration parameters for runtime settings for HeavyIQ.

Flag

Description

Default

Example

heavydb_host

The hostname/IP of the HeavyDB instance (Optional)

localhost

http://heavydb.example.com

Security

Loading and Exporting Data

Supported Data Sources

Command Line

SQL

Data Definition (DDL)

Views

DDL - Views

A view is a virtual table based on the result set of a SQL statement. It derives its fields from a SELECT statement. You can do anything with a HEAVY.AI view query that you can do in a non-view HEAVY.AI query.

Nomenclature Constraints

View object names must use the NAME format, described in notation as:

Policies

You can use policies to provide row-level security (RLS) in HEAVY.AI.

CREATE POLICY

Create an RLS policy for a user or role (<name>); admin rights are required. All queries on the table for the user or role are automatically filtered to include only rows where the column contains any one of the values from the VALUES clause.

RLS filtering works similarly to a

Comment

Adds a comment or removes an existing comment for an existing table or column object.

COMMENT

Create or remove a comment for a TABLE or COLUMN object of name object_name. The comment must be a string literal or NULL

Data Manipulation (DML)

SQL Capabilities

ALTER SESSION SET

Change a parameter value for the current session.

Paremeter name

Values

ALTER SYSTEM CLEAR

Clear CPU, GPU, or RENDER memory. Available to super users only.

ALTER SYSTEM CLEAR (CPU|GPU|RENDER) MEMORY

Examples

ALTER SYSTEM CLEAR CPU MEMORY

ALTER SYSTEM CLEAR GPU MEMORY

Generally, the server handles memory management, and you do not need to use this command. If you are having unexpected memory issues, try clearing the memory to see if performance improves.

DELETE

Deletes rows that satisfy the WHERE clause from the specified table. If the WHERE clause is absent, all rows in the table are deleted, resulting in a valid but empty table.

DELETE FROM table_name [ * ] [ [ AS ] alias ]
[ WHERE condition ]

Cross-Database Queries

In Release 6.4 and higher, you can run DELETE queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases.

To execute queries against another database, you must have ACCESS privilege on that database, as well as DELETE privilege.

Example

Delete rows from a table in the my_other_db database:

KILL QUERY

Interrupt a queued query. Specify the query by using its session ID.

To see the queries in the queue, use the command:

To interrupt the last query in the list (ID 946-ooNP):

Showing the queries again indicates that 946-ooNP has been deleted:

LIKELY/UNLIKELY

Usage Notes

UPDATE

Changes the values of the specified columns based on the assign argument (identifier=expression) in all rows that satisfy the condition in the WHERE clause.

Example

Logical Operators and Conditional and Subquery Expressions

Logical Operator Support

Operator

Description

show queries;
query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU
947-ooNP        |RUNNING_IMPORTER    |0          |2021-08-03 ...|IMPORT_GEO_TABLE|Rio       |tcp:::ffff:127.0.0.1:47314|omnisci|CPU

show queries;
query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU

You should see an output similar to the following. The first entry is the container ID. In this example, it is 9e01e520c30c:

CREATE VIEW view_movies
AS SELECT movies.movieId, movies.title, movies.genres, avg(ratings.rating)
FROM ratings
JOIN movies on ratings.movieId=movies.movieId
GROUP BY movies.title, movies.movieId, movies.genres;

\d view_movies
VIEW defined AS: SELECT  movies.movieId, movies.title, movies.genres,
avg(ratings.rating) FROM ratings JOIN movies ON ratings.movieId=movies.movieId
GROUP BY movies.title, movies.movieId, movies.genres
Column types:
    movieId INTEGER,
    title TEXT ENCODING DICT(32),
    genres TEXT ENCODING DICT(32),
    EXPR$3 DOUBLE

SQL normally assumes that terms in the WHERE clause that cannot be used by indices are usually true. If this assumption is incorrect, it could lead to a suboptimal query plan. Use the LIKELY(X) and UNLIKELY(X) SQL functions to provide hints to the query planner about clause terms that are probably not true, which helps the query planner to select the best possible plan.

NULL

CREATE TABLE employees (id INT, salary BIGINT);
-- Add a comment to the 'employees' table
COMMENT ON TABLE employees IS 'This table stores employee information';
-- Add a comment to the 'salary' column
COMMENT ON COLUMN employees.salary IS 'Stores the salary of the employee';

-- Connect to information_schema database
\c information_schema admin XXXXXXXX

-- Select subset of columns from the tables system table
SELECT table_id,table_name,"comment" FROM tables where table_name = 'employees';

-- Returns one result for the table comment
table_id|table_name|comment
5|employees|This table stores employee information
1 rows returned.

-- Select subset of columns from the columns system table
SELECT table_id,table_name,column_id,column_name,"comment" FROM columns where table_name = 'employees';

-- Returns two results, one of the columns has no comment.
table_id|table_name|column_id|column_name|comment
5|employees|1|id|NULL
5|employees|2|salary|Stores the salary of the employee
2 rows returned.

WHERE column = value

ALTER SESSION SET CURRENT_DATABASE='information_schema';
TException - service has thrown: TDBException(error_msg=Unauthorized access: 
user test is not allowed to access database information_schema.)

UPDATE test_facts SET lookup_id = (SELECT SAMPLE(test_lookup.id

UPDATE test_facts SET val = val+1, lookup_id = (SELECT SAMPLE

UPDATE test_facts SET lookup_id = (SELECT SAMPLE(test_lookup.id

Overview

HEAVY.AI has minimal configuration requirements with a number of additional configuration options. This topic describes the required and optional configuration changes you can use in your HEAVY.AI instance.

In release 4.5.0 and higher, HEAVY.AI requires that all configuration flags used at startup match a flag on the HEAVY.AI server. If any flag is misspelled or invalid, the server does not start. This helps ensure that all settings are intentional and will not have an unexpected impact on performance or data integrity.

Storage Directory

Before starting the HEAVY.AI server, you must initialize the persistent storage directory. To do so, create an empty directory at the desired path, such as /var/lib/heavyai.

Create the environment variable $HEAVYAI_BASE.

2. Then, change the owner of the directory to the user that the server will run as ($HEAVYAI_USER):

where $HEAVYAI_USER is the system user account that the server runs as, such as heavyai, and $HEAVYAI_BASE is the path to the parent of the HEAVY.AI server storage directory.

3. Run $HEAVYAI_PATH/bin/initheavy with the storage directory path as the argument:

Configuring a Custom Heavy Immerse Subdirectory

Immerse serves the application from the root path (/) by default. To serve the application from a sub-path, you must modify the $HEAVYAI_PATH/frontend/app-config.js file to change the IMMERSE_PATH_PREFIX value. The Heavy Immerse path must start with a forward slash (/).

Configuration File

The configuration file stores runtime options for your HEAVY.AI servers. You can use the file to change the default behavior.

The heavy.conf file is stored in the $HEAVYAI_BASE directory. The configuration settings are picked up automatically by the sudo systemctl start heavydb and sudo systemctl start heavy_web_server commands.

Set the flags in the configuration file using the format <flag> = <value>. Strings must be enclosed in quotes.

The following is a sample configuration file. The entry for data path is a string and must be in quotes. The last entry in the first section, for null-div-by-zero, is the Boolean value true and does not require quotes.

To comment out a line in heavy.conf, prepend the line with the pound sign (#) character.

For encrypted backend connections, if you do not use a configuration file to start the database, Calcite expects passwords to be supplied through the command line, and calcite passwords will be visible in the processes table. If a configuration file is supplied, then passwords must be supplied in the file. If they are not, Calcite will fail.

INSERT

Use INSERT for both single- and multi-row ad hoc inserts. (When inserting many rows, use the more efficient COPY command.)

Examples

You can also insert into a table as SELECT, as shown in the following examples:

INSERT INTO destination_table SELECT * FROM source_table;

INSERT INTO destination_table (id, name, age, gender) SELECT * FROM source_table;

You can insert array literals into array columns. The inserts in the following example each have three array values, and demonstrate how you can:

Create a table with variable-length and fixed-length array columns.
Insert NULL arrays into these colums.
Specify and insert array literals using {...} or ARRAY[...]

Default Values

If you with column that has a default value, or to add a column with a default value, using the INSERT command creates a record that includes the default value if it is omitted from the INSERT. For example, assume a table created as follows:

If you omit the name column from an INSERT or INSERT FROM SELECT statement, the missing value for column name is set to 'John Doe'.

INSERT INTO tbl (id, age) VALUES (1, 36); creates the record 1|'John Doe'|36 .

INSERT INTO tbl (id, age) SELECT id, age FROM old_tbl; also sets all the name values to John Doe .

generate_random_strings

Generates random string data.

SELECT * FROM TABLE(generate_random_strings(<num_strings>, <string_length>/)

Input Arguments

Parameter

Description

Data Type

Output Columns

Name

Description

Data Type

Example

CREATE TABLE ar (ai INT[], af FLOAT[], ad2 DOUBLE[2]); 
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}); 
INSERT INTO ar VALUES (ARRAY[NULL,2],NULL,NULL); 
INSERT INTO ar VALUES (NULL,{},{2.0,NULL});
-- or a multi-row insert equivalent
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}), (ARRAY[NULL,2],NULL,NULL), (NULL,{},{2.0,NULL});

port = 6274 
http-port = 6278
data = "/var/lib/heavyai/storage"
null-div-by-zero = true

[web]
port = 6273
frontend = "/opt/heavyai/frontend"
servers-json = "/var/lib/heavyai/servers.json"
enable-https = true

wget 
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin

mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 

apt-key adv --fetch-keys 
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub

add-apt-repository "deb 
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" 

apt update 

nvidia-smi 

apt install cuda-compat-11-0 

nvidia-smi 

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/compat/ 

nvidia-smi

[Unit] 
Description=HEAVY.AI database server 
After=network.target remote-fs.target

[Service] 
Environment=LD_LIBRARY_PATH=/usr/local/cuda-11.0/compat:$LD_LIBRARY_PATH
User=heavyai 
Group=heavyai 
WorkingDirectory=/opt/heavyai
ExecStart=/opt/heavyai/bin/heavydb --config /var/lib/heavyai/heavy.conf 
KillMode=control-group 
SuccessExitStatus=143 
LimitNOFILE=65536 
Restart=always

[Install] 
WantedBy=multi-user.target

generate_cert [{-ca} <bool>]
              [{-duration} <duration>]
              [{-ecdsa-curve} <string>]
              [{-host} <host1,host2>]
              [{-rsa-bits} <int>]
              [{-start-date} <string>]

CREATE TABLE ar (ai INT[], af FLOAT[], ad2 DOUBLE[2]); 
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}); 
INSERT INTO ar VALUES (ARRAY[NULL,2],NULL,NULL); 
INSERT INTO ar VALUES (NULL,{},{2.0,NULL});

heavysql> SELECT * FROM TABLE(generate_random_strings(10, 20);
id|rand_str
0 |He9UeknrGYIOxHzh5OZC
1 |Simnx7WQl1xRihLiH56u
2 |m5H1lBTOErpS8is00YJ
3 |eeDiNHfKzVQsSg0qHFS0
4 |JwOhUoQEI6Z0L78mj8jo
5 |kBTbSIMm25dvf64VMi
6 |W3lUUvC5ajm0W24JML
7 |XdtSQfdXQ85nvaIoyYUY
8 |iUTfGN5Jaj25LjGJhiRN
9 |72GUoTK2BzcBJVTgTGW

/home/heavyai/build/bin/KafkaImporter stream1 heavyai -p HyperInteractive -u heavyai --port 6274 --batch 1 --brokers localhost:6283  
--topic matstream --group-id 1

Field Delimiter: ,
Line Delimiter: \n
Null String: \N
Insert Batch Size: 1
1 Rows Inserted, 0 rows skipped.
2 Rows Inserted, 0 rows skipped.
3 Rows Inserted, 0 rows skipped.
4 Rows Inserted, 0 rows skipped.

package main

import (
    "crypto/aes"
    "crypto/cipher"
    "crypto/rand"
    
    "fmt"
    "io")
    
// 1. Replace example key with encryption string
var key = "v9y$B&E(H+MbQeThWmZq4t7w!z%C*F-J"

// 2. Replace strings "username", "password", "dbName"with credentials
var stringsToBeEncrypted = []string{
    "username",
    "password",
    "dbName",
}

// 3. Run program to see encrypted credentials in console
func main() {
    for i := range stringsToBeEncrypted {
        encrypted, err := EncryptString(stringsToBeEncrypted[i])
        if err != nil {
            panic(err)
        }
        fmt.Printf("%s => %s\n", stringsToBeEncrypted[i],encrypted)
    }
}

func EncryptString(str string) (encrypted string,err error) {
    keyBytes := []byte(key)
    
    block, err := aes.NewCipher(keyBytes)
    if err != nil {
        panic(err.Error())
    }
    aesGCM, err := cipher.NewGCM(block)
    if err != nil {
        panic(err.Error())
    }
    nonce := make([]byte, aesGCM.NonceSize())
    if _, err = io.ReadFull(rand.Reader, nonce); err!= nil {
        panic(err.Error())
    }
    strBytes := []byte(str)
    
    cipherBytes := aesGCM.Seal(nonce, nonce, strBytes,nil)
    
    return fmt.Sprintf("%x", cipherBytes), err
}

-- Without privilege, the following exception will occur for test_user.
-- "Violation of access privileges: user test_user has no proper privileges for object employees"
SELECT count(*) FROM employees;

-- The following is run as an super-user or administrator. 
GRANT SELECT(id) ON TABLE employees TO test_user;
-- The following works without issue for test_user.
SELECT count(*) FROM employees;

-- The following is run as an super-user or administrator.
GRANT SELECT(id) ON TABLE employees TO test_user;

-- The following query completes without error for test_user.
SELECT id FROM employees;
-- The following query does not complete and reports no proper privileges for test_user.
SELECT id, salary FROM employees;

-- The following query completes without error for test_user.
SELECT * FROM (SELECT id FROM employees);

-- The following query does not complete and reports no proper privileges for test_user.
SELECT * FROM (SELECT id, salary FROM employees);

-- The following is run as an super-user or administrator.
GRANT SELECT ON TABLE employees TO test_user;
GRANT SELECT(id) ON TABLE employees TO test_user;

-- The following query completes without error for test_user.
SELECT id FROM employees;
-- The following query completes without error for test_user.
SELECT id, salary FROM employees;

-- The following is run as an super-user or administrator.
REVOKE SELECT(id) ON TABLE employees FROM test_user;
-- The following query completes without error for test_user. The user still has table-level privileges.
SELECT id, salary FROM employees;

Implementing a Secure Binary Interface

Follow these instructions to start an HEAVY.AI server with an encrypted main port.

Required PKI Components

You need the following PKI (Public Key Infrastructure) components to implement a Secure Binary Interface.

A CRT (short for certificate) file containing the server's PKI certificate. This file must be shared with the clients that connect using encrypted communications. Ideally, this file is signed by a recognized certificate issuing agency.
A key file containing the server's private key. Keep this file secret and secure.
A Java TrustStore containing the server's PKI certificate. The password for the trust store is also required.

Although in this instance the trust store contains only information that can be shared, the Java TrustStore program requires it to be password protected.

A Java KeyStore and password.
In a distributed system, add the configuration parameters to the heavyai.conf file on the aggregator and all leaf nodes in your HeavyDB cluster.

Demonstration Script to Create "Mock/Test" PKI Components

You can use OpenSSL utilities to create the various PKI elements. The server certificate in this instance is self-signing, and should not be used in a production system.

Generate a new private key.
Use the private key to generate a certificate signing request.
Self sign the certificate signing request to create a public certificate.

To generate a keystore file from your server key:

Copy server.key to server.txt. Concatenate it with server.crt.
Use server.txt to create a PKCS12 file.
Use server.p12 to create a keystore.

Start the Server in Encrypted Mode with PKI Client Authentication

Start the server using the following options.

Example

Configuring heavyai.conf for Encrypted Connection

Alternatively, you can add the following configuration parameters to heavyai.conf to establish a Secure Binary Interface. The following configuration flags implement the same encryption shown in the runtime example above:

Passwords for the SSL truststore and keystore can be enclosed in single (') or double (") quotes.

Why Use Both server.crt and a Java TrustStore?

The server.crt file and the Java truststore contain the same public key information in different formats. Both are required by the server to establish both the secure client communication with the various interfaces and with its Calcite server. At startup, the Java truststore is passed to the Calcite server for authentication and to encrypt its traffic with the HEAVY.AI server.

When possible, joins involving a geospatial operator (such as ST_Contains) build a binned spatial hash table (overlaps hash join), falling back to a Cartesian loop join if a spatial hash join cannot be constructed.

# Table customers is very small
CREATE TABLE sales (
id INTEGER,
customerid TEXT ENCODING DICT(32),
saledate DATE ENCODING DAYS(32),
saleamt DOUBLE);

CREATE TABLE customers (
id TEXT ENCODING DICT(32),
someid INTEGER,
name TEXT ENCODING DICT(32))
WITH (partitions = 'replicated') #this causes the entire contents of this table to be replicated to each leaf node. Only recommened for small dimension tables.
SELECT c.id, c.name from sales s inner join customers c on c.id = s.customerid limit 10;

CREATE TABLE sales (
id INTEGER,
customerid BIGINT, #note the numeric datatype, so we don't need to specify a shared dictionary on the customer table
saledate DATE ENCODING DAYS(32),
saleamt DOUBLE,
SHARD KEY (customerid))
WITH (SHARD_COUNT = <num gpus in cluster>)

CREATE TABLE customers (
id TEXT BIGINT,
someid INTEGER,
name TEXT ENCODING DICT(32)
SHARD KEY (id))
WITH (SHARD_COUNT=<num gpus in cluster>);

SELECT c.id, c.name FROM sales s INNER JOIN customers c ON c.id = s.customerid LIMIT 10;

CREATE TABLE sales (
id INTEGER,
customerid TEXT ENCODING DICT(32),
saledate DATE ENCODING DAYS(32),
saleamt DOUBLE,
SHARD KEY (customerid))
WITH (SHARD_COUNT = <num gpus in cluster>)

#note the difference when customerid is a text encoded field:

CREATE TABLE customers (
id TEXT,
someid INTEGER,
name TEXT ENCODING DICT(32),
SHARD KEY (id),
SHARED DICTIONARY (id) REFERENCES sales(customerid))
WITH (SHARD_COUNT = <num gpus in cluster>)

SELECT c.id, c.name FROM sales s INNER JOIN customers c ON c.id = s.customerid LIMIT 10;

sudo start heavyai_server --port 6274 --data /data --pki-db-client-auth true  
--ssl-cert /tls_certs/self_signed_server.example.com_self_signed/self_signed_server.example.com.pem 
--ssl-private-key /tls_certs/self_signed_server.example.com_self_signed/private/self_signed_server.example.com_key.pem 
--ssl-trust-store /tls_certs/self_signed_server.example.com_self_signed/trust_store_self_signed_server.example.com.jks 
--ssl-trust-password truststore_password 
--sslkeystore /tls_certs/self_signed_server.example.com_self_signed/key_store_self_signed_server.example.com.jks
--ssl-keystore-password keystore_password 
--ssl-trust-ca = "/tls_certs/self_signed_server.example.com_self_signed/self_signed_server.example.com.pem" 
--ssl-trust-ca-server /tls_certs/ca_primary/ca_primary_cert.pem

# Start pki authentication 
pki-db-client-auth = true 
ssl-cert = "/tls_certs/self_signed_server.example.com_self_signed/self_signed_server.example.com.pem" 
ssl-private-key = "/tls_certs/self_signed_server.example.com_self_signed/private/self_signed_server.example.com_key.pem" 
ssl-trust-store = "/tls_certs/self_signed_server.example.com_self_signed/trust_store_self_signed_server.example.com.jks" 
ssl-trust-password = "truststore_password"  
ssl-keystore = "/tls_certs/self_signed_server.example.com_self_signed/key_store_self_signed_server.example.com.jks" 
ssl-keystore-password = "keystore_password" 
ssl-trust-ca = "/tls_certs/self_signed_server.example.com_self_signed/self_signed_server.example.com.pem" 
ssl-trust-ca-server = "/tls_certs/ca_primary/ca_primary_cert.pem"

$ echo "\status" | ./bin/heavysql -p HyperInteractive
License invalid
Not connected to any database. See \h for help.
Server Version                      : 7.2.4 Enterprise Edition
Host ID                             : 1234567890abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqr
--------------------------------------------------
Server Name                         : example.com
Server Start Time                   : 2024-03-01 : 00:00:00

$ mkdir -p ~/heavyai-license-server/storage
$ echo "port = 6278" > ~/heavyai-license-server/license_server.conf

$ docker run --rm -p 6280:6280 \
  -v ~/heavyai-license-server:/var/lib/heavyai \
  heavyai/heavyai-license-server

[Unit] 
Description=HEAVY.AI database server 
After=network.target remote-fs.target

[Service] 
Environment=LD_LIBRARY_PATH=/usr/local/cuda-11.0/compat:$LD_LIBRARY_PATH
WorkingDirectory=/opt/heavyai
ExecStart=/opt/heavyai/bin/heavydb --config /var/lib/heavyai/heavy.conf 
KillMode=control-group 
SuccessExitStatus=143 
LimitNOFILE=65536 
Restart=always

[Install] 
WantedBy=multi-user.target

Welcome to HEAVY.AI Documentation

Use of HEAVY.AI is subject to the terms of the HEAVY.AI End User License Agreement (EULA) and Addendum (EULA).

What Will I Learn?

For Analysts

Learn how to use to gain new insights to your data with fast, responsive graphics and .

For Administrators

Learn how to and configure your HEAVY.AI instance, then for analysis.

For Developers and Data Scientists

Learn how to extend HEAVY.AI with an integrated and custom . Contribute to the HEAVY.AI Core Open Source project.

Release Highlights

For more complete release information, see the .

Release 8.2

With the V8.2 release, we now support the Nvidia Grace Hopper Superchip which combines an Nvidia GPU with an Arm-based CPU in a single chip. This architecture supports Nvidia's NVLink interconnect which provides for high-speed bandwidth between the GPU and CPU.

Release 8.0

We are pleased to introduce HeavyIQ, a custom LLM embedded within a brand new visual notebook interface. This combination of custom model and user experience represents our vision for the future of analytics. It supports the capabilities you’d expect, including English to SQL, English to SQL-backed answer and English to graphics. We think you will be very pleased with the “out of the box” results.

While HeavyIQ is certainly the headline, there are as always a number of additional features in this release. One not yet fully apparent to a casual user is support for table and column level metadata. This is available at 8.0 in SQL, and at release will already be used by HeavyIQ to help in table and column selection. In cases where table or column names are ambiguous, we’ve found that simplifying adding a clarifying metadata comment is a simple way to improve HeavyIQ accuracy.

At 8.0, we’ve also significantly improved our support for raster and multidimensional array datasets. Since most raster data is available on huge external data stores, we’ve added raster to HeavyConnect. Now rather than to import these datasets, you have the option to link to them on-the-fly as needed. We’ve also changed the internal storage of rasters to use a tile-oriented approach aligned with fragments. This lowers memory requirements and improves performance by allowing fragment skipping. What we’ve not changed is our unified syntax for raster and vector processing. That continues to make use of raster data significantly easier than on systems with entirely different internal languages for raster and vector data processing.

Finally, this release includes major dependency updates and a more flexible license management system. The dependency updates should be transparent to most users, but are an important part of maintaining system security. The new licensing system deliberately mirrors those of our peers, now supporting “floating” as well as “node locked” licenses. As more of our customers deploy in the cloud, these new capabilities support more flexibility in resource management.

We hope you enjoy this major new release, and look forward to seeing how you put these new capabilities to expand the power and accessibility of visual analytics within your organizations.

Release 7.0

Overview

We are also pleased to announce the general availability of our new backend Executor Resource Manager with CPU / GPU parallelism and query policy controls such as executor type, memory and time limits. We can also now support CPU queries larger than available CPU memory.

This release also features the debut of a user interface for joins in Immerse (beta), supporting inner and left joins which are named and persisted in dashboards. This provides analytic and visualization access to joined columns, complementing the prior table linking function supporting cross-filtering.

Powerful machine learning (beta) and statistical methods (beta) are now available in the database, supporting high performance predictive analytics workflows. For example you can now perform clustering or run linear regression or random forest models on large datasets with interactive inferencing.

Immerse also gains a large set of dashboard refinements, including an optional ‘minimalist’ style with hidden chart titles, and an optional new text chart with full HTML and font controls.

There are several major external dependency updates in this release. With Ubuntu 18 reaching its end of life we now require Ubuntu 20.04. For similar reasons, we now support NVIDIA CUDA version 11.8, which deprecates support for Kepler GPUs. Last but not least, we are formally retiring polygon ‘render groups’ within the database, a change which is not backwards compatible. So full database backups are required as part of this upgrade.

Heavy Immerse

New Features and Improvements

BETA: Joins in Immerse
BETA: Enhanced text chart. The flag `ui/enable_new_text_chart` adds a “text2” chart type, with additional features:
- font family (e.g. arial)

HeavyML (BETA)

7.0 marks the beta release of HeavyML, a new set of capabilities to execute accelerated machine learning workflows directly from SQL.

General Capabilities and Methods

Named model creation is supported via a new CREATE MODEL statement (see the release notes and documentation for more details)
Row-wise inference (GPU-accelerated for GPU queries) can be performed via a new ML_PREDICT row-wise operator. This can be used as an Immerse custom measure and persisted into dashboards, allowing end-users to consume models without needing to know how to create or administer them.
An EVALUATE model function is provided to test models against metrics (such as r2).

Regression Algorithms

Four regression algorithms are supported initially: linear regression, random forest regression, decision trees, and Gradient Boosted Trees (GBT).
Both categorical text and continuous numeric regressors/predictors are supported. Categorical inputs are automatically one-hot-encoded.
Support for continuous variable prediction is initially supported, categorical classification is planned for a later release.

Clustering Algorithms

Two clustering algorithms are supported in this initial release: KMeans and DBScan.
Clustering algorithms can be called via associated table functions (more detail can be found in the relevant documentation), and currently support continuous numeric inputs only.

Performance and Administration

A new Executor Resource Manager (ERM) framework is provided
The ERM allows for CPU queries to run fully in parallel, and one or more CPU queries to run in parallel while a GPU query is executing (parallel GPU query kernel execution is not supported yet).
It also allows execution of CPU queries where the input datasets do not fit into the CPU buffer pool by executing on a fragment-by-fragment basis, paging from storage.

HeavyRF

New Features and Improvements

A new “cell editor” is provided. This supports multi-band antennas mounted within various sites within a cell. Various antenna attributes such as horizontal and vertical falloff can be easily applied based on an extensible library of antenna types.

Vegetation and building envelope attenuation can now be directly or indirectly specified. For example, typical values can be provided as scalar constants, or clutter object-specific attributes can be derived from normal SQL cursor queries. Vegetation attenuation can be tied to measurements of canopy moisture content from remote sensing based on seasonal statistics, or for individual dates to match drive test data. Building attenuation can be driven by various known or inferred characteristics, such as from parcels databases.

The right-hand information panel has been extended to better support targeting of large numbers of buildings. This can be done directly by searching and filtering on building attributes in the HeavyRF application, such as building type or size. But it can also be combined with analyses in Immerse extending to multiple arbitrary tags. For example, a set of locations with high customer value and high potential for churn can be identified in Immerse and tagged with attributes searchable in HeavyRF.

Last but not least, the HeavyRF platform will soon be available on NVIDIA’s LaunchPad. This facilitates initial evaluation of the software by making it immediately available together with appropriate supporting GPU hardware.

Release 6.4

HEAVY.AI continues to refine and extend the data connectors ecosystem. This release features general availability of data connectors for PostGreSQL, beta Immerse connectors for Snowflake and Redshift, and SQL support for Google BigQuery and Hive (beta). These managed data connections let you use HEAVY.AI as an acceleration platform, wherever your source data lives. Scheduling and automated caching ensure that from an end-user perspective, fast analytics are always running on the latest available data.

Immerse features four new chart types: Contour, Cross-section, Wind barb and Skew-t. While these are especially useful for atmospheric and geotechnical data visualization, Contour and Cross-section also have more general application.

Major improvements for time series analysis have been added. This includes time series comparison via window functions, and a large number of SQL window function additions and performance enhancements.

This release also includes two major architectural improvements:

The ability to perform cross-database queries in SQL, increasing flexibility across the board.
Render queries no longer block other GPU queries. In many use cases, renders can be significantly slower than other common queries. This should result in significant performance gains, particularly in map-heavy dashboards.

Release 6.2

Heavy Immerse

Chart animation through cross filter replay, allowing controlled playback of time-based data such as weather maps or GPS tracks.
You can now directly export your charts and dashboards as image files.
New control panel enables administrators to view the configuration of the system and easily access logs and system tables.

General Analytics

Numerous improvements to core SQL and geoSQL capabilities.
Support for string to numeric, timestamp, date, and time types with the new TRY_CAST operator.
Explicit and implicit cast support for numeric, timestamp, date, and time types.

Advanced Analytics

Two new functions now support direct loading of LiDAR data: tf_point_cloud_metadata quickly searches tile metadata and helps you find data to import, and tf_load_point_cloud does the actual import importing.
Network graph analytics functions have been added. These can work on networks alone, including non-geographic networks, or can find the least-cost path along a geographic network.

Release 6.1

Release 6.1.0 features more granular administrative monitoring dashboards based on logs. These have been accessible in an open format on the server side, and now they are available in Immerse, by specific dashboards, users, or queries. Intermediate and advanced SQL support continues to mature, with INSERT, window functions, and UNION ALL.

This release contains a number of user interface polish items requested by customers. Cartography now supports polygons with colorful borders and transparent fills. Table presentation has been enhanced in various ways, from alignment to zebra striping. And dashboard saving reminders have been scaled back, based on customer feedback.

The extension framework now features an enhanced “custom source” dialog, as well as new SQL commands to see installed extensions and their parameters. We introduce three new extensions. The first, tf_compute_dwell_times, reduces GPS event stream data volumes considerably while keeping relevant information. The others compute feature similarity scores and are very general.

This release also includes initial public betas of our PostgreSQL Immerse connector, and SQL support for COPY FROM ODBC database connections, making it easier to connect to your enterprise data.

Release 6.0

This release features large advances in data access, including intelligent linking to enterprise data (HeavyConnect) and support for raster geodata. SQL support includes high-performance string functions, as well as enhancements to window functions and table unions. Performance improvements are noticeable across the product, including fundamental advances in rendering, query compilation, and data transport. Our system administration tools have been expanded with a new Admin Portal, as well as additional system tables supporting detailed diagnostics. Major strides in extensibility include new charting options and a new extensions framework (beta).

Name Changes

Rebranded platform from OmniSci to HEAVY.AI, with OmniSciDB now HeavyDB, OmniSci Render now HeavyRender, and OmniSci Immerse now Heavy Immerse.

HeavyConnect and Data Import

HeavyConnect allows the HEAVY.AI platform to work seamlessly as an accelerator for data in other data lakes and data warehouses. For Release 6.0, CSV and Parquet files on local file systems and in S3 buckets can be linked or imported. Other SQL databases are also supported via ODBC (beta).
HeavyConnect enables users to specify a data refresh schedule, which ensures access to up-to-date data.
Heavy Immerse now supports import of dozens of raster data formats, including geoTIFF, geoJPEG , and PNG. HeavySQL now supports most any vector GIS file format.

Other Immerse Enhancements

New Gauge chart for easy visualization of key metrics relative to target thresholds.
New landing page and Help Center.
Enhanced mapping workflows with automated column picking.

SQL Enhancements

Support for a wide range of performant string operations using a new string dictionary translation framework, as well as the ability to on-the-fly dictionary encode none-encoded strings with a new ENCODE_TEXT operator.
Support for UNION ALL is now enabled by default, with significant performance improvements from the previous release (where it was beta flagged).
Significant functionality and performance improvements for window functions, including the ability to support expressions in PARTITION and ORDER clauses.

Performance

Parallel compilation of queries and a new multi-executor shared code cache provide up to 20% throughput/concurrency gains for interactive usage scenarios.
10X+ performance improvements in many cases for initial join queries via optimized Join Hash Table framework.
New result set recycler allows for expensive query sub-steps to be cached via the SQL hint /*+ keep_result */, which can significantly increase performance when a subquery is used across multiple queries.

System Administration

A new Admin Portal provides information on system resources usage and users.
System table support under a new information_schema database, containing 10 new system tables providing system statistics and memory and storage utilization.

Extensibility

New system and user-defined UDF framework (beta), comprising both row (scalar) and table (UDTF) functions, including the ability to define fast UDFs via Numba Python using the RBC framework, which are then inlined into the HeavyDB compiled query code for performant CPU and GPU execution.
System-provided table functions include generate_series for easy numeric series generation, tf_geo_rasterize_slope for fast geospatial binning and slope/aspect computation over elevation data, and others, with more capabilities planned for future releases.
Leveraging the new table function framework, a new HeavyRF module (licensed separate) includes tf_rf_prop and tf_rf_prop_max_signal table functions for fast radio frequency signal propagation analysis and visualization.

Release 5.10

Row-level security (RLS) can be used by an administrator to apply security filtering to queries run as a user or with a role.
Support for import from dozens of image and raster file types, such as jpeg, png, geotiff, and ESRI grid, including remote files.
Significantly more performant, parallelized window functions, executing up to 10X faster than in Release 5.9.

Release 5.9

Significant speedup for POINT and fixed-length array imports and CTAS/ITAS, generally 5-20X faster.
The PNG encoding step of a render request is no longer a blocking step, providing improvement to render concurrency.
Adds support to hide legacy chart types from add/edit chart menu in preparation for future deprecation (defaults to off).

Release 5.8

Parallel execution framework is on by default. Running with multiple executors allows parts of query evaluation, such as code generation and intermediate reductions, to be executed concurrently. Currently available for single-node deployments.
Spatial joins between geospatial point types using the ST_Distance operator are accelerated using the overlaps hash join framework, with speedups up to 100x compared to Release 5.7.1.
Significant performance gains for many query patterns through optimization of query code generation, particularly benefitting CPU queries.

Release 5.7

Extensive enhancements to Immerse support for parameters. Parameters can now be used in chart column selectors, chart filters, chart titles, global filters, and dashboard titles. Dashboards can have parameter widgets embedded on them, side-by-side with charts. Parameter values are visible in chart axes/labels, legends, and tooltips, and you can toggle parameter visibility.
In Immerse Pointmap charts, you can specify which color-by attribute always render on top, which is useful for highlight anomalies in data.
Significantly faster and more accurate "lasso" tool filters geospatial data on Immerse Pointmap charts, leveraging native geospatial intersection operations.

Release 5.6

Custom SQL dimensions, measures, and filters can now be parameterized in Immerse, enabling more flexible and powerful scenario analysis, projections, and comparison use cases.
New angle measure added to Pointmap and Scatter charts, allowing orientation data to be visualized with wedge and arrow icons.
Custom SQL modal with validation and column name display now enabled across all charts in Immerse.

Release 5.5

Ability to set annotations on New Combo charts for different dimension/measure combinations.
New ‘Arrow-over-the-wire’ capability to deliver result sets in Apache Arrow format, with ~3x performance improvement over Thrift-based result set serialization.
Support for concurrent SELECT and UPDATE/DELETE queries for single-node installations

Release 5.4

Added initial compilation support for NVIDIA Ampere GPUs.
Improved performance for UPDATE and DELETE queries.
Improved the performance of filtered group-by queries on large-cardinality string columns**.**

Release 5.3

New Combo chart type in Immerse provides increased configurability and flexibility.
Immerse chart-specific filters and quick filters add increased flexibility and speed.
Updated Immerse Filter panel provides a Simple mode and Advanced mode for viewing and creating filters.

Release 5.2

NULL support for geospatial types, including in ALTER TABLE ADD COLUMN.
: SHOW TABLES, SHOW DATABASES, SHOW CREATE TABLE, and SHOW USER SESSIONS.
Ability to perform updates and deletes on temporary tables.

Release 5.1

Added support for UPDATE via JOIN with a subquery in the WHERE clause.
Initial support for (that is, non-persistent) tables.
Improved performance for multi-column GROUP BY queries, as well as single column GROUP BY queries with high cardinality. Performance improvement varies depending on data volume and available hardware, but most use cases can expect a 1.5 to 2x performance increase over OmniSciDB 5.0.

Release 5.0

The new filter panel in Immerse enables the ability to toggle filters on and off, and introduces Filter Sets to provide quick access to different sets of filters in one dashboard.
Immerse now supports using global and cross-filters to interactively build cohorts of interest, and the ability to apply a cohort as a dashboard filter, either within the existing filter set or in a new filter set.
Data Catalog, located within Data Import, is a repository of datasets that users can use to enhance existing analyses.

This link is for the benefit of the search crawler.

Getting Started on AWS

Getting Started with AWS AMI

You can use the HEAVY.AI AWS AMI (Amazon Web Services Amazon Machine Image) to try HeavyDB and Heavy Immerse in the cloud. Perform visual analytics with the included New York Taxi database, or import and explore your own data.

Many options are available when deploying an AWS AMI. These instructions skip to the specific tasks you must perform to deploy a sample environment.

Prerequisite

You need a security key pair when you launch your HEAVY.AI instance. If you do not have one, create one before you continue.

Go to the EC2 Dashboard.
Select Key Pairs under Network & Security.
Click Create Key Pair.

Launching Your Instance

Go to the and select the version you want to use. You can get overview information about the product, see pricing, and get usage and support information.
Click Continue to Subscribe to subscribe.
Read the Terms and Conditions, and then click Continue to Configuration.

Using HEAVY.AI Immerse on Your AWS Instance

To connect to Heavy Immerse, you need your Public IP address and Instance ID for the instance you created. You can find these values on the Description tab for your instance.

To connect to Heavy Immerse:

Point your Internet browser to the public IP address for your instance, on port 6273. For example, for public IP 54.83.211.182, you would use the URL https://54.83.211.182:6273.
If you receive an error message stating that the connection is not private, follow the prompts onscreen to click through to the unsecured website. To secure your site, see .

For more information on Heavy Immerse features, see .

Importing Your Own Data

Working with your own familiar dataset makes it easier to see the advantages of HEAVY.AI processing speed and data visualization.

To import your own data to Heavy Immerse:

Export your data from your current datastore as a comma-separated value (CSV) or tab-separated value (TSV) file. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI.
Point your Internet browser to the public IP address for your instance, on port 6273. For example, for public IP 54.83.211.182, you would use the URL https://54.83.211.182:6273.

For more information, see .

Accessing Your HEAVY.AI Instance Using SSH

Follow these instructions to connect to your instance using SSH from MacOS or Linux. For information on connecting from Windows, see .

Open a terminal window.
Locate your private key file (for example, MyKey.pem). The wizard automatically detects the key you used to launch the instance.
Your key must not be publicly viewable for SSH to work. Use this command to change permissions, if needed:

heavysql> select * from table(generate_series(2, 10, 2)); 
series 
2 
4 
6 
8 
10 
5 rows returned.

heavysql> select * from table(generate_series(8, -4, -3)); 
series 
8 
5 
2 
-1 
-4
5 rows returned.

SELECT
  generate_series AS ts
FROM
  TABLE(
    generate_series(
      TIMESTAMP(0) '2021-01-01 00:00:00',
      TIMESTAMP(0) '2021-09-04 00:00:00',
      INTERVAL '1' MONTH
    )
  )
  ORDER BY ts;
  
ts
2021-01-01 00:00:00.000000000
2021-02-01 00:00:00.000000000
2021-03-01 00:00:00.000000000
2021-04-01 00:00:00.000000000
2021-05-01 00:00:00.000000000
2021-06-01 00:00:00.000000000
2021-07-01 00:00:00.000000000
2021-08-01 00:00:00.000000000
2021-09-01 00:00:00.000000000

'zero' - Export null elements as zero (or an empty string).

lat: Latitude coordinate of the center point to size from.

     Helm-workspace
          ↳heavyai
               ↳Chart.yml
               ↳values.yml
	       ↳templates
	            ↳README.pdf
                    ↳deployment.yml
          ↳misc
               ↳example-heavyai-pv.yml
               ↳example-heavyai-pvc.yml

# Default values for heavyai.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
#
# Version of heavyai to install in the format 'v7.0.0' or 'latest' for the latest version released.
version: v7.0.0
# Persistant volume claim name to use with heavyai.
pvcName: heavyai-pvc
# Namespace to install heavyai in.
nameSpace: heavyai
# Number or GPU's to assign to heavyai or 0 to run the CPU version of heavyai.
gpuNumber: 1
# NodeName to install heavyai on, if you wish to let Kubernetes schedule a host, leave it blank.
nodeName: heavyai-node
# Immerse port redirect of 6273.
hostPortImmerse: 9273
# TCP port redirect of 6274.
hostPortTCP: 9274
# HTTP port redirect of 6278.
hostPortHTTP: 9278

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: heavyai-pvc
 namespace: heavyai
spec:
 volumeMode: Filesystem
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 100Gi
 storageClassName: heavyai

apiVersion: v1
kind: PersistentVolume
metadata:
 name: heavyai-pv
spec:
 capacity:
   storage: 100Gi
 volumeMode: Filesystem
 accessModes:
   - ReadWriteOnce
 persistentVolumeReclaimPolicy: Retain
 storageClassName: heavyai
 mountOptions:
   - hard
   - nfsvers=4.1
 nfs:
   path: {your nfs path goes here }
   server: { your nfs server name goes here }

slope_weighted_exponent

slope_pct_max

use_cache

false

Users and Databases

HEAVY.AI has a default superuser named admin with default password HyperInteractive.

When you create or alter a user, you can grant superuser privileges by setting the is_super property.

You can also specify a default database when you create or alter a user by using the default_db property. During login, if a database is not specified, the server uses the default database assigned to that user. If no default database is assigned to the user and no database is specified during login, the heavyai database is used.

When an administrator, superuser, or owner drops or renames a database, all current active sessions for users logged in to that database are invalidated. The users must log in again.

Similarly, when an administrator or superuser drops or renames a user, all active sessions for that user are immediately invalidated.

If a password includes characters that are nonalphanumeric, it must be enclosed in single quotes when logging in to heavysql. For example: $HEAVYAI_PATH/bin/heavysql heavyai -u admin -p '77Heavy!9Ai'

For more information about users, roles, and privileges, see .

Nomenclature Constraints

The following are naming convention requirements for HEAVY.AI objects, described in notation:

A NAME is [A-Za-z_][A-Za-z0-9\$_]*
A DASHEDNAME is [A-Za-z_][A-Za-z0-9\$_\-]*
An EMAIL is ([^[:space:]\"]+|\".+\")@[A-Za-z0-9][A-Za-z0-9\-\.]*\.[A-Za-z]+

User objects can use NAME, DASHEDNAME, or EMAIL format.

Role objects must use either NAME or DASHEDNAME format.

Database and column objects must use NAME format.

CREATE USER

HEAVY.AI accepts (almost) any string enclosed in optional double quotation marks as the user name.

Property

Value

Examples:

DROP USER

Example:

ALTER USER

HEAVY.AI accepts (almost) any string enclosed in optional double quotation marks as the old or new user name.

Property

Value

Example:

CREATE DATABASE

Database names cannot include quotes, spaces, or special characters.

In Release 6.3.0 and later, database names are case insensitive. Duplicate database names will cause a failure when attempting to start HeavyDB 6.3.0 or higher. Check database names and revise as necessary to avoid duplicate names.

Property

Value

Example:

DROP DATABASE

Example:

ALTER DATABASE

To alter a database, you must be the owner of the database or an HeavyDB superuser.

Example:

ALTER DATABASE OWNER TO

Enable super users to change the owner of a database.

Example

Change the owner of my_database to user Joe:

Only superusers can run the ALTER DATABASE OWNER TO command.

REASSIGN OWNED

Changes ownership of database objects (tables, views, dashboards, etc.) from a user or set of users to a different user. When the ALL keyword is specified, ownership change would apply to database objects across all databases. Otherwise, ownership change only applies to database objects in the current database.

Example: Reassign database objects owned by jason and mike in the current database to joe.

Example: Reassign database objects owned by jason and mike across all databases to joe.

Database object ownership changes only for objects within the database; ownership of the database itself is not affected. You must be a superuser to run this command.

Database Security Example

See in for a database security example.

LDAP Integration

HEAVY.AI supports LDAP authentication using an IPA Server or Microsoft Active Directory.

You can configure HEAVY.AI Enterprise edition to map LDAP roles 1-to-1 to HEAVY.AI roles. When you enable this mapping, LDAP becomes the main authority controlling user roles in HEAVY.AI.

LDAP mapping is available only in HEAVY.AI Enterprise edition.

HEAVY.AI supports five configuration settings that allow you to integrate with your LDAP server.

Parameter

Description

Example

Obtaining Credential Information

To find the ldap-role-query-url and ldap-role-query-regex to use, query your user roles. For example, if there is a user named kiran on the IPA LDAP server ldap://myldapserver.mycompany.com, you could use the following curl command to get the role information:

When successful, it returns information similar to the following:

ldap-dn matches the DN, which is uid=kiran,cn=users,cn=accounts,dc=mycompany,dc=com.
ldap-role-query-url includes the LDAP URI + the DN + the LDAP attribute that represents the role/group the member belongs to, such as memberOf.

Make sure that LDAP configuration appears before the [web] section of heavy.conf.

Double quotes are not required for LDAP properties in heavy.conf. For example, both of the following are valid:

ldap-uri = "ldap://myldapserver.mycompany.com" ldap-uri = ldap://myldapserver.mycompany.com

Setting Up LDAP with HEAVY.AI

To integrate LDAP with HEAVY.AI, you need the following:

A functional LDAP server, with all users/roles/groups created (ldap-uri, ldap-dn, ldap-role-query-url, ldap-role-query-regex, and ldap-superuser-role) to be used by HEAVY.AI. You can use the curl command to test and find the filters.

Once you have your server information, you can configure HEAVY.AI to use LDAP authentication.

Locate the heavy.conf file and edit it to include the LDAP parameter. For example:
Restart the HEAVY.AI server:
Log on to heavysql as MyCompany user, or any user who belongs to one of the roles/groups that match the filter.

When you use LDAP authentication, the default admin user and password HyperInteractive do not work unless you create the admin user with the same password on the LDAP server.

If your login fails, inspect $HEAVYAI_STORAGE/mapd_log/heavyai_server.INFO to check for any obvious errors about LDAP authentication.

Once you log in, you can create a new role name in heavysql, and then apply GRANT/REVOKE privileges to the role. Log in as another user with that role and confirm that GRANT/REVOKE works.

If you refresh the browser window, you are required to log in and reauthenticate.

Using LDAPS

To use LDAPS, HEAVY.AI must trust the LDAP server's SSL certificate. To achieve this, you must have the CA for the server's certificate, or the server certificate itself. Install the certificate as a trusted certificate.

IPA on CentOS

To use IPA as your LDAP server with HEAVY.AI running on CentOS 7:

Copy the IPA server CA certificate to your local machine.
Update the PKI certificates.
Edit /etc/openldap/ldap.conf to add the following line.

IPA on Ubuntu

To use IPA as your LDAP server with HEAVY.AI running on Ubuntu:

Copy the IPA server CA certificate to your local machine.
Rename ipa-ca.crm to ipa-ca.crt so that the certificates bundle update script can find it:
Update the PKI certificates:

Active Directory

1. Locate the heavy.conf file and edit it to include the LDAP parameter.

Example 1:

Example 2:

2. Restart the HEAVY.AI server:

Other LDAP user authentication attributes, such as userPrincipalName, are not currently supported.

dnf list cuda-toolkit-* | egrep -v config

Available Packages
cuda-toolkit-10-1.x86_64                     10.1.243-1        cuda-rhel8-x86_64
cuda-toolkit-10-2.x86_64                     10.2.89-1         cuda-rhel8-x86_64
cuda-toolkit-11-0.x86_64                     11.0.3-1          cuda-rhel8-x86_64
cuda-toolkit-11-1.x86_64                     11.1.1-1          cuda-rhel8-x86_64
cuda-toolkit-11-2.x86_64                     11.2.2-1          cuda-rhel8-x86_64
cuda-toolkit-11-3.x86_64                     11.3.1-1          cuda-rhel8-x86_64
cuda-toolkit-11-4.x86_64                     11.4.4-1          cuda-rhel8-x86_64
cuda-toolkit-11-5.x86_64                     11.5.2-1          cuda-rhel8-x86_64
cuda-toolkit-11-6.x86_64                     11.6.2-1          cuda-rhel8-x86_64
cuda-toolkit-11-7.x86_64                     11.7.1-1          cuda-rhel8-x86_64
cuda-toolkit-11-8.x86_64                     11.8.0-1          cuda-rhel8-x86_64
cuda-toolkit-12.x86_64                       12.5.0-1          cuda-rhel8-x86_64
cuda-toolkit-12-0.x86_64                     12.0.1-1          cuda-rhel8-x86_64
cuda-toolkit-12-1.x86_64                     12.1.1-1          cuda-rhel8-x86_64
cuda-toolkit-12-2.x86_64                     12.2.2-1          cuda-rhel8-x86_64
cuda-toolkit-12-3.x86_64                     12.3.2-1          cuda-rhel8-x86_64
cuda-toolkit-12-4.x86_64                     12.4.1-1          cuda-rhel8-x86_64
cuda-toolkit-12-5.x86_64                     12.5.0-1          cuda-rhel8-x86_64

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

SELECT
  *
FROM
  TABLE(
    tf_feature_similarity(
      primary_features => CURSOR(
        SELECT
          primary_key,
          pivot_features,
          metric
        from
          table
        where
          ...
        group by
          primary_key,
          pivot_features
      ),
      comparison_features => CURSOR(
        SELECT
          comparison_metric
        from
          table
        where
          ...
        group by <column>
      ),
      use_tf_idf => <boolean>
    )
  )

/* Compute the similarity of US airline flight nums to a particular
Delta flight (DAL795) based on the cosine similarity of the overlap of
flight paths binned to a H3 Hex at zoom level 7 (roughly 5 sq km),  
and return the top 10 most similar flight nums */

SELECT
  *
FROM
  TABLE(
    tf_feature_similarity(
      primary_features => CURSOR(
        SELECT
          callsign,
          geotoh3(st_x(location), st_y(location), 7) as h3,
          count(*) as n
        from
          adsb_2021_03_01
        where
          operator in (
            'Delta Air Lines',
            'Alaska Airlines',
            'Southwest Airlines',
            'American Airlines',
            'United Airlines'
          )
          and altitude >= 1000
        group by
          callsign,
          h3
      ),
      comparison_features => CURSOR(
        SELECT
          geotoh3(st_x(location), st_y(location), 7) as h3,
          COUNT(*) as n
        from
          adsb_2021_03_01
        where
          callsign = 'DAL795'
          and altitude >= 1000
        group by
          h3
      ),
      use_tf_idf => false
    )
  )
ORDER BY
  similarity_score desc
limit
  10;
  
class|similarity_score
DAL795|1
DAL538|0.610889
DAL1192|0.3419932
DAL1185|0.3391671
SWA4346|0.3206964
DAL365|0.3037131
SWA953|0.2912168
UAL1559|0.2747431
SWA2098|0.2511763
DAL526|0.2473387

select * from table( 
  tf_compute_dwell_times( 
    data => CURSOR( 
      select 
        entity_id, 
        site_id, 
        ts, 
      from 
        <table> 
      where 
        ... 
        ), 
        min_dwell_seconds => <seconds>, 
        min_dwell_points => <points>, 
        max_inactive_seconds => <seconds> 
        ) 
      );

/* Data from https://www.kaggle.com/datasets/vodclickstream/netflix-audience-behaviour-uk-movies */

select
  *
from
  table(
    tf_compute_dwell_times(
      data => cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      min_dwell_points => 3,
      min_dwell_seconds => 600,
      max_inactive_seconds => 10800
    )
  )
order by
  num_dwell_points desc
limit
  10;

entity_id|site_id|prev_site_id|next_site_id|session_id|start_seq_id|ts|dwell_time_sec|num_dwell_points
59416738c3|cbdf9820bc|d058594d1c|863b39bbe8|2|19|2017-02-21 15:12:11.000000000|4391|54
16d994f6dd|1bae944666|4f1cf3c2dc|NULL|5|61|2017-11-11 20:27:02.000000000|9570|36
3675d9ba4a|948f2b5bf6|948f2b5bf6|69cb38018a|2|11|2018-11-26 18:42:52.000000000|3600|34
da01959c0b|fd711679f9|1f579d43c3|NULL|5|90|2019-03-21 05:37:22.000000000|7189|31
23c52f9b50|df00041e47|df00041e47|NULL|2|39|2019-01-21 15:53:33.000000000|1227|29
da01959c0b|8ab46a0cb1|f1fffa6ff4|1f579d43c3|3|29|2019-03-12 04:33:01.000000000|6026|29
23c52f9b50|df00041e47|NULL|df00041e47|1|10|2019-01-21 15:33:39.000000000|1194|28
da01959c0b|1f579d43c3|8ab46a0cb1|fd711679f9|4|63|2019-03-17 02:01:49.000000000|7240|27
3261cb81a5|1cb40406ae|NULL|NULL|1|2|2019-04-28 20:48:24.000000000|11240|27
dbed64ce9e|c5830185ca|NULL|NULL|1|3|2019-03-01 06:43:32.000000000|7261|25

omnisql> SELECT * FROM test_array;
name|colors|qty
Banana|{green, yellow}|{1, 2}
Cherry|{red, black}|{1, 1}
Olive|{green, black}|{1, 0}
Onion|{red, white}|{1, 1}
Pepper|{red, green, yellow}|{1, 2, 3}
Radish|{red, white}|{}
Rutabaga|NULL|{}
Zucchini|{green, yellow}|{NULL}

heavysql> EXPLAIN calcite (SELECT * FROM movies ORDER BY title);
Explanation
LogicalSort(sort0=[$1], dir0=[ASC])
   LogicalProject(movieId=[$0], title=[$1], genres=[$2])
      LogicalTableScan(TABLE=[[CATALOG, omnisci, MOVIES]])

heavysql> EXPLAIN calcite SELECT bc.firstname, bc.lastname, b.title, bo.orderdate, s.name
FROM book b, book_customer bc, book_order bo, shipper s
WHERE bo.cust_id = bc.cust_id AND b.book_id = bo.book_id AND bo.shipper_id = s.shipper_id
AND s.name = 'UPS';
Explanation
LogicalProject(firstname=[$5], lastname=[$6], title=[$2], orderdate=[$11], name=[$14])
    LogicalFilter(condition=[AND(=($9, $4), =($0, $8), =($10, $13), =($14, 'UPS'))])
        LogicalJoin(condition=[true], joinType=[INNER])
            LogicalJoin(condition=[true], joinType=[INNER])
                LogicalJoin(condition=[true], joinType=[INNER])
                    LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK]])
                    LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_CUSTOMER]])
                LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_ORDER]])
            LogicalTableScan(TABLE=[[CATALOG, omnisci, SHIPPER]])

heavysql> EXPLAIN calcite SELECT bc.firstname, bc.lastname, b.title, bo.orderdate, s.name
FROM book_order bo, book_customer bc, book b, shipper s
WHERE bo.cust_id = bc.cust_id AND bo.book_id = b.book_id AND bo.shipper_id = s.shipper_id
AND s.name = 'UPS';
Explanation
LogicalProject(firstname=[$10], lastname=[$11], title=[$7], orderdate=[$3], name=[$14])
    LogicalFilter(condition=[AND(=($1, $9), =($5, $0), =($2, $13), =($14, 'UPS'))])
        LogicalJoin(condition=[true], joinType=[INNER])
            LogicalJoin(condition=[true], joinType=[INNER])
                LogicalJoin(condition=[true], joinType=[INNER])
                  LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_ORDER]])
                  LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_CUSTOMER]])
                LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK]])
            LogicalTableScan(TABLE=[[CATALOG, omnisci, SHIPPER]])

heavysql> EXPLAIN CALCITE SELECT x, SUM(y) FROM test GROUP BY x;
Explanation
LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
  LogicalProject(x=[$0], y=[$2])
    LogicalTableScan(table=[[testDB, test]])

heavysql> EXPLAIN CALCITE DETAILED SELECT x, SUM(y) FROM test GROUP BY x;
Explanation
LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])	{[$1->db:testDB,tableName:test,colName:y]}
  LogicalProject(x=[$0], y=[$2])	{[$2->db:testDB,tableName:test,colName:y], [$0->db:testDB,tableName:test,colName:x]}
    LogicalTableScan(table=[[testDB, test]])

SELECT * FROM TABLE(
    tf_raster_graph_shortest_slope_weighted_path(
        raster => CURSOR(
            SELECT x, y, z FROM table
        ),
        agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
        bin_dim => <meters>,
        geographic_coords => <true/false>,
        neighborhood_fill_radius => <num bins>,
        fill_only_nulls => <true/false>,
        origin_x => <origin x coordinate>,
        origin_y => <origin y coordinate>,
        destination_x => <destination x coordinate>,
        destination_y => <destination y coordinate>,
        slope_weighted_exponent => <exponent>,
        slope_pct_max => <max pct slope>
    )

/* Compute the shortest slope weighted path over a 30m Copernicus 
Digital Elevation Model (DEM) input raster comprising the area around Mt. Everest,
to compute the shorest slope-weighted path from the plains of Nepal to the peak */

create table mt_everest_climb as
select
  path_step,
  st_setsrid(st_point(x, y), 4326) as path_pt
from
  table(
    tf_raster_graph_shortest_slope_weighted_path(
      raster => cursor(
        select
          st_x(raster_point),
          st_y(raster_point),
          z
        from
          copernicus_30m_mt_everest
      ),
      agg_type => 'AVG',
      bin_dim => 30,
      geographic_coords => TRUE,
      neighborhood_fill_radius => 1,
      fill_only_nulls => FALSE,
      origin_x => 86.01,
      origin_y => 27.01,
      destination_x => 86.9250,
      destination_y => 27.9881,
      slope_weight_exponent => 4,
      slope_pct_max => 50
    )
  );

SELECT * FROM TABLE(
    tf_load_point_cloud(
        path => <path>,
        [out_srs => <out_srs>,
        use_cache => <use_cache>,
        x_min => <x_min>,
        x_max => <x_max>,
        y_min => <y_min>,
        y_max => <y_max>]
    )
)

SELECT
  x, y, z, classification
FROM
  TABLE(
    tf_load_point_cloud(
      path => '/path/to/las_files/*.las',
      out_srs => 'EPSG:4326',
      use_cache => true,
      y_min => 37.0,
      y_max => 38.0,
      x_min => -123.0,
      x_max => -122.0
    )
  )

SELECT * FROM TABLE(
  tf_mandelbrot( 
    x_pixels => <x_pixels>,
    y_pixels => <y_pixels>,
    x_min => <x_min>,
    x_max => <x_max>,
    y_min => <y_min>,
    y_max => <y_max>,
    max_iterations => <max_iterations>
  )
)

SELECT * FROM TABLE(
    tf_graph_shortest_path(
        edge_list => CURSOR(
            SELECT node1, node2, distance FROM table
        ),
        origin_node => <origin node>,
        destination_node => <destination node>
    )

select * from table(
  tf_feature_self_similarity(
    primary_features => cursor(
      select
        primary_key,
        pivot_features,
        metric
      from
        table
      group by
        primary_key,
        pivot_features
    ),
    use_tf_idf => <boolean>))

ldap-role-query-regex is a regular expression that matches the role names. The matching role names are used to grant and revoke privileges in HEAVY.AI. For example, if we created some roles on an IPA LDAP server where the role names begin with MyCompany_ (for example, _MyCompany__Engineering, MyCompany_Sales, _MyCompany__SuperUser), the regular expression can filter the role names using MyCompany_.

ALTER USER admin (password = 'HeavyaiIsFast!');
ALTER USER jason (is_super = 'false', password = 'SilkySmooth', default_db='traffic');
ALTER USER methuselah RENAME TO aurora;
ALTER USER "pembroke.q.aloysius" RENAME TO "pembroke.q.murgatroyd";
ALTER USER chumley (can_login='false');

/* Compute the 10 furthest destination airports as measured by average travel-time
when departing origin airport 'RDU' (Raleigh-Durham, NC) on United Airlines for the
year 2008, adding 60 minutes for each leg to account forboarding/plane change time 
costs. */

SELECT
  *
FROM
  TABLE(
    tf_graph_shortest_paths_distances(
      edge_list => CURSOR(
        SELECT
          origin,
          dest,
          /* Add 60 minutes to each leg to account for boarding/plane change costs */
          AVG(airtime) + 60 as avg_airtime
        FROM
          flights_2008
        WHERE
          carrier_name = 'United Air Lines'
        GROUP by
          origin,
          dest
      ),
      origin_node => 'RDU'
    )
  )
ORDER BY
  distance DESC
LIMIT
  10;
  
origin_node|destination_node|distance|num_edges_traversed
RDU|JFK|803|3
RDU|LIH|757|2
RDU|KOA|746|2
RDU|HNL|735|2
RDU|OGG|728|2
RDU|EUG|595|3
RDU|ANC|586|2
RDU|SJC|468|2
RDU|SFO|468|2
RDU|OAK|468|2

/* Compute the all-destinations path distances along a time-traversal weighted
edge graph of roads in the Eastern United States from a location in North Carolina joining to a node locations table to output the lon/lat pairs 
of each destination node. */

select
  destination_node,
  lon,
  lat
  distance,
  num_edges_traversed
from
  table(
    tf_graph_shortest_paths_distances(
      cursor(
        select
          node1,
          node2,
          traversal_time
        from
          usa_roads_east_time
      ),
      1561955
    )
  ),
  USA_roads_east_coords
where
  destination_node = node_id
order by
  distance desc
limit
  20;
  
destination_node|lon|lat|distance|num_edges_traversed
2228153|-69.74701|46.941648|22021532|5387
324156|-69.67822799999999|46.990543|21916494|5386
324151|-69.687833|46.933106|21906798|5386
1372661|-69.64962799999999|46.942144|21830101|5385
320610|-69.47672399999999|46.967413|21807384|5379
324152|-69.637714|46.958516|21798959|5385
1372667|-69.633437|46.95189999999999|21793379|5385
1372662|-69.63483099999999|46.954334|21786119|5384
2228156|-69.622767|46.949534|21768541|5383
1372670|-69.58720599999999|46.942504|21759257|5382
1372663|-69.62387099999999|46.968569|21741445|5383
2226724|-69.557773|46.969276|21714682|5381
324159|-69.607209|46.967823|21709789|5382
324160|-69.59385999999999|46.967445|21691648|5382
2228155|-69.59575599999999|46.967461|21688053|5381
320578|-69.57176699999999|47.067628|21683322|5377
1372669|-69.58906999999999|46.977104|21675010|5382
2226740|-69.582106|46.991048|21673764|5379
320609|-69.55000199999999|46.966089|21668411|5378
324158|-69.585776|46.973521|21663260|5381

ldap-uri = "ldaps://myldapserver.mycompany.com"
ldap-dn = "uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com"
ldap-role-query-url = "ldaps://myldapserver.mycompany.com/uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com?memberOf"
ldap-role-query-regex = "(MyCompany_.*?),"
ldap-superuser-role = "MyCompany_SuperUser"

ldap-uri = "ldaps://myldapserver.mycompany.com"
ldap-dn = "uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com"
ldap-role-query-url = "ldaps://myldapserver.mycompany.com/uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com?memberOf"
ldap-role-query-regex = "(MyCompany_.*?),"
ldap-superuser-role = "MyCompany_SuperUser"

DN: uid=kiran,cn=users,cn=accounts,dc=mycompany,dc=com
memberOf: cn=ipausers,cn=groups,cn=accounts,dc=mycompany,dc=com
memberOf: cn=MyCompany_SuperUser,cn=roles,cn=accounts,dc=mycompany,dc=com
memberOf: cn=test,cn=groups,cn=accounts,dc=mycompany,dc=com

ldap-uri = "ldap://myldapserver.mycompany.com"
ldap-dn = "uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com"
ldap-role-query-url = "ldap://myldapserver.mycompany.com/uid=$USERNAME,cn=users,cn=accounts,dc=mycompany,dc=com?memberOf"
ldap-role-query-regex = "(MyCompany_.*?),"
ldap-superuser-role = "MyCompany_SuperUser"

ldap-uri = "ldap://myldapserver.mycompany.com"
ldap-dn = "cn=$USERNAME,cn=users,dc=qa-mycompany,dc=com"
ldap-role-query-url = "ldap:///myldapserver.mycompany.com/cn=$USERNAME,cn=users,dc=qa-mycompany,dc=com?memberOf"
ldap-role-query-regex = "(HEAVYAI_.*?),"
ldap-superuser-role = "HEAVYAI_SuperUser"

ldap-uri = "ldap://myldapserver.mycompany.com"
ldap-dn = "[email protected]"
ldap-role-query-url = "ldap:///myldapserver.mycompany.com/OU=MyCompany Users,dc=MyCompany,DC=com?memberOf?sub?(sAMAccountName=$USERNAME)"
ldap-role-query-regex = "(HEAVYAI_.*?),"
ldap-superuser-role = "HEAVYAI_SuperUser"

/* Compute the shortest flight route on United Airlines for the year 2008 as measured
by flight time between origin airport 'RDU' (Raleigh-Durham, NC) and destination 
airport 'SAT' (San Antonio, TX), adding 60 minutes for each leg to account for 
boarding/plane change time costs, and only counting routes that were flown at least
300 times during the year. */
 
SELECT
  *
FROM
  TABLE(
    tf_graph_shortest_path(
      edge_list => CURSOR(
        SELECT
          origin,
          dest,
          /* Add 60 minutes to each leg to account
          for boarding/plane change costs */
          AVG(airtime) + 60 as avg_airtime
        FROM
          flights_2008
        WHERE
          carrier_name = 'United Air Lines'
        GROUP by
          origin,
          dest
        HAVING
          COUNT(*) > 300
      ),
      origin_node => 'RDU',
      destination_node => 'SAT'
    )
  )
ORDER BY
  path_step
 
path_step|node|cume_distance
1|RDU|0
2|ORD|167
3|DEN|354
4|SAT|519

/* Compute the shortest path between along a time-traversal weighted
edge graph of roads in the Eastern United States between a location in North Carolina and
a location in Maine, joining to a node locations table to output the lon/lat pairs 
of each node. */

select
  path_step,
  node,
  lon,
  lat,
  cume_distance
from
  table(
    tf_graph_shortest_path(
      cursor(
        select
          node1,
          node2,
          traversal_time
        from
          usa_roads_east_time
      ),
      1561955,
      1591319
    )
  ),
  USA_roads_east_coords
where
  node = node_id 
order by 
  cume_distance desc
limit 20;

path_step|node|lon|lat|cume_distance
4380|1591319|-71.55136299999999|43.75256|13442017
4379|1591989|-71.55174099999999|43.75245|13441199
4378|1589348|-71.554147|43.752464|13436371
4377|2315795|-71.554867|43.752489|13434924
4376|1589286|-71.55497099999999|43.752113|13434214
4375|1589285|-71.555049|43.751833|13433685
4374|2315785|-71.555999|43.750704|13431238
4373|2315973|-71.55798799999999|43.748622|13426553
4372|2315950|-71.56366299999999|43.746268|13417798
4371|1589788|-71.56476599999999|43.745765|13416053
4370|1591997|-71.56484|43.745691|13415884
4369|1589787|-71.564886|43.745645|13415779
4368|2315951|-71.56517599999999|43.745353|13415113
4367|2315952|-71.56659499999999|43.744599|13412756
4366|1591999|-71.56685899999999|43.744565|13412397
4365|543394|-71.567357|43.744335|13411606
4364|543393|-71.567832|43.744116|13410852
4363|543392|-71.571827|43.743673|13405444
4362|541181|-71.57268499999999|43.743802|13404271
4361|1589786|-71.572964|43.743844|13403890

SELECT * FROM TABLE(
  tf_geo_rasterize_slope(
      raster => CURSOR(
        SELECT 
           x, y, z FROM table
      ),
      agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
      bin_dim_meters => <meters>, 
      geographic_coords => <true/false>, 
      neighborhood_fill_radius => <radius in bins>,
      fill_only_nulls => <true/false>,
      compute_slope_in_degrees => <true/false>
    )
 )

/* Compute the slope and aspect ratio for a 30-meter Copernicus 
Digital Elevation Model (DEM) raster, binned to 90-meters */

select
  *
from
  table(
    tf_geo_rasterize_slope(
      raster => cursor(
        select
          st_x(raster_point),
          st_y(raster_point),
          CAST(z AS float)
        from
          copernicus_30m_mt_everest
      ),
      agg_type => 'AVG',
      bin_dim_meters => 90.0,
      geographic_coords => true,
      neighborhood_fill_radius => 1,
      fill_only_nulls => false,
      compute_slope_in_degrees => true
    )
  )
order by
  slope desc nulls last
limit
  20;
  
x|y|z|slope|aspect
86.96533511629579|27.96534132281817|6212.096|78.37033|18.09232
87.23751907091268|27.78489838800869|3793.584|78.17864|125.03
87.23660262662104|27.78408922686605|3929.989|78.06877|127.629
86.96625156058742|27.96534132281817|6041.277|78.00574|19.00616
87.2356861823294|27.78328006572341|3981.662|77.53327|127.3175
86.96441867200414|27.96615048396082|5869.373|77.3751|20.82031
86.95800356196267|27.96857796738875|6083.791|77.13709|29.89468
86.96350222771251|27.96615048396082|6081.35|77.08266|21.6792
87.23843551520432|27.78570754915134|3630.32|77.04676|125.2154
86.96441867200414|27.96534132281817|6378.94|76.95021|17.77107
87.22468885082972|27.81321902800121|4771.554|76.71017|253.2764
87.2356861823294|27.78247090458076|3520.049|76.63997|113.6511
87.23660262662104|27.78328006572341|3445.282|76.38319|127.2889
86.96716800487906|27.96534132281817|5864.711|76.16835|19.27573
87.23476973803776|27.78166174343812|3945.683|76.13519|102.7789
86.95708711767104|27.96857796738875|6336.072|76.13168|24.90349
87.22468885082972|27.81240986685857|4732.937|76.07494|264.7046
87.23751907091268|27.78408922686605|3367.659|76.0099|126.7463
86.9589200062543|27.9677688062461|6223.083|75.46346|26.85898
87.22377240653809|27.81402818914385|4704.619|75.41299|205.3219

SELECT
  file_name,
  num_points,
  specified_utm_zone,
  x_min_4326,
  x_max_4326,
  y_min_4326,
  y_max_4326
FROM
  TABLE(
    tf_point_cloud_metadata(
      path => '/home/todd/data/lidar/las_files/*2010_00000*.las'
    )
  )
ORDER BY
  file_name;
  
file_name|num_points|specified_utm_zone|x_min_4326|x_max_4326|y_min_4326|y_max_4326
ARRA-CA_GoldenGate_2010_000001.las|2063102|10|-122.9943066785969|-122.9772226614453|37.97913478250298|37.99265200734278
ARRA-CA_GoldenGate_2010_000002.las|4755131|10|-122.9943056338411|-122.9772184796481|37.99265416515848|38.00617135784082
ARRA-CA_GoldenGate_2010_000003.las|4833631|10|-122.9943045883859|-122.9772142950517|38.00617351665583|38.01969067717678
ARRA-CA_GoldenGate_2010_000004.las|6518715|10|-122.9943035422309|-122.9772101076538|38.01969283699149|38.03320996534712
ARRA-CA_GoldenGate_2010_000005.las|7508919|10|-122.9943024953755|-122.9772059174526|38.03321212616189|38.04672922234828
ARRA-CA_GoldenGate_2010_000006.las|7442130|10|-122.9943014478193|-122.977201724446|38.04673138416345|38.06024844817669
ARRA-CA_GoldenGate_2010_000007.las|5610772|10|-122.9943003995618|-122.9771975286321|38.06025061099263|38.07376764282882
ARRA-CA_GoldenGate_2010_000008.las|3515095|10|-122.9942993506024|-122.9771933300088|38.07376980664591|38.08728680630115
ARRA-CA_GoldenGate_2010_000009.las|1689283|10|-122.9942898783015|-122.9771554156435|38.19544116402802|38.20895787388029

/* Compute similarity of airlines by the airports they fly from */

select
  *
from
  table(
    tf_feature_self_similarity(
      primary_features => cursor(
        select
          carrier_name,
          origin,
          count(*) as num_flights
        from
          flights_2008
        group by
          carrier_name,
          origin
      ),
      use_tf_idf => false
    )
  )
where
  similarity_score <= 0.99
order by
  similarity_score desc
limit
  20;
  
class1|class2|similarity_score
Expressjet Airlines|Continental Air Lines|0.9564615
Delta Air Lines|Atlantic Southeast Airlines|0.9436753
Delta Air Lines|AirTran Airways Corporation|0.9379856
Atlantic Southeast Airlines|AirTran Airways Corporation|0.9326661
American Eagle Airlines|American Airlines|0.8906327
Northwest Airlines|Pinnacle Airlines|0.8222722
Skywest Airlines|United Air Lines|0.6857293
Mesa Airlines|US Airways|0.6116939
United Air Lines|Frontier Airlines|0.5921053
Mesa Airlines|United Air Lines|0.5686765
United Air Lines|American Eagle Airlines|0.5272493
Skywest Airlines|Frontier Airlines|0.4684323
Southwest Airlines|US Airways|0.4166781
United Air Lines|American Airlines|0.397027
Comair|JetBlue Airways|0.3631534
Mesa Airlines|American Eagle Airlines|0.3379275
Skywest Airlines|American Eagle Airlines|0.3331468
Mesa Airlines|Skywest Airlines|0.3235496
Comair|Delta Air Lines|0.3075919
Southwest Airlines|Mesa Airlines|0.2901711

/* Compute the similarity of US States by the TF-IDF
 weighted cosine similarity of the words tweeted in each state */
 
 select
  *
from
  table(
    tf_feature_self_similarity(
      primary_features => cursor(
        select
          state_abbr,
          unnest(tweet_tokens),
          count(*)
        from
          tweets_2022_06
        where country = 'US'
        group by
          state_abbr,
          unnest(tweet_tokens)
      ),
      use_tf_idf => TRUE
    )
  )
where
  class1 <> class2
order by
  similarity_score desc;
  
TX|GA|0.9928479
IL|TN|0.9920474
IL|NC|0.9920027
TX|IL|0.9917723
IN|OH|0.9916649
TN|NC|0.9915619
CA|TX|0.9910875
IN|VA|0.9909871
CA|IL|0.9909689
IL|OH|0.9909481
TX|NC|0.9908867
IL|MO|0.9907863
IN|MI|0.990751
TN|OH|0.9907123
IL|MD|0.9907106
OH|NC|0.9905779
VA|OH|0.990536
IN|IL|0.9904549
IN|MO|0.9903805
TX|TN|0.9903381

[
  {
    "host": "node1",
    "port": 16274,
    "role": "dbleaf"
  },
  {
    "host": "node2",
    "port": 16274,
    "role": "dbleaf"
  },
 {
    "host": "node3",
    "port": 16274,
    "role": "dbleaf"
  },
  {
    "host": "node4",
    "port": 16274,
    "role": "dbleaf"
  },

  {
    "host": "node2",
    "port": 6277,
    "role": "string"
  }
]

port = 6274
http-port = 6278
calcite-port = 6279
data = "<location>/heavyai-storage/nodeLocalAggregator/data"
read-only = false
num-gpus = 1
cluster = "<location>/heavyai-storage/cluster.conf"

[web]
port = 6273
frontend = "<location>/prod/heavyai/frontend"

SELECT
  contour_[lines|polygons],
  contour_values
FROM TABLE(
  tf_raster_contour_[lines|polygons](
    raster => CURSOR(
      <lon>,
      <lat>,
      <value>
    ),
    agg_type => ‘<agg_type>’,
    bin_dim_meters => <bin_dim_meters>,
    neighborhood_fill_radius => <neighborhood_fill_radius>,
    fill_only_nulls => <fill_only_nulls>,
    fill_agg_type => ‘<fill_agg_type>’,
    flip_latitude => <flip_latitude>,
    contour_interval => <contour_interval>,
    contour_offset => <contour_offset>
  )
);

SELECT
  contour_[lines|polygons],
  contour_values
FROM TABLE(
  tf_raster_contour_[lines|polygons](
    raster => CURSOR(
      <lon>,
      <lat>,
      <value>
    ),
    raster_width => <raster_width>,
    raster_height => <raster_height>,
    flip_latitude => <flip_latitude>,
    contour_interval => <contour_interval>,
    contour_offset => <contour_offset>
  )
);

SELECT
  contour_lines,
  contour_values
FROM TABLE(
  tf_raster_contour_lines(
    raster => CURSOR(
      SELECT
        lon,
        lat,
        elevation
      FROM
        elevation_table
    ),
    agg_type => ‘AVG’,
    bin_dim_meters => 10.0,
    neighborhood_fill_radius => 0,
    fill_only_nulls => FALSE,
    fill_agg_type => ‘AVG’,
    flip_latitude => FALSE,
    contour_interval => 100.0,
    contour_offset => 0.0
  )
);

SELECT
  contour_polygons,
  contour_values
FROM TABLE(
  tf_raster_contour_polygons(
    raster => CURSOR(
      SELECT
        lon,
        lat,
        elevation
      FROM
        elevation_table
    ),
    raster_width => 1024,
    raster_height => 1024,
    flip_latitude => FALSE,
    contour_interval => 100.0,
    contour_offset => 0.0
  )
);

HEAVY.AI Installation on RHEL

This is an end-to-end recipe for installing HEAVY.AI on a Red Hat Enterprise 8.x machine using CPU and GPU devices.

The order of these instructions is significant. To avoid problems, install each component in the order presented.

The same instructions can be used to install on RL / RHEL 9, which some minor modifications.

Assumptions

These instructions assume the following:

You are installing a "clean" Rocky Linux / RHEL 8 host machine with only the operating system.
Your HEAVY.AI host only runs the daemons and services required to support HEAVY.AI.
Your HEAVY.AI host is connected to the Internet.

Preparation

Prepare your machine by updating your system and optionally enabling or configuring a firewall.

Update and Reboot

Update the entire system and reboot the system if needed.

Install the utilities needed to create HEAVY.AI repositories and download installation binaries.

JDK

Follow these instructions to install a headless JDK and configure an environment variable with a path to the library. The “headless” Java Development Kit does not provide support for keyboard, mouse, or display systems. It has fewer dependencies and is best suited for a server host. For more information, see .

Open a terminal on the host machine.
Install the headless JDK using the following command:

Create the HEAVY.AI User

Create a group called heavyai and a user named heavyai, who will own HEAVY.AI software and data on the file system.

You can create the group, user, and home directory using the useradd command with the --user-group and --create-home switches:

Set a password for the user using the passwd command.

Installation

There are two ways to install the heavy.ai software

To install software using DNF's package manager, you can utilize DNF's package management capabilities to search for and then install the desired software. This method provides a convenient and efficient way to manage software installations and dependencies on your system.\
Installing via a tarball involves obtaining a compressed archive file (tarball) from the software's official source or repository. After downloading the tarball, you would need to extract its contents and follow the installation instructions provided by the software developers. This method allows for manual installation and customization of the software.

Using the DNF package manager for installation is highly recommended due to its ability to handle dependencies and streamline the installation process, making it a preferred choice for many users.

Install NVIDIA Drivers ᴳᴾᵁ ᴼᴾᵀᴵᴼᴺ

If your system includes NVIDIA GPUs but the drivers are not installed, it is advisable to install them before proceeding with the suite installation.

See I for details.

Installing with DNF

Create a DNF repository depending on the edition (Enterprise, Free, or Open Source) and execution device (GPU or CPU) you will use.

Add the GPG-key to the newly added repository.

Use DNF to install the latest version of HEAVY.AI.

You can use the DNF package manager to list the available packages when installing a specific version of HEAVY.AI, such as when a multistep upgrade is necessary, or a specific version is needed for any other reason. sudodnf --showduplicateslistheavyai Select the version needed from the list (e.g. 7.0.0) and install using the command.

sudo

Installing with a Tarball

Let's begin by creating the installation directory.

Download the archive and install the latest version of the software. The appropriate archive is downloaded based on the edition (Enterprise, Free, or Open Source) and the device used for runtime.

Configuration

Follow these steps to configure your HEAVY.AI environment.

Set Environment Variables

For your convenience, you can update .bashrc with these environment variables

Although this step is optional, you will find references to the HEAVYAI_BASE and HEAVYAI_PATH variables. These variables contain the paths where configuration, license, and data files are stored and the location of the software installation. It is strongly recommended that you set them up.

Initialization

Run the script that will initialize the HEAVY.AI services and database storage located in the systemd folder.

Accept the default values provided or make changes as needed.

This step will take a few minutes if you are installing a CUDA-enabled version of the software because the shaders must be compiled.

The script creates a data directory in $HEAVYAI_BASE/storage (typically /var/lib/heavyai) with the directories catalogs, dataand log, which will contain the metadata, the data of the database tables, and the log files from Immerse's web server and the database. The log folder is particularly important for database administrators. It contains data about the system's health, performance, and user activities.

Activation

The first step to activate the system is starting HeavyDB and the Web Server service that Heavy Immerse needs.

Heavy Immerse is not available in the OS Edition.

Start the services and enable the automatic startup of the service at reboot and start the HEAVY.AI services.

Configure the Firewall ᴼᴾᵀᴵᴼᴺᴬᴸ

If a firewall is not already installed and you want to harden your system, install and start firewalld.

To use Heavy Immerse or other third-party tools, you must prepare your host machine to accept incoming HTTP(S) connections. Configure your firewall for external access:

Most cloud providers use a different mechanism for firewall configuration. The commands above might not run in cloud deployments.

For more information, see .

Licensing HEAVY.AI ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

If you are on Enterprise or Free Edition, you need to validate your HEAVY.AI instance with your license key. You can skip this section if you are using Open Source Edition.

Copy your license key from the registration email message. If you have not received your license key, contact your Sales Representative or register for your 30-day trial .
Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.
When prompted, paste your license key in the text box and click Apply.

The $HEAVYAI_BASE directory must be dedicated to HEAVYAI; do not set it to a directory shared by other packages.

Final Checks

To verify that everything is working, load some sample data, perform a heavysql query, and generate a Pointmap using Heavy Immerse.

Load Sample Data and Run a Simple Query

HEAVY.AI ships with two sample datasets of airline flight information collected in 2008, and a census of New York City trees. To install sample data, run the following command.

Connect to HeavyDB by entering the following command in a terminal on the host machine (default password is HyperInteractive):

anEnter a SQL query such as the following:

The results should be similar to the results below.

Create a Dashboard Using Heavy Immerse ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

After installing Enterprise or Free Edition, check if Heavy Immerse is running as intended.

Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.
Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

Create a new dashboard and a Scatter Plot to verify that backend rendering is working.

Click New Dashboard.
Click Add Chart.
Click SCATTER

¹ In the OS Edition, Heavy Immerse is unavailable.

² The OS Edition does not require a license key.

Connecting Using SAML

Security Assertion Markup Language (SAML) is used for exchanging authentication and authorization data between security domains. SAML uses security tokens containing assertions (statements that service providers use to make decisions about access control) to pass information about a principal (usually an end user) between a SAML authority, named an Identity Provider (IdP), and a SAML consumer, named a Service Provider (SP). SAML enables web-based, cross-domain, single sign-on (SSO), which helps reduce the administrative overhead of sending multiple authentication tokens to the user.

If you use SAML for authentication to HEAVY.AI, and SAML login fails, HEAVY.AI automatically falls back to log in using LDAP if it is configured.

If both SAML and LDAP authentication fail, you are authenticated against a locally stored password, but only if the allow-local-auth-fallback flag is set.

These instructions use as the IdP and HEAVY.AI as the SP in an SP-initiated workflow, similar to the following:

A user uses a login page to connect to HEAVY.AI.
The HEAVY.AI login page redirects the user to the Okta login page.
The user signs in using an Okta account. (This step is skipped if the user is already logged in to Okta.)

In addition to Okta, the following SAML providers are also supported:

Registering Your SAML Application in Okta

Begin by adding your SAML application in Okta. If you do not have an Okta account, you can sign up on the .

1) Log into your Okta account and click the Admin button.

2) From the Applications menu, select Applications.

3) Click the Add Application button.

4) On the Add Application screen, click Create New App.

5) On the Create a New Application Integration page, set the following details:

Platform: Web
Sign on Method: SAML 2.0
And then, click Create.

6) On the Create SAML Integration page, in the App name field, type Heavyai and click Next.

7) In the SAML Settings page, enter the following information:

Single sign on URL: Your Heavy Immerse web URL with the suffix saml-post; for example, . Select the Use this for Recipient URL and Destination URL checkbox.
Audience URI (SP Entity ID): Your Heavy Immerse web URL with the suffix saml-post.
Default RelayState

Leave other settings at their default values, or change as required for your specific installation.

After making your selections, click Next.

8) In the Help Okta Support... page, click I'm an Okta customer adding an internal app. All other questions on this page are optional.

After making your selections, click Finish.

Your application is now registered and displayed, and the Sign On tab is selected.

Configuring SAML for Your HEAVY.AI Application

Before configuring SAML, make sure that HTTPS is enabled on your web server.

On the Sign On tab, configure SAML settings for your application:

1) On the Settings page, click View Setup Instructions.

2) On the How to Configure SAML 2.0 for HEAVY.AI Application page, scroll to the bottom, copy the XML fragment in the Provide the following IDP metadata to your SP provider box, and save it as a raw text file called idp.xml.

3) Upload idp.xml to your HEAVY.AI server in $HEAVYAI_STORAGE.

4) Edit heavy.conf and add the following configuration parameters:

saml-metadata-file: Path to the idp.xml file you created.
saml-sp-target-url: Web URL to your Heavy Immerse saml-post endpoint.
saml-signed-assertion

5) On the How to Configure SAML 2.0 for HEAVY.AI Application page, copy the Identity Provider Single Sign-On URL, which looks similar to this:

6) If the servers.json file you identified in the [web] section of heavy.conf does not exist, create it. In servers.json, include the SAMLurl property, using the same value you copied in Identify Provider Single Sign-On URL. For example:

7) Restart the heavyai_server and heavyai_web_server services.

Auto-Creating Users with SAML

Users can be automatically created in HEAVY.AI based on group membership:

1) Go to the Application Configuration page for the HEAVY.AI application in Okta.

2) On the General tab, scroll to the SAML Settings section and click the Edit button.

3) Click the Next button, and then in the Group Attribute Statements section, set the following:

Name: Groups
Filter: Set to the desired filter type to determine the set of groups delivered to HEAVY.AI through the SAML response. In the text box next to the Filter type drop-down box, enter the text that defines the filter.
Click Next, and then click Finish.

Any group that requires access to HEAVY.AI must be created in HEAVY.AI before users can log in.

Modify your heavyai.conf file by adding the following parameter:
The heavyai.conf entries now look like this:
Restart the heavyai_server and heavyai_web_server processes.

Users whose group membership in Okta contains a group name that exists in HeavyDB can log in and have the privileges assigned to their groups.

Creating Users Manually

1) On the Okta website, on the Assignments tab, click Assign > Assign to People.

2) On the Assign HEAVY.AI to People panel, click the Assign button next to users that you want to provide access to HEAVY.AI.

3) Click Save and Go Back to assign HEAVY.AI to the user.

) Repeat steps 2 and 3 for all users to whom you want to grant access. Click Done when you are finished.

User accounts assigned to the HEAVY.AI application in Okta must exist in HEAVY.AI before a user can log in. To have users created automatically based on their group membership, see .

Verifying SAML Configuration

Verify that the SAML is configured correctly by opening your Heavy Immerse login page. You should be automatically redirected to the Okta login page, and then back to Immerse, without entering credentials.

When you log out of Immerse, you see the following screen:

Logging out of Immerse does not log you out of Okta. If you log back in to Immerse and are still logged in to Okta, you do not need to reathenticate.

If authentication fails, you see this error message when you attempt to log in through Okta:

To resolve the authentication error:

Add the license information by either:
- Adding heavyai.license to your HEAVY.AI data directory.
- Logging in to HeavyDB and run the following command:

The Information about authentication errors can be found in the log files.

SELECT * FROM TABLE(
  tf_geo_rasterize(
      raster => CURSOR(
        SELECT 
           x, y, z FROM table
      ),
      agg_type => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'>,
      /* fill_agg_type is optional */
      [<fill_agg_type> => <'AVG'|'COUNT'|'SUM'|'MIN'|'MAX'|'GAUSS_AVG'|'BOX_AVG'>,] 
      bin_dim_meters => <meters>, 
      geographic_coords => <true/false>, 
      neighborhood_fill_radius => <radius in bins>,
      fill_only_nulls => <true/false> [,
      <x_min> => <minimum output x-coordinate>,
      <x_max> => <maximum output x-coordinate>,
      <y_min> => <minimum output y-coordinate>,
      <y_max> => <maximum output y-coordinate>]
    ) 
  )...

/* Bin 10cm USGS LiDAR from Tallahassee to 1 meter, taking the minimum z-value
for each xy-bin. Then for each xy-bin, perform a Gaussian-average over the neighboring
100 xy-bins. This query yields the approximate terrain for an area after removing human-made
structures (due to the wide 100-bin Gaussian-average window), as can be seen in the 
right-hand render result in the screenshot below. Note that the LIMIT was only
applied to this SQL query and is not used in the rendered-screenshot below. */

SELECT
  x,
  y,
  z
FROM
  TABLE(
    tf_geo_rasterize(
      raster => CURSOR(
        SELECT
          ST_X(pt),
          ST_Y(pt),
          z
        FROM
          USGS_LPC_FL_LeonCo_2018_049377_N_LAS_2019
      ),
      bin_dim_meters => 1,
      geographic_coords => TRUE,
      neighborhood_fill_radius => 100,
      fill_only_nulls => FALSE,
      agg_type => 'MIN',
      fill_agg_type => 'GAUSS_AVG'
    )
  ) limit 20;
  
x|y|z
-84.29857764791747|30.40240526206634|-15.30264
-84.29086331121893|30.40264801040913|-17.25718
-84.29856722313815|30.40240526206634|-15.31047
-84.29855679835883|30.40240526206634|-15.31835
-84.29085288643959|30.40264801040913|-17.25859
-84.2985463735795|30.40240526206634|-15.32627
-84.30278925876371|30.402198476441|-17.09047
-84.29084246166028|30.40264801040913|-17.25993
-84.30277883398438|30.402198476441|-17.10194
-84.29853594880018|30.40240526206634|-15.33422
-84.30276840920506|30.402198476441|-17.11329
-84.29083203688096|30.40264801040913|-17.26122
-84.30275798442574|30.402198476441|-17.12446
-84.29852552402086|30.40240526206634|-15.34223
-84.30274755964642|30.402198476441|-17.1354
-84.29878614350392|30.40263002905041|-14.74146
-84.29119690415723|30.40236030866953|-17.22919
-84.30449892257258|30.40238728070761|-15.9867
-84.29328186002171|30.40223443915845|-17.63177
-84.29432433795395|30.40263901972977|-17.85748

apt list nvidia-driver-*
Listing... Done

nvidia-driver-450/bionic-updates,bionic-security 460.91.03-0ubuntu0.18.04.1 amd64
nvidia-driver-450-server/bionic-updates,bionic-security 450.172.01-0ubuntu0.18.04.1 amd64
nvidia-driver-455/bionic-updates,bionic-security 460.91.03-0ubuntu0.18.04.1 amd64
nvidia-driver-460/bionic-updates,bionic-security 470.103.01-0ubuntu0.18.04.1 amd64
nvidia-driver-465/bionic-updates,bionic-security 470.103.01-0ubuntu0.18.04.1 amd64
nvidia-driver-470/bionic-updates,bionic-security 470.103.01-0ubuntu0.18.04.1 amd64
nvidia-driver-470-server/bionic-updates,bionic-security 470.103.01-0ubuntu0.18.04.1 amd64
nvidia-driver-495/bionic-updates,bionic-security 510.60.02-0ubuntu0.18.04.1 amd64
nvidia-driver-510/bionic-updates,bionic-security 510.60.02-0ubuntu0.18.04.1 amd64
nvidia-driver-510-server/bionic-updates,bionic-security 510.47.03-0ubuntu0.18.04.1 amd64

apt list cuda-toolkit-* | grep -v config

Listing...
cuda-toolkit-10-0/unknown 10.0.130-1 amd64
cuda-toolkit-10-1/unknown 10.1.243-1 amd64
cuda-toolkit-10-2/unknown 10.2.89-1 amd64
cuda-toolkit-11-0/unknown 11.0.3-1 amd64
cuda-toolkit-11-1/unknown 11.1.1-1 amd64
cuda-toolkit-11-2/unknown 11.2.2-1 amd64
cuda-toolkit-11-3/unknown 11.3.1-1 amd64
cuda-toolkit-11-4/unknown 11.4.4-1 amd64
cuda-toolkit-11-5/unknown 11.5.2-1 amd64
cuda-toolkit-11-6/unknown 11.6.2-1 amd64
cuda-toolkit-11-7/unknown 11.7.0-1 amd64

/usr/local/cuda/bin/nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

select origin_state, carrier_name, n 
   from (select origin_state, carrier_name, row_number() over(
      partition by origin_state order by n desc) as rownum, n 
         from (select origin_state, carrier_name, count(*) as n 
            from flights_2008_7M where extract(year 
               from dep_timestamp) = 2008 
   group by origin_state, carrier_name )) where rownum = 1

query:
  |   WITH withItem [ , withItem ]* query
  |   {
          select
      }
      [ ORDER BY orderItem [, orderItem ]* ]
      [ LIMIT [ start, ] { count | ALL } ]
      [ OFFSET start { ROW | ROWS } ]

withItem:
      name
      [ '(' column [, column ]* ')' ]
      AS '(' query ')'

orderItem:
      expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ]

select:
      SELECT [ DISTINCT ] [/*+ hints */]
          { * | projectItem [, projectItem ]* }    
      FROM tableExpression
      [ WHERE booleanExpression ]
      [ GROUP BY { groupItem [, groupItem ]* } ]
      [ HAVING booleanExpression ]
      [ WINDOW window_name AS ( window_definition ) [, ...] ]

projectItem:
      expression [ [ AS ] columnAlias ]
  |   tableAlias . *

tableExpression:
      tableReference [, tableReference ]*
  |   tableExpression [ ( LEFT ) [ OUTER ] ] JOIN tableExpression [ joinCondition ]

joinCondition:
      ON booleanExpression
  |   USING '(' column [, column ]* ')'

tableReference:
      tablePrimary
      [ [ AS ] alias ]

tablePrimary:
      [ catalogName . ] tableName
  |   '(' query ')'

groupItem:
      expression
  |   '(' expression [, expression ]* ')'

GAUSS_AVG

N

, and NTH_VALUE functions.

 [
  {
    "enableJupyter": true,
     "url": "tonysingle.com",
     "port": "6273",
    "SAMLurl":"https://heavyai-tony.okta.com/app/heavyaiorg969324_heavyai_2/exk1p0m4blWiBsFiU357/sso/saml"
  }
]

dnf

install

heavyai-7.0.0_20230501_be4f51b048-1.x86_64

HEAVY.AI Installation on Ubuntu

This is an end-to-end recipe for installing HEAVY.AI on a Ubuntu 22.04 machine using CPU and GPU devices.

The order of these instructions is significant. To avoid problems, install each component in the order presented.

Assumptions

These instructions assume the following:

You are installing on a “clean” Ubuntu 22.04 host machine with only the operating system installed.
Your HEAVY.AI host only runs the daemons and services required to support HEAVY.AI.
Your HEAVY.AI host is connected to the Internet.

Preparation

Prepare your Ubuntu machine by updating your system, creating the HEAVY.AI user (named heavyai), installing kernel headers, installing CUDA drivers, and optionally enabling the firewall.

Update and Reboot

Update the entire system:

2. Install the utilities needed to create Heavy.ai repositories and download archives:

3. Install the headless JDK and the utility apt-transport-https:

4. Reboot to activate the latest kernel:

Create the HEAVY.AI User

Create a group called heavyai and a user named heavyai, who will be the owner of the HEAVY.AI software and data on the filesystem.

Create the group, user, and home directory using the useradd command with the --user-group and --create-home switches.

2. Set a password for the user:

3. Log in with the newly created user:

Installation

Install the HEAVY.AI using APT and a tarball.

The installation using the APT package manager is recommended to those who want a more automated install and upgrade procedure.

Install NVIDIA Drivers ᴳᴾᵁ ᴼᴾᵀᴵᴼᴺ

If your system uses NVIDIA GPUs, but the drivers not installed, install them now. See for details.

Installing with APT

Download and add a GPG key to APT.

Add a source apt depending on the edition (Enterprise, Free, or Open Source) and execution device (GPU or CPU) you are going to use.

Use apt to install the latest version of HEAVY.AI.

If you need to install a specific version of HEAVY.AI, because you are upgrading from Omnisci or for different reasons, you must run the following command:

Installing with a Tarball

First create the installation directory.

Download the archive and install the software. A different archive is downloaded depending on the Edition (Enterprise, Free, or Open Source) and the device used for runtime (GPU or CPU).

Configuration

Follow these steps to prepare your HEAVY.AI environment.

Set Environment Variables

For convenience, you can update .bashrc with these environment variables

Although this step is optional, you will find references to the HEAVYAI_BASE and HEAVYAI_PATH variables. These variables contain respectively the paths where configuration, license, and data files are stored and where the software is installed. Setting them is strongly recommended.

Initialization

Run the systemd installer to create heavyai services, a minimal config file, and initialize the data storage.

Accept the default values provided or make changes as needed.

The script creates a data directory in $HEAVYAI_BASE/storage (default /var/lib/heavyai/storage) with the directories catalogs, data, export and log.The import directory is created when you insert data the first time. If you are HEAVY.AI administrator, the log directory is of particular interest.

Activation

Start and use HeavyDB and Heavy Immerse.

Heavy Immerse is not available in the OSS Edition, so if running the OSS Edition the systemctl command using the heavy_web_server has no effect.

Enable the automatic startup of the service at reboot and start the HEAVY.AI services.

Configure Firewall ᴼᴾᵀᴵᴼᴺᴬᴸ

If a firewall is not already installed and you want to harden your system, install theufw.

To use Heavy Immerse or other third-party tools, you must prepare your host machine to accept incoming HTTP(S) connections. Configure your firewall for external access.

Most cloud providers use a different mechanism for firewall configuration. The commands above might not run in cloud deployments.

For more information, see .

Licensing HEAVY.AI ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

If you are using Enterprise or Free Edition, you need to validate your HEAVY.AI instance with your license key.

Skip this section if you are on Open Source Edition

Copy your license key of Enterprise or Free Edition from the registration email message. If you do not have a license and you want to evaluate HEAVI.AI in an unlimited
enterprise environment, contact your Sales Representative or register for your 30-day trial of Enterprise Edition . If you need a Free License you can get one .
Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.

Final Checks

To verify that everything is working, load some sample data, perform a heavysql query, and generate a Pointmap using Heavy Immerse

Load Sample Data and Run a Simple Query

HEAVY.AI ships with two sample datasets of airline flight information collected in 2008, and a census of New York City trees. To install sample data, run the following command.

Connect to HeavyDB by entering the following command in a terminal on the host machine (default password is HyperInteractive):

Enter a SQL query such as the following

The results should be similar to the results below.

Create a Dashboard Using Heavy Immerse ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

After installing Enterprise or Free Edition, check if Heavy Immerse is running as intended.

Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.
Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

Create a new dashboard and a Scatter Plot to verify that backend rendering is working.

Click New Dashboard.
Click Add Chart.

¹ In the OS Edition, Heavy Immerse is unavailable.

² The OS Edition does not require a license key.

echo "# HEAVY.AI variable and paths
export HEAVYAI_PATH=/opt/heavyai
export HEAVYAI_BASE=/var/lib/heavyai
export HEAVYAI_LOG=\$HEAVYAI_BASE/storage/log
export PATH=\$HEAVYAI_PATH/bin:$PATH" \
>> ~/.bashrc
source ~/.bashrc

#     Enter dataset number to download, or 'q' to quit:
Dataset           Rows    Table Name          File Name
1)    Flights (2008)    7M      flights_2008_7M     flights_2008_7M.tar.gz
2)    Flights (2008)    10k     flights_2008_10k    flights_2008_10k.tar.gz
3)    NYC Tree Census (2015)    683k    nyc_trees_2015_683k    nyc_trees_2015_683k.tar.gz

Origin|Destination|Average Airtime
Austin|Houston|33.055556
Norfolk|Baltimore|36.071429
Ft. Myers|Orlando|28.666667
Orlando|Ft. Myers|32.583333
Houston|Austin|29.611111
Baltimore|Norfolk|31.714286

CurveP256

Fixes an issue where the root user could be deleted in certain cases.
Fixes an issue where staging directories for S3 import could remain when imports failed.
Fixes a crash that could occur when accessing the "tables" system table on instances containing tables with many columns.
Fixes a crash that could occur when accessing CSV and regex parsed file foreign tables that previously errored out during cache recovery.
Fixes an issue where dumping table foreign tables would produce an empty table.
Fixes an intermittent crash that could occur when accessing CSV and regex parsed file foreign tables that are backed by large files.
Fixes a "Ran out of slots in the query output buffer" exception that could occur when using stale cached cardinality values.
Fixes an issue where user defined table functions are erroneously categorized as ambiguous.
Fixes an error that could occur when a group by clause includes an alias that matches a column name.
Fixes a crash that could occur on GPUs with the Pascal architecture when executing join queries with case expression projections.
Fixes a crash that could occur when using the LAG_IN_FRAME window function.
Fixes a crash that could occur when projecting geospatial columns from the tf_raster_contour_polygons table function.
Fixes an issue that could occur when calling window functions on encoded date columns.
Fixes a crash that could occur when the coalesce function is called with geospatial or array columns.
Fixes a crash that could occur when projecting case expressions with geospatial or array columns.
Fixes a crash that could occur due to rounding error when using the WIDTH_BUCKET function.
Fixes a crash that could occur in certain cases where left join queries are executed on GPU.
Fixes a crash that could occur for queries with joins on encoded date columns.
Fixes a crash that could occur when using the SAMPLE function on a geospatial column.
Fixes a crash that could occur for table functions with cursor arguments that specify no field type.
Fixes an issue where automatic casting does not work correctly for table function calls with ColumnList input arguments.
Fixes an issue where table function argument types are not correctly inferred when arithmetic operations are applied.
Fixes an intermittent crash that could occur for join queries due to a race condition when changing hash table layouts.
Fixes an out of CPU memory error that could occur when executing a query with a count distinct function call on a high cardinality column.
Fixes a crash that could occur when running a HeavyDB instance in read-only mode after previously executing write queries on tables.
Fixes an issue where the auto-vacuuming process does not immediately evict chunks that were pulled in for vacuuming.
Fixes a crash that could occur in certain cases when HeavyConnect is used with Parquet files containing null string values.
Fixes potentially inaccurate calculation of vertical attenuation from antenna patterns in HeavyRF.

CURSOR

HEAVY.AI Installation using Docker on Ubuntu

Follow these steps to install HEAVY.AI as a Docker container on a machine running with on CPU or with supported NVIDIA GPU cards using Ubuntu as the host OS.

Preparation

Prepare your host by installing Docker and if needed for your configuration NVIDIA drivers and NVIDIA runtime.

Install Docker

Remove any existing Docker Installs and if on GPU the legacy NVIDIA docker runtime.

Add Docker's GPG key using curl and ca-certificates

Add Docker to your Apt repository.

Update your repository.

Install Docker, the command line interface, and the container runtime.

Run the following usermod command so that Docker command execution does not require sudo privilege. Log out and log back in for the changes to take effect. (recommended)

Verify your Docker installation.

For more information on Docker installation, see the .

Install NVIDIA Drivers and NVIDIA Container ᴳᴾᵁ ᴼᴾᵀᴵᴼᴺ

Install NVIDIA Drivers

Install NVIDIA driver and Cuda Toolkit using

Install NVIDIA Docker Runtime

Use curl to add Nvidia's Gpg key:

Update your sources list:

Update apt-get and install nvidia-container-runtime:

Edit /etc/docker/daemon.json to add the following, and save the changes:

Restart the Docker daemon:

Check NVIDIA Drivers

Verify that docker and NVIDIA runtime work together.

If everything is working you should get the output of nvidia-smi command showing the installed GPUs in the system.

HEAVY.AI Installation

Create a directory to store data and configuration files

Then a minimal configuration file for the docker installation

Ensure that you have sufficient storage on the drive you choose for your storage dir running this command

Optional: Download HEAVY.AI from Release Website

The subsequent section will download and install an image using DockerHub. However, if you wish to avoid pulling from DockerHub and instead download and prepare a specific image, follow the instructions in this section. To Download a specific version, visit one of the following websites, choose the version that you wish to install, right click and select "COPY URL".

Enterprise/Free Editions:

Open Source Editions:

Use files ending in -render-docker.tar.gz to install GPU edition and -cpu-docker.tar.gz to install CPU editions.

Then, on the server where you wish to install HEAVY.AI, run the following command (Replacing $DOWNLOAD_URL with the URL from your clipboard).

wget $DOWNLOAD_URL

Await successful download and run ls | grep heavy to see the filename of the package you just downloaded. Copy the filename to your clipboard, and then run the next command replacing $DOWNLOADED_FILENAME with the contents of your clipboard.

docker load < $DOWNLOADED_FILENAME

The command will return a Docker image name. Replace heavyai/heavyai-(...):latest with the image you just loaded.

Download HEAVY.AI from DockerHub and Start HEAVY.AI in Docker.

Select the tab depending on the Edition (Enterprise, Free, or Open Source) and execution Device (GPU or CPU) you are going to use.

Replace ":latest" with ":vX.Y.Z" to pull a specific docker version. (Eg: heavyai-ee-cuda:v8.0.1)

Check that the docker is up and running a docker ps commnd:

You should see an output similar to the following.

See also the note regarding the in Optimizing Performance.

Configure Firewall ᴼᴾᵀᴵᴼᴺᴬᴸ

If a firewall is not already installed and you want to harden your system, install theufw.

To use Heavy Immerse or other third-party tools, you must prepare your host machine to accept incoming HTTP(S) connections. Configure your firewall for external access.

Most cloud providers use a different mechanism for firewall configuration. The commands above might not run in cloud deployments.

For more information, see .

Licensing HEAVY.AI ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

If you are on Enterprise or Free Edition, you need to validate your HEAVY.AI instance using your license key. You must skip this section if you are on Open Source Edition

Copy your license key of Enterprise or Free Edition from the registration email message. If you don't have a license and you want to evaluate HEAVY.AI in an enterprise environment, contact your Sales Representative or register for your 30-day trial of Enterprise Edition . If you need a Free License you can get one .
Connect to Heavy Immerse using a web browser to your host on port 6273. For example, http://heavyai.mycompany.com:6273.

Command-Line Access

You can access the command line in the Docker image to perform configuration and run HEAVY.AI utilities.

You need to know the container-id to access the command line. Use the command below to list the running containers.

You see output similar to the following.

Once you have your container ID, in the example 9e01e520c30c, you can access the command line using the Docker exec command. For example, here is the command to start a Bash session in the Docker instance listed above. The -it switch makes the session interactive.

You can end the Bash session with the exit command.

Final Checks

To verify that everything is working, load some sample data, perform a heavysql query, and generate a Scatter Plot or a Bubble Chart using Heavy Immerse

Load Sample Data and Run a Simple Query

HEAVY.AI ships with two sample datasets of airline flight information collected in 2008, and a census of New York City trees. To install sample data, run the following command.

Where <container-id> is the container in which HEAVY.AI is running.

When prompted, choose whether to insert dataset 1 (7,000,000 rows), dataset 2 (10,000 rows), or dataset 3 (683,000 rows). The examples below use dataset 2.

Connect to HeavyDB by entering the following command (a password willò be asked; the default password is HyperInteractive):

Enter a SQL query such as the following:

The results should be similar to the results below.

Create a Dashboard Using Heavy Immerse ᵉᵉ⁻ᶠʳᵉᵉ ᵒⁿˡʸ

Installing Enterprise or Free Edition, check if Heavy Immerse is running as intended.

Connect to Heavy Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.
Log into Heavy Immerse by entering the default username (admin) and password (HyperInteractive), and then click Connect.

Create a new dashboard and a Scatter Plot to verify that backend rendering is working.

Click New Dashboard.
Click Add Chart.

¹ In the OS Edition, Heavy Immerse Service is unavailable.

² The OS Edition does not require a license key.

echo "# HEAVY.AI variable and paths
export HEAVYAI_PATH=/opt/heavyai
export HEAVYAI_BASE=/var/lib/heavyai
export HEAVYAI_LOG=$HEAVYAI_BASE/storage/log
export PATH=$HEAVYAI_PATH/bin:$PATH" \
>> ~/.bashrc
source ~/.bashrc

#     Enter dataset number to download, or 'q' to quit:
Dataset           Rows    Table Name          File Name
1)    Flights (2008)    7M      flights_2008_7M     flights_2008_7M.tar.gz
2)    Flights (2008)    10k     flights_2008_10k    flights_2008_10k.tar.gz
3)    NYC Tree Census (2015)    683k    nyc_trees_2015_683k    nyc_trees_2015_683k.tar.gz

Origin|Destination|Average Airtime
Austin|Houston|33.055556
Norfolk|Baltimore|36.071429
Ft. Myers|Orlando|28.666667
Orlando|Ft. Myers|32.583333
Houston|Austin|29.611111
Baltimore|Norfolk|31.714286

> cat geo.csv
"p", "l", "poly"
"POINT(1 1)", "LINESTRING( 2 0,  2  2)", "POLYGON(( 1 0,  0 1, 1 1 ))"
"POINT(2 2)", "LINESTRING( 4 0,  4  4)", "POLYGON(( 2 0,  0 2, 2 2 ))"
"POINT(3 3)", "LINESTRING( 6 0,  6  6)", "POLYGON(( 3 0,  0 3, 3 3 ))"
"POINT(4 4)", "LINESTRING( 8 0,  8  8)", "POLYGON(( 4 0,  0 4, 4 4 ))"
heavysql> COPY geo FROM 'geo.csv';
Result
Loaded: 4 recs, Rejected: 0 recs in 0.356000 secs

> cat geo1.csv
"p", "l", "poly"
POINT(5 5); LINESTRING(10 0, 10 10); POLYGON(( 5 0, 0 5, 5 5 ))
heavysql> COPY geo FROM 'geo1.csv' WITH (delimiter=';', quoted='false');
Result
Loaded: 1 recs, Rejected: 0 recs in 0.148000 secs

$ unzip -l states.zip
Archive:  states.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2018-02-13 11:09   states/
   446116  2017-11-06 12:15   states/cb_2014_us_state_20m.shp
     8434  2017-11-06 12:15   states/cb_2014_us_state_20m.dbf
        9  2017-11-06 12:15   states/cb_2014_us_state_20m.cpg
      165  2017-11-06 12:15   states/cb_2014_us_state_20m.prj
      516  2017-11-06 12:15   states/cb_2014_us_state_20m.shx
---------                     -------
   491525                     6 files

heavysql> COPY states FROM 'states.zip' with (geo='true');
heavysql> COPY zipcodes FROM 'zipcodes.geojson.gz' with (geo='true');
heavysql> COPY zipcodes FROM 'zipcodes.geojson.zip' with (geo='true');
heavysql> COPY cell_towers FROM 'cell_towers.kml.gz' with (geo='true');

heavysql> COPY states FROM 's3://mybucket/myfolder/states.shp' with (geo='true');
heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' with (geo='true');
heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.gz' with (geo='true');
heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.zip' with (geo='true');

heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' WITH (geo='true', s3_region='us-west-1', s3_access_key='********************', s3_secret_key='****************************************');

SELECT * FROM (TABLE(my_table_function /* This is only an example! */ (
 CURSOR(SELECT arg1, arg2, arg3 FROM input_1 WHERE x > 10) /* First CURSOR 
 argument consisting of 3 columns */,
 CURSOR(SELECT arg1, AVG(arg2) FROM input_2 GROUP BY arg1 where y < 40) 
 /* Second CURSOR argument constisting of 2 columns. This could be from the same
 table as the first CURSOR, or as is the case here, a completely different table
 (or even joined table or logical value expression) */,
 'Fred' /* TEXT constant literal argument */,
 true /* BOOLEAN constant literal argument */,
 (SELECT COUNT(*) FROM another_table), /* scalar subquery results do not need
 to be wrapped in a CURSOR */,
 27.3 /* FLOAT constant literal argument */))
WHERE output1 BETWEEN 32.2 AND 81.8;

/* The following two table function calls, the first with unnamed
 signature-ordered arguments, and the second with named arguments,
 are equivalent */

select
  *
from
  table(
    tf_compute_dwell_times(
      /* Without the use of named arguments, input arguments must
      be ordered as specified by the table function signature */
      cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      3,
      600,
      10800
    )
  )
order by
  num_dwell_points desc
limit
  10;


select
  *
from
  table(
    tf_compute_dwell_times(
     /* Using named arguments, input arguments can be
     ordered in any order, as long as all arguments are named */
     min_dwell_seconds => 600,
     max_inactive_seconds => 10800
      data => cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      min_dwell_points => 3
    )
  )
order by
  num_dwell_points desc
limit
  10;

SELECT
  *
FROM
  TABLE(
    my_spatial_table_function(
      CURSOR(
        SELECT
          x,
          y
        from
          spatial_data_table
          /* Presuming filter push down is enabled for 
          my_spatial_table_function, the filter applied to 
          x and y will be applied here to the table function
          input CURSOR */
      )
    )
  )
WHERE
  x BETWEEN 38.2
  AND 39.1
  and Y BETWEEN -121.4
  and -120.1;

SHOW TABLE FUNCTIONS DETAILS <table_function_name>;

name|signature|input_names|input_types|output_names|output_types|CPU|GPU|Runtime|filter_table_transpose
generate_series|(i64 series_start, i64 series_stop, i64 series_step) -> Column<i64>|[series_start, series_stop, series_step]|[i64, i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false
generate_series|(i64 series_start, i64 series_stop) -> Column<i64>|[series_start, series_stop]|[i64, i64]|[generate_series]|[Column<i64>]|true|false|false|false

ABS
ACCESS
ADD
ALL
ALLOCATE
ALLOW
ALTER
AMMSC
AND
ANY
ARCHIVE
ARE
ARRAY_MAX_CARDINALITY
ARRAY
AS
ASC
ASENSITIVE
ASYMMETRIC
AT
ATOMIC
AUTHORIZATION
AVG
BEGIN
BEGIN_FRAME
BEGIN_PARTITION
BETWEEN
BIGINT
BINARY
BIT
BLOB
BOOLEAN
BOTH
BY
CALL
CALLED
CARDINALITY
CASCADED
CASE
CAST
CEIL
CEILING
CHAR
CHARACTER
CHARACTER_LENGTH
CHAR_LENGTH
CHECK
CLASSIFIER
CLOB
CLOSE
COALESCE
COLLATE
COLLECT
COLUMN
COMMIT
CONDITION
CONNECT
CONSTRAINT
CONTAINS
CONTINUE
CONVERT
COPY
CORR
CORRESPONDING
COUNT
COVAR_POP
COVAR_SAMP
CREATE
CROSS
CUBE
CUME_DIST
CURRENT
CURRENT_CATALOG
CURRENT_DATE
CURRENT_DEFAULT_TRANSFORM_GROUP
CURRENT_PATH
CURRENT_ROLE
CURRENT_ROW
CURRENT_SCHEMA
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_TRANSFORM_GROUP_FOR_TYPE
CURRENT_USER
CURSOR
CYCLE
DASHBOARD
DATABASE
DATE
DATE_TRUNC
DATETIME
DAY
DEALLOCATE
DEC
DECIMAL
DECLARE
DEFAULT
DEFINE
DELETE
DENSE_RANK
DEREF
DESC
DESCRIBE
DETERMINISTIC
DISALLOW
DISCONNECT
DISTINCT
DOUBLE
DROP
DUMP
DYNAMIC
EACH
EDIT
EDITOR
ELEMENT
ELSE
EMPTY
END
END-EXEC
END_FRAME
END_PARTITION
EQUALS
ESCAPE
EVERY
EXCEPT
EXEC
EXECUTE
EXISTS
EXP
EXPLAIN
EXTEND
EXTERNAL
EXTRACT
FALSE
FETCH
FILTER
FIRST
FIRST_VALUE
FLOAT
FLOOR
FOR
FOREIGN
FOUND
FRAME_ROW
FREE
FROM
FULL
FUNCTION
FUSION
GEOGRAPHY 
GEOMETRY 
GET
GLOBAL
GRANT
GROUP
GROUPING
GROUPS
HAVING
HOLD
HOUR
IDENTITY
IF
ILIKE
IMPORT
IN
INDICATOR
INITIAL
INNER
INOUT
INSENSITIVE
INSERT
INT
INTEGER
INTERSECT
INTERSECTION
INTERVAL
INTO
IS
JOIN
LAG
LANGUAGE
LARGE
LAST_VALUE
LAST
LATERAL
LEAD
LEADING
LEFT
LENGTH
LIKE
LIKE_REGEX
LIMIT
LINESTRING 
LN
LOCAL
LOCALTIME
LOCALTIMESTAMP
LOWER
MATCH
MATCH_NUMBER
MATCH_RECOGNIZE
MATCHES
MAX
MEASURES
MEMBER
MERGE
METHOD
MIN
MINUS
MINUTE
MOD
MODIFIES
MODULE
MONTH
MULTIPOLYGON 
MULTISET
NATIONAL
NATURAL
NCHAR
NCLOB
NEW
NEXT
NO
NONE
NORMALIZE
NOT
NOW
NTH_VALUE
NTILE
NULL
NULLIF
NULLX
NUMERIC
OCCURRENCES_REGEX
OCTET_LENGTH
OF
OFFSET
OLD
OMIT
ON
ONE
ONLY
OPEN
OPTIMIZE
OPTION
OR
ORDER
OUT
OUTER
OVER
OVERLAPS
OVERLAY
PARAMETER
PARTITION
PATTERN
PER
PERCENT
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
PERIOD
PERMUTE
POINT 
POLYGON 
PORTION
POSITION
POSITION_REGEX
POWER
PRECEDES
PRECISION
PREPARE
PREV
PRIMARY
PRIVILEGES
PROCEDURE
PUBLIC
RANGE
RANK
READS
REAL
RECURSIVE
REF
REFERENCES
REFERENCING
REGR_AVGX
REGR_AVGY
REGR_COUNT
REGR_INTERCEPT
REGR_R2
REGR_SLOPE
REGR_SXX
REGR_SXY
REGR_SYY
RELEASE
RENAME
RESET
RESULT
RESTORE
RETURN
RETURNS
REVOKE
RIGHT
ROLE 
ROLLBACK
ROLLUP
ROW
ROW_NUMBER
ROWS
ROWID 
RUNNING
SAVEPOINT
SCHEMA
SCOPE
SCROLL
SEARCH
SECOND
SEEK
SELECT
SENSITIVE
SESSION_USER
SET
SHOW
SIMILAR
SKIP
SMALLINT
SOME
SPECIFIC
SPECIFICTYPE
SQL
SQLEXCEPTION
SQLSTATE
SQLWARNING
SQRT
START
STATIC
STDDEV_POP
STDDEV_SAMP
STREAM
SUBMULTISET
SUBSET
SUBSTRING
SUBSTRING_REGEX
SUCCEEDS
SUM
SYMMETRIC
SYSTEM
SYSTEM_TIME
SYSTEM_USER
TABLE
TABLESAMPLE
TEMPORARY
TEXT
THEN
TIME
TIMESTAMP
TIMEZONE_HOUR
TIMEZONE_MINUTE
TINYINT
TO
TRAILING
TRANSLATE
TRANSLATE_REGEX
TRANSLATION
TREAT
TRIGGER
TRIM
TRIM_ARRAY
TRUE
TRUNCATE
UESCAPE
UNION
UNIQUE
UNKNOWN
UNNEST
UPDATE
UPPER
UPSERT
USER
USING
VALUE
VALUE_OF
VALUES
VARBINARY
VARCHAR
VAR_POP
VAR_SAMP
VARYING
VERSIONING
VIEW
WHEN
WHENEVER
WHERE
WIDTH_BUCKET
WINDOW
WITH
WITHIN
WITHOUT
WORK
YEAR

SHOW

Use SHOW commands to get information about databases, tables, and user sessions.

SHOW CREATE SERVER

Shows the CREATE SERVER statement that could have been used to create the server.

Syntax

Example

SHOW CREATE TABLE

Shows the CREATE TABLE statement that could have been used to create the table.

Syntax

Example

SHOW DATABASES

Retrieve the databases accessible for the current user, showing the database name and owner.

Example

SHOW FUNCTIONS

Show registered compile-time UDFs and extension functions in the system and their arguments.

Syntax

Example

SHOW POLICIES

Displays a list of all row-level security (RLS) policies that exist for a user or role; admin rights are required. If EFFECTIVE is used, the list also includes any policies that exist for all roles that apply to the requested user or role.

Syntax

SHOW QUERIES

Returns a list of queued queries in the system; information includes session ID, status, query string, account login name, client address, database name, and device type (CPU or GPU).

Example

Admin users can see and interrupt all queries, and non-admin users can see and interrupt only their own queries

NOTE: SHOW QUERIES is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt) is set.

To interrupt a query in the queue, see .

SHOW ROLES

If included with a name, lists the role granted directly to a user or role. SHOW EFFECTIVE ROLES with a name lists the roles directly granted to a user or role, and also lists the roles indirectly inherited through the directly granted roles.

Syntax

If the user name or role name is omitted, then a regular user sees their own roles, and a superuser sees a list of all roles existing in the system.

SHOW RUNTIME FUNCTIONS

Show user-defined runtime functions and table functions.

Syntax

SHOW SUPPORTED DATA SOURCES

Show data connectors.

Syntax

SHOW TABLE DETAILS

Displays storage-related information for a table, such as the table ID/name, number of data/metadata files used by the table, total size of data/metadata files, and table epoch values.

You can see table details for all tables that you have access to in the current database, or for only those tables you specify.

Syntax

Examples

Show details for all tables you have access to:

Show details for table omnisci_states:

The number of columns returned includes system columns. As a result, the number of columns in column_count can be up to two greater than the number of columns created by the user.

SHOW TABLE FUNCTIONS

Displays the list of available system (built-in) table functions.

For more information, see .

SHOW TABLE FUNCTIONS DETAILS

Show detailed output information for the specified table function. Output details vary depending on the table function specified.

Syntax

Example - generate_series

View SHOW output for the generate_series table function:

Output Header

Output Details

SHOW SERVERS

Retrieve the servers accessible for the current user.

Example

SHOW TABLES

Retrieve the tables accessible for the current user.

Example

SHOW USER DETAILS

Lists name, ID, and default database for all or specified users for the current database. If the command is issued by a superuser, login permission status is also shown. Only superusers see users who do not have permission to log in.

Example

SHOW [ALL] USER DETAILS lists name, ID, superuser status, default database, and login permission status for all users across the HeavyDB instance. This variant of the command is available only to superusers. Regular users who run the SHOW ALL USER DETAILS command receive an error message.

Superuser Output

Show all user details for all users:

Show all user details for specified users ue, ud, ua, and uf:

If a specified user is not found, the superuser sees an error message:

Show user details for specified users ue, ud, and uf:

Show user details for all users:

Non-Superuser Output

Running SHOW ALL USER DETAILS results in an error message:

Show user details for all users:

If a specified user is not found, the user sees an error message:

Show user details for user ua:

SHOW USER SESSIONS

Retrieve all persisted user sessions, showing the session ID, user login name, client address, and database name. Admin or superuser privileges required.

KILL QUERY

Interrupt a queued query. Specify the query by using its session ID.

To see the queries in the queue, use the command:

To interrupt the last query in the list (ID 946-ooNP):

Showing the queries again indicates that 946-ooNP has been deleted:

KILL QUERY is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt) is set.
Interrupting a query in ‘PENDING_QUEUE’ status is supported in both distributed and single-server mode.

sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl --silent --location https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

if test -d /var/lib/heavyai; then echo "There is $(df -kh /var/lib/heavyai --output="avail" | sed 1d) avaibale space in you storage dir"; else echo "There was a problem with the creation of storage dir";  fi;

CONTAINER ID        IMAGE                     COMMAND                     CREATED             STATUS              PORTS                                            NAMES
9e01e520c30c        heavyai/heavyai-ee-gpu    "/bin/sh -c '/heavyai..."   50 seconds ago      Up 48 seconds ago   0.0.0.0:6273-6280->6273-6280/tcp                 confident_neumann

Enter dataset number to download, or 'q' to quit:
#     Dataset                   Rows    Table Name             File Name
1)    Flights (2008)            7M      flights_2008_7M        flights_2008_7M.tar.gz
2)    Flights (2008)            10k     flights_2008_10k       flights_2008_10k.tar.gz
3)    NYC Tree Census (2015)    683k    nyc_trees_2015_683k    nyc_trees_2015_683k.tar.gz

Origin|Destination|Average Airtime
West Palm Beach|Tampa|33.8
Norfolk|Baltimore|36.1
Ft. Myers|Orlando|28.7
Indianapolis|Chicago|39.5
Tampa|West Palm Beach|33.3
Orlando|Ft. Myers|32.6
Austin|Houston|33.1
Chicago|Indianapolis|32.7
Baltimore|Norfolk|31.7
Houston|Austin|29.6

cat /var/lib/heavyai/omnisci.conf | \
sed "s/^\(data.*=.*\)/#\1\\ndata = \"\/var\/lib\/heavyai\/storage\"/" | \
sed "s/^\(frontend.*=.*\)/#\1\\nfrontend = \"\/opt\/heavyai\/frontend\"/" 
>/var/lib/heavyai/heavy.conf

total 32
drwxr-xr-x  8 heavyai heavyai 4096 lug 15 16:03 .
drwxr-xr-x  4 heavyai heavyai 4096 lug 15 16:02 ..
drwxrwxr-x  2 heavyai heavyai 4096 lug 15 16:03 catalogs
drwxr-xr-x  2 heavyai heavyai 4096 lug 15 15:54 mapd_catalogs
drwxr-xr-x 52 heavyai heavyai 4096 lug 15 15:54 mapd_data
drwxr-xr-x  2 heavyai heavyai 4096 lug 15 15:54 mapd_export
drwxr-xr-x  2 heavyai heavyai 4096 lug 15 15:54 mapd_log
drwxr-xr-x  2 heavyai heavyai 4096 lug 15 15:54 omnisci_disk_cache
-rw-r--r--  1 heavyai heavyai 1229 lug 15 16:07 omnisci-licence

cat /var/lib/heavyai/omnisci.conf | \
sed "s/^\(data.*=.*\)/#\1\\ndata = \"\/var\/lib\/heavyai\/storage\"/" | \
sed "s/^\(frontend.*=.*\)/#\1\\nfrontend = \"\/opt\/heavyai\/frontend\"/" \
>/var/lib/heavyai/heavy.conf

SHOW CREATE TABLE heavyai_states;
CREATE TABLE heavyai_states (
 id TEXT ENCODING DICT(32),
 abbr TEXT ENCODING DICT(32),
 name TEXT ENCODING DICT(32),
 omnisci_geo GEOMETRY(MULTIPOLYGON, 4326
) NOT NULL);

SHOW FUNCTIONS
Scalar UDF
distance_point_line
ST_DWithin_Polygon_Polygon
ST_Distance_Point_ClosedLineString
Truncate
ct_device_selection_udf_any
area_triangle
_h3RotatePent60cw
ST_Intersects_Polygon_Point
ST_DWithin_LineString_Polygon
ST_Intersects_Point_Polygon
box_contains_box

show queries;
query_session_id|current_status|submitted          |query_str                                                   |login_name|client_address     |db_name   |exec_device_type
834-8VAA        |Pending       |2020-05-06 08:21:15|select d_date_sk, count(1) from date_dim group by d_date_sk;|admin     |tcp:localhost:48596|tpcds_sf10|CPU
826-CLKk        |Running       |2020-05-06 08:20:57|select count(1) from store_sales, store_returns;            |admin     |tcp:localhost:48592|tpcds_sf10|CPU
828-V6s7        |Pending       |2020-05-06 08:21:13|select count(1) from store_sales;                           |admin     |tcp:localhost:48594|tpcds_sf10|GPU
946-rtJ7        |Pending       |2020-05-06 08:20:58|select count(1) from item;                                  |admin     |tcp:localhost:48610|tpcds_sf10|GPU

omnisql> show table details;
table_id|table_name       |column_count|is_sharded_table|shard_count|max_rows           |fragment_size|max_rollback_epochs|min_epoch|max_epoch|min_epoch_floor|max_epoch_floor|metadata_file_count|total_metadata_file_size|total_metadata_page_count|total_free_metadata_page_count|data_file_count|total_data_file_size|total_data_page_count|total_free_data_page_count
1       |heavyai_states   |11          |false           |0          |4611686018427387904|32000000     |-1                 |1        |1        |0              |0              |1                  |16777216                |4096                     |4082                          |1              |536870912           |256                  |242
2       |heavyai_counties |13          |false           |0          |4611686018427387904|32000000     |-1                 |1        |1        |0              |0              |1                  |16777216                |4096                     |NULL                          |1              |536870912           |256                  |NULL
3       |heavyai_countries|71          |false           |0          |4611686018427387904|32000000     |-1                 |1        |1        |0              |0              |1                  |16777216                |4096                     |4022                          |1              |536870912           |256                  |182

omnisql> show table details heavyai_states;
table_id|table_name    |column_count|is_sharded_table|shard_count|max_rows           |fragment_size|max_rollback_epochs|min_epoch|max_epoch|min_epoch_floor|max_epoch_floor|metadata_file_count|total_metadata_file_size|total_metadata_page_count|total_free_metadata_page_count|data_file_count|total_data_file_size|total_data_page_count|total_free_data_page_count
1       |heavyai_states|11          |false           |0          |4611686018427387904|32000000     |-1                 |1        |1        |0              |0              |1                  |16777216                |4096                     |4082                          |1              |536870912           |256                  |242

SHOW TABLE FUNCTIONS;
tf_compute_dwell_times
tf_feature_self_similarity
tf_feature_similarity
tf_rf_prop
tf_rf_prop_max_signal
tf_geo_rasterize_slope
tf_geo_rasterize
generate_random_strings
generate_series
tf_mandelbrot_cuda_float
tf_mandelbrot_cuda
tf_mandelbrot_float
tf_mandelbrot

SHOW SERVERS;
server_name|data_wrapper|created_at|options
default_local_delimited|DELIMITED_FILE|2022-03-15 10:06:05|{"STORAGE_TYPE":"LOCAL_FILE"}
default_local_parquet|PARQUET_FILE|2022-03-15 10:06:05|{"STORAGE_TYPE":"LOCAL_FILE"}
default_local_regex_parsed|REGEX_PARSED_FILE|2022-03-15 10:06:05|{"STORAGE_TYPE":"LOCAL_FILE"}
...

SHOW USER DETAILS
NAME            ID         DEFAULT_DB 
mike.nuumann    191        mondale
Dale            184        churchill
Editor_Test     141        mondale
Jerry.wong      181        alluvial
AA_superuser    139        
BB_superuser    2140
PlinyTheElder   183        windsor
aaron.tyre      241        db1
achristie       243        sid
eve.mandela     202        nancy
...

heavysql> show all user details;
NAME|ID|IS_SUPER|DEFAULT_DB|CAN_LOGIN
admin|0|true|(-1)|true
ua|2|false|db1(2)|true
ub|3|false|db1(2)|true
uc|4|false|db1(2)|false
ud|5|false|db2(3)|true
ue|6|false|db2(3)|true
uf|7|false|db2(3)|false

heavysql> \db db2
User admin switched to database db2

heavysql> show all user details ue, ud, uf, ua;
NAME|ID|IS_SUPER|DEFAULT_DB|CAN_LOGIN
ua|2|false|db1(2)|true
ud|5|false|db2(3)|true
ue|6|false|db2(3)|true
uf|7|false|db2(3)|false

SHOW USER SESSIONS;
session_id   login_name   client_address         db_name
453-X6ds     mike         http:198.51.100.1      game_results
453-0t2r     erin         http:198.51.100.11     game_results
421-B64s     shauna       http:198.51.100.43     game_results
213-06dw     ahmed        http:198.51.100.12     signals
333-R28d     cat          http:198.51.100.233    signals
497-Xyz6     inez         http:198.51.100.5      ships
...

show queries;
query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU
947-ooNP        |RUNNING_IMPORTER    |0          |2021-08-03 ...|IMPORT_GEO_TABLE|Rio       |tcp:::ffff:127.0.0.1:47314|omnisci|CPU

show queries;
query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU

Using HeavyImmerse Data Manager

HeavyImmerse supports file upload for .csv, .tsv, and .txt files, and supports comma, tab, and pipe delimiters.

HeavyImmerse also supports upload of compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

You can import data to HeavyDB using the Immerse import wizard. You can upload data from a local delimited file, from an Amazon S3 data source, or from the Data Catalog.

For methods specific to geospatial data, see also Importing Geospatial Data Using Immerse.

If there is a potential for duplicate entries, and you prefer to avoid loading duplicate rows, see .
If a source file uses a reserved word, OmniSci automatically adds an underscore at the end of the reserved word. For example, year is converted to year_.
If you click the

Importing Non-Geospatial Data from a Local File

Follow these steps to import your data:

Click DATA MANAGER.
Click Import Data.
Click Import data from a local file.

You can also import locally stored shape files in a variety of formats. See .

Importing Data from Amazon S3

To import data from your Amazon S3 instance, you need:

The Region and Path for the file in your S3 bucket, or the direct URL to the file (S3 Link).
If importing private data, your Access Key and Secret Key for your personal IAM account in S3.

Locating the Data File S3 Region, Path, and URL

For information on opening and reviewing items in your S3 instance, see

In an S3 bucket, the Region is in the upper-right corner of the screen – US West (N. California) in this case:

Click the file you want to import. To load your S3 file to HEAVY.AI using the steps for S3 Region | Bucket | Path, below, click Copy path to copy to your clipboard the path to your file within your S3 bucket. Alternatively, you can copy the link to your file. The Link in this example is https://s3-us-west-1.amazonaws.com/my-company-bucket/trip_data.7z.

Obtaining Your S3 Access Key and Secret Key

To learn about creating your S3 Access Key and Secret Key, see

If the data you want to copy is publicly available, you do not need to provide an Access Key and Secret Key.

You can import any file you can see using your IAM account with your Access Key and Secret Key.

Your Secret Key is created with your Access Key, and cannot be retrieved afterward. If you lose your Secret Key, you must create a new Access Key and Secret Key.

Loading Your S3 Data to HEAVY.AI

Follow these steps to import your S3 data:

Click DATA MANAGER.
Click Import Data.
Click Import data from Amazon S3.

Importing from the Data Catalog

The Data Catalog provides access to sample datasets you can use to exercise data visualization features in Heavy Immerse. The selection of datasets continually changes, independent of product releases.

To import from the data catalog:

Open the Data Manager.
Click Data Catalog.
Use the Search box to locate a specific data set, or scroll to find the dataset you want to use. The Contains Geo toggle filters for data sets that contain Geographical information.

Appending Data to a Table

You can append additional data to an existing table.

To append data to a table:

Open Data Manager.
Select the table you want to append.
Click Append Data.

To append data from AWS, click Append Data, then follow the instructions for .

Truncating a Table

Sometimes you might want to remove or replace the data in a table without losing the table definition itself.

To remove all data from a table:

Open Data Manager.
Select the table you want to truncate.
Click Delete All Rows.

Deleting a Table

You can drop a table entirely using Data Manager.

To delete a table:

Open Data Manager.
Select the table you want to delete.
Click DELETE TABLE.

CREATE TABLE geo ( name TEXT ENCODING DICT(32),
                   p0 POINT,
                   p1 GEOMETRY(POINT),
                   p2 GEOMETRY(POINT, 4326),
                   p3 GEOMETRY(POINT, 4326) ENCODING NONE,
                   p4 GEOMETRY(POINT, 900913),
                   mp GEOMETRY(MULTIPOINT, 4326),
                   ls0  LINESTRING,
                   ls1 GEOMETRY(LINESTRING, 4326) ENCODING COMPRESSED(32),
                   ls2 GEOMETRY(LINESTRING, 4326) ENCODING NONE,
                   mls1 GEOMETRY(MULTILINESTRING, 4326) ENCODING COMPRESSED(32),
                   mls2 GEOMETRY(MULTILINESTRING, 4326) ENCODING NONE,
                   poly0 POLYGON,
                   poly1 GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32),
                   mpoly0 GEOMETRY(MULTIPOLYGON, 4326)
                  );

CREATE TABLE arrayexamples (
  tiny_int_array TINYINT[],
  int_array INTEGER[],
  big_int_array BIGINT[],
  text_array TEXT[] ENCODING DICT(32), --OmniSci supports only DICT(32) TEXT arrays.
  float_array FLOAT[],
  double_array DOUBLE[],
  decimal_array DECIMAL(18,6)[],
  boolean_array BOOLEAN[],
  date_array DATE[],
  time_array TIME[],
  timestamp_array TIMESTAMP[])

CREATE TABLE text_shard (
i TEXT ENCODING DICT(32),
s TEXT ENCODING DICT(32),
SHARD KEY (i))
WITH (SHARD_COUNT = 2);

CREATE TABLE text_shard1 (
i TEXT,
s TEXT ENCODING DICT(32),
SHARD KEY (i),
SHARED DICTIONARY (i) REFERENCES text_shard(i))
WITH (SHARD_COUNT = 2);

create table flights (
*
*
*
dest_name TEXT ENCODING DICT,
dest_city TEXT ENCODING DICT,
dest_state TEXT ENCODING DICT,
dest_country TEXT ENCODING DICT,

*
*
*
origin_name TEXT,
origin_city TEXT,
origin_state TEXT,
origin_country TEXT,
*
*
*

SHARED DICTIONARY (origin_name) REFERENCES flights(dest_name),
SHARED DICTIONARY (origin_city) REFERENCES flights(dest_city),
SHARED DICTIONARY (origin_state) REFERENCES flights(dest_state),
SHARED DICTIONARY (origin_country) REFERENCES flights(dest_country),
*
*
*
)
WITH(
*
*
*
)

create table flights (

*
*
*

SHARED DICTIONARY (origin_city) REFERENCES us_geography(city),
SHARED DICTIONARY (origin_state) REFERENCES us_geography(state),
SHARED DICTIONARY (origin_country) REFERENCES us_geography(country),
SHARED DICTIONARY (dest_city) REFERENCES us_geography(city),
SHARED DICTIONARY (dest_state) REFERENCES us_geography(state),
SHARED DICTIONARY (dest_country) REFERENCES us_geography(country),

*
*
*
)
WITH(
*
*
*
);

Tables

These functions are used to create and modify data tables in HEAVY.AI.

Nomenclature Constraints

Table names must use the NAME format, described in regex notation as:

Table and column names can include quotes, spaces, and the underscore character. Other special characters are permitted if the name of the table or column is enclosed in double quotes (" ").

Spaces and special characters other than underscore (_) cannot be used in Heavy Immerse.
Column and table names enclosed in double quotes cannot be used in Heavy Immerse

CREATE TABLE

Create a table named <table> specifying <columns> and table properties.

Supported Datatypes

Datatype

Size (bytes)

Notes

* In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see ).

For more information, see .

For geospatial datatypes, see .

Examples

Create a table named tweets and specify the columns, including type, in the table.

Create a table named delta and assign a default value San Francisco to column city.

Default values currently have the following limitations:

Only literals can be used for column DEFAULT values; expressions are not supported.
You cannot define a DEFAULT value for a shard key. For example, the following does not parse:

Supported Encoding

Encoding

Descriptions

WITH Clause Properties

Property

Description

Sharding

Sharding partitions a database table across multiple servers so each server has a part of the table with the same columns but with different rows. Partitioning is based on a sharding key defined when you create the table.

Without sharding, the dimension tables involved in a join are replicated and sent to each GPU, which is not feasible for dimension tables with many rows. Specifying a shard key makes it possible for the query to execute efficiently on large dimension tables.

Currently, specifying a shard key is useful for joins, only:

If two tables specify a shard key with the same type and the same number of shards, a join on that key only sends a part of the dimension table column data to each GPU.
For multi-node installs, the dimension table does not need to be replicated and the join executes locally on each leaf.

Constraints

A shard key must specify a single column to shard on. There is no support for sharding by a combination of keys.
One shard key can be specified for a table.
Data are partitioned according to the shard key and the number of shards (shard_count).

Recommendations

Set shard_count to the number of GPUs you eventually want to distribute the data table across.
Referenced tables must also be shard_count -aligned.
Sharding should be minimized because it can introduce load skew accross resources, compared to when sharding is not used.

Examples

Basic sharding:

Sharding with shared dictionary:

Temporary Tables

Using the TEMPORARY argument creates a table that persists only while the server is live. They are useful for storing intermediate result sets that you access more than once.

Adding or dropping a column from a temporary table is not supported.

Example

CREATE TABLE AS SELECT

Create a table with the specified columns, copying any data that meet SELECT statement criteria.

WITH Clause Properties

Property

Description

Examples

Create the table newTable. Populate the table with all information from the table oldTable, effectively creating a duplicate of the original table.

Create a table named trousers. Populate it with data from the columns name, waist, and inseam from the table wardrobe.

Create a table named cosmos. Populate it with data from the columns star and planet from the table universe where planet has the class M.

ALTER TABLE

Examples

Rename the table tweets to retweets.

Rename the column source to device in the table retweets.

Add the column pt_dropoff to table tweets with a default value point(0,0).

Add multiple columns a, b, and c to table table_one with a default value of 15 for column b.

Default values currently have the following limitations:

Only literals can be used for column DEFAULT values; expressions are not supported.
For arrays, use the following syntax:

Add the column lang to the table tweets using a TEXT ENCODING DICTIONARY.

Add the columns lang and encode to the table tweets using a TEXT ENCODING DICTIONARY for each.

Drop the column pt_dropoff from table tweets.

Limit on-disk data growth by setting the number of allowed epoch rollbacks to 50:

You cannot add a dictionary-encoded string column with a shared dictionary when using ALTER TABLE ADD COLUMN.
Currently, HEAVY.AI does not support adding a geo column type (POINT, LINESTRING, POLYGON, or MULTIPOLYGON) to a table.

Change a text column “id” to an integer column:

Change text columns “id” and “location” to big integer and point columns respectively:

Currently, only text column types (dictionary encoded and none encoded text columns) can be altered.

DROP TABLE

Deletes the table structure, all data from the table, and any dictionary content unless it is a shared dictionary. (See the Note regarding .)

Example

DUMP TABLE

Archives data and dictionary files of the table <table> to file <filepath>.

Valid values for <compression_program> include:

gzip (default)
pigz
lz4

If you do not choose a compression option, the system uses gzip if it is available. If gzip is not installed, the file is not compressed.

The file path must be enclosed in single quotes.

Dumping a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being dumped.
The DUMP command is not supported on distributed configurations.

Example

RENAME TABLE

Rename a table or multiple tables at once.

Examples

Rename a single table:

Swap table names:

Swap table names multiple times:

RESTORE TABLE

Restores data and dictionary files of table <table> from the file at <filepath>. If you specified a compression program when you used the DUMP TABLE command, you must specify the same compression method during RESTORE.

Restoring a table decompresses and then reimports the table. You must have enough disk space for both the new table and the archived table, as well as enough scratch space to decompress the archive and reimport it.

The file path must be enclosed in single quotes.

You can also restore a table from archives stored in S3-compatible endpoints:

s3_region is required. All features discussed in , such as custom S3 endpoints and server privileges, are supported.

Restoring a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being restored.
The RESTORE command is not supported on distributed configurations.

Do not attempt to use RESTORE TABLE with a table dump created using a release of HEAVY.AI that is higher than the release running on the server where you will restore the table.

Examples

Restore table tweets from /opt/archive/tweetsBackup.gz:

Restore table tweets from a public S3 file or using server privileges (with the allow-s3-server-privileges server flag enabled):

Restore table tweets from a private S3 file using AWS access keys:

Restore table tweets from a private S3 file using temporary AWS access keys/session token:

Restore table tweets from an S3-compatible endpoint:

TRUNCATE TABLE

Use the TRUNCATE TABLE statement to remove all rows from a table without deleting the table structure.

This releases table on-disk and memory storage and removes dictionary content unless it is a shared dictionary. (See the note regarding .)

Removing rows is more efficient than using . Dropping followed by recreating the table invalidates dependent objects of the table requiring you to regrant object privileges. Truncating has none of these effects.

Example

When you DROP or TRUNCATE, the command returns almost immediately. The directories to be purged are marked with the suffix \_DELETE_ME_. The files are automatically removed asynchronously.

In practical terms, this means that you will not see a reduction in disk usage until the automatic task runs, which might not start for up to five minutes.

You might also see directory names appended with \_DELETE_ME_. You can ignore these, with the expectation that they will be deleted automatically over time.

OPTIMIZE TABLE

Use this statement to remove rows from storage that have been marked as deleted via DELETE statements.

When run without the vacuum option, the column-level metadata is recomputed for each column in the specified table. HeavyDB makes heavy use of metadata to optimize query plans, so optimizing table metadata can increase query performance after metadata widening operations such as updates or deletes. If the configuration parameter enable-auto-metadata-update is not set, HeavyDB does not narrow metadata during an update or delete — metadata is only widened to cover a new range.

When run with the vacuum option, it removes any rows marked "deleted" from the data stored on disk. Vacuum is a checkpointing operation, so new copies of any vacuum records are deleted. Using OPTIMIZE with the VACUUM option compacts pages and deletes unused data files that have not been repopulated.

Beginning with Release 5.6.0, OPTIMIZE should be used infrequently, because UPDATE, DELETE, and IMPORT queries manage space more effectively.

VALIDATE

Performs checks for negative and inconsistent epochs across table shards for single-node configurations.

If VALIDATE detects epoch-related issues, it returns a report similar to the following:

If no issues are detected, it reports as follows:

VALIDATE CLUSTER

Perform checks and report discovered issues on a running HEAVY.AI cluster. Compare metadata between the aggregator and leaves to verify that the logical components between the processes are identical.

VALIDATE CLUSTER also detects and reports issues related to table epochs. It reports when epochs are negative or when table epochs across leaf nodes or shards are inconsistent.

Examples

If VALIDATE CLUSTER detects issues, it returns a report similar to the following:

If no issues are detected, it will report as follows:

You can include the WITH(REPAIR_TYPE) argument. (REPAIR_TYPE='NONE') is the same as running the command with no argument. (REPAIR_TYPE='REMOVE') removes any leaf objects that have issues. For example:

Epoch Issue Example

This example output from the VALIDATE CLUSTER command on a distributed setup shows epoch-related issues:

CREATE TABLE tbl (id INTEGER NOT NULL DEFAULT 0, name TEXT, shard key (id)) with (shard_count = 2);

ARRAY[A, B, C, …. N]

{A, B, C, ... N}

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] <table>
  (<column> <type> [NOT NULL] [DEFAULT <value>] [ENCODING <encodingSpec>],
  [SHARD KEY (<column>)],
  [SHARED DICTIONARY (<column>) REFERENCES <table>(<column>)], ...)
  [WITH (<property> = value, ...)];

CREATE TABLE IF NOT EXISTS tweets (
   tweet_id BIGINT NOT NULL,
   tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
   lat FLOAT,
   lon FLOAT,
   sender_id BIGINT NOT NULL,
   sender_name TEXT NOT NULL ENCODING DICT,
   location TEXT ENCODING  DICT,
   source TEXT ENCODING DICT,
   reply_to_user_id BIGINT,
   reply_to_tweet_id BIGINT,
   lang TEXT ENCODING  DICT,
   followers INT,
   followees INT,
   tweet_count INT,
   join_time TIMESTAMP ENCODING  FIXED(32),
   tweet_text TEXT,
   state TEXT ENCODING  DICT,
   county TEXT ENCODING DICT,
   place_name TEXT,
   state_abbr TEXT ENCODING DICT,
   county_state TEXT ENCODING DICT,
   origin TEXT ENCODING DICT,
   phone_numbers bigint);

ALTER TABLE <table> RENAME TO <table>;
ALTER TABLE <table> RENAME COLUMN <column> TO <column>;
ALTER TABLE <table> ADD [COLUMN] <column> <type> [NOT NULL] [ENCODING <encodingSpec>];
ALTER TABLE <table> ADD (<column> <type> [NOT NULL] [ENCODING <encodingSpec>], ...);
ALTER TABLE <table> ADD (<column> <type> DEFAULT <value>);
ALTER TABLE <table> DROP COLUMN <column_1>[, <column_2>, ...];
ALTER TABLE <table> SET MAX_ROLLBACK_EPOCHS=<value>;
ALTER TABLE <table> ALTER COLUMN <column> TYPE <type>, ALTER COLUMN <column> TYPE <type>, ...;

RESTORE TABLE <table> FROM '<S3_file_URL>' 
  WITH (compression = '<compression_program>', 
        s3_region = '<region>', 
        s3_access_key = '<access_key>', 
        s3_secret_key = '<secret_key>', 
        s3_session_token = '<session_token>');

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 2 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy');

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy',
      s3_session_token = 'zzzzzzzz');

heavysql> validate;
Result

Negative epoch value found for table "my_table". Epoch: -1.
Epoch values for table "my_table_2" are inconsistent:
Table Id  Epoch     
========= ========= 
4         1         
5         2

mapd@thing3 ~]$ /mnt/gluster/dist_mapd/mapd-sw2/bin/mapdql -p HyperInteractive
User admin connected to database heavyai
heavysql> validate cluster;
Result
 Node          Table Count 
 ===========   =========== 
 Aggregator     1116
 Leaf 0         1114
 Leaf 1         1114
No matching table on Leaf 0 for Table cities_dtl_POINTS table id 56
No matching table on Leaf 1 for Table cities_dtl_POINTS table id 56
No matching table on Leaf 0 for Table cities_dtl table id 80
No matching table on Leaf 1 for Table cities_dtl table id 80
Table details don't match on Leaf 0 for Table view_geo table id 95
Table details don't match on Leaf 1 for Table view_geo table id 95

heavysql> validate cluster;
Result

Negative epoch value found for table "my_table". Epoch: -16777216.
Epoch values for table "my_table_2" are inconsistent:
Node      Table Id  Epoch     
========= ========= ========= 
Leaf 0    4         1         
Leaf 1    4         2

If you provide no arguments, the implied value is 1 (true) (allow-loop-joins).

Geospatial Capabilities

HEAVY.AI supports a subset of object types and functions for storing and writing queries for geospatial definitions.

Geospatial Datatypes

Type

Size

Example

For information about geospatial datatype sizes, see and in .

For more information on WKT primitives, see .

HEAVY.AI supports SRID 4326 () and 900913 (Google Web Mercator), and 32601-32660,32701-32760 (Universal Transverse Mercator (UTM) Zones). When using geospatial fields, you set the SRID to determine which reference system to use. HEAVY.AI does not assign a default SRID.

If you do not set the SRID of the geo field in the table, you can set it in a SQL query using ST_SETSRID(column_name, SRID). For example, ST_SETSRID(a.pt,4326).

When representing longitude and latitude, the first coordinate is assumed to be longitude in HEAVY.AI geospatial primitives.

You create geospatial objects as geometries (planar spatial data types), which are supported by the planar geometry engine at run time. When you call ST_DISTANCE on two geometry objects, the engine returns the shortest straight-line planar distance, in degrees, between those points. For example, the following query returns the shortest distance between the point(s) in p1 and the polygon(s) in poly1:

For information about importing data, see .

Geospatial Literals

Geospatial functions that expect geospatial object arguments accept geospatial columns, geospatial objects returned by other functions, or string literals containing WKT representations of geospatial objects. Supplying a WKT string is equivalent to calling a geometry constructor. For example, these two queries are identical:

You can create geospatial literals with a specific SRID. For example:

Support for Geography

HEAVY.AI provides support for geography objects and geodesic distance calculations, with some limitations.

Exporting Coordinates from Immerse

HeavyDB supports import from any coordinate system supported by the Geospatial Data Abstraction Library (GDAL). On import, HeavyDB will convert to and store in WGS84 encoding, and rendering is accurate in Immerse.

However, no built-in way to reference the original coordinates currently exists in Immerse, and coordinates exported from Immerse will be WGS84 coordinates. You can work around this limitation by adding to the dataset a column or columns in non-geo format that could be included for display in Immerse (for example, in a popup) or on export.

Distance Calculation

Currently, HEAVY.AI supports spheroidal distance calculation between:

Two points using either SRID 4326 or 900913.
A point and a polygon/multipolygon using SRID 900913.

Using SRID 900913 results in variance compared to SRID 4326 as polygons approach the North and South Poles.

The following query returns the points and polygons within 1,000 meters of each other:

See the tables in below for examples.

Geospatial Functions

HEAVY.AI supports the functions listed.

Geometry Constructors

Function

Description

Geometry to String Conversion

Function

Description

Geometry Processing

Function

Description

FunctionDescription Special processing is automatically applied to WGS84 input geometries (SRID=4326) to limit buffer distortion:

Implementation first determines the best planar SRID to which to project the 4326 input geometry.
Preferred SRIDs are UTM and Lambert (LAEA) North/South zones, with Mercator used as a fallback.
Buffer distance is interpreted as distance in meters (units of all planar SRIDs being considered).

Example: Build 10-meter buffer geometries (SRID=4326) with limited distortion:SELECT ST_Buffer(poly4326, 10.0) FROM tbl; .ST_CentroidComputes the geometric center of a geometry as a POINT.

Geometry Editors

Function

Description

Geometry Accessors

Function

Description

Overlay Functions

Function

Description

Spatial Relationships and Measurements

Function

Description

Additional Geo Notes

You can use SQL code similar to the examples in this topic as global filters in Immerse.
CREATE TABLE AS SELECT is not currently supported for geo data types in distributed mode.
GROUP BY is not supported for geo types (

POINT

MULTIPOINT

LINESTRING

MULTILINESTRING

POLYGON

MULTIPOLYGON

heavysql> \d starting_point
CREATE TABLE starting_point (
                               name TEXT ENCODING DICT(32),
                               myPoint GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32)
                             )

ST_TRANSFORM(

ST_GeomFromText(

'POLYGON((-76.6168198439371 39.9703199555959,

-80.5189990254673 40.6493554919257,

-82.5189990254673 42.6493554919257,

-76.6168198439371 39.9703199555959)

)', 4326

),

900913)

)

-80.5189990254673 40.6493554919257,

-82.5189990254673 42.6493554919257,

-76.6168198439371 39.9703199555959)

)',

4326)

)

from tbl;

'POINT(-118.4079 33.9434)', 4326),

ST_GeogFromText('POINT(2.5559 49.0083)',

4326 ),

10000000.0) FROM tbl;

Functions and Operators

Functions and Operators (DML)

Basic Mathematical Operators

Operator

Description

+numeric

Mathematical Operator Precedence

Parenthesization
Multiplication and division
Addition and subtraction

Comparison Operators

Operator

Description

Mathematical Functions

Function

Description

Trigonometric Functions

Function

Description

Geometric Functions

Function

Description

String Functions

Function

Description

Pattern-Matching Functions

Name

Example

Description

Usage Notes

The following wildcard characters are supported by LIKE and ILIKE:

% matches any number of characters, including zero characters.
_ matches exactly one character.

Date/Time Functions

Function

Description

Supported Types

Supported date_part types:

Supported interval types:

Accepted Date, Time, and Timestamp Formats

Datatype

Formats

Examples

Usage Notes

For two-digit years, years 69-99 are assumed to be previous century (for example, 1969), and 0-68 are assumed to be current century (for example, 2016).
For four-digit years, negative years (BC) are not supported.
Hours are expressed in 24-hour format.

Statistical and Aggregate Functions

Both double-precision (standard) and single-precision floating point statistical functions are provided. Single-precision functions run faster on GPUs but might cause overflow errors.

Usage Notes

COUNT(DISTINCT x), especially when used in conjunction with GROUP BY, can require a very large amount of memory to keep track of all distinct values in large tables with large cardinalities. To avoid this large overhead, use APPROX_COUNT_DISTINCT.
APPROX_COUNT_DISTINCT(x,

Miscellaneous Functions

Function

Description

User-Defined Functions

You can create your own C++ functions and use them in your SQL queries.

User-defined Functions (UDFs) require clang++ version 9. You can verify the version installed using the command clang++ --version.
UDFs currently allow any authenticated user to register and execute a runtime function. By default, runtime UDFs are globally disabled but can be enabled with the runtime flag enable-runtime-udf

Create your function and save it in a .cpp file; for example, /var/lib/omnisci/udf_myFunction.cpp.
Add the UDF configuration flag to omnisci.conf. For example:
Use your function in a SQL query. For example:

Sample User-Defined Function

This function, udf_diff.cpp, returns the difference of two values from a table.

Code Commentary

Include the standard integer library, which supports the following datatypes:

bool
int8_t (cstdint), char
int16_t (cstdint), short

The next four lines are boilerplate code that allows OmniSci to determine whether the server is running with GPUs. OmniSci chooses whether it should compile the function inline to achieve the best possible performance.

The next line is the actual user-defined function, which returns the difference between INTEGER values x and y.

To run the udf_diff function, add this line to your /var/lib/omnisci/omnisci.conf file (in this example, the .cpp file is stored at /var/lib/omnisci/udf_diff.cpp):

Restart the OmniSci server.

Use your command from an OmniSci SQL client to query, for example, a table named myTable that contains the INTEGER columns myInt1 and myInt2.

OmniSci returns the difference as an INTEGER value.

e

)

APPROX_COUNT_DISTINCT

COUNT(DISTINCT

x

)

APPROX_COUNT_DISTINCT

-hll-precision-bits

DATE_TRUNC [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
            MICROSECOND, NANOSECOND, MILLENNIUM, CENTURY, DECADE, WEEK, 
            WEEK_SUNDAY, QUARTERDAY]
EXTRACT    [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
            MICROSECOND, NANOSECOND, DOW, ISODOW, DOY, EPOCH, QUARTERDAY, 
            WEEK, WEEK_SUNDAY, DATEEPOCH]
DATEDIFF   [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
            MICROSECOND, NANOSECOND, WEEK]

DATEADD       [DECADE, YEAR, QUARTER, MONTH, WEEK, WEEKDAY, DAY, 
               HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, NANOSECOND]
TIMESTAMPADD  [YEAR, QUARTER, MONTH, WEEKDAY, DAY, HOUR, MINUTE,
               SECOND, MILLISECOND, MICROSECOND, NANOSECOND]
DATEPART      [YEAR, QUARTER, MONTH, DAYOFYEAR, QUARTERDAY, WEEKDAY, DAY, HOUR,
               MINUTE, SECOND, MILLISECOND, MICROSECOND, NANOSECOND]

#include <cstdint>
#if defined(__CUDA_ARCH__) && defined(__CUDACC__) && defined(__clang__)
#define DEVICE __device__
#define NEVER_INLINE
#define ALWAYS_INLINE
#else
#define DEVICE
#define NEVER_INLINE __attribute__((noinline))
#define ALWAYS_INLINE __attribute__((always_inline))
#endif
#define EXTENSION_NOINLINE extern "C" NEVER_INLINE DEVICE
EXTENSION_NOINLINE int32_t udf_diff(const int32_t x, const int32_t y) { return x - y; }

#include <cstdint>
#if defined(__CUDA_ARCH__) && defined(__CUDACC__) && defined(__clang__)
#define DEVICE __device__
#define NEVER_INLINE
#define ALWAYS_INLINE
#else
#define DEVICE
#define NEVER_INLINE __attribute__((noinline))
#define ALWAYS_INLINE __attribute__((always_inline))
#endif
#define EXTENSION_NOINLINE extern "C" NEVER_INLINE DEVICE

,

.

:

;

+

-

*

%

/

|

\

[

]

(

)

{

}

<

>

len

lpad_str

str

start

str

]])

len

pad_str

Roles and Privileges

HEAVY.AI supports data security using a set of database object access privileges granted to users or roles.

Users and Privileges

When you create a database, the admin superuser is created by default. The admin superuser is granted all privileges on all database objects. Superusers can create new users that, by default, have no database object privileges.

Superusers can grant users selective access privileges on multiple database objects using two mechanisms: role-based privileges and user-based privileges.

Role-based Privileges

Grant roles access privileges on database objects.
Grant roles to users.
Grant roles to other roles.

User-based Privileges

When a user has privilege requirements that differ from role privileges, you can grant privileges directly to the user. These mechanisms provide data security for many users and classes of users to access the database.

You have the following options for granting privileges:

Each object privilege can be granted to one or many roles, or to one or many users.
A role and/or user can be granted privileges on one or many objects.
A role can be granted to one or many users or other roles.

This supports the following many-to-many relationships:

Objects and roles
Objects and users
Roles and users

These relationships provide flexibility and convenience when granting/revoking privileges to and from users.

Granting object privileges to roles and users, and granting roles to users, has a cumulative effect. The result of several grant commands is a combination of all individual grant commands. This applies to all database object types and to privileges inherited by objects. For example, object privileges granted to the object of database type are propagated to all table-type objects of that database object.

Who Can Grant Object Privileges?

Only a superuser or an object owner can grant privileges for on object.

A superuser has all privileges on all database objects.
A non-superuser user has only those privileges on a database object that are granted by a superuser.
A non-superuser user has ALL privileges on a table created by that user.

Roles and Privileges Persistence

Roles can be created and dropped at any time.
Object privileges and roles can be granted or revoked at any time, and the action takes effect immediately.
Privilege state is persistent and restored if the HEAVY.AI session is interrupted.

Database Object Privileges

There are five database object types, each with its own privileges.

ACCESS - Connect to the database. The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

ALL - Allow all privileges on this database except issuing grants and dropping the database.

SELECT, INSERT, TRUNCATE, UPDATE, DELETE - Allow these operations on any table in the database.

ALTER SERVER - Alter servers in the current database.

CREATE SERVER - Create servers in the current database.

CREATE TABLE - Create a table in the current database. (Also CREATE.)

Privileges granted on a database-type object are inherited by all tables of that database.

Privilege Commands

SQL

Description

Example

The following example shows a valid sequence for granting access privileges to non-superuser user1 by granting a role to user1 and by directly granting a privilege. This example presumes that table1 and user1 already exist, and that user1 has ACCESS privileges on the database where table1 exists.

Create the r_select role.
Grant the SELECT privilege on table1 to the r_select role. Any user granted the r_select role gains the SELECT privilege.

See for a more complete example.

CREATE ROLE

Create a role. Roles are granted to users for role-based database object access.

This clause requires superuser privilege and <roleName> must not exist.

Synopsis

Parameters

<roleName>

Name of the role to create.

Example

Create a payroll department role called payrollDept.

DROP ROLE

Remove a role.

This clause requires superuser privilege and <roleName> must exist.

Synopsis

Parameters

<roleName>

Name of the role to drop.

Example

Remove the payrollDept role.

GRANT

Grant role privileges to users and to other roles.

The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

This clause requires superuser privilege. The specified <roleNames> and <userNames> must exist.

Synopsis

Parameters

<roleNames>

Names of roles to grant to users and other roles. Use commas to separate multiple role names.

<userNames>

Names of users. Use commas to separate multiple user names.

Examples

Assign payrollDept role privileges to user dennis.

Grant payrollDept and accountsPayableDept role privileges to users dennis and mike and role hrDept.

REVOKE

Remove role privilege from users or from other roles. This removes database object access privileges granted with the role.

This clause requires superuser privilege. The specified <roleNames> and <userNames> must exist.

Synopsis

Parameters

<roleNames>

Names of roles to remove from users and other roles. Use commas to separate multiple role names.

<userName>

Names of the users. Use commas to separate multiple user names.

Example

Remove payrollDept role privileges from user dennis.

Revoke payrollDept and accountsPayableDept role privileges from users dennis and fred and role hrDept.

GRANT ON TABLE

Define the privilege(s) a role or user has on the specified table. You can specify any combination of the INSERT, SELECT, DELETE, UPDATE, DROP, or TRUNCATE privilege or specify all privileges.

The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

This clause requires superuser privilege, or <tableName> must have been created by the user invoking this command. The specified <tableName> and users or roles defined in <entityList> must exist.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<tableName>

Name of the database table.

<entityList>

Name of entity or entities to be granted the privilege(s).

Parameter Value

Descriptions

Examples

Permit all privileges on the employees table for the payrollDept role.

Permit SELECT-only privilege on the employees table for user chris.

Permit INSERT-only privilege on the employees table for the hrdept and accountsPayableDept roles.

Permit INSERT, SELECT, and TRUNCATE privileges on the employees table for the role hrDept and for users dennis and mike.

REVOKE ON TABLE

Remove the privilege(s) a role or user has on the specified table. You can remove any combination of the INSERT, SELECT, DELETE, UPDATE, or TRUNCATE privileges, or remove all privileges.

This clause requires superuser privilege or <tableName> must have been created by the user invoking this command. The specified <tableName> and users or roles in <entityList> must exist.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<tableName>

Name of the database table.

<entityList>

Name of entities to be denied the privilege(s).

Parameter Value

Descriptions

Example

Prohibit SELECT and INSERT operations on the employees table for the nonemployee role.

Prohibit SELECT operations on the directors table for the employee role.

Prohibit INSERT operations on the directors table for role employee and user laura.

Prohibit INSERT, SELECT, and TRUNCATE privileges on the employees table for the role nonemployee and for users dennis and mike.

GRANT ON VIEW

Define the privileges a role or user has on the specified view. You can specify any combination of the SELECT, INSERT, or DROP privileges, or specify all privileges.

This clause requires superuser privileges, or <viewName> must have been created by the user invoking this command. The specified <viewName> and users or roles in <entityList> must exist.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<viewName>

Name of the database view.

<entityList>

Name of entities to be granted the privileges.

Parameter Value

Descriptions

Examples

Permit SELECT, INSERT, and DROP privileges on the employees view for the payrollDept role.

Permit SELECT-only privilege on the employees view for the employee role and user venkat.

Permit INSERT and DROP privileges on the employees view for the hrDept and acctPayableDept roles and users simon and dmitri.

REVOKE ON VIEW

Remove the privileges a role or user has on the specified view. You can remove any combination of the INSERT, DROP, or SELECT privileges, or remove all privileges.

This clause requires superuser privilege, or <viewName> must have been created by the user invoking this command. The specified <viewName> and users or roles in <entityList> must exist.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<viewName>

Name of the database view.

<entityList>

Name of entity to be denied the privilege(s).

Parameter Value

Descriptions

Example

Prohibit SELECT, DROP, and INSERT operations on the employees view for the nonemployee role.

Prohibit SELECT operations on the directors view for the employee role.

Prohibit INSERT and DROP operations on the directors view for the employee and manager role and for users ashish and lindsey.

GRANT ON DATABASE

Define the valid privileges a role or user has on the specified database. You can specify any combination of privileges, or specify all privileges.

The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

This clause requires superuser privileges.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<dbName>

Name of the database, which must exist, created by CREATE DATABASE.

<entityList>

Name of the entity to be granted the privilege.

Parameter Value

Descriptions

Examples

Permit all operations on the companydb database for the payrollDept role and user david.

Permit SELECT-only operations on the companydb database for the employee role.

Permit INSERT, UPDATE, and DROP operations on the companydb database for the hrdept and manager role and for users irene and stephen.

REVOKE ON DATABASE

Remove the operations a role or user can perform on the specified database. You can specify privileges individually or specify all privileges.

This clause requires superuser privilege or the user must own the database object. The specified <dbName> and roles or users in <entityList> must exist.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<dbName>

Name of the database.

<entityList>

Parameter Value

Descriptions

Example

Prohibit all operations on the employees database for the nonemployee role.

Prohibit SELECT operations on the directors database for the employee role and for user monica.

Prohibit INSERT, DROP, CREATE, and DELETE operations on the directors database for employee role and for users max and alex.

GRANT ON SERVER

Define the valid privileges a role or user has for working with servers. You can specify any combination of privileges or specify all privileges.

This clause requires superuser privileges, or <serverName> must have been created by the user invoking the command.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<serverName>

Name of the server, which must exist on the current database, created by CREATE SERVER ON DATABASE.

<entityList>

Parameter Value

Descriptions

Examples

Grant DROP privilege on server parquet_s3_server to user fred:

Grant ALTER privilege on server parquet_s3_server to role payrollDept:

Grant USAGE and ALTER privileges on server parquet_s3_server to role payrollDept and user jamie:

REVOKE ON SERVER

Remove privileges a role or user has for working with servers. You can specify any combination of privileges or specify all privileges.

This clause requires superuser privileges, or <serverName> must have been created by the user invoking the command.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<serverName>

Name of the server, which must exist on the current database, created by CREATE SERVER ON DATABASE.

<entityList>

Parameter Value

Descriptions

Examples

Revoke DROP privilege on server parquet_s3_server for user inga:

Grant ALTER privilege on server parquet_s3_server for role payrollDept:

Grant USAGE and ALTER privileges on server parquet_s3_server for role payrollDept and user marvin:

GRANT ON DASHBOARD

Define the valid privileges a role or user has for working with dashboards. You can specify any combination of privileges or specify all privileges.

This clause requires superuser privileges.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<dashboardId>

ID of the dashboard, which must exist, created by CREATE DASHBOARD. To show a list of all dashboards and IDs in heavysql, run the \dash command when logged in as superuser.

<entityList>

Parameter Value

Descriptions

Examples

Permit all privileges on the dashboard ID 740 for the payrollDept role.

Permit VIEW-only privilege on dashboard 730 for the hrDept role and user dennis.

Permit EDIT and DELETE privileges on dashboard 740 for the hrDept and accountsPayableDept roles and for user pavan.

REVOKE ON DASHBOARD

Remove privileges a role or user has for working with dashboards. You can specify any combination of privileges, or all privileges.

This clause requires superuser privileges.

Synopsis

Parameters

<privilegeList>

Parameter Value

Descriptions

<dashboardId>

ID of the dashboard, which must exist, created by CREATE DASHBOARD.

<entityList>

Parameter Value

Descriptions

Revoke DELETE privileges on dashboard 740 for the payrollDept role.

Revoke all privileges on dashboard 730 for hrDept role and users dennis and mike.

Revoke EDIT and DELETE of dashboard 740 for the hrDept and accountsPayableDept roles and for users dante and jonathan.

Common Privilege Levels for Non-Superusers

The following privilege levels are typically recommended for non-superusers in Immerse. Privileges assigned for users in your organization may vary depending on access requirements.

Privilege

Command Syntax to Grant Privilege

Example: Roles and Privileges

These examples assume that tables table1 through table4 are created as needed:

The following examples show how to work with users, roles, tables, and dashboards.

Create User Accounts

Grant Access to Users on Database

Create Marketing Department Roles

Grant Marketing Department Roles to Marketing Department Employees

Grant Privelege to Marketing Department Roles

Create Sales Department Roles

Grant Sales Department Roles to Sales Department Employees

Grant Privilege to Sales Department Roles

Grant All Sales Roles to Sales Department Manager and Marketing Department Manager

Grant View on Dashboards

Use the \dash command to list all dashboards and their unique IDs in HEAVY.AI:

Here, the Marketing_Summary dashboard uses table2 as a data source. The role marketingDeptRole2 has select privileges on that table. Grant view access on the Marketing_Summary dashboard to marketingDeptRole2:

Relationships Between Users, Roles, and Tables

The following table shows the roles and privileges for each user created in the previous example.

User

Roles Granted

Table Privileges

Commands to Report Roles and Privileges

Use the following commands to list current roles and assigned privileges. If you have superuser access, you can see privileges for all users. Otherwise, you can see only those roles and privileges for which you have access.

Results for users, roles, privileges, and object privileges are returned in creation order.

\dash

Lists all dashboards and dashboard IDs in HEAVY.AI. Requires superuser privileges. Dashboard privileges are assigned by dashboard ID because dashboard names may not be unique.

Example

heavysql> \dash database heavyai Dashboard ID | Dashboard Name | Owner 1 | Marketing_Summary | heavyai

\object_privileges objectType `_objectName`_

Reports all privileges granted to the specified object for all roles and users. If the specified objectName does not exist, no results are reported. Used for databases and tables only.

Example

\privileges roleName | userName

Reports all object privileges granted to the specified role or user. The roleName or userName specified must exist.

Example

\role_list userName

Reports all roles granted to the given user. The userName specified must exist.

Example

\roles

Reports all roles.

Example

\u

Lists all users.

Example

Example: Data Security

The following example demonstrates field-level security using two views:

view_users_limited, in which users only see three of seven fields: userid, First_Name, and Department.
view_users_full, users see all seven fields.

Source Data

Create Views

Create Users

Grant Access to Users on Database

Create Roles

Grant Roles to Users

Grant Privilege to View Roles

Verify Views

User readonly1 sees no tables, only the specific view granted, and only the three specific columns returned in the view:

User readonly2 sees no tables, only the specific view granted, and all seven columns returned in the view:

create user marketingDeptEmployee1 (password = 'md1');
create user marketingDeptEmployee2 (password = 'md2');
create user marketingDeptManagerEmployee3 (password = 'md3');

create user salesDeptEmployee1 (password = 'sd1');
create user salesDeptEmployee2 (password = 'sd2');
create user salesDeptEmployee3 (password = 'sd3');
create user salesDeptEmployee4 (password = 'sd4');
create user salesDeptManagerEmployee5 (password = 'sd5');

grant access on database heavyai to marketingDeptEmployee1, marketingDeptEmployee2, marketingDeptManagerEmployee3;
grant access on database heavyai to salesDeptEmployee1, salesDeptEmployee2, salesDeptEmployee3, salesDeptEmployee4, salesDeptManagerEmployee5;

heavysql> \object_privileges database heavyai 
marketingDeptEmployee1 privileges: login-access 
marketingDeptEmployee2 privileges: login-access marketingDeptManagerEmployee3 privileges: login-access
salesDeptEmployee1 privileges: login-access 
salesDeptEmployee2 privileges: login-access 
salesDeptEmployee3 privileges: login-access 
salesDeptEmployee4 privileges: login-access 
salesDeptManagerEmployee5 privileges: login-access

heavysql> \privileges salesDeptRole1 
table1 (table): select 
table3 (table): select
heavysql> \privileges salesDeptManagerEmployee5 
mapd (database): login-access

heavysql> \privileges marketingdeptrole2 
table2 (table): select
Marketing_Summary (dashboard): view

heavysql> \u 
heavyai 
marketingDeptEmployee1 
marketingDeptEmployee2 
salesDeptEmployee1 
salesDeptEmployee2 
salesDeptEmployee3 
salesDeptEmployee4 
salesDeptManagerEmployee5 
marketingDeptManagerEmployee3

heavysql> \t
heavysql> \v
view_users_full
heavysql> select * from view_users_full;
userid|First_Name|Department|Address|City|State|Zip
1|Todd|C Suite|1 Front Street|San Francisco|CA|94111
2|Don|Sales|1 5th Avenue|New York|NY|10001
3|Mike|Customer Succes|100 Main Street|Reston|VA|20191

Loading Data with SQL

This topic describes several ways to load data to HEAVY.AI using SQL commands.

If there is a potential for duplicate entries, and you want to avoid loading duplicate rows, see How can I avoid creating duplicate rows? on the Troubleshooting page.
If a source file uses a reserved word, HEAVY.AI automatically adds an underscore at the end of the reserved word. For example, year is converted to year_.

COPY FROM

CSV/TSV Import

Use the following syntax for CSV and TSV files:

<file pattern> must be local on the server. The file pattern can contain wildcards if you want to load multiple files. In addition to CSV, TSV, and TXT files, you can import compressed files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

COPY FROM appends data from the source into the target table. It does not truncate the table or overwrite existing data.

You can import client-side files (\copy command in heavysql) but it is significantly slower. For large files, HEAVY.AI recommends that you first scp the file to the server, and then issue the COPY command.

HEAVYAI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI.

Available properties in the optional WITH clause are described in the following table.

Parameter

Description

Default Value

By default, the CSV parser assumes one row per line. To import a file with multiple lines in a single field, specify threads = 1 in the WITH clause.

Examples

Geo Import

You can use COPY FROM to import geo files. You can create the table based on the source file and then load the data:

You can also append data to an existing, predefined table:

Use the following syntax, depending on the file source.

If you are using COPY FROM to load to an existing table, the field type must match the metadata of the source file. If it does not, COPY FROM throws an error and does not load the data.

The following WITH options are available for geo file imports from all sources.

Currently, a manually created geo table can have only one geo column. If it has more than one, import is not performed.

Any GDAL-supported file type can be imported. If it is not supported, GDAL throws an error.

An ESRI file geodatabase can have multiple layers, and importing it results in the creation of one table for each layer in the file. This behavior differs from that of importing shapefiles, GeoJSON, or KML files, which results in a single table. For more information, see .

The first compatible file in the bundle is loaded; subfolders are traversed until a compatible file is found. The rest of the contents in the bundle are ignored. If the bundle contains multiple filesets, unpack the file manually and specify it for import.

For more information about importing specific geo file formats, see .

CSV files containing WKT strings are not considered geo files and should not be parsed with the source_type='geo' option. When importing WKT strings from CSV files, you must create the table first. The geo column type and encoding are specified as part of the DDL. For example, for a polygon with no encoding, try the following:

Raster Import

You can use COPY FROM to import raster files supported by GDAL as one row per pixel, where a pixel may consist of one or more data bands, with optional corresponding pixel or world-space coordinate columns. This allows the data to be rendered as a point/symbol cloud that approximates a 2D image.

Use the same syntax that you would for , depending on the file source.

The following WITH options are available for raster file imports from all sources.

Parameter

Description

Default Value

Illegal combinations of raster_point_type and raster_point_transform are rejected. For example, world transform can only be performed on raster files that have a geospatial coordinate system in their metadata, and cannot be performed if <type> is an integer format (which cannot represent world-space coordinate values).

Any GDAL-supported file type can be imported. If it is not supported, GDAL throws an error.

HDF5 and possibly other GDAL drivers may not be thread-safe, so use WITH (threads=1) when importing.

Archive file import (.zip, .tar, .tar.gz) is not currently supported for raster files.

Band and Column Names

The following raster file formats contain the metadata required to derive sensible names for the bands, which are then used for their corresponding columns:

GRIB2 - geospatial/meteorological format
OME TIFF - an OpenMicroscopy format

The band names from the file are sanitized (illegal characters and spaces removed) and de-duplicated (addition of a suffix in cases where the same band name is repeated within the file or across datasets).

For other formats, the columns are named band_1_1, band_1_2 , and so on.

The sanitized and de-duplicated names must be used for the raster_import_bands option.

Band and Column Data Types

Raster files can have bands in the following data types:

Signed or unsigned 8-, 16-, or 32-bit integer
32- or 64-bit floating point
Complex number formats (not supported)

Signed data is stored in the directly corresponding column type, as follows:

int8 -> TINYINT int16 -> SMALLINT int32 -> INT float32 -> FLOAT float64 -> DOUBLE

Unsigned integer column types are not currently supported, so any data of those types is converted to the next size up signed column type:

uint8 -> SMALLINT uint16 -> INT uint32 -> BIGINT

Column types cannot currently be overridden.

Raster Tiled Import

Tiled Raster Import is currently a beta feature and currently does not have suppport for some of the features and flags supported by the existing Raster Import. For best performance, we currently suggest specifying point data as raster_lon/raster_lat DOUBLE values.

A new feature allows raster import to support data-tiling and can be used by using the flag enable-legacy-raster-import=false at startup. This will allow the import to organize raster data by geospatial coordinates to better optimize data access.

Once the server flag is set, the functionality is invoked using the COPY FROM sql command. A notable difference between the tiled import and the legacy import is that the tiled import requires the table to be created in advance with columns specified, and uses some new WITH options.

The following additional WITH options are available for raster files using the new tiled importer

Parameter

Description

Default Value

ODBC Import

ODBC import is currently a beta feature.

You can use COPY FROM to import data from a Relational Database Management System (RDMS) or data warehouse using the Open Database Connectivity (ODBC) interface.

The following WITH options are available for ODBC import.

Examples

Using a data source name:

Using a connection string:

For information about using ODBC HeavyConnect, see .

Globbing, Filtering, and Sorting Parquet and CSV Files

These examples assume the following folder and file structure:

Globbing

Local Parquet/CSV files can now be globbed by specifying either a path name with a wildcard or a folder name.

Globbing a folder recursively returns all files under the specified folder. For example,

COPY table_1 FROM ".../subdir";

returns file_3, file_4, file_5.

Globbing with a wildcard returns any file paths matching the expanded file path. So

COPY table_1 FROM ".../subdir/file*"; returns file_3, file_4.

Does not apply to S3 cases, because file paths specified for S3 always use prefix matching.

Filtering

Use file filtering to filter out unwanted files that have been globbed. To use filtering, specify the REGEX_PATH_FILTER option. Files not matching this pattern are not included on import. Consistent across local and S3 use cases.

The following regex expression:

COPY table_1 from ".../" WITH (REGEX_PATH_FILTER=".*file_[4-5]");

returns file_4, file_5.

Sorting

Use the FILE_SORT_ORDER_BY option to specify the order in which files are imported.

FILE_SORT_ORDER_BY Options

pathname (default)
date_modified
regex

*FILE_SORT_REGEX option required

Using FILE_SORT_ORDER_BY

COPY table_1 from ".../" WITH (FILE_SORT_ORDER_BY="date_modified");

Using FILE_SORT_ORDER_BY with FILE_SORT_REGEX

Regex sort keys are formed by the concatenation of all capture groups from the FILE_SORT_REGEX expression. Regex sort keys are strings but can be converted to dates or FLOAT64 with the appropriate FILE_SORT_ORDER_BY option. File paths that do not match the provided capture groups or that cannot be converted to the appropriate date or FLOAT64 are treated as NULLs and sorted to the front in a deterministic order.

Multiple Capture Groups:

FILE_SORT_REGEX=".*/data_(.*)_(.*)_" /root/dir/unmatchedFile → <NULL> /root/dir/data_andrew_54321_ → andrew54321 /root/dir2/data_brian_Josef_ → brianJosef

Dates:

FILE_SORT_REGEX=".*data_(.*) /root/data_222 → <NULL> (invalid date conversion) /root/data_2020-12-31 → 2020-12-31 /root/dir/data_2021-01-01 → 2021-01-01

Import:

COPY table_1 from ".../" WITH (FILE_SORT_ORDER_BY="regex", FILE_SORT_REGEX=".*file_(.)");

Geo and Raster File Globbing

Limited filename globbing is supported for both geo and raster import. For example, to import a sequence of same-format GeoTIFF files into a single table, you can run the following:

COPY table FROM '/path/path/something_*.tiff' WITH (source_type='raster_file')

The files are imported in alphanumeric sort order, per regular glob rules, and all appended to the same table. This may fail if the files are not all of the same format (band count, names, and types).

For non-geo/raster files (CSV and Parquet), you can provide just the path to the directory OR a wildcard; for example:

/path/to/directory/ /path/to/directory /path/to/directory/*

For geo/raster files, a wildcard is required, as shown in the last example.

SQLImporter

SQLImporter is a Java utility run at the command line. It runs a SELECT statement on another database through JDBC and loads the result set into HeavyDB.

Usage

Flags

HEAVY.AI recommends that you use a service account with read-only permissions when accessing data from a remote database.

In release 4.6 and higher, the user ID (-u) and password (-p) flags are required. If your password includes a special character, you must escape the character using a backslash (\).

If the table does not exist in HeavyDB, SQLImporter creates it. If the target table in HeavyDB does not match the SELECT statement metadata, SQLImporter fails.

If the truncate flag is used, SQLImporter truncates the table in HeavyDB before transferring the data. If the truncate flag is not used, SQLImporter appends the results of the SQL statement to the target table in HeavyDB.

The -i argument provides a path to an initialization file. Each line of the file is sent as a SQL statement to the remote database. You can use -i to set additional custom parameters before the data is loaded.

The SQLImporter string is case-sensitive. Incorrect case returns the following:

Error: Could not find or load main class com.mapd.utility.SQLimporter

PostgreSQL/PostGIS Support

You can migrate geo data types from a PostgreSQL database. The following table shows the correlation between PostgreSQL/PostGIS geo types and HEAVY.AI geo types.

point

Other PostgreSQL types, including circle, box, and path, are not supported.

HeavyDB Example

By default, 100,000 records are selected from HeavyDB. To select a larger number of records, use the LIMIT statement.

Hive Example

Google Big Query Example

PostgreSQL Example

SQLServer Example

MySQL Example

StreamInsert

Stream data into HeavyDB by attaching the StreamInsert program to the end of a data stream. The data stream can be another program printing to standard out, a Kafka endpoint, or any other real-time stream output. You can specify the appropriate batch size, according to the expected stream rates and your insert frequency. The target table must exist before you attempt to stream data into the table.

Setting

Default

Description

For more information on creating regex transformation statements, see .

Example

Importing AWS S3 Files

You can use the SQL COPY FROM statement to import files stored on Amazon Web Services Simple Storage Service (AWS S3) into an HEAVY.AI table, in much the same way you would with local files. In the WITH clause, specify the S3 credentials and region information of the bucket accessed.

Access key and secret key, or session token if using temporary credentials, and region are required. For information about AWS S3 credentials, see .

HEAVY.AI does not support the use of asterisks (*) in URL strings to import items. To import multiple files, pass in an S3 path instead of a file name, and COPY FROM imports all items in that path and any subpath.

Custom S3 Endpoints

HEAVY.AI supports custom S3 endpoints, which allows you to import data from S3-compatible services, such as Google Cloud Storage.

To use custom S3 endpoints, add s3_endpoint to the WITH clause of a COPY FROM statement; for example, to set the S3 endpoint to point to Google Cloud Services:

For information about interoperability and setup for Google Cloud Services, see .

You can also configure custom S3 endpoints by passing the s3_endpoint field to Thrift import_table.

Examples

The following examples show failed and successful attempts to copy the table from AWS S3.

The following example imports all the files in the trip.compressed directory.

trips Table

The table trips is created with the following statement:

Using Server Privileges to Access AWS S3

You can configure HEAVY.AI server to provide AWS credentials, which allows S3 Queries to be run without specifying AWS credentials. S3 Regions are not configured by the server, and will need to be passed in either as a client side environment variable or as an option with the request.

Example Commands

\detect: $ export AWS_REGION=us-west-1 heavysql > \detect <s3-bucket-uri
import_table: $ ./Heavyai-remote -h localhost:6274 import_table "'<session-id>'" "<table-name>" '<s3-bucket-uri>' 'TCopyParams(s3_region="'us-west-1'")'

Configuring AWS Credentials

Enable server privileges in the server configuration file heavy.conf allow-s3-server-privileges = true
For bare metal installations set the following environment variables and restart the HeavyDB service: AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=xxx AWS_SESSION_TOKEN=xxx

KafkaImporter

You can ingest data from an existing Kafka producer to an existing table in HEAVY.AI using KafkaImporter on the command line:

KafkaImporter requires a functioning Kafka cluster. See the and the .

KafkaImporter Options

Setting

Default

Description

KafkaImporter Logging Options

Configure KafkaImporter to use your target table. KafkaImporter listens to a pre-defined Kafka topic associated with your table. You must create the table before using the KafkaImporter utility. For example, you might have a table named customer_site_visit_events that listens to a topic named customer_site_visit_events_topic.

The data format must be a record-level format supported by HEAVY.AI.

KafkaImporter listens to the topic, validates records against the target schema, and ingests topic batches of your designated size to the target table. Rejected records use the existing reject reporting mechanism. You can start, shut down, and configure KafkaImporter independent of the HeavyDB engine. If KafkaImporter is running and the database shuts down, KafkaImporter shuts down as well. Reads from the topic are nondestructive.

KafkaImporter is not responsible for event ordering; a streaming platform outside HEAVY.AI (for example, Spark streaming, flink) should handle the stream processing. HEAVY.AI ingests the end-state stream of post-processed events.

KafkaImporter does not handle dynamic schema creation on first ingest, but must be configured with a specific target table (and its schema) as the basis. There is a 1:1 correspondence between target table and topic.

StreamImporter

StreamImporter is an updated version of the StreamInsert utility used for streaming reads from delimited files into HeavyDB. StreamImporter uses a binary columnar load path, providing improved performance compared to StreamInsert.

You can ingest data from a data stream to an existing table in HEAVY.AI using StreamImporter on the command line.

StreamImporter Options

Setting

Default

Description

StreamImporter Logging Options

Setting

Default

Description

Configure StreamImporter to use your target table. StreamImporter listens to a pre-defined data stream associated with your table. You must create the table before using the StreamImporter utility.

The data format must be a record-level format supported by HEAVY.AI.

StreamImporter listens to the stream, validates records against the target schema, and ingests batches of your designated size to the target table. Rejected records use the existing reject reporting mechanism. You can start, shut down, and configure StreamImporter independent of the HeavyDB engine. If StreamImporter is running but the database shuts down, StreamImporter shuts down as well. Reads from the stream are non-destructive.

StreamImporter is not responsible for event ordering - a first class streaming platform outside HEAVY.AI (for example, Spark streaming, flink) should handle the stream processing. HEAVY.AI ingests the end-state stream of post-processed events.

StreamImporter does not handle dynamic schema creation on first ingest, but must be configured with a specific target table (and its schema) as the basis.

There is a 1:1 correspondence between target table and a stream record.

Importing Data from HDFS with Sqoop

You can consume a CSV or Parquet file residing in HDFS (Hadoop Distributed File System) into HeavyDB.

Copy the HEAVY.AI JDBC driver into the Apache Sqoop library, normally found at /usr/lib/sqoop/lib/.

Example

The following is a straightforward import command. For more information on options and parameters for using Apache Sqoop, see the user guide at .

The --connect parameter is the address of a valid JDBC port on your HEAVY.AI instance.

Troubleshooting: Avoiding Duplicate Rows

To detect duplication prior to loading data into HeavyDB, you can perform the following steps. For this example, the files are labeled A,B,C...Z.

Load file A into table MYTABLE.
Run the following query.
There should be no rows returned; if rows are returned, your first A file is not unique.

COPY FROM

Load file B into table TEMPTABLE.

COPY tweets FROM '/tmp/tweets.csv' WITH (nulls = 'NA'); 
COPY tweets FROM '/tmp/tweets.tsv' WITH (delimiter = '\t', quoted = 'false'); 
COPY tweets FROM '/tmp/*' WITH (header='false'); 
COPY trips FROM '/mnt/trip/trip.parquet/part-00000-0284f745-1595-4743-b5c4-3aa0262e4de3-c000.snappy.parquet' with (parquet='true');

COPY example_table
  FROM 'SELECT * FROM remote_postgres_table WHERE event_timestamp > ''2020-01-01'';'
  WITH 
    (source_type = 'odbc', 
     sql_order_by = 'event_timestamp',
     data_source_name = 'postgres_db_1',
     username = 'my_username',
     password = 'my_password');

COPY example_table
  FROM 'SELECT * FROM remote_postgres_table WHERE event_timestamp > ''2020-01-01'';'
  WITH 
    (source_type = 'odbc',
     sql_order_by = 'event_timestamp',
     connection_string = 'Driver=PostgreSQL;Database=my_postgres_db;Servername=my_postgres.example.com;Port=1234',
     credential_string = 'Username=my_username;Password=my_password');

java -cp [HEAVY.AI utility jar file]:[3rd party JDBC driver]
SQLImporter
-u <userid>; -p <password>; [(--binary|--http|--https [--insecure])]
-s <heavyai server host> -db <omnsci db> --port <heavyai server port>
[-d <other database JDBC drive class>] -c <other database JDBC connection string>
-su <other database user> -sp <other database user password> -ss <other database sql statement>
-t <HEAVY.AI target table> -b <transfer buffer size> -f <table fragment size>
[-tr] [-nprg] [-adtf] [-nlj] -i <init commands file>

-r                                     Row load limit 
-h,--help                              Help message
-r <arg>;                              Row load limit 
-h,--help                              Help message 
-u,--user <arg>;                       HEAVY.AI user 
-p,--passwd <arg>;                     HEAVY.AI password 
--binary                               Use binary transport to connect to HEAVY.AI 
--http                                 Use http transport to connect to HEAVY.AI 
--https                                Use https transport to connect to HEAVY.AI 
-s,--server <arg>;                     HEAVY.AI Server 
-db,--database <arg>;                  HEAVY.AI Database 
--port <arg>;                          HEAVY.AI Port 
--ca-trust-store <arg>;                CA certificate trust store 
--ca-trust-store-passwd <arg>;         CA certificate trust store password 
--insecure <arg>;                      Insecure TLS - Do not validate server HEAVY.AI 
                                       server certificates 
-d,--driver <arg>;                     JDBC driver class 
-c,--jdbcConnect <arg>;                JDBC connection string 
-su,--sourceUser <arg>;                Source user 
-sp,--sourcePasswd <arg>;              Source password 
-ss,--sqlStmt <arg>;                   SQL Select statement 
-t,--targetTable <arg>;                HEAVY.AI Target Table 
-b,--bufferSize <arg>;                 Transfer buffer size 
-f,--fragmentSize <arg>;               Table fragment size 
-tr,--truncate                         Truncate table if it exists 
-nprg,--noPolyRenderGroups             Disable render group assignment  
-adtf,--allowDoubleToFloat             Allow narrow casting
-nlj,--no-log-jdbc-connection-string   Omit JDBC connection string from logs   
-i,--initializeFile <arg>;             File containing init command for DB

java -cp /opt/heavyai/bin/heavyai-utility-<db-version}>.jar 
com.mapd.utility.SQLImporter -u admin -p HyperInteractive -db heavyai --port 6274 
-t mytable -su admin -sp HyperInteractive -c "jdbc:heavyai:myhost:6274:heavyai" 
-ss "select * from mytable limit 1000000000"

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/hive-jdbc-1.2.1000.2.6.1.0-129-standalone.jar
com.mapd.utility.SQLImporter
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password
-c "jdbc:hive2://server_address:port_number/database_name"
-ss "select * from source_table_name"

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:./GoogleBigQueryJDBC42.jar:
./google-oauth-client-1.22.0.jar:./google-http-client-jackson2-1.22.0.jar:./google-http-client-1.22.0.jar:./google-api-client-1.22.0.jar:
./google-api-services-bigquery-v2-rev355-1.22.0.jar 
com.mapd.utility.SQLImporter
-d com.simba.googlebigquery.jdbc42.Driver 
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password 
-c "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=project-id;OAuthType=0;
[email protected];OAuthPvtKeyPath=/home/simba/myproject.json;"
-ss "select * from schema.source_table_name"

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/tmp/postgresql-42.2.5.jar 
com.mapd.utility.SQLImporter 
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password 
-c "jdbc:postgresql://127.0.0.1/postgres"
-ss "select * from schema_name.source_table_name"

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/path/sqljdbc4.jar
com.mapd.utility.SQLImporter
-d com.microsoft.sqlserver.jdbc.SQLServerDriver 
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password 
-c "jdbc:sqlserver://server:port;DatabaseName=database_name"
-ss "select top 10 * from dbo.source_table_name"

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:mysql/mysql-connector-java-5.1.38-bin.jar
com.mapd.utility.SQLImporter 
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password 
-c "jdbc:mysql://server:port/database_name"
-ss "select * from schema_name.source_table_name"

<data stream> | StreamInsert <table name> <database name> \
{-u|--user} <user> {-p|--passwd} <password> [{--host} <hostname>] \
[--port <port number>][--delim <delimiter>][--null <null string>] \
[--line <line delimiter>][--batch <batch size>][{-t|--transform} \
transformation ...][--retry_count <num_of_retries>] \
[--retry_wait <wait in secs>][--print_error][--print_transform]

heavysql> COPY trips FROM 's3://heavyai-s3-no-access/trip_data_9.gz' with (s3_access_key='xxxxxxxxxx',s3_secret_key='yyyyyyyyy');
Exception: failed to list objects of s3 url 's3://heavyai-s3-no-access/trip_data_9.gz': AuthorizationHeaderMalformed: Unable to parse ExceptionName: AuthorizationHeaderMalformed Message: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-1'

heavysql> COPY trips FROM 's3://heavyai-testdata/trip.compressed/trip_data_9.csv' with (s3_access_key='xxxxxxxx',s3_secret_key='yyyyyyyy',s3_region='us-west-1');
Result
Loaded: 100 recs, Rejected: 0 recs in 0.361000 secs

heavysql> copy trips from 's3://heavyai-testdata/trip.compressed/' with (s3_access_key='xxxxxxxx',s3_secret_key='yyyyyyyy',s3_region='us-west-1');
Result
Loaded: 105200 recs, Rejected: 0 recs in 1.890000 secs

heavysql> \d trips
        CREATE TABLE trips (
        medallion TEXT ENCODING DICT(32),
        hack_license TEXT ENCODING DICT(32),
        vendor_id TEXT ENCODING DICT(32),
        rate_code_id SMALLINT,
        store_and_fwd_flag TEXT ENCODING DICT(32),
        pickup_datetime TIMESTAMP,
        dropoff_datetime TIMESTAMP,
        passenger_count SMALLINT,
        trip_time_in_secs INTEGER,
        trip_distance DECIMAL(14,2),
        pickup_longitude DECIMAL(14,2),
        pickup_latitude DECIMAL(14,2),
        dropoff_longitude DECIMAL(14,2),
        dropoff_latitude DECIMAL(14,2))
WITH (FRAGMENT_SIZE = 75000000);

KafkaImporter <table_name> <database_name> {-u|--user <user_name> \
{-p|--passwd <user_password>} [{--host} <hostname>] \
[--port <HeavyDB_port>] [--http] [--https] [--skip-verify] \
[--ca-cert <path>] [--delim <delimiter>] [--batch <batch_size>] \
[{-t|--transform} transformation ...] [retry_count <retry_number>] \
[--retry_wait <delay_in_seconds>] --null <null_value_string> [--quoted true|false] \
[--line <line_delimiter>] --brokers=<broker_name:broker_port> \ 
--group-id=<kafka_group_id> --topic=<topic_type> [--print_error] [--print_transform]

cat tweets.tsv | -./KafkaImporter tweets_small heavyai-u imauser-p imapassword--delim '\t'--batch 100000--retry_count 360--retry_wait 10--null null--port 9999--brokers=localhost:9092--group-id=testImport1--topic=tweet
cat tweets.tsv | ./KafkaImporter tweets_small heavyai
-u imauser
-p imapassword
--delim '\t'
--batch 100000
--retry_count 360
--retry_wait 10
--null null
--port 9999
--brokers=localhost:9092
--group-id=testImport1
--topic=tweet

StreamImporter <table_name> <database_name> {-u|--user <user_name> \
{-p|--passwd <user_password>} [{--host} <hostname>] [--port <HeavyDB_port>] \
[--http] [--https] [--skipverify] [--ca-cert <path>] [--delim <delimiter>] \
[--null <null string>] [--line <line delimiter>]  [--quoted <boolean>] \
 [--batch <batch_size>] [{-t|--transform} transformation ...] \
[retry_count <number_of_retries>] [--retry_wait <delay_in_seconds>]  \
[--print_error] [--print_transform]

sqoop-export --table iAmATable \
--export-dir /user/cloudera/ \
--connect "jdbc:heavyai:000.000.000.0:6274:heavyai" \
--driver com.heavyai.jdbc.HeavyaiDriver \
--username imauser \
--password imapassword \
--direct \
--batch

raster_y

v8.3.0

Overview

Overview

hashtagHeavyDB

Installation and Configuration

System Requirements

Software Requirements

Installation

Free Version

Installing on Docker

Installing on Ubuntu

Installing on Rocky Linux / RHEL

Upgrading

hashtagSupported Upgrade Path

Uninstalling

hashtagUninstalling HEAVY.AI from Docker

Ports

Services and Utilities

Configuration Parameters

Configuration Parameters for HeavyIQ

Security

Loading and Exporting Data

Supported Data Sources

Command Line

SQL

Data Definition (DDL)

Views

hashtagNomenclature Constraints

Policies

hashtagCREATE POLICY

Comment

hashtagCOMMENT

Data Manipulation (DML)

SQL Capabilities

ALTER SESSION SET

ALTER SYSTEM CLEAR

hashtagExamples

DELETE

hashtagCross-Database Queries

hashtagExample

KILL QUERY

LIKELY/UNLIKELY

UPDATE

hashtagExample

Logical Operators and Conditional and Subquery Expressions

hashtagLogical Operator Support

System Requirements

Installing on Rocky Linux / RHEL

Installing on Ubuntu

Command Line

Security

Configuration Parameters

Installing on Docker

Services and Utilities

Data Definition (DDL)

Data Manipulation (DML)

Configuration Parameters for HeavyIQ

Supported Data Sources

SQL Capabilities

ALTER SYSTEM CLEAR

hashtagExamples

Software Requirements

Installation

DELETE

hashtagCross-Database Queries

hashtagExample

Overview

hashtagHeavyDB

Free Version

Uninstalling

hashtagUninstalling HEAVY.AI from Docker

Ports

Upgrading

hashtagSupported Upgrade Path

Views

hashtagNomenclature Constraints

LIKELY/UNLIKELY

KILL QUERY

Comment

hashtagCOMMENT

HeavyDB

Supported Upgrade Path

Uninstalling HEAVY.AI from Docker

Nomenclature Constraints

CREATE POLICY

COMMENT

Examples

Cross-Database Queries

Example

Example

Logical Operator Support

Examples

Cross-Database Queries

Example

HeavyDB

Uninstalling HEAVY.AI from Docker

Supported Upgrade Path

Nomenclature Constraints

COMMENT

Logical Operator Support

Example

CREATE POLICY

Add Users

Conditional Expression Support

Subquery Expression Support

Usage Notes

Uninstalling HEAVY.AI on Redhat and Ubuntu

Example

DROP VIEW

Example

Examples

DROP POLICY

SHOW POLICIES

CURRENT_DATABASE

EXECUTOR_DEVICE

Native SQL

Geospatial Data

Open Source

HeavyRender

Geospatial Analysis

Visualize with Vega

Heavy Immerse

Dashboards

Charts

Use Multiple Sources

Streaming Data

Ready to Get Started?

Update Via Subquery

Cross-Database Queries

Example

Storage Directory

Configuring a Custom Heavy Immerse Subdirectory

Configuration File

Installing the Drivers

Updating systemd Files

initdb

generate_cert

Creating a Topic

Using the Producer

Using the Consumer

Prerequisites

Configure Your HEAVY.AI Instance

Examples

Default Values

Input Arguments

Output Columns

Prerequisites

Generating an Encryption Key