1 of 7

Data Definition (DDL)

Datatypes

Datatypes and Fixed Encoding

This topic describes standard datatypes and space-saving variations for values stored in HEAVY.AI.

Datatypes

Each HEAVY.AI datatype uses space in memory and on disk. For certain datatypes, you can use fixed encoding for a more compact representation of these values. You can set a default value for a column by using the DEFAULT constraint; for more information, see CREATE TABLE.

Datatypes, variations, and sizes are described in the following table.

[1] - In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see DATE ENCODING DAYS(16)).

[2] - See Storage and Compression below for information about geospatial datatype sizes.

HEAVY.AI does not support geometry arrays.
Timestamp values are always stored in 8 bytes. The greater the precision, the lower the fidelity.

Geospatial Datatypes

HEAVY.AI supports the LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON, POINT, and MULTIPOINT geospatial datatypes.

In the following example:

p0, p1, ls0, and poly0 are simple (planar) geometries.
p4 is point geometry with Web Mercator longitude/latitude coordinates.
p2, p3, mp, ls1, ls2, mls1, mls2, poly1, and mpoly0 are geometries using WGS84 SRID=4326 longitude/latitude coordinates.

CREATE TABLE geo ( name TEXT ENCODING DICT(32),
                   p0 POINT,
                   p1 GEOMETRY(POINT),
                   p2 GEOMETRY(POINT, 4326),
                   p3 GEOMETRY(POINT, 4326) ENCODING NONE,
                   p4 GEOMETRY(POINT, 900913),
                   mp GEOMETRY(MULTIPOINT, 4326),
                   ls0  LINESTRING,
                   ls1 GEOMETRY(LINESTRING, 4326) ENCODING COMPRESSED(32),
                   ls2 GEOMETRY(LINESTRING, 4326) ENCODING NONE,
                   mls1 GEOMETRY(MULTILINESTRING, 4326) ENCODING COMPRESSED(32),
                   mls2 GEOMETRY(MULTILINESTRING, 4326) ENCODING NONE,
                   poly0 POLYGON,
                   poly1 GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32),
                   mpoly0 GEOMETRY(MULTIPOLYGON, 4326)
                  );

Storage

Geometry storage requirements are largely dependent on coordinate data. Coordinates are normally stored as 8-byte doubles, two coordinates per point, for all points that form a geometry. Each POINT geometry in the p1 column, for example, requires 16 bytes.

Compression

WGS84 (SRID 4326) coordinates are compressed to 32 bits by default. This sacrifices some precision but reduces storage requirements by half.

For example, columns p2, mp, ls1, mls1, poly1, and mpoly0 in the table defined above are compressed. Each geometry in the p2 column requires 8 bytes, compared to 16 bytes for p0.

You can explicitly disable compression. WGS84 columns p3, ls2, mls2 are not compressed and continue using doubles. Simple (planar) columns p0, p1, ls0, poly1 and non-4326 column p4 are not compressed.

For more information about geospatial datatypes and functions, see Geospatial Capabilities.

Defining Arrays

Define datatype arrays by appending square brackets, as shown in the arrayexamples DDL sample.

CREATE TABLE arrayexamples (
  tiny_int_array TINYINT[],
  int_array INTEGER[],
  big_int_array BIGINT[],
  text_array TEXT[] ENCODING DICT(32), --OmniSci supports only DICT(32) TEXT arrays.
  float_array FLOAT[],
  double_array DOUBLE[],
  decimal_array DECIMAL(18,6)[],
  boolean_array BOOLEAN[],
  date_array DATE[],
  time_array TIME[],
  timestamp_array TIMESTAMP[])

You can also define fixed-length arrays. For example:

CREATE TABLE arrayexamples (
  float_array3 FLOAT[3],
  date_array4 DATE[4]

Fixed-length arrays require less storage space than variable-length arrays.

Fixed Encoding

To use fixed-length fields, the range of the data must fit into the constraints as described. Understanding your schema and the scope of potential values in each field helps you to apply fixed encoding types and save significant storage space.

These encodings are most effective on low-cardinality TEXT fields, where you can achieve large savings of storage space and improved processing speed, and on TIMESTAMP fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07. If a TEXT ENCODING field does not match the defined cardinality, HEAVY.AI substitutes a NULL value and logs the change.

For DATE types, you can use the terms FIXED and DAYS interchangeably. Both are synonymous for the DATE type in HEAVY.AI.

Some of the INTEGER options overlap. For example, INTEGER ENCODINGFIXED(8) and SMALLINT ENCODINGFIXED(8) are essentially identical.

Shared Dictionaries

You can improve performance of string operations and optimize storage using shared dictionaries. You can share dictionaries within a table or between different tables in the same database. The table with which you want to share dictionaries must exist when you create the table that references the TEXT ENCODING DICT field, and the column that you are referencing in that table must also exist. The following small DDL shows the basic structure:

CREATE TABLE text_shard (
i TEXT ENCODING DICT(32),
s TEXT ENCODING DICT(32),
SHARD KEY (i))
WITH (SHARD_COUNT = 2);

CREATE TABLE text_shard1 (
i TEXT,
s TEXT ENCODING DICT(32),
SHARD KEY (i),
SHARED DICTIONARY (i) REFERENCES text_shard(i))
WITH (SHARD_COUNT = 2);

In the table definition, make sure that referenced columns appear before the referencing columns.

For example, this DDL is a portion of the schema for the flights database. Because airports are both origin and destination locations, it makes sense to reuse the same dictionaries for name, city, state, and country values.

create table flights (
*
*
*
dest_name TEXT ENCODING DICT,
dest_city TEXT ENCODING DICT,
dest_state TEXT ENCODING DICT,
dest_country TEXT ENCODING DICT,

*
*
*
origin_name TEXT,
origin_city TEXT,
origin_state TEXT,
origin_country TEXT,
*
*
*

SHARED DICTIONARY (origin_name) REFERENCES flights(dest_name),
SHARED DICTIONARY (origin_city) REFERENCES flights(dest_city),
SHARED DICTIONARY (origin_state) REFERENCES flights(dest_state),
SHARED DICTIONARY (origin_country) REFERENCES flights(dest_country),
*
*
*
)
WITH(
*
*
*
)

To share a dictionary in a different existing table, replace the table name in the REFERENCES instruction. For example, if you have an existing table called us_geography, you can share the dictionary by following the pattern in the DDL fragment below.

create table flights (

*
*
*

SHARED DICTIONARY (origin_city) REFERENCES us_geography(city),
SHARED DICTIONARY (origin_state) REFERENCES us_geography(state),
SHARED DICTIONARY (origin_country) REFERENCES us_geography(country),
SHARED DICTIONARY (dest_city) REFERENCES us_geography(city),
SHARED DICTIONARY (dest_state) REFERENCES us_geography(state),
SHARED DICTIONARY (dest_country) REFERENCES us_geography(country),

*
*
*
)
WITH(
*
*
*
);

The referencing column cannot specify the encoding of the dictionary, because it uses the encoding from the referenced column.

Users and Databases

DDL - Users and Databases

HEAVY.AI has a default superuser named admin with default password HyperInteractive.

When you create or alter a user, you can grant superuser privileges by setting the is_super property.

You can also specify a default database when you create or alter a user by using the default_db property. During login, if a database is not specified, the server uses the default database assigned to that user. If no default database is assigned to the user and no database is specified during login, the heavyai database is used.

When an administrator, superuser, or owner drops or renames a database, all current active sessions for users logged in to that database are invalidated. The users must log in again.

Similarly, when an administrator or superuser drops or renames a user, all active sessions for that user are immediately invalidated.

If a password includes characters that are nonalphanumeric, it must be enclosed in single quotes when logging in to heavysql. For example: $HEAVYAI_PATH/bin/heavysql heavyai -u admin -p '77Heavy!9Ai'

For more information about users, roles, and privileges, see DDL - Roles and Privileges.

Nomenclature Constraints

The following are naming convention requirements for HEAVY.AI objects, described in regex notation:

A NAME is [A-Za-z_][A-Za-z0-9\$_]*
A DASHEDNAME is [A-Za-z_][A-Za-z0-9\$_\-]*
An EMAIL is ([^[:space:]\"]+|\".+\")@[A-Za-z0-9][A-Za-z0-9\-\.]*\.[A-Za-z]+

User objects can use NAME, DASHEDNAME, or EMAIL format.

Role objects must use either NAME or DASHEDNAME format.

Database and column objects must use NAME format.

CREATE USER

CREATE USER ["]<name>["] (<property> = value,...);

HEAVY.AI accepts (almost) any string enclosed in optional double quotation marks as the user name.

Examples:

CREATE USER jason (password = 'HeavyaiRocks!', is_super = 'true', default_db='tweets');
CREATE USER "pembroke.q.aloysius" (password= 'HeavyaiRolls!', default_db='heavyai');

DROP USER

DROP USER [IF EXISTS] ["]<name>["];

Example:

DROP USER [IF EXISTS] jason;
DROP USER "pemboke.q.aloysius";

ALTER USER

ALTER USER ["]<name>["] (<property> = value, ...);
ALTER USER ["]<oldUserName>["] RENAME TO ["]<newUserName>["];

HEAVY.AI accepts (almost) any string enclosed in optional double quotation marks as the old or new user name.

Example:

ALTER USER admin (password = 'HeavyaiIsFast!');
ALTER USER jason (is_super = 'false', password = 'SilkySmooth', default_db='traffic');
ALTER USER methuselah RENAME TO aurora;
ALTER USER "pembroke.q.aloysius" RENAME TO "pembroke.q.murgatroyd";
ALTER USER chumley (can_login='false');

CREATE DATABASE

CREATE DATABASE [IF NOT EXISTS] <name> (<property> = value, ...);

Database names cannot include quotes, spaces, or special characters.

In Release 6.3.0 and later, database names are case insensitive. Duplicate database names will cause a failure when attempting to start HeavyDB 6.3.0 or higher. Check database names and revise as necessary to avoid duplicate names.

Example:

CREATE DATABASE test (owner = 'jason');

DROP DATABASE

DROP DATABASE [IF EXISTS] ;

Example:

DROP DATABASE IF EXISTS test;

ALTER DATABASE

ALTER DATABASE <name> RENAME TO <name>;

To alter a database, you must be the owner of the database or an HeavyDB superuser.

Example:

ALTER DATABASE curmudgeonlyOldDatabase RENAME TO ingenuousNewDatabase;

ALTER DATABASE OWNER TO

Enable super users to change the owner of a database.

ALTER DATABASE <database name> OWNER TO <new_owner>;

Example

Change the owner of my_database to user Joe:

ALTER DATABASE my_database OWNER TO Joe;

Only superusers can run the ALTER DATABASE OWNER TO command.

REASSIGN OWNED

REASSIGN OWNED BY <old_owner>, <old_owner>, ... TO <new_owner>

Changes ownership of database objects (tables, views, dashboards, etc.) from a user or set of users in the current database to a different user.

Example: Reassign database objects owned by jason and mike to joe.

REASSIGN OWNED BY jason, mike TO joe;

Database object ownership changes only for the currently connected database; objects in other databases are not affected. Ownership of the database itself is not affected. You must be a superuser to run this command.

Database Security Example

See Example: Data Security in DDL - Roles and Privileges for a database security example.

Tables

DDL - Tables

These functions are used to create and modify data tables in HEAVY.AI.

Nomenclature Constraints

Table names must use the NAME format, described in regex notation as:

[A-Za-z_][A-Za-z0-9\$_]*

Table and column names can include quotes, spaces, and the underscore character. Other special characters are permitted if the name of the table or column is enclosed in double quotes (" ").

Spaces and special characters other than underscore (_) cannot be used in Heavy Immerse.
Column and table names enclosed in double quotes cannot be used in Heavy Immerse

CREATE TABLE

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] <table>
  (<column> <type> [NOT NULL] [DEFAULT <value>] [ENCODING <encodingSpec>],
  [SHARD KEY (<column>)],
  [SHARED DICTIONARY (<column>) REFERENCES <table>(<column>)], ...)
  [WITH (<property> = value, ...)];

Create a table named <table> specifying <columns> and table properties.

Supported Datatypes

* In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see DATE ENCODING FIXED(16)).

For more information, see Datatypes and Fixed Encoding.

For geospatial datatypes, see Geospatial Primitives.

Examples

Create a table named tweets and specify the columns, including type, in the table.

CREATE TABLE IF NOT EXISTS tweets (
   tweet_id BIGINT NOT NULL,
   tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
   lat FLOAT,
   lon FLOAT,
   sender_id BIGINT NOT NULL,
   sender_name TEXT NOT NULL ENCODING DICT,
   location TEXT ENCODING  DICT,
   source TEXT ENCODING DICT,
   reply_to_user_id BIGINT,
   reply_to_tweet_id BIGINT,
   lang TEXT ENCODING  DICT,
   followers INT,
   followees INT,
   tweet_count INT,
   join_time TIMESTAMP ENCODING  FIXED(32),
   tweet_text TEXT,
   state TEXT ENCODING  DICT,
   county TEXT ENCODING DICT,
   place_name TEXT,
   state_abbr TEXT ENCODING DICT,
   county_state TEXT ENCODING DICT,
   origin TEXT ENCODING DICT,
   phone_numbers bigint);

Create a table named delta and assign a default value San Francisco to column city.

CREATE TABLE delta (
   id INTEGER NOT NULL, 
   name TEXT NOT NULL, 
   city TEXT NOT NULL DEFAULT 'San Francisco' ENCODING DICT(16));

Default values currently have the following limitations:

Only literals can be used for column DEFAULT values; expressions are not supported.
You cannot define a DEFAULT value for a shard key. For example, the following does not parse: CREATE TABLE tbl (id INTEGER NOT NULL DEFAULT 0, name TEXT, shard key (id)) with (shard_count = 2);
For arrays, use the following syntax: ARRAY[A, B, C, …. N]
The syntax {A, B, C, ... N} is not supported.
Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with malformed literal as a default value, but when you try to insert a row with a default value, it will throw an error.

Supported Encoding

WITH Clause Properties

Sharding

Sharding partitions a database table across multiple servers so each server has a part of the table with the same columns but with different rows. Partitioning is based on a sharding key defined when you create the table.

Without sharding, the dimension tables involved in a join are replicated and sent to each GPU, which is not feasible for dimension tables with many rows. Specifying a shard key makes it possible for the query to execute efficiently on large dimension tables.

Currently, specifying a shard key is useful for joins, only:

If two tables specify a shard key with the same type and the same number of shards, a join on that key only sends a part of the dimension table column data to each GPU.
For multi-node installs, the dimension table does not need to be replicated and the join executes locally on each leaf.

Constraints

A shard key must specify a single column to shard on. There is no support for sharding by a combination of keys.
One shard key can be specified for a table.
Data are partitioned according to the shard key and the number of shards (shard_count).
A value in the column specified as a shard key is always sent to the same partition.
The number of shards should be equal to the number of GPUs in the cluster.
Sharding is allowed on the following column types:
- DATE
- INT
- TEXT ENCODING DICT
- TIME
- TIMESTAMP
Tables must share the dictionary for the column to be involved in sharded joins. If the dictionary is not specified as shared, the join does not take advantage of sharding. Dictionaries are reference-counted and only dropped when the last reference drops.

Recommendations

Set shard_count to the number of GPUs you eventually want to distribute the data table across.
Referenced tables must also be shard_count -aligned.
Sharding should be minimized because it can introduce load skew accross resources, compared to when sharding is not used.

Examples

Basic sharding:

CREATE TABLE  customers(
   accountId text,
   name text,
   SHARD KEY (accountId))
  WITH (shard_count = 4);

Sharding with shared dictionary:

CREATE TABLE transactions(
   accountId text,
   action text,
   SHARD KEY (accountId),
   SHARED DICTIONARY (accountId) REFERENCES customers(accountId))
  WITH (shard_count = 4);

Temporary Tables

Using the TEMPORARY argument creates a table that persists only while the server is live. They are useful for storing intermediate result sets that you access more than once.

Adding or dropping a column from a temporary table is not supported.

Example

CREATE TEMPORARY TABLE customers(
   accountId TEXT,
   name TEXT,
   timeCreated TIMESTAMP)

CREATE TABLE AS SELECT

CREATE TABLE [IF NOT EXISTS] <newTableName> AS (<SELECT statement>) [WITH (<property> = value, ...)];

Create a table with the specified columns, copying any data that meet SELECT statement criteria.

WITH Clause Properties

Examples

Create the table newTable. Populate the table with all information from the table oldTable, effectively creating a duplicate of the original table.

CREATE TABLE newTable AS (SELECT * FROM oldTable);

Create a table named trousers. Populate it with data from the columns name, waist, and inseam from the table wardrobe.

CREATE TABLE trousers AS (SELECT name, waist, inseam FROM wardrobe);

Create a table named cosmos. Populate it with data from the columns star and planet from the table universe where planet has the class M.

CREATE TABLE IF NOT EXISTS cosmos AS (SELECT star, planet FROM universe WHERE class='M');

ALTER TABLE

ALTER TABLE <table> RENAME TO <table>;
ALTER TABLE <table> RENAME COLUMN <column> TO <column>;
ALTER TABLE <table> ADD [COLUMN] <column> <type> [NOT NULL] [ENCODING <encodingSpec>];
ALTER TABLE <table> ADD (<column> <type> [NOT NULL] [ENCODING <encodingSpec>], ...);
ALTER TABLE <table> ADD (<column> <type> DEFAULT <value>);
ALTER TABLE <table> DROP COLUMN <column_1>[, <column_2>, ...];
ALTER TABLE <table> SET MAX_ROLLBACK_EPOCHS=<value>

Examples

Rename the table tweets to retweets.

ALTER TABLE tweets RENAME TO retweets;

Rename the column source to device in the table retweets.

ALTER TABLE retweets RENAME COLUMN source TO device;

Add the column pt_dropoff to table tweets with a default value point(0,0).

ALTER TABLE tweets ADD COLUMN pt_dropoff POINT DEFAULT 'point(0 0)';

Add multiple columns a, b, and c to table table_one with a default value of 15 for column b.

ALTER TABLE table_one ADD a INTEGER, b INTEGER NOT NULL DEFAULT 15, c TEXT;

Default values currently have the following limitations:

Only literals can be used for column DEFAULT values; expressions are not supported.
For arrays, use the following syntax: ARRAY[A, B, C, …. N]. The syntax {A, B, C, ... N} is not supported.
Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with a malformed literal as a default value, but when you try to insert a row with a default value, it throws an error.

Add the column lang to the table tweets using a TEXT ENCODING DICTIONARY.

ALTER TABLE tweets ADD COLUMN lang TEXT ENCODING DICT;

Add the columns lang and encode to the table tweets using a TEXT ENCODING DICTIONARY for each.

ALTER TABLE tweets ADD (lang TEXT ENCODING DICT, encode TEXT ENCODING DICT);

Drop the column pt_dropoff from table tweets.

ALTER TABLE tweets DROP COLUMN pt_dropoff;

Limit on-disk data growth by setting the number of allowed epoch rollbacks to 50:

ALTER TABLE test_table SET MAX_ROLLBACK_EPOCHS=50;

You cannot add a dictionary-encoded string column with a shared dictionary when using ALTER TABLE ADD COLUMN.
Currently, HEAVY.AI does not support adding a geo column type (POINT, LINESTRING, POLYGON, or MULTIPOLYGON) to a table.
HEAVY.AI supports ALTER TABLE RENAME TABLE and ALTER TABLE RENAME COLUMN for temporary tables. HEAVY.AI does not support ALTER TABLE ADD COLUMN to modify a temporary table.

DROP TABLE

DROP TABLE [IF EXISTS] <table>;

Deletes the table structure, all data from the table, and any dictionary content unless it is a shared dictionary. (See the Note regarding disk space reclamation.)

Example

DROP TABLE IF EXISTS tweets;

DUMP TABLE

DUMP TABLE <table> TO '<filepath>' [WITH (COMPRESSION='<compression_program>')];

Archives data and dictionary files of the table <table> to file <filepath>.

Valid values for <compression_program> include:

gzip (default)
pigz
lz4
none

If you do not choose a compression option, the system uses gzip if it is available. If gzip is not installed, the file is not compressed.

The file path must be enclosed in single quotes.

Dumping a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being dumped.
The DUMP command is not supported on distributed configurations.
You must have a least GRANT CREATE ON DATABASE privilege level to use the DUMP command.

Example

DUMP TABLE tweets TO '/opt/archive/tweetsBackup.gz' WITH (COMPRESSION='gzip');

RENAME TABLE

RENAME TABLE <table> TO <table>[, <table> TO <table>, <table> TO <table>...];

Rename a table or multiple tables at once.

Examples

Rename a single table:

RENAME TABLE table_A TO table_B;

Swap table names:

RENAME TABLE table_A TO table_B, table_B TO table_A;

RENAME TABLE table_A TO table_B, table_B TO table_C, table_C TO table_A;

Swap table names multiple times:

RENAME TABLE table_A TO table_A_stale, table_B TO table_B_stale, table_A_new TO table_A, table_B_new TO table_B;

RESTORE TABLE

RESTORE TABLE <table> FROM '<filepath>' [WITH (COMPRESSION='<compression_program>')];

Restores data and dictionary files of table <table> from the file at <filepath>. If you specified a compression program when you used the DUMP TABLE command, you must specify the same compression method during RESTORE.

Restoring a table decompresses and then reimports the table. You must have enough disk space for both the new table and the archived table, as well as enough scratch space to decompress the archive and reimport it.

The file path must be enclosed in single quotes.

You can also restore a table from archives stored in S3-compatible endpoints:

RESTORE TABLE <table> FROM '<S3_file_URL>' 
  WITH (compression = '<compression_program>', 
        s3_region = '<region>', 
        s3_access_key = '<access_key>', 
        s3_secret_key = '<secret_key>', 
        s3_session_token = '<session_token>');

s3_region is required. All features discussed in the S3 import documentation, such as custom S3 endpoints and server privileges, are supported.

Restoring a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being restored.
The RESTORE command is not supported on distributed configurations.
You must have a least GRANT CREATE ON DATABASE privilege level to use the RESTORE command.

Do not attempt to use RESTORE TABLE with a table dump created using a release of HEAVY.AI that is higher than the release running on the server where you will restore the table.

Examples

Restore table tweets from /opt/archive/tweetsBackup.gz:

RESTORE TABLE tweets FROM '/opt/archive/tweetsBackup.gz' 
   WITH (COMPRESSION='gzip');

Restore table tweets from a public S3 file or using server privileges (with the allow-s3-server-privileges server flag enabled):

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz'
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1');

Restore table tweets from a private S3 file using AWS access keys:

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 2 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy');

Restore table tweets from a private S3 file using temporary AWS access keys/session token:

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy',
      s3_session_token = 'zzzzzzzz');

Restore table tweets from an S3-compatible endpoint:

RESTORE TABLE tweets FROM 's3://my-gcp-bucket/archive/tweetsBackup.gz' 2 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_endpoint = 'storage.googleapis.com');

TRUNCATE TABLE

TRUNCATE TABLE <table>;

Use the TRUNCATE TABLE statement to remove all rows from a table without deleting the table structure.

This releases table on-disk and memory storage and removes dictionary content unless it is a shared dictionary. (See the note regarding disk space reclamation.)

Removing rows is more efficient than using DROP TABLE. Dropping followed by recreating the table invalidates dependent objects of the table requiring you to regrant object privileges. Truncating has none of these effects.

Example

TRUNCATE TABLE tweets;

When you DROP or TRUNCATE, the command returns almost immediately. The directories to be purged are marked with the suffix \_DELETE_ME_. The files are automatically removed asynchronously.

In practical terms, this means that you will not see a reduction in disk usage until the automatic task runs, which might not start for up to five minutes.

You might also see directory names appended with \_DELETE_ME_. You can ignore these, with the expectation that they will be deleted automatically over time.

OPTIMIZE TABLE

OPTIMIZE TABLE [<table>] [WITH (VACUUM='true')]

Use this statement to remove rows from storage that have been marked as deleted via DELETE statements.

When run without the vacuum option, the column-level metadata is recomputed for each column in the specified table. HeavyDB makes heavy use of metadata to optimize query plans, so optimizing table metadata can increase query performance after metadata widening operations such as updates or deletes. If the configuration parameter enable-auto-metadata-update is not set, HeavyDB does not narrow metadata during an update or delete — metadata is only widened to cover a new range.

When run with the vacuum option, it removes any rows marked "deleted" from the data stored on disk. Vacuum is a checkpointing operation, so new copies of any vacuum records are deleted. Using OPTIMIZE with the VACUUM option compacts pages and deletes unused data files that have not been repopulated.

Beginning with Release 5.6.0, OPTIMIZE should be used infrequently, because UPDATE, DELETE, and IMPORT queries manage space more effectively.

VALIDATE

VALIDATE

Performs checks for negative and inconsistent epochs across table shards for single-node configurations.

If VALIDATE detects epoch-related issues, it returns a report similar to the following:

heavysql> validate;
Result

Negative epoch value found for table "my_table". Epoch: -1.
Epoch values for table "my_table_2" are inconsistent:
Table Id  Epoch     
========= ========= 
4         1         
5         2

If no issues are detected, it reports as follows:

Instance OK

VALIDATE CLUSTER

VALIDATE CLUSTER [WITH (REPAIR_TYPE = ['NONE' | 'REMOVE'])];

Perform checks and report discovered issues on a running HEAVY.AI cluster. Compare metadata between the aggregator and leaves to verify that the logical components between the processes are identical.

VALIDATE CLUSTER also detects and reports issues related to table epochs. It reports when epochs are negative or when table epochs across leaf nodes or shards are inconsistent.

Examples

If VALIDATE CLUSTER detects issues, it returns a report similar to the following:

mapd@thing3 ~]$ /mnt/gluster/dist_mapd/mapd-sw2/bin/mapdql -p HyperInteractive
User admin connected to database heavyai
heavysql> validate cluster;
Result
 Node          Table Count 
 ===========   =========== 
 Aggregator     1116
 Leaf 0         1114
 Leaf 1         1114
No matching table on Leaf 0 for Table cities_dtl_POINTS table id 56
No matching table on Leaf 1 for Table cities_dtl_POINTS table id 56
No matching table on Leaf 0 for Table cities_dtl table id 80
No matching table on Leaf 1 for Table cities_dtl table id 80
Table details don't match on Leaf 0 for Table view_geo table id 95
Table details don't match on Leaf 1 for Table view_geo table id 95

If no issues are detected, it will report as follows:

Cluster OK

You can include the WITH(REPAIR_TYPE) argument. (REPAIR_TYPE='NONE') is the same as running the command with no argument. (REPAIR_TYPE='REMOVE') removes any leaf objects that have issues. For example:

VALIDATE CLUSTER WITH (REPAIR_TYPE = 'REMOVE');

Epoch Issue Example

This example output from the VALIDATE CLUSTER command on a distributed setup shows epoch-related issues:

heavysql> validate cluster;
Result

Negative epoch value found for table "my_table". Epoch: -16777216.
Epoch values for table "my_table_2" are inconsistent:
Node      Table Id  Epoch     
========= ========= ========= 
Leaf 0    4         1         
Leaf 1    4         2

System Tables

HeavyDB system tables provide a way to access information about database objects, database object permissions, and system resource (storage, CPU, and GPU memory) utilization. These system tables can be found in the information_schema database that is available by default on server startup. You can query system tables in the same way as regular tables, and you can use the SHOW CREATE TABLE command to view the table schemas.

Users

The users system table provides information about all database users and contains the following columns:

Databases

The databases system table provides information about all created databases on the server and contains the following columns:

Permissions

The permissions system table provides information about all user/role permissions for all database objects and contains the following columns:

Roles

The roles system table lists all created database roles and contains the following columns:

Tables

The tables system table provides information about all database tables and contains the following columns:

Dashboards

The dashboards system table provides information about created dashboards (enterprise edition only) and contains the following columns:

Role Assignments

The role_assignments system table provides information about database roles that have been assigned to users and contains the following columns:

Memory Summary

The memory_summary system table provides high level information about utilized memory across CPU and GPU devices and contains the following columns:

Memory Details

The memory_details system table provides detailed information about allocated memory segments across CPU and GPU devices and contains the following columns:

Storage Details

The storage_details system table provides detailed information about utilized storage per table and contains the following columns:

Log-Based System Tables

Log-based system tables are considered beta functionality in Release 6.1.0 and are disabled by default.

Request Logs

The request_logs system table provides information about HeavyDB Thrift API requests and contains the following columns:

Server Logs

The server_logs system table provides HeavyDB server logs in tabular form and contains the following columns:

Web Server Logs

The web_server_logs system table provides HEAVY.AI Web Server logs in tabular form and contains the following columns (Enterprise Edition only):

Web Server Access Logs

The web_server_access_logs system table provides information about requests made to the HEAVY.AI Web Server. The table contains the following columns:

Refreshing Logs System Tables

The logs system tables must be refreshed manually to view new log entries. You can run the REFRESH FOREIGN TABLES SQL command (for example, REFRESH FOREIGN TABLES server_logs, request_logs; ), or click the Refresh Data Now button on the table’s Data Manager page in Heavy Immerse.

Request Logs and Monitoring System Dashboard

The Request Logs and Monitoring system dashboard is built on the log-based system tables and provides visualization of request counts, performance, and errors over time, along with the server logs.

System Dashboards

Preconfigured system dashboards are built on various system tables. Specifically, two dashboards named System Resources and User Roles and Permissions are available by default. The Request Logs and Monitoring system dashboard is considered beta functionality and disabled by default. These dashboards can be found in the information_schema database, along with the system tables that they use.

Access to system dashboards is controlled using Heavy Immerse privileges; only users with Admin privileges or users/roles with access to the information_schema database can access the system dashboards.

Cross-linking must be enabled to allow cross-filtering across charts that use different system tables. Enable cross-linking by adding "ui/enable_crosslink_panel": true to the feature_flags section of the servers.json file.

Views

DDL - Views

A view is a virtual table based on the result set of a SQL statement. It derives its fields from a SELECT statement. You can do anything with a HEAVY.AI view query that you can do in a non-view HEAVY.AI query.

Nomenclature Constraints

View object names must use the NAME format, described in regex notation as:

[A-Za-z_][A-Za-z0-9\$_]*

CREATE VIEW

Creates a view based on a SQL statement.

Example

CREATE VIEW view_movies
AS SELECT movies.movieId, movies.title, movies.genres, avg(ratings.rating)
FROM ratings
JOIN movies on ratings.movieId=movies.movieId
GROUP BY movies.title, movies.movieId, movies.genres;

You can describe the view as you would a table.

\d view_movies
VIEW defined AS: SELECT  movies.movieId, movies.title, movies.genres,
avg(ratings.rating) FROM ratings JOIN movies ON ratings.movieId=movies.movieId
GROUP BY movies.title, movies.movieId, movies.genres
Column types:
    movieId INTEGER,
    title TEXT ENCODING DICT(32),
    genres TEXT ENCODING DICT(32),
    EXPR$3 DOUBLE

You can query the view as you would a table.

SELECT title, EXPR$3 from view_movies where movieId=260;
Star Wars: Episode IV - A New Hope (1977)|4.048937

DROP VIEW

Removes a view created by the CREATE VIEW statement. The view definition is removed from the database schema, but no actual data in the underlying base tables is modified.

Example

DROP VIEW IF EXISTS v_reviews;

Policies

You can use policies to provide row-level security (RLS) in HEAVY.AI.

CREATE POLICY

CREATE POLICY ON COLUMN table.column TO <name> VALUES ('string', 123, ...);

Create an RLS policy for a user or role (<name>); admin rights are required. All queries on the table for the user or role are automatically filtered to include only rows where the column contains any one of the values from the VALUES clause.

RLS filtering works similarly to a WHERE column = value clause, appended to every query or subquery on the table, would work. If policies on multiple columns in the same table are defined for a user or role, then a row is visible to that user or role if any one or more of the policies matches that row.

DROP POLICY

DROP POLICY ON COLUMN table.column FROM <name>;

Drop an RLS policy for a user or role (<name>); admin rights are required. All values specified for the column by the policy are dropped. Effective values from another policy on an inherited role are not dropped.

SHOW POLICIES

SHOW [EFFECTIVE] POLICIES <name>;

Displays a list of all RLS policies that exist for a user or role. If EFFECTIVE is used, the list also include any policies that exist for all roles that apply to the requested user or role.

Tables

DDL - Tables

These functions are used to create and modify data tables in HEAVY.AI.

Nomenclature Constraints

Table names must use the NAME format, described in regex notation as:

[A-Za-z_][A-Za-z0-9\$_]*

Table and column names can include quotes, spaces, and the underscore character. Other special characters are permitted if the name of the table or column is enclosed in double quotes (" ").

Spaces and special characters other than underscore (_) cannot be used in Heavy Immerse.
Column and table names enclosed in double quotes cannot be used in Heavy Immerse

CREATE TABLE

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] <table>
  (<column> <type> [NOT NULL] [DEFAULT <value>] [ENCODING <encodingSpec>],
  [SHARD KEY (<column>)],
  [SHARED DICTIONARY (<column>) REFERENCES <table>(<column>)], ...)
  [WITH (<property> = value, ...)];

Create a table named <table> specifying <columns> and table properties.

Supported Datatypes

* In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see DATE ENCODING FIXED(16)).

For more information, see Datatypes and Fixed Encoding.

For geospatial datatypes, see Geospatial Primitives.

Examples

Create a table named tweets and specify the columns, including type, in the table.

CREATE TABLE IF NOT EXISTS tweets (
   tweet_id BIGINT NOT NULL,
   tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
   lat FLOAT,
   lon FLOAT,
   sender_id BIGINT NOT NULL,
   sender_name TEXT NOT NULL ENCODING DICT,
   location TEXT ENCODING  DICT,
   source TEXT ENCODING DICT,
   reply_to_user_id BIGINT,
   reply_to_tweet_id BIGINT,
   lang TEXT ENCODING  DICT,
   followers INT,
   followees INT,
   tweet_count INT,
   join_time TIMESTAMP ENCODING  FIXED(32),
   tweet_text TEXT,
   state TEXT ENCODING  DICT,
   county TEXT ENCODING DICT,
   place_name TEXT,
   state_abbr TEXT ENCODING DICT,
   county_state TEXT ENCODING DICT,
   origin TEXT ENCODING DICT,
   phone_numbers bigint);

Create a table named delta and assign a default value San Francisco to column city.

CREATE TABLE delta (
   id INTEGER NOT NULL, 
   name TEXT NOT NULL, 
   city TEXT NOT NULL DEFAULT 'San Francisco' ENCODING DICT(16));

Default values currently have the following limitations:

Only literals can be used for column DEFAULT values; expressions are not supported.
You cannot define a DEFAULT value for a shard key. For example, the following does not parse: CREATE TABLE tbl (id INTEGER NOT NULL DEFAULT 0, name TEXT, shard key (id)) with (shard_count = 2);
For arrays, use the following syntax: ARRAY[A, B, C, …. N]
The syntax {A, B, C, ... N} is not supported.
Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with malformed literal as a default value, but when you try to insert a row with a default value, it will throw an error.

Supported Encoding

WITH Clause Properties

Sharding

Currently, specifying a shard key is useful for joins, only:

If two tables specify a shard key with the same type and the same number of shards, a join on that key only sends a part of the dimension table column data to each GPU.
For multi-node installs, the dimension table does not need to be replicated and the join executes locally on each leaf.

Constraints

A shard key must specify a single column to shard on. There is no support for sharding by a combination of keys.
One shard key can be specified for a table.
Data are partitioned according to the shard key and the number of shards (shard_count).
A value in the column specified as a shard key is always sent to the same partition.
The number of shards should be equal to the number of GPUs in the cluster.
Sharding is allowed on the following column types:
- DATE
- INT
- TEXT ENCODING DICT
- TIME
- TIMESTAMP
Tables must share the dictionary for the column to be involved in sharded joins. If the dictionary is not specified as shared, the join does not take advantage of sharding. Dictionaries are reference-counted and only dropped when the last reference drops.

Recommendations

Set shard_count to the number of GPUs you eventually want to distribute the data table across.
Referenced tables must also be shard_count -aligned.
Sharding should be minimized because it can introduce load skew accross resources, compared to when sharding is not used.

Examples

Basic sharding:

CREATE TABLE  customers(
   accountId text,
   name text,
   SHARD KEY (accountId))
  WITH (shard_count = 4);

Sharding with shared dictionary:

CREATE TABLE transactions(
   accountId text,
   action text,
   SHARD KEY (accountId),
   SHARED DICTIONARY (accountId) REFERENCES customers(accountId))
  WITH (shard_count = 4);

Temporary Tables

Using the TEMPORARY argument creates a table that persists only while the server is live. They are useful for storing intermediate result sets that you access more than once.

Adding or dropping a column from a temporary table is not supported.

Example

CREATE TEMPORARY TABLE customers(
   accountId TEXT,
   name TEXT,
   timeCreated TIMESTAMP)

CREATE TABLE AS SELECT

CREATE TABLE [IF NOT EXISTS] <newTableName> AS (<SELECT statement>) [WITH (<property> = value, ...)];

Create a table with the specified columns, copying any data that meet SELECT statement criteria.

WITH Clause Properties

Examples

Create the table newTable. Populate the table with all information from the table oldTable, effectively creating a duplicate of the original table.

CREATE TABLE newTable AS (SELECT * FROM oldTable);

Create a table named trousers. Populate it with data from the columns name, waist, and inseam from the table wardrobe.

CREATE TABLE trousers AS (SELECT name, waist, inseam FROM wardrobe);

Create a table named cosmos. Populate it with data from the columns star and planet from the table universe where planet has the class M.

CREATE TABLE IF NOT EXISTS cosmos AS (SELECT star, planet FROM universe WHERE class='M');

ALTER TABLE

ALTER TABLE <table> RENAME TO <table>;
ALTER TABLE <table> RENAME COLUMN <column> TO <column>;
ALTER TABLE <table> ADD [COLUMN] <column> <type> [NOT NULL] [ENCODING <encodingSpec>];
ALTER TABLE <table> ADD (<column> <type> [NOT NULL] [ENCODING <encodingSpec>], ...);
ALTER TABLE <table> ADD (<column> <type> DEFAULT <value>);
ALTER TABLE <table> DROP COLUMN <column_1>[, <column_2>, ...];
ALTER TABLE <table> SET MAX_ROLLBACK_EPOCHS=<value>

Examples

Rename the table tweets to retweets.

ALTER TABLE tweets RENAME TO retweets;

Rename the column source to device in the table retweets.

ALTER TABLE retweets RENAME COLUMN source TO device;

Add the column pt_dropoff to table tweets with a default value point(0,0).

ALTER TABLE tweets ADD COLUMN pt_dropoff POINT DEFAULT 'point(0 0)';

Add multiple columns a, b, and c to table table_one with a default value of 15 for column b.

ALTER TABLE table_one ADD a INTEGER, b INTEGER NOT NULL DEFAULT 15, c TEXT;

Default values currently have the following limitations:

Only literals can be used for column DEFAULT values; expressions are not supported.
For arrays, use the following syntax: ARRAY[A, B, C, …. N]. The syntax {A, B, C, ... N} is not supported.
Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with a malformed literal as a default value, but when you try to insert a row with a default value, it throws an error.

Add the column lang to the table tweets using a TEXT ENCODING DICTIONARY.

ALTER TABLE tweets ADD COLUMN lang TEXT ENCODING DICT;

Add the columns lang and encode to the table tweets using a TEXT ENCODING DICTIONARY for each.

ALTER TABLE tweets ADD (lang TEXT ENCODING DICT, encode TEXT ENCODING DICT);

Drop the column pt_dropoff from table tweets.

ALTER TABLE tweets DROP COLUMN pt_dropoff;

Limit on-disk data growth by setting the number of allowed epoch rollbacks to 50:

ALTER TABLE test_table SET MAX_ROLLBACK_EPOCHS=50;

You cannot add a dictionary-encoded string column with a shared dictionary when using ALTER TABLE ADD COLUMN.
Currently, HEAVY.AI does not support adding a geo column type (POINT, LINESTRING, POLYGON, or MULTIPOLYGON) to a table.
HEAVY.AI supports ALTER TABLE RENAME TABLE and ALTER TABLE RENAME COLUMN for temporary tables. HEAVY.AI does not support ALTER TABLE ADD COLUMN to modify a temporary table.

DROP TABLE

DROP TABLE [IF EXISTS] <table>;

Deletes the table structure, all data from the table, and any dictionary content unless it is a shared dictionary. (See the Note regarding disk space reclamation.)

Example

DROP TABLE IF EXISTS tweets;

DUMP TABLE

DUMP TABLE <table> TO '<filepath>' [WITH (COMPRESSION='<compression_program>')];

Archives data and dictionary files of the table <table> to file <filepath>.

Valid values for <compression_program> include:

gzip (default)
pigz
lz4
none

If you do not choose a compression option, the system uses gzip if it is available. If gzip is not installed, the file is not compressed.

The file path must be enclosed in single quotes.

Dumping a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being dumped.
The DUMP command is not supported on distributed configurations.
You must have a least GRANT CREATE ON DATABASE privilege level to use the DUMP command.

Example

DUMP TABLE tweets TO '/opt/archive/tweetsBackup.gz' WITH (COMPRESSION='gzip');

RENAME TABLE

RENAME TABLE <table> TO <table>[, <table> TO <table>, <table> TO <table>...];

Rename a table or multiple tables at once.

Examples

Rename a single table:

RENAME TABLE table_A TO table_B;

Swap table names:

RENAME TABLE table_A TO table_B, table_B TO table_A;

RENAME TABLE table_A TO table_B, table_B TO table_C, table_C TO table_A;

Swap table names multiple times:

RENAME TABLE table_A TO table_A_stale, table_B TO table_B_stale, table_A_new TO table_A, table_B_new TO table_B;

RESTORE TABLE

RESTORE TABLE <table> FROM '<filepath>' [WITH (COMPRESSION='<compression_program>')];

The file path must be enclosed in single quotes.

You can also restore a table from archives stored in S3-compatible endpoints:

RESTORE TABLE <table> FROM '<S3_file_URL>' 
  WITH (compression = '<compression_program>', 
        s3_region = '<region>', 
        s3_access_key = '<access_key>', 
        s3_secret_key = '<secret_key>', 
        s3_session_token = '<session_token>');

s3_region is required. All features discussed in the S3 import documentation, such as custom S3 endpoints and server privileges, are supported.

Restoring a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being restored.
The RESTORE command is not supported on distributed configurations.
You must have a least GRANT CREATE ON DATABASE privilege level to use the RESTORE command.

Do not attempt to use RESTORE TABLE with a table dump created using a release of HEAVY.AI that is higher than the release running on the server where you will restore the table.

Examples

Restore table tweets from /opt/archive/tweetsBackup.gz:

RESTORE TABLE tweets FROM '/opt/archive/tweetsBackup.gz' 
   WITH (COMPRESSION='gzip');

Restore table tweets from a public S3 file or using server privileges (with the allow-s3-server-privileges server flag enabled):

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz'
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1');

Restore table tweets from a private S3 file using AWS access keys:

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 2 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy');

Restore table tweets from a private S3 file using temporary AWS access keys/session token:

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy',
      s3_session_token = 'zzzzzzzz');

Restore table tweets from an S3-compatible endpoint:

RESTORE TABLE tweets FROM 's3://my-gcp-bucket/archive/tweetsBackup.gz' 2 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_endpoint = 'storage.googleapis.com');

TRUNCATE TABLE

TRUNCATE TABLE <table>;

Use the TRUNCATE TABLE statement to remove all rows from a table without deleting the table structure.

This releases table on-disk and memory storage and removes dictionary content unless it is a shared dictionary. (See the note regarding disk space reclamation.)

Example

TRUNCATE TABLE tweets;

When you DROP or TRUNCATE, the command returns almost immediately. The directories to be purged are marked with the suffix \_DELETE_ME_. The files are automatically removed asynchronously.

In practical terms, this means that you will not see a reduction in disk usage until the automatic task runs, which might not start for up to five minutes.

You might also see directory names appended with \_DELETE_ME_. You can ignore these, with the expectation that they will be deleted automatically over time.

OPTIMIZE TABLE

OPTIMIZE TABLE [<table>] [WITH (VACUUM='true')]

Use this statement to remove rows from storage that have been marked as deleted via DELETE statements.

Beginning with Release 5.6.0, OPTIMIZE should be used infrequently, because UPDATE, DELETE, and IMPORT queries manage space more effectively.

VALIDATE

VALIDATE

Performs checks for negative and inconsistent epochs across table shards for single-node configurations.

If VALIDATE detects epoch-related issues, it returns a report similar to the following:

heavysql> validate;
Result

Negative epoch value found for table "my_table". Epoch: -1.
Epoch values for table "my_table_2" are inconsistent:
Table Id  Epoch     
========= ========= 
4         1         
5         2

If no issues are detected, it reports as follows:

Instance OK

VALIDATE CLUSTER

VALIDATE CLUSTER [WITH (REPAIR_TYPE = ['NONE' | 'REMOVE'])];

VALIDATE CLUSTER also detects and reports issues related to table epochs. It reports when epochs are negative or when table epochs across leaf nodes or shards are inconsistent.

Examples

If VALIDATE CLUSTER detects issues, it returns a report similar to the following:

mapd@thing3 ~]$ /mnt/gluster/dist_mapd/mapd-sw2/bin/mapdql -p HyperInteractive
User admin connected to database heavyai
heavysql> validate cluster;
Result
 Node          Table Count 
 ===========   =========== 
 Aggregator     1116
 Leaf 0         1114
 Leaf 1         1114
No matching table on Leaf 0 for Table cities_dtl_POINTS table id 56
No matching table on Leaf 1 for Table cities_dtl_POINTS table id 56
No matching table on Leaf 0 for Table cities_dtl table id 80
No matching table on Leaf 1 for Table cities_dtl table id 80
Table details don't match on Leaf 0 for Table view_geo table id 95
Table details don't match on Leaf 1 for Table view_geo table id 95

If no issues are detected, it will report as follows:

Cluster OK

VALIDATE CLUSTER WITH (REPAIR_TYPE = 'REMOVE');

Epoch Issue Example

This example output from the VALIDATE CLUSTER command on a distributed setup shows epoch-related issues:

heavysql> validate cluster;
Result

Negative epoch value found for table "my_table". Epoch: -16777216.
Epoch values for table "my_table_2" are inconsistent:
Node      Table Id  Epoch     
========= ========= ========= 
Leaf 0    4         1         
Leaf 1    4         2

Datatypes

Datatypes and Fixed Encoding

This topic describes standard datatypes and space-saving variations for values stored in HEAVY.AI.

Datatypes

Datatypes, variations, and sizes are described in the following table.

[2] - See Storage and Compression below for information about geospatial datatype sizes.

HEAVY.AI does not support geometry arrays.
Timestamp values are always stored in 8 bytes. The greater the precision, the lower the fidelity.

Geospatial Datatypes

HEAVY.AI supports the LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON, POINT, and MULTIPOINT geospatial datatypes.

In the following example:

p0, p1, ls0, and poly0 are simple (planar) geometries.
p4 is point geometry with Web Mercator longitude/latitude coordinates.
p2, p3, mp, ls1, ls2, mls1, mls2, poly1, and mpoly0 are geometries using WGS84 SRID=4326 longitude/latitude coordinates.

CREATE TABLE geo ( name TEXT ENCODING DICT(32),
                   p0 POINT,
                   p1 GEOMETRY(POINT),
                   p2 GEOMETRY(POINT, 4326),
                   p3 GEOMETRY(POINT, 4326) ENCODING NONE,
                   p4 GEOMETRY(POINT, 900913),
                   mp GEOMETRY(MULTIPOINT, 4326),
                   ls0  LINESTRING,
                   ls1 GEOMETRY(LINESTRING, 4326) ENCODING COMPRESSED(32),
                   ls2 GEOMETRY(LINESTRING, 4326) ENCODING NONE,
                   mls1 GEOMETRY(MULTILINESTRING, 4326) ENCODING COMPRESSED(32),
                   mls2 GEOMETRY(MULTILINESTRING, 4326) ENCODING NONE,
                   poly0 POLYGON,
                   poly1 GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32),
                   mpoly0 GEOMETRY(MULTIPOLYGON, 4326)
                  );

Storage

Compression

WGS84 (SRID 4326) coordinates are compressed to 32 bits by default. This sacrifices some precision but reduces storage requirements by half.

For example, columns p2, mp, ls1, mls1, poly1, and mpoly0 in the table defined above are compressed. Each geometry in the p2 column requires 8 bytes, compared to 16 bytes for p0.

For more information about geospatial datatypes and functions, see Geospatial Capabilities.

Defining Arrays

Define datatype arrays by appending square brackets, as shown in the arrayexamples DDL sample.

CREATE TABLE arrayexamples (
  tiny_int_array TINYINT[],
  int_array INTEGER[],
  big_int_array BIGINT[],
  text_array TEXT[] ENCODING DICT(32), --OmniSci supports only DICT(32) TEXT arrays.
  float_array FLOAT[],
  double_array DOUBLE[],
  decimal_array DECIMAL(18,6)[],
  boolean_array BOOLEAN[],
  date_array DATE[],
  time_array TIME[],
  timestamp_array TIMESTAMP[])

You can also define fixed-length arrays. For example:

CREATE TABLE arrayexamples (
  float_array3 FLOAT[3],
  date_array4 DATE[4]

Fixed-length arrays require less storage space than variable-length arrays.

Fixed Encoding

For DATE types, you can use the terms FIXED and DAYS interchangeably. Both are synonymous for the DATE type in HEAVY.AI.

Some of the INTEGER options overlap. For example, INTEGER ENCODINGFIXED(8) and SMALLINT ENCODINGFIXED(8) are essentially identical.

Shared Dictionaries

CREATE TABLE text_shard (
i TEXT ENCODING DICT(32),
s TEXT ENCODING DICT(32),
SHARD KEY (i))
WITH (SHARD_COUNT = 2);

CREATE TABLE text_shard1 (
i TEXT,
s TEXT ENCODING DICT(32),
SHARD KEY (i),
SHARED DICTIONARY (i) REFERENCES text_shard(i))
WITH (SHARD_COUNT = 2);

In the table definition, make sure that referenced columns appear before the referencing columns.

create table flights (
*
*
*
dest_name TEXT ENCODING DICT,
dest_city TEXT ENCODING DICT,
dest_state TEXT ENCODING DICT,
dest_country TEXT ENCODING DICT,

*
*
*
origin_name TEXT,
origin_city TEXT,
origin_state TEXT,
origin_country TEXT,
*
*
*

SHARED DICTIONARY (origin_name) REFERENCES flights(dest_name),
SHARED DICTIONARY (origin_city) REFERENCES flights(dest_city),
SHARED DICTIONARY (origin_state) REFERENCES flights(dest_state),
SHARED DICTIONARY (origin_country) REFERENCES flights(dest_country),
*
*
*
)
WITH(
*
*
*
)

create table flights (

*
*
*

SHARED DICTIONARY (origin_city) REFERENCES us_geography(city),
SHARED DICTIONARY (origin_state) REFERENCES us_geography(state),
SHARED DICTIONARY (origin_country) REFERENCES us_geography(country),
SHARED DICTIONARY (dest_city) REFERENCES us_geography(city),
SHARED DICTIONARY (dest_state) REFERENCES us_geography(state),
SHARED DICTIONARY (dest_country) REFERENCES us_geography(country),

*
*
*
)
WITH(
*
*
*
);

The referencing column cannot specify the encoding of the dictionary, because it uses the encoding from the referenced column.