HEAVY.AI Docs
v8.1.0
v8.1.0
  • Welcome to HEAVY.AI Documentation
  • Overview
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
      • Hardware Reference
      • Software Requirements
      • Licensing
    • Installation
      • Free Version
      • Installing on Docker
        • HEAVY.AI Installation using Docker on Ubuntu
      • Installing on Ubuntu
        • HEAVY.AI Installation on Ubuntu
        • Install NVIDIA Drivers and Vulkan on Ubuntu
      • Installing on Rocky Linux / RHEL
        • HEAVY.AI Installation on RHEL
        • Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL
      • Getting Started on AWS
      • Getting Started on GCP
      • Getting Started on Azure
      • Getting Started on Kubernetes (BETA)
      • Upgrading
        • Upgrading HEAVY.AI
        • Upgrading from Omnisci to HEAVY.AI 6.0
        • CUDA Compatibility Drivers
      • Uninstalling
      • Ports
    • Services and Utilities
      • Using Services
      • Using Utilities
    • Executor Resource Manager
    • Configuration Parameters
      • Overview
      • Configuration Parameters for HeavyDB
      • Configuration Parameters for HEAVY.AI Web Server
      • Configuration Parameters for HeavyIQ
    • Security
      • Roles and Privileges
        • Column-Level Security
      • Connecting Using SAML
      • Implementing a Secure Binary Interface
      • Encrypted Credentials in Custom Applications
      • LDAP Integration
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
      • Kafka
      • Using HeavyImmerse Data Manager
      • Importing Geospatial Data
    • Command Line
      • Loading Data with SQL
      • Exporting Data
  • SQL
    • Data Definition (DDL)
      • Datatypes
      • Users and Databases
      • Tables
      • System Tables
      • Views
      • Policies
      • Comment
    • Data Manipulation (DML)
      • SQL Capabilities
        • ALTER SESSION SET
        • ALTER SYSTEM CLEAR
        • DELETE
        • EXPLAIN
        • INSERT
        • KILL QUERY
        • LIKELY/UNLIKELY
        • SELECT
        • SHOW
        • UPDATE
        • Arrays
        • Logical Operators and Conditional and Subquery Expressions
        • Table Expression and Join Support
        • Type Casts
      • Geospatial Capabilities
        • Uber H3 Hexagonal Modeling
      • Functions and Operators
      • System Table Functions
        • generate_random_strings
        • generate_series
        • tf_compute_dwell_times
        • tf_feature_self_similarity
        • tf_feature_similarity
        • tf_geo_rasterize
        • tf_geo_rasterize_slope
        • tf_graph_shortest_path
        • tf_graph_shortest_paths_distances
        • tf_load_point_cloud
        • tf_mandelbrot*
        • tf_point_cloud_metadata
        • tf_raster_contour_lines; tf_raster_contour_polygons
        • tf_raster_graph_shortest_slope_weighted_path
        • tf_rf_prop_max_signal (Directional Antennas)
        • ts_rf_prop_max_signal (Isotropic Antennas)
        • tf_rf_prop
      • Window Functions
      • Reserved Words
      • SQL Extensions
      • HeavyIQ LLM_TRANSFORM
  • HeavyImmerse
    • Introduction to HeavyImmerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
      • Dashboard List
      • Creating a Dashboard
      • Configuring a Dashboard
      • Duplicating and Sharing Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Using Cross-link
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Joins (Beta)
    • Chart Types
      • Overview
      • Bubble
      • Choropleth
      • Combo
      • Contour
      • Cross-Section
      • Gauge
      • Geo Heatmap
      • Heatmap
      • Linemap
      • Number
      • Pie
      • Pointmap
      • Scatter Plot
      • Skew-T
      • Table
      • Text Widget
      • Wind Barb
    • Deprecated Charts
      • Bar
      • Combo - Original
      • Histogram
      • Line
      • Stacked Bar
    • HeavyIQ SQL Notebook
  • HEAVYIQ Conversational Analytics
    • HeavyIQ Overview
      • HeavyIQ Guidance
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
    • Raster Data Wrapper Reference
  • HeavyML (BETA)
    • HeavyML Overview
    • Clustering Algorithms
    • Regression Algorithms
      • Linear Regression
      • Random Forest Regression
      • Decision Tree Regression
      • Gradient Boosting Tree Regression
    • Principal Components Analysis
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)
      • Installation
      • Registering and Using a Function
      • User-Defined Table Functions
      • RBC UDF/UDTF Example Notebooks
      • General UDF/UDTF Tutorial Notebooks
      • RBC API Reference
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
      • Forecasting with HEAVY.AI and Prophet
  • APIs and Interfaces
    • Overview
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
      • Vega Tutorials
        • Vega at a Glance
        • Getting Started with Vega
        • Getting More from Your Data
        • Creating More Advanced Charts
        • Using Polys Marks Type
        • Vega Accumulator
        • Using Transform Aggregation
        • Improving Rendering with SQL Extensions
      • Vega Reference Overview
        • data Property
        • projections Property
        • scales Property
        • marks Property
      • Migration
        • Migrating Vega Code to Dynamic Poly Rendering
      • Try Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • FAQs
    • Troubleshooting
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
      • Release 6.x
      • Release 5.x
      • Release 4.x
      • Release 3.x
Powered by GitBook
On this page
  • Datatypes
  • Geospatial Datatypes
  • Storage
  • Compression
  • Defining Arrays
  • Fixed Encoding
  • Shared Dictionaries
Export as PDF
  1. SQL
  2. Data Definition (DDL)

Datatypes

Datatypes and Fixed Encoding

PreviousData Definition (DDL)NextUsers and Databases

Last updated 1 year ago

This topic describes standard datatypes and space-saving variations for values stored in HEAVY.AI.

Datatypes

Each HEAVY.AI datatype uses space in memory and on disk. For certain datatypes, you can use fixed encoding for a more compact representation of these values. You can set a default value for a column by using the DEFAULT constraint; for more information, see .

Datatypes, variations, and sizes are described in the following table.

Datatype

Size (bytes)

Notes

BIGINT

8

Minimum value: -9,223,372,036,854,775,807; maximum value: 9,223,372,036,854,775,807.

BIGINT ENCODING FIXED(8)

1

Minimum value: -127; maximum value: 127

BIGINT ENCODING FIXED(16)

2

Same as SMALLINT.

BIGINT ENCODING FIXED(32)

4

Same as INTEGER.

BOOLEAN

1

TRUE: 'true', '1', 't'. FALSE: 'false', '0', 'f'. Text values are not case-sensitive.

4

Same as DATE ENCODING DAYS(32).

DATE ENCODING DAYS(16)

2

Range in days: -32,768 - 32,767 Range in years: +/-90 around epoch, April 14, 1880 - September 9, 2059. Minumum value: -2,831,155,200; maximum value: 2,831,068,800. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.

DATE ENCODING DAYS(32)

4

Range in years: +/-5,883,517 around epoch. Maximum date January 1, 5885487 (approximately). Minimum value: -2,147,483,648; maximum value: 2,147,483,647. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.

DATE ENCODING FIXED(16)

2

In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

DATE ENCODING FIXED(32)

4

In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

DECIMAL

2, 4, or 8

Takes precision and scale parameters: DECIMAL(precision,scale)

Size depends on precision:

  • Up to 4: 2 bytes

  • 5 to 9: 4 bytes

  • 10 to 18 (maximum): 8 bytes

Scale must be less than precision.

DOUBLE

8

Variable precision. Minimum value: -1.79e308; maximum value: 1.79e308

EPOCH

8

Seconds ranging from -30610224000 (1/1/1000 00:00:00) through 185542587100800 (1/1/5885487 23:59:59).

FLOAT

4

Variable precision. Minimum value: -3.4e38; maximum value: 3.4e38.

INTEGER

4

Minimum value: -2,147,483,647; maximum value: 2,147,483,647.

INTEGER ENCODING FIXED(8)

1

Minumum value: -127; maximum value: 127.

INTEGER ENCODING FIXED(16)

2

Same as SMALLINT.

LINESTRING

Variable[2]

Geospatial datatype. A sequence of 2 or more points and the lines that connect them. For example: LINESTRING(0 0,1 1,1 2)

MULTILINESTRING

Variable[2]

Geospatial datatype. A set of associated lines. For example: MULTILINESTRING((0 0, 1 0, 2 0), (0 1, 1 1, 2 1))

MULTIPOINT

Variable[2]

Geospatial datatype. A set of points. For example: MULTIPOINT((0 0), (1 0), (2 0))

MULTIPOLYGON

Variable[2]

Geospatial datatype. A set of one or more polygons. For example:MULTIPOLYGON(((0 0,4 0,4 4,0 4,0 0),(1 1,2 1,2 2,1 2,1 1)), ((-1 -1,-1 -2,-2 -2,-2 -1,-1 -1)))

POINT

Variable[2]

Geospatial datatype. A point described by two coordinates. When the coordinates are longitude and latitude, HEAVY.AI stores longitude first, and then latitude. For example: POINT(0 0)

POLYGON

Variable[2]

Geospatial datatype. A set of one or more rings (closed line strings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings). For example: POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1))

SMALLINT

2

Minimum value: -32,767; maximum value: 32,767.

SMALLINT ENCODING FIXED(8)

1

Minumum value: -127 ; maximum value: 127.

TEXT ENCODING DICT

4

Max cardinality 2 billion distinct string values. Maximum string length is 32,767.

TEXT ENCODING DICT(8)

1

Max cardinality 255 distinct string values.

TEXT ENCODING DICT(16)

2

Max cardinality 64 K distinct string values.

TEXT ENCODING NONE

Variable

Size of the string + 6 bytes. Maximum string length is 32,767.

TIME

8

Minimum value: 00:00:00; maximum value: 23:59:59.

TIME ENCODING FIXED(32)

4

Minimum value: 00:00:00; maximum value: 23:59:59.

TIMESTAMP(0)

8

Linux timestamp from -30610224000 (1/1/1000 00:00:00) through 29379542399 (12/31/2900 23:59:59). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS or YYYY-MM-DDTHH:MM:SS (the T is dropped when the field is populated).

TIMESTAMP(3) (milliseconds)

8

Linux timestamp from -30610224000000 (1/1/1000 00:00:00.000) through 29379542399999 (12/31/2900 23:59:59.999). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.fff or YYYY-MM-DDTHH:MM:SS.fff (the T is dropped when the field is populated).

TIMESTAMP(6) (microseconds)

8

Linux timestamp from -30610224000000000 (1/1/1000 00:00:00.000000) through 29379542399999999 (12/31/2900 23:59:59.999999). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.ffffff or YYYY-MM-DDTHH:MM:SS.ffffff (the T is dropped when the field is populated).

TIMESTAMP(9) (nanoseconds)

8

Linux timestamp from -9223372036854775807 (09/21/1677 00:12:43.145224193) through 9223372036854775807 (11/04/2262 23:47:16.854775807). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.fffffffff or YYYY-MM-DDTHH:MM:SS.fffffffff (the T is dropped when the field is populated).

TIMESTAMP ENCODING FIXED(32)

4

Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07. Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS or YYYY-MM-DDTHH:MM:SS (the T is dropped when the field is populated).

TINYINT

1

Minimum value: -127; maximum value: 127.

[1] - In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see DATE ENCODING DAYS(16)).

  • HEAVY.AI does not support geometry arrays.

  • Timestamp values are always stored in 8 bytes. The greater the precision, the lower the fidelity.

Geospatial Datatypes

HEAVY.AI supports the LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON, POINT, and MULTIPOINT geospatial datatypes.

In the following example:

  • p0, p1, ls0, and poly0 are simple (planar) geometries.

  • p4 is point geometry with Web Mercator longitude/latitude coordinates.

  • p2, p3, mp, ls1, ls2, mls1, mls2, poly1, and mpoly0 are geometries using WGS84 SRID=4326 longitude/latitude coordinates.

CREATE TABLE geo ( name TEXT ENCODING DICT(32),
                   p0 POINT,
                   p1 GEOMETRY(POINT),
                   p2 GEOMETRY(POINT, 4326),
                   p3 GEOMETRY(POINT, 4326) ENCODING NONE,
                   p4 GEOMETRY(POINT, 900913),
                   mp GEOMETRY(MULTIPOINT, 4326),
                   ls0  LINESTRING,
                   ls1 GEOMETRY(LINESTRING, 4326) ENCODING COMPRESSED(32),
                   ls2 GEOMETRY(LINESTRING, 4326) ENCODING NONE,
                   mls1 GEOMETRY(MULTILINESTRING, 4326) ENCODING COMPRESSED(32),
                   mls2 GEOMETRY(MULTILINESTRING, 4326) ENCODING NONE,
                   poly0 POLYGON,
                   poly1 GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32),
                   mpoly0 GEOMETRY(MULTIPOLYGON, 4326)
                  );

Storage

Geometry storage requirements are largely dependent on coordinate data. Coordinates are normally stored as 8-byte doubles, two coordinates per point, for all points that form a geometry. Each POINT geometry in the p1 column, for example, requires 16 bytes.

Compression

WGS84 (SRID 4326) coordinates are compressed to 32 bits by default. This sacrifices some precision but reduces storage requirements by half.

For example, columns p2, mp, ls1, mls1, poly1, and mpoly0 in the table defined above are compressed. Each geometry in the p2 column requires 8 bytes, compared to 16 bytes for p0.

You can explicitly disable compression. WGS84 columns p3, ls2, mls2 are not compressed and continue using doubles. Simple (planar) columns p0, p1, ls0, poly1 and non-4326 column p4 are not compressed.

Defining Arrays

Define datatype arrays by appending square brackets, as shown in the arrayexamples DDL sample.

CREATE TABLE arrayexamples (
  tiny_int_array TINYINT[],
  int_array INTEGER[],
  big_int_array BIGINT[],
  text_array TEXT[] ENCODING DICT(32), --OmniSci supports only DICT(32) TEXT arrays.
  float_array FLOAT[],
  double_array DOUBLE[],
  decimal_array DECIMAL(18,6)[],
  boolean_array BOOLEAN[],
  date_array DATE[],
  time_array TIME[],
  timestamp_array TIMESTAMP[])

You can also define fixed-length arrays. For example:

CREATE TABLE arrayexamples (
  float_array3 FLOAT[3],
  date_array4 DATE[4]

Fixed-length arrays require less storage space than variable-length arrays.

Fixed Encoding

To use fixed-length fields, the range of the data must fit into the constraints as described. Understanding your schema and the scope of potential values in each field helps you to apply fixed encoding types and save significant storage space.

These encodings are most effective on low-cardinality TEXT fields, where you can achieve large savings of storage space and improved processing speed, and on TIMESTAMP fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07. If a TEXT ENCODING field does not match the defined cardinality, HEAVY.AI substitutes a NULL value and logs the change.

For DATE types, you can use the terms FIXED and DAYS interchangeably. Both are synonymous for the DATE type in HEAVY.AI.

Some of the INTEGER options overlap. For example, INTEGER ENCODINGFIXED(8) and SMALLINT ENCODINGFIXED(8) are essentially identical.

Shared Dictionaries

You can improve performance of string operations and optimize storage using shared dictionaries. You can share dictionaries within a table or between different tables in the same database. The table with which you want to share dictionaries must exist when you create the table that references the TEXT ENCODING DICT field, and the column that you are referencing in that table must also exist. The following small DDL shows the basic structure:

CREATE TABLE text_shard (
i TEXT ENCODING DICT(32),
s TEXT ENCODING DICT(32),
SHARD KEY (i))
WITH (SHARD_COUNT = 2);

CREATE TABLE text_shard1 (
i TEXT,
s TEXT ENCODING DICT(32),
SHARD KEY (i),
SHARED DICTIONARY (i) REFERENCES text_shard(i))
WITH (SHARD_COUNT = 2);

In the table definition, make sure that referenced columns appear before the referencing columns.

For example, this DDL is a portion of the schema for the flights database. Because airports are both origin and destination locations, it makes sense to reuse the same dictionaries for name, city, state, and country values.

create table flights (
*
*
*
dest_name TEXT ENCODING DICT,
dest_city TEXT ENCODING DICT,
dest_state TEXT ENCODING DICT,
dest_country TEXT ENCODING DICT,

*
*
*
origin_name TEXT,
origin_city TEXT,
origin_state TEXT,
origin_country TEXT,
*
*
*

SHARED DICTIONARY (origin_name) REFERENCES flights(dest_name),
SHARED DICTIONARY (origin_city) REFERENCES flights(dest_city),
SHARED DICTIONARY (origin_state) REFERENCES flights(dest_state),
SHARED DICTIONARY (origin_country) REFERENCES flights(dest_country),
*
*
*
)
WITH(
*
*
*
)

To share a dictionary in a different existing table, replace the table name in the REFERENCES instruction. For example, if you have an existing table called us_geography, you can share the dictionary by following the pattern in the DDL fragment below.

create table flights (

*
*
*

SHARED DICTIONARY (origin_city) REFERENCES us_geography(city),
SHARED DICTIONARY (origin_state) REFERENCES us_geography(state),
SHARED DICTIONARY (origin_country) REFERENCES us_geography(country),
SHARED DICTIONARY (dest_city) REFERENCES us_geography(city),
SHARED DICTIONARY (dest_state) REFERENCES us_geography(state),
SHARED DICTIONARY (dest_country) REFERENCES us_geography(country),

*
*
*
)
WITH(
*
*
*
);

The referencing column cannot specify the encoding of the dictionary, because it uses the encoding from the referenced column.

DATE]

Note: Importing TEXT ENCODING NONE fields using the has limitations for Immerse. When you use string instead of string [dict. encode] for a column when importing, you cannot use that column in Immerse dashboards.

[2] - See and below for information about geospatial datatype sizes.

For more information about geospatial datatypes and functions, see .

Geospatial Capabilities
Storage
Compression
Data Manager
[1
CREATE TABLE