Datatypes
Datatypes and Fixed Encoding
This topic describes standard datatypes and space-saving variations for values stored in HEAVY.AI.
Datatypes
Each HEAVY.AI datatype uses space in memory and on disk. For certain datatypes, you can use fixed encoding for a more compact representation of these values. You can set a default value for a column by using the DEFAULT
constraint; for more information, see CREATE TABLE.
Datatypes, variations, and sizes are described in the following table.
Datatype | Size (bytes) | Notes |
| 8 | Minimum value: |
| 1 | Minimum value: |
| 2 | Same as |
| 4 | Same as |
| 1 | TRUE: |
| 4 | Same as |
| 2 | Range in days: |
| 4 | Range in years: |
| 2 | In DDL statements defaults to |
| 4 | In DDL statements defaults to |
| 2, 4, or 8 | Takes precision and scale parameters: Size depends on precision:
Scale must be less than precision. |
| 8 | Variable precision. Minimum value: |
| 8 | Seconds ranging from |
| 4 | Variable precision. Minimum value: |
| 4 | Minimum value: |
| 1 | Minumum value: |
| 2 | Same as |
| Variable[2] | Geospatial datatype. A sequence of 2 or more points and the lines that connect them. For example: |
MULTILINESTRING | Variable[2] | Geospatial datatype. A set of associated lines. For example: MULTILINESTRING((0 0, 1 0, 2 0), (0 1, 1 1, 2 1)) |
MULTIPOINT | Variable[2] | Geospatial datatype. A set of points. For example: MULTIPOINT((0 0), (1 0), (2 0)) |
| Variable[2] | Geospatial datatype. A set of one or more polygons. For example: |
| Variable[2] | Geospatial datatype. A point described by two coordinates. When the coordinates are longitude and latitude, HEAVY.AI stores longitude first, and then latitude. For example: |
| Variable[2] | Geospatial datatype. A set of one or more rings (closed line strings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings). For example: |
| 2 | Minimum value: |
| 1 | Minumum value: |
| 4 | Max cardinality 2 billion distinct string values. Maximum string length is 32,767. |
| 1 | Max cardinality 255 distinct string values. |
| 2 | Max cardinality 64 K distinct string values. |
| Variable | Size of the string + 6 bytes. Maximum string length is 32,767.
|
| 8 | Minimum value: |
| 4 | Minimum value: |
| 8 | Linux timestamp from |
| 8 | Linux timestamp from |
| 8 | Linux timestamp from |
| 8 | Linux timestamp from |
| 4 | Range: |
| 1 | Minimum value: |
[1] - In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE
columns, but you can create only 4-byte DATE
columns (default) and 2-byte DATE
columns (see DATE ENCODING DAYS(16)
).
[2] - See Storage and Compression below for information about geospatial datatype sizes.
HEAVY.AI does not support geometry arrays.
Timestamp values are always stored in 8 bytes. The greater the precision, the lower the fidelity.
Geospatial Datatypes
HEAVY.AI supports the LINESTRING, MULTILINESTRING
, POLYGON, MULTIPOLYGON
, POINT
, and MULTIPOINT
geospatial datatypes.
In the following example:
p0
,p1
,ls0
, andpoly0
are simple (planar) geometries.p4
is point geometry with Web Mercator longitude/latitude coordinates.p2
,p3
,mp, ls1
,ls2, mls1, mls2
,poly1
, andmpoly0
are geometries using WGS84 SRID=4326 longitude/latitude coordinates.
Storage
Geometry storage requirements are largely dependent on coordinate data. Coordinates are normally stored as 8-byte doubles, two coordinates per point, for all points that form a geometry. Each POINT geometry in the p1 column, for example, requires 16 bytes.
Compression
WGS84 (SRID 4326) coordinates are compressed to 32 bits by default. This sacrifices some precision but reduces storage requirements by half.
For example, columns p2, mp, ls1, mls1, poly1, and mpoly0 in the table defined above are compressed. Each geometry in the p2 column requires 8 bytes, compared to 16 bytes for p0.
You can explicitly disable compression. WGS84 columns p3, ls2, mls2 are not compressed and continue using doubles. Simple (planar) columns p0, p1, ls0, poly1 and non-4326 column p4 are not compressed.
For more information about geospatial datatypes and functions, see Geospatial Capabilities.
Defining Arrays
Define datatype arrays by appending square brackets, as shown in the arrayexamples
DDL sample.
You can also define fixed-length arrays. For example:
Fixed-length arrays require less storage space than variable-length arrays.
Fixed Encoding
To use fixed-length fields, the range of the data must fit into the constraints as described. Understanding your schema and the scope of potential values in each field helps you to apply fixed encoding types and save significant storage space.
These encodings are most effective on low-cardinality TEXT
fields, where you can achieve large savings of storage space and improved processing speed, and on TIMESTAMP
fields where the timestamps range between 1901-12-13 20:45:53
and 2038-01-19 03:14:07
. If a TEXT ENCODING
field does not match the defined cardinality, HEAVY.AI substitutes a NULL
value and logs the change.
For DATE
types, you can use the terms FIXED
and DAYS
interchangeably. Both are synonymous for the DATE
type in HEAVY.AI.
Some of the INTEGER
options overlap. For example, INTEGER ENCODINGFIXED(8)
and SMALLINT ENCODINGFIXED(8)
are essentially identical.
Shared Dictionaries
You can improve performance of string operations and optimize storage using shared dictionaries. You can share dictionaries within a table or between different tables in the same database. The table with which you want to share dictionaries must exist when you create the table that references the TEXT ENCODING DICT
field, and the column that you are referencing in that table must also exist. The following small DDL shows the basic structure:
In the table definition, make sure that referenced columns appear before the referencing columns.
For example, this DDL is a portion of the schema for the flights database. Because airports are both origin and destination locations, it makes sense to reuse the same dictionaries for name, city, state, and country values.
To share a dictionary in a different existing table, replace the table name in the REFERENCES
instruction. For example, if you have an existing table called us_geography
, you can share the dictionary by following the pattern in the DDL fragment below.
The referencing column cannot specify the encoding of the dictionary, because it uses the encoding from the referenced column.
Last updated