Parquet Data Wrapper Reference

The following are the supported data types and configuration for use with HeavyConnect and the Parquet data format. This reference outlines prerequisites, restrictions, and supported mappings of HeavyDB column types to Parquet column types.

Parquet Data Wrapper Prerequisites & Restrictions

HeavyDB to Parquet Data Type Mapping

Numeric and Boolean Types: Table 1

Numeric and Boolean Types: Table 2

Numeric and Boolean Types: Table 3

[1] Unsigned Parquet types must map to HeavyDB signed types of one sizing larger-- for example Parquet’s Unsigned INT(32) must map to HeavyDB's BIGINT-- to ensure that no information loss occurs.

[2] Float types use 32 bits, while double types use 64 bits in their representation according to the IEEE standard. There is no check for precision loss when coercion from a double to float is requested. However, there is a check to see if the double fits in the range that float types are capable of representing.

Date and Time Types: Table 1

Date and Time Types: Table 2

[1] DATE ENCODING DAYS (16) has no mapping from any Parquet type where the mapping would not result in a potential loss of information. Some mappings are allowed through coercion.

[2] The HeavyDB TIME type represents the number of seconds elapsed in a 24-hour period, while the Parquet TIME type represents a similar quantity but in milli/micro/nanoseconds. To make use of such Parquet columns, this mapping is allowed even though it breaks the convention that only direct mappings are supported. An intermediate transform takes place, calculating the number of seconds elapsed given the number of milli, micro, or nano seconds.

[3] Similar to the HeavyDB TIME type, TIMESTAMP (0) represents the time/date using seconds, which is not compatible with any Parquet TIMESTAMP types. As such, an exception is made for this case to support mapping from all of milli, micro, or nano second Parquet TIMESTAMPs.

[4] Because Parquet’s TIMESTAMP stores the data in a 64-bit representation, and the ENCODING FIXED (32) uses a 32-bit representation, information loss is possible. In these cases, no mapping is supported; however, a coercion is supported.

[5] Timestamps that use different precision with 32-bit representation (other than second precision) have a very limited range and are are not supported. Second timestamps that use a 32-bit representation have a maximum range of:8:45:53 pm UTC | Friday, December 13, 1901 to 3:14:07 am UTC | Tuesday, January 19, 2038

String Types

GeoTypes

Only geometry types in WKT format are currently supported

Array Types

HeavyDB array data type maps to the Parquet list data type.

Last updated