1 of 4

Supported Data Sources

Kafka

Apache Kafka is a distributed streaming platform. It allows you to create publishers, which create data streams, and consumers, which subscribe to and ingest the data streams produced by publishers.

You can use HeavyDB KafkaImporter C++ program to consume a topic created by running Kafka shell scripts from the command line. Follow the procedure below to use a Kafka producer to send data, and a Kafka consumer to store the data, in HeavyDB.

This example assumes you have already installed and configured Apache Kafka. See the Kafka website.

Creating a Topic

Create a sample topic for your Kafka producer.

Run the kafka-topics.sh script with the following arguments:

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1
--partitions 1 --topic matstream

Create a file named myfile that consists of comma-separated data. For example:
```
michael,1
andrew,2
ralph,3
sandhya,4
```
Use heavysql to create a table to store the stream.
```
create table stream1(name text, id int);
```

Using the Producer

Load your file into the Kafka producer.

Create and start a producer using the following command.

cat myfile | bin/kafka-console-producer.sh --broker-list localhost:9097
--topic matstream

Using the Consumer

Load the data to HeavyDB using the Kafka console consumer and the KafkaImporter program.

Pull the data from Kafka into the KafkaImporter program.

/home/heavyai/build/bin/KafkaImporter stream1 heavyai -p HyperInteractive -u heavyai --port 6274 --batch 1 --brokers localhost:6283  
--topic matstream --group-id 1

Field Delimiter: ,
Line Delimiter: \n
Null String: \N
Insert Batch Size: 1
1 Rows Inserted, 0 rows skipped.
2 Rows Inserted, 0 rows skipped.
3 Rows Inserted, 0 rows skipped.
4 Rows Inserted, 0 rows skipped.

Verify that the data arrived using heavysql.

heavysql> select * from stream1;
name|id
michael|1
andrew|2
ralph|3
sandhya|4

Using Heavy Immerse Data Manager

Heavy Immerse supports file upload for .csv, .tsv, and .txt files, and supports comma, tab, and pipe delimiters.

Heavy Immerse also supports upload of compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

You can import data to HeavyDB using the Immerse import wizard. You can upload data from a local delimited file, from an Amazon S3 data source, or from the Data Catalog.

For methods specific to geospatial data, see also Importing Geospatial Data Using Immerse.

If there is a potential for duplicate entries, and you prefer to avoid loading duplicate rows, see How can I avoid creating duplicate rows?.
If a source file uses a reserved word, OmniSci automatically adds an underscore at the end of the reserved word. For example, year is converted to year_.
If you click the Back button (or accidentally two-finger swipe your mousepad) before your data load is complete, OmniSciDB stops the data load and any records that had transferred are invalidated.

Importing Non-Geospatial Data from a Local File

Follow these steps to import your data:

Click DATA MANAGER.
Click Import Data.
Click Import data from a local file.
Either click the plus sign (+) or drag your file(s) for upload. If you are uploading multiple files, the column names and data types must match. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI. In addition to CSV, TSV, and TXT files, you can import compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.
Choose Import Settings:
- Null string: If, instead using a blank for null cells in your upload document, you have substituted strings such as NULL, enter that string in the Null String field. The values are treated as null values on upload.
- Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma, tab, or pipe.
- Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.
- Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.
- Replicate Table: If you are importing non-geospatial data to a distributed database with more than one node, select this checkbox to replicate the table to all nodes in the cluster. This effectively adds the PARTITIONS='REPLICATED' option to the create table statement. See Replicated Tables.
Click Import Files.
The Table Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance. Immerse defaults to second precision for all timestamp columns. You can reset the precision to second, millisecond, nanosecond, or microsecond. If your column headers contain SQL reserved words, reserved characters (for example, year, /, or #), or spaces, the importer alters the characters to make them safe and notifies you of the changes. You can also change the column labels.
Name the table, and click Save Table.

You can also import locally stored shape files in a variety of formats. See Importing Geospatial Data Using Immerse.

Importing Data from Amazon S3

To import data from your Amazon S3 instance, you need:

The Region and Path for the file in your S3 bucket, or the direct URL to the file (S3 Link).
If importing private data, your Access Key and Secret Key for your personal IAM account in S3.

Locating the Data File S3 Region, Path, and URL

For information on opening and reviewing items in your S3 instance, see https://docs.aws.amazon.com/AmazonS3/latest/gsg/OpeningAnObject.html

In an S3 bucket, the Region is in the upper-right corner of the screen – US West (N. California) in this case:

Click the file you want to import. To load your S3 file to HEAVY.AI using the steps for S3 Region | Bucket | Path, below, click Copy path to copy to your clipboard the path to your file within your S3 bucket. Alternatively, you can copy the link to your file. The Link in this example is https://s3-us-west-1.amazonaws.com/my-company-bucket/trip_data.7z.

Obtaining Your S3 Access Key and Secret Key

To learn about creating your S3 Access Key and Secret Key, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey

If the data you want to copy is publicly available, you do not need to provide an Access Key and Secret Key.

You can import any file you can see using your IAM account with your Access Key and Secret Key.

Your Secret Key is created with your Access Key, and cannot be retrieved afterward. If you lose your Secret Key, you must create a new Access Key and Secret Key.

Loading Your S3 Data to HEAVY.AI

Follow these steps to import your S3 data:

Click DATA MANAGER.
Click Import Data.
Click Import data from Amazon S3.
Choose whether to import using the S3 Region | Bucket | Path or a direct full link URL to the file (S3 Link).
1. To import data using S3 Region | Bucket | Path:
  1. Select your Region from the pop-up menu.
  2. Enter the unique name of your S3 Bucket.
  3. Enter or paste the Path to the file stored in your S3 bucket.
2. To import data using S3 link:
  1. Copy the Link URL from the file Overview in your S3 bucket.
  2. Paste the link in the Full Link URL field of the HEAVY.AI Table Importer.
If the data is publicly available, you can disable the Private Data checkbox. If you are importing Private Data, enter your credentials:
1. Enable the Private Data checkbox.
2. Enter your S3 Access Key.
3. Enter your S3 Secret Key.
Choose the appropriate Import Settings. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV.
1. Null string: If you have substituted a string such as NULL for null values in your upload document, enter that string in the Null String field. The values are treated as null values on upload.
2. Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma or pipe.
3. Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.
4. Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.
Click Import Files.
The Table Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance. If your column headers contain SQL reserved words, reserved characters (for example, year, /, or #), or spaces, the importer alters the characters to make them safe and notifies you of the changes. You can also change the column labels.
Name the table, and click Save Table.

Importing from the Data Catalog

The Data Catalog provides access to sample datasets you can use to exercise data visualization features in Heavy Immerse. The selection of datasets continually changes, independent of product releases.

To import from the data catalog:

Open the Data Manager.
Click Data Catalog.
Use the Search box to locate a specific data set, or scroll to find the dataset you want to use. The Contains Geo toggle filters for data sets that contain Geographical information.
Click the Import button beneath the dataset you want to use.
Verify the table and column names in the Data Preview screen.
Click Import Data.

Appending Data to a Table

You can append additional data to an existing table.

To append data to a table:

Open Data Manager.
Select the table you want to append.
Click Append Data.
Click Import data from a local file.
Either click the plus sign (+) or drag your file(s) for upload. The column names and data types of the files you select must match the existing table. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI. In addition to CSV, TSV, and TXT files, you can import compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.
Click Preview.
Click Import Settings
Choose Import Settings:
- Null string: If, instead using a blank for null cells in your upload document, you have substituted strings such as NULL, enter that string in the Null String field. The values are treated as null values on upload.
- Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma, tab, or pipe.
- Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.
- Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.
- Replicate Table: If you are importing non-geospatial data to a distributed database with more than one node, select this checkbox to replicate the table to all nodes in the cluster. This effectively adds the PARTITIONS='REPLICATED' option to the create table statement. See Replicated Tables.
Close Import Settings.
The Data Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance.
If your data contains column headers, verify they match the existing headers.
Click Import Data.

To append data from AWS, click Append Data, then follow the instructions for Loading S3 Data to HEAVY.AI.

Truncating a Table

Sometimes you might want to remove or replace the data in a table without losing the table definition itself.

To remove all data from a table:

Open Data Manager.
Select the table you want to truncate.
Click Delete All Rows.
A very scary red dialog box reminds you that the operation cannot be undone. Click DELETE TABLE ROWS.
Immerse displays the table information with a row count of 0.

Deleting a Table

You can drop a table entirely using Data Manager.

To delete a table:

Open Data Manager.
Select the table you want to delete.
Click DELETE TABLE.
A very scary red dialog box reminds you that the operation cannot be undone. Click DELETE TABLE.
Immerse deletes the table and returns you to the Data Manager TABLES list.

Importing Geospatial Data

If there is a potential for duplicate entries and you want to avoid loading duplicate rows, see How can I avoid creating duplicate rows? on the Troubleshooting page.

Importing Geospatial Data Using Heavy Immerse

You can use Heavy Immerse to import geospatial data into HeavyDB.

Supported formats include:

Keyhole Markup Language (.kml)
GeoJSON (.geojson)
Shapefiles (.shp)
FlatGeobuf (.fgb)

Shapefiles include four mandatory files: .shp, .shx, .dbf, and .prj. If you do not import the .prj file, the coordinate system will be incorrect and you cannot render the shapes on a map.

To import geospatial definition data:

Open Heavy Immerse.
Click Data Manager.
Click Import Data.
Choose whether to import from a local file or an Amazon S3 instance. For details on importing from Amazon S3, see Importing Data from Amazon S3.
Click the large + icon to select files for upload, or drag and drop the files to the Data Importer screen.
When importing shapefiles, upload all required file types at the same time. If you upload them separately, Heavy Immerse issues an error message.
Wait for the uploads to complete (indicated by green checkmarks on the file icons), then click Preview.
On the Data Preview screen:
- Edit the column headers (if needed).
- Enter a name for the table in the field at the bottom of the screen.
- If you are loading the data files into a distributed system, verify under Import Settings that the Replicate Table checkbox is selected.
- Click Import Data.
On the Successfully Imported Table screen, verify the rows and columns that compose your data table.

Importing Well-Known Text

You can import spatial representations in Well-known Text (WKT) format. WKT is a text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects, and transformations between spatial reference systems.

When representing longitude and latitude in HEAVY.AI geospatial primitives, the first coordinate is assumed to be longitude by default.

WKT Data Supported in Geospatial Columns

You can use heavysql to define tables with columns that store WKT geospatial objects.

heavysql> \d geo
CREATE TABLE geo (
p POINT,
l LINESTRING,
poly POLYGON)

Insert

You can use heavysql to insert data as WKT string values.

heavysql> INSERT INTO geo values('POINT(20 20)', 'LINESTRING(40 0, 40 40)', 
'POLYGON(( 0 0, 40 0, 40 40, 0 40, 0 0 ))');

Importing Delimited Files

You can insert data from CSV/TSV files containing WKT strings. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI.

> cat geo.csv
"p", "l", "poly"
"POINT(1 1)", "LINESTRING( 2 0,  2  2)", "POLYGON(( 1 0,  0 1, 1 1 ))"
"POINT(2 2)", "LINESTRING( 4 0,  4  4)", "POLYGON(( 2 0,  0 2, 2 2 ))"
"POINT(3 3)", "LINESTRING( 6 0,  6  6)", "POLYGON(( 3 0,  0 3, 3 3 ))"
"POINT(4 4)", "LINESTRING( 8 0,  8  8)", "POLYGON(( 4 0,  0 4, 4 4 ))"
heavysql> COPY geo FROM 'geo.csv';
Result
Loaded: 4 recs, Rejected: 0 recs in 0.356000 secs

You can use your own custom delimiter in your data files.

> cat geo1.csv
"p", "l", "poly"
POINT(5 5); LINESTRING(10 0, 10 10); POLYGON(( 5 0, 0 5, 5 5 ))
heavysql> COPY geo FROM 'geo1.csv' WITH (delimiter=';', quoted='false');
Result
Loaded: 1 recs, Rejected: 0 recs in 0.148000 secs

Importing Legacy CSV/TSV Files

Storing Geo Data

You can import CSV and TSV files for tables that store longitude and latitude as either:

Separate consecutive scalar columns
A POINT field.

If the data is stored as a POINT, you can use spatial functions like ST_Distance and ST_Contains. When location data are stored as a POINT column, they are displayed as such when querying the table:

select * from destination_points;
name|pt
Just Fishing Around|POINT (-85.499999999727588 44.6929999755849)
Moonlight Cove Waterfront|POINT (-85.5046011346879 44.6758447935227)

HEAVY.AI accepts data with any SRID, or with no SRID. HEAVY.AI supports SRID 4326 (WGS 84), and allows projections from SRID 4326 to SRID 900913 (Google Web Mercator). Geometries declared with SRID 4326 are compressed by default, and can be rendered and used to calculate geodesic distance. Geometries declared with any other SRID, or no SRID, are treated as planar geometries; the SRIDs are ignored.

If two geometries are used in one operation (for example, in ST_Distance), the SRID values need to match.

Importing the Data

If you are using heavysql, create the table in HEAVY.AI with the POINT field defined as below:

CREATE TABLE new_geo (p GEOMETRY(POINT,4326))

Then, import the file using COPY FROM in heavysql. By default, the two columns as consumed as longitude x and then latitude y. If the order of the coordinates in the CSV file is reversed, load the data using the WITH option lonlat='false':

heavysql> COPY new_geo FROM 'legacy_geo.csv' WITH (lonlat='false');

Columns can exist on either side of the point field; the lon/lat in the source file does not have to be at the beginning or end of the target table. Fields can exist on either side of the lon/lat pair.

If the imported coordinates are not 4326---for example, 2263---you can transform them to 4326 on the fly:

heavysql> COPY new_geo FROM 'legacy_geo_2263.csv' WITH (source_srid=2263, lonlat='false');

Importing CSV, TSV, and TXT Files in Immerse

In Immerse, you define the table when loading the data instead of predefining it before import. Immerse supports appending data to a table by loading one or more files.

Longitude and latitude can be imported as separate columns.

Importing Geospatial Files

You can create geo tables by importing specific geo file formats. HEAVY.AI supports the following types:

ESRI shapefile (.shp and associated files)
GeoJSON (.geojson or .json)
KML (.kml or .kmz)
ESRI file geodatabase (.gdb)

An ESRI file geodatabase can have multiple layers, and importing it results in the creation of one table for each layer in the file. This behavior differs from that of importing shapefiles, GeoJSON, or KML files, which results in a single table. See Importing an ESRI File Geodatabase for more information.

You import geo files using the COPY FROM command with the geo option:

heavysql> COPY states FROM 'states.shp' WITH (geo='true');
heavysql> COPY zipcodes FROM 'zipcodes.geojson' WITH (geo='true');
heavysql> COPY cell_towers FROM 'cell_towers.kml' WITH (geo='true');

The geo file import process automatically creates the table by detecting the column names and types explicitly described in the geo file header. It then creates a single geo column (always called heavyai_geo) that is of one of the supported types (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, or MULTIPOLYGON).

In Release 6.2 and higher, polygon render metadata assignment is disabled by default. This data is no longer required by the new polygon rendering algorithm introduced in Release 6.0. The new default results in significantly faster import for polygon table imports, particularly high-cardinality tables.

If you need to revert to the legacy polygon rendering algorithm, polygons from tables imported in Release 6.2 may not render correctly. Those tables must be re-imported after setting the server configuration flag enable-assign-render-groups to true.

The legacy polygon rendering algorithm and polygon render metadata server config will be removed completely in an upcoming release.

Due to the prevalence of mixed POLYGON/MULTIPOLYGON geo files (and CSVs), if HEAVY.AI detects a POLYGON type geo file, HEAVY.AI creates a MULTIPOLYGON column and imports the data as single polygons.

If the table does not already exist, it is created automatically.

If the table already exists, and the data in the geo file has exactly the same column structure, the new file is appended to the existing table. This enables import of large geo data sets split across multiple files. The new file is rejected if it does not have the same column structure.

By default, geo data is stored as GEOMETRY.

You can also create tables with coordinates in SRID 3857 or SRID 900913 (Google Web Mercator). Importing data from shapefiles using SRID 3857 or 900913 is supported; importing data from delimited files into tables with these SRIDs is not supported at this time. To explicitly store in other formats, use the following WITH options in addition to geo='true':

Compression used:

COMPRESSED(32) - 50% compression (default)
None - No compression

Spatial reference identifier (SRID) type:

4326 - EPSG:4326 (default)
900913 - Google Web Mercator
3857 - EPSG:3857

For example, the following explicitly sets the default values for encoding and SRID:

geo_coords_encoding='COMPRESSED(32)'
geo_coords_srid=4326

Rendering of geo LINESTRING, MULTILINESTRING, POLYGON and MULTIPOLYGON is possible only with data stored in the default lon/lat WGS84 (SRID 4326) format, although the type and encoding are flexible. Unless compression is explictly disabled (NONE), all SRID 4326 geometries are compressed. For more information, see WSG84 Coordinate Compression.

Note that rendering of geo MULTIPOINT is not yet supported.

Importing an ESRI File Geodatabase

An ESRI file geodatabase (.gdb) provides a method of storing GIS information in one large file that can have one or more "layers", with each layer containing disparate but related data. The data in each layer can be of different types. Importing a .gdb file results in the creation of one table for each layer in the file. You import an ESRI file geodatabase the same way that you import other geo file formats, using the COPY FROM command with the geo option:

heavysql> COPY counties FROM 'counties.gdb' WITH (geo='true');

The layers in the file are scanned and defined by name and contents. Contents are classified as EMPTY, GEO, NON_GEO or UNSUPPORTED_GEO:

EMPTY layers are skipped because they contain no useful data.
GEO layers contain one or more geo columns of a supported type (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON) and one or more regular columns, and can be imported to a single table in the same way as the other geo file formats.
NON_GEO layers contain no geo columns and one or more regular columns, and can be imported to a regular table. Although the data comes from a geo file, data in this layer does not result in a geo table.
UNSUPPORTED_GEO layers contain geo columns of a type not currently supported (for example, GEOMETRYCOLLECTION). These layers are skipped because they cannot be imported completely.

A single COPY FROM command can result in multiple tables, one for each layer in the file. The table names are automatically generated by appending the layer name to the provided table name.

For example, consider the geodatabase file mydata.gdb which contains two importable layers with names A and B. Running COPY FROM creates two tables, mydata_A and mydata_B, with the data from layers A and B, respectively. The layer names are appended to the provided table name. If the geodatabase file only contains one layer, the layer name is not appended.

You can load one specific layer from the geodatabase file by using the geo_layer_name option:

COPY mydata FROM 'mydata.gdb' WITH (geo='true', geo_layer_name='A');

This loads only layer A, if it is importable. The resulting table is called mydata, and the layer name is not appended. Use this import method if you want to set a different name for each table. If the layer name from the geodatabase file would result in an illegal table name when appended, the name is sanitized by removing any illegal characters.

Importing Geo Files from Archives or Non-Local Storage

You can import geo files directly from archive files (for example, .zip .tar .tgz .tar.gz) without unpacking the archive. You can directly import individual geo files compressed with Zip or GZip (GeoJSON and KML only). The server opens the archive header and loads the first candidate file it finds (.shp .geojson .json .kml), along with any associated files (in the case of an ESRI Shapefile, the associated files must be siblings of the first).

$ unzip -l states.zip
Archive:  states.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2018-02-13 11:09   states/
   446116  2017-11-06 12:15   states/cb_2014_us_state_20m.shp
     8434  2017-11-06 12:15   states/cb_2014_us_state_20m.dbf
        9  2017-11-06 12:15   states/cb_2014_us_state_20m.cpg
      165  2017-11-06 12:15   states/cb_2014_us_state_20m.prj
      516  2017-11-06 12:15   states/cb_2014_us_state_20m.shx
---------                     -------
   491525                     6 files

heavysql> COPY states FROM 'states.zip' with (geo='true');
heavysql> COPY zipcodes FROM 'zipcodes.geojson.gz' with (geo='true');
heavysql> COPY zipcodes FROM 'zipcodes.geojson.zip' with (geo='true');
heavysql> COPY cell_towers FROM 'cell_towers.kml.gz' with (geo='true');

You can import geo files or archives directly from an Amazon S3 bucket.

heavysql> COPY states FROM 's3://mybucket/myfolder/states.shp' with (geo='true');
heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' with (geo='true');
heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.gz' with (geo='true');
heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.zip' with (geo='true');

You can provide Amazon S3 credentials, if required, by setting variables in the environment of the heavysql process…

AWS_REGION=us-west-1
AWS_ACCESS_KEY_ID=********************
AWS_SECRET_ACCESS_KEY=****************************************

You can also provide your credentials explicitly in the COPY FROM command.

heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' WITH (geo='true', s3_region='us-west-1', s3_access_key='********************', s3_secret_key='****************************************');

You can import geo files or archives directly from an HTTP/HTTPS website.

heavysql> COPY states FROM 'http://www.mysite.com/myfolder/states.zip' with (geo='true');

You can extend a column type specification to include spatial reference (SRID) and compression mode information.

Geospatial objects declared with SRID 4326 are compressed 50% by default with ENCODING COMPRESSED(32). In the following definition of table geo2, the columns poly2 and mpoly2 are compressed.

COMPRESSED(32) compression maps lon/lat degree ranges to 32-bit integers, providing a smaller memory footprint and faster query execution. The effect on precision is small, approximately 4 inches at the equator.

You can disable compression by explicitly choosing ENCODING NONE.

WGS84 Coordinate Compression

You can extend a column type specification to include spatial reference (SRID) and compression mode information.

Geospatial objects declared with SRID 4326 are compressed 50% by default with ENCODING COMPRESSED(32). In the following definition of table geo2, the columns poly2 and mpoly2 are compressed.

CREATE TABLE geo2 (
p2 GEOMETRY(POINT, 4326) ENCODING NONE,
l2 GEOMETRY(LINESTRING, 900913),
poly2 GEOMETRY(POLYGON, 4326),
mpoly2 GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32));

You can disable compression by explicitly choosing ENCODING NONE.

Importing Geospatial Data

If there is a potential for duplicate entries and you want to avoid loading duplicate rows, see How can I avoid creating duplicate rows? on the Troubleshooting page.

Importing Geospatial Data Using Heavy Immerse

You can use Heavy Immerse to import geospatial data into HeavyDB.

Supported formats include:

Keyhole Markup Language (.kml)
GeoJSON (.geojson)
Shapefiles (.shp)
FlatGeobuf (.fgb)

Shapefiles include four mandatory files: .shp, .shx, .dbf, and .prj. If you do not import the .prj file, the coordinate system will be incorrect and you cannot render the shapes on a map.

To import geospatial definition data:

Open Heavy Immerse.
Click Data Manager.
Click Import Data.
Choose whether to import from a local file or an Amazon S3 instance. For details on importing from Amazon S3, see Importing Data from Amazon S3.
Click the large + icon to select files for upload, or drag and drop the files to the Data Importer screen.
When importing shapefiles, upload all required file types at the same time. If you upload them separately, Heavy Immerse issues an error message.
Wait for the uploads to complete (indicated by green checkmarks on the file icons), then click Preview.
On the Data Preview screen:
- Edit the column headers (if needed).
- Enter a name for the table in the field at the bottom of the screen.
- If you are loading the data files into a distributed system, verify under Import Settings that the Replicate Table checkbox is selected.
- Click Import Data.
On the Successfully Imported Table screen, verify the rows and columns that compose your data table.

Importing Well-Known Text

When representing longitude and latitude in HEAVY.AI geospatial primitives, the first coordinate is assumed to be longitude by default.

WKT Data Supported in Geospatial Columns

You can use heavysql to define tables with columns that store WKT geospatial objects.

heavysql> \d geo
CREATE TABLE geo (
p POINT,
l LINESTRING,
poly POLYGON)

Insert

You can use heavysql to insert data as WKT string values.

heavysql> INSERT INTO geo values('POINT(20 20)', 'LINESTRING(40 0, 40 40)', 
'POLYGON(( 0 0, 40 0, 40 40, 0 40, 0 0 ))');

Importing Delimited Files

> cat geo.csv
"p", "l", "poly"
"POINT(1 1)", "LINESTRING( 2 0,  2  2)", "POLYGON(( 1 0,  0 1, 1 1 ))"
"POINT(2 2)", "LINESTRING( 4 0,  4  4)", "POLYGON(( 2 0,  0 2, 2 2 ))"
"POINT(3 3)", "LINESTRING( 6 0,  6  6)", "POLYGON(( 3 0,  0 3, 3 3 ))"
"POINT(4 4)", "LINESTRING( 8 0,  8  8)", "POLYGON(( 4 0,  0 4, 4 4 ))"
heavysql> COPY geo FROM 'geo.csv';
Result
Loaded: 4 recs, Rejected: 0 recs in 0.356000 secs

You can use your own custom delimiter in your data files.

> cat geo1.csv
"p", "l", "poly"
POINT(5 5); LINESTRING(10 0, 10 10); POLYGON(( 5 0, 0 5, 5 5 ))
heavysql> COPY geo FROM 'geo1.csv' WITH (delimiter=';', quoted='false');
Result
Loaded: 1 recs, Rejected: 0 recs in 0.148000 secs

Importing Legacy CSV/TSV Files

Storing Geo Data

You can import CSV and TSV files for tables that store longitude and latitude as either:

Separate consecutive scalar columns
A POINT field.

select * from destination_points;
name|pt
Just Fishing Around|POINT (-85.499999999727588 44.6929999755849)
Moonlight Cove Waterfront|POINT (-85.5046011346879 44.6758447935227)

If two geometries are used in one operation (for example, in ST_Distance), the SRID values need to match.

Importing the Data

If you are using heavysql, create the table in HEAVY.AI with the POINT field defined as below:

CREATE TABLE new_geo (p GEOMETRY(POINT,4326))

heavysql> COPY new_geo FROM 'legacy_geo.csv' WITH (lonlat='false');

If the imported coordinates are not 4326---for example, 2263---you can transform them to 4326 on the fly:

heavysql> COPY new_geo FROM 'legacy_geo_2263.csv' WITH (source_srid=2263, lonlat='false');

Importing CSV, TSV, and TXT Files in Immerse

In Immerse, you define the table when loading the data instead of predefining it before import. Immerse supports appending data to a table by loading one or more files.

Longitude and latitude can be imported as separate columns.

Importing Geospatial Files

You can create geo tables by importing specific geo file formats. HEAVY.AI supports the following types:

ESRI shapefile (.shp and associated files)
GeoJSON (.geojson or .json)
KML (.kml or .kmz)
ESRI file geodatabase (.gdb)

You import geo files using the COPY FROM command with the geo option:

heavysql> COPY states FROM 'states.shp' WITH (geo='true');
heavysql> COPY zipcodes FROM 'zipcodes.geojson' WITH (geo='true');
heavysql> COPY cell_towers FROM 'cell_towers.kml' WITH (geo='true');

The legacy polygon rendering algorithm and polygon render metadata server config will be removed completely in an upcoming release.

If the table does not already exist, it is created automatically.

By default, geo data is stored as GEOMETRY.

Compression used:

COMPRESSED(32) - 50% compression (default)
None - No compression

Spatial reference identifier (SRID) type:

4326 - EPSG:4326 (default)
900913 - Google Web Mercator
3857 - EPSG:3857

For example, the following explicitly sets the default values for encoding and SRID:

geo_coords_encoding='COMPRESSED(32)'
geo_coords_srid=4326

Note that rendering of geo MULTIPOINT is not yet supported.

Importing an ESRI File Geodatabase

heavysql> COPY counties FROM 'counties.gdb' WITH (geo='true');

The layers in the file are scanned and defined by name and contents. Contents are classified as EMPTY, GEO, NON_GEO or UNSUPPORTED_GEO:

EMPTY layers are skipped because they contain no useful data.
GEO layers contain one or more geo columns of a supported type (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON) and one or more regular columns, and can be imported to a single table in the same way as the other geo file formats.
NON_GEO layers contain no geo columns and one or more regular columns, and can be imported to a regular table. Although the data comes from a geo file, data in this layer does not result in a geo table.
UNSUPPORTED_GEO layers contain geo columns of a type not currently supported (for example, GEOMETRYCOLLECTION). These layers are skipped because they cannot be imported completely.

A single COPY FROM command can result in multiple tables, one for each layer in the file. The table names are automatically generated by appending the layer name to the provided table name.

You can load one specific layer from the geodatabase file by using the geo_layer_name option:

COPY mydata FROM 'mydata.gdb' WITH (geo='true', geo_layer_name='A');

Importing Geo Files from Archives or Non-Local Storage

$ unzip -l states.zip
Archive:  states.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2018-02-13 11:09   states/
   446116  2017-11-06 12:15   states/cb_2014_us_state_20m.shp
     8434  2017-11-06 12:15   states/cb_2014_us_state_20m.dbf
        9  2017-11-06 12:15   states/cb_2014_us_state_20m.cpg
      165  2017-11-06 12:15   states/cb_2014_us_state_20m.prj
      516  2017-11-06 12:15   states/cb_2014_us_state_20m.shx
---------                     -------
   491525                     6 files

heavysql> COPY states FROM 'states.zip' with (geo='true');
heavysql> COPY zipcodes FROM 'zipcodes.geojson.gz' with (geo='true');
heavysql> COPY zipcodes FROM 'zipcodes.geojson.zip' with (geo='true');
heavysql> COPY cell_towers FROM 'cell_towers.kml.gz' with (geo='true');

You can import geo files or archives directly from an Amazon S3 bucket.

heavysql> COPY states FROM 's3://mybucket/myfolder/states.shp' with (geo='true');
heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' with (geo='true');
heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.gz' with (geo='true');
heavysql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.zip' with (geo='true');

You can provide Amazon S3 credentials, if required, by setting variables in the environment of the heavysql process…

AWS_REGION=us-west-1
AWS_ACCESS_KEY_ID=********************
AWS_SECRET_ACCESS_KEY=****************************************

You can also provide your credentials explicitly in the COPY FROM command.

heavysql> COPY states FROM 's3://mybucket/myfolder/states.zip' WITH (geo='true', s3_region='us-west-1', s3_access_key='********************', s3_secret_key='****************************************');

You can import geo files or archives directly from an HTTP/HTTPS website.

heavysql> COPY states FROM 'http://www.mysite.com/myfolder/states.zip' with (geo='true');

You can extend a column type specification to include spatial reference (SRID) and compression mode information.

Geospatial objects declared with SRID 4326 are compressed 50% by default with ENCODING COMPRESSED(32). In the following definition of table geo2, the columns poly2 and mpoly2 are compressed.

You can disable compression by explicitly choosing ENCODING NONE.

WGS84 Coordinate Compression

You can extend a column type specification to include spatial reference (SRID) and compression mode information.

Geospatial objects declared with SRID 4326 are compressed 50% by default with ENCODING COMPRESSED(32). In the following definition of table geo2, the columns poly2 and mpoly2 are compressed.

CREATE TABLE geo2 (
p2 GEOMETRY(POINT, 4326) ENCODING NONE,
l2 GEOMETRY(LINESTRING, 900913),
poly2 GEOMETRY(POLYGON, 4326),
mpoly2 GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32));

You can disable compression by explicitly choosing ENCODING NONE.

Using Heavy Immerse Data Manager

Heavy Immerse supports file upload for .csv, .tsv, and .txt files, and supports comma, tab, and pipe delimiters.

Heavy Immerse also supports upload of compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

You can import data to HeavyDB using the Immerse import wizard. You can upload data from a local delimited file, from an Amazon S3 data source, or from the Data Catalog.

For methods specific to geospatial data, see also Importing Geospatial Data Using Immerse.

If there is a potential for duplicate entries, and you prefer to avoid loading duplicate rows, see How can I avoid creating duplicate rows?.
If a source file uses a reserved word, OmniSci automatically adds an underscore at the end of the reserved word. For example, year is converted to year_.
If you click the Back button (or accidentally two-finger swipe your mousepad) before your data load is complete, OmniSciDB stops the data load and any records that had transferred are invalidated.

Importing Non-Geospatial Data from a Local File

Follow these steps to import your data:

Click DATA MANAGER.
Click Import Data.
Click Import data from a local file.
Either click the plus sign (+) or drag your file(s) for upload. If you are uploading multiple files, the column names and data types must match. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI. In addition to CSV, TSV, and TXT files, you can import compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.
Choose Import Settings:
- Null string: If, instead using a blank for null cells in your upload document, you have substituted strings such as NULL, enter that string in the Null String field. The values are treated as null values on upload.
- Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma, tab, or pipe.
- Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.
- Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.
- Replicate Table: If you are importing non-geospatial data to a distributed database with more than one node, select this checkbox to replicate the table to all nodes in the cluster. This effectively adds the PARTITIONS='REPLICATED' option to the create table statement. See Replicated Tables.
Click Import Files.
The Table Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance. Immerse defaults to second precision for all timestamp columns. You can reset the precision to second, millisecond, nanosecond, or microsecond. If your column headers contain SQL reserved words, reserved characters (for example, year, /, or #), or spaces, the importer alters the characters to make them safe and notifies you of the changes. You can also change the column labels.
Name the table, and click Save Table.

You can also import locally stored shape files in a variety of formats. See Importing Geospatial Data Using Immerse.

Importing Data from Amazon S3

To import data from your Amazon S3 instance, you need:

The Region and Path for the file in your S3 bucket, or the direct URL to the file (S3 Link).
If importing private data, your Access Key and Secret Key for your personal IAM account in S3.

Locating the Data File S3 Region, Path, and URL

For information on opening and reviewing items in your S3 instance, see https://docs.aws.amazon.com/AmazonS3/latest/gsg/OpeningAnObject.html

In an S3 bucket, the Region is in the upper-right corner of the screen – US West (N. California) in this case:

Obtaining Your S3 Access Key and Secret Key

To learn about creating your S3 Access Key and Secret Key, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey

If the data you want to copy is publicly available, you do not need to provide an Access Key and Secret Key.

You can import any file you can see using your IAM account with your Access Key and Secret Key.

Your Secret Key is created with your Access Key, and cannot be retrieved afterward. If you lose your Secret Key, you must create a new Access Key and Secret Key.

Loading Your S3 Data to HEAVY.AI

Follow these steps to import your S3 data:

Click DATA MANAGER.
Click Import Data.
Click Import data from Amazon S3.
Choose whether to import using the S3 Region | Bucket | Path or a direct full link URL to the file (S3 Link).
1. To import data using S3 Region | Bucket | Path:
  1. Select your Region from the pop-up menu.
  2. Enter the unique name of your S3 Bucket.
  3. Enter or paste the Path to the file stored in your S3 bucket.
2. To import data using S3 link:
  1. Copy the Link URL from the file Overview in your S3 bucket.
  2. Paste the link in the Full Link URL field of the HEAVY.AI Table Importer.
If the data is publicly available, you can disable the Private Data checkbox. If you are importing Private Data, enter your credentials:
1. Enable the Private Data checkbox.
2. Enter your S3 Access Key.
3. Enter your S3 Secret Key.
Choose the appropriate Import Settings. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV.
1. Null string: If you have substituted a string such as NULL for null values in your upload document, enter that string in the Null String field. The values are treated as null values on upload.
2. Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma or pipe.
3. Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.
4. Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.
Click Import Files.
The Table Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance. If your column headers contain SQL reserved words, reserved characters (for example, year, /, or #), or spaces, the importer alters the characters to make them safe and notifies you of the changes. You can also change the column labels.
Name the table, and click Save Table.

Importing from the Data Catalog

To import from the data catalog:

Open the Data Manager.
Click Data Catalog.
Use the Search box to locate a specific data set, or scroll to find the dataset you want to use. The Contains Geo toggle filters for data sets that contain Geographical information.
Click the Import button beneath the dataset you want to use.
Verify the table and column names in the Data Preview screen.
Click Import Data.

Appending Data to a Table

You can append additional data to an existing table.

To append data to a table:

Open Data Manager.
Select the table you want to append.
Click Append Data.
Click Import data from a local file.
Either click the plus sign (+) or drag your file(s) for upload. The column names and data types of the files you select must match the existing table. HEAVY.AI supports only delimiter-separated formats such as CSV and TSV. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI. In addition to CSV, TSV, and TXT files, you can import compressed delimited files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.
Click Preview.
Click Import Settings
Choose Import Settings:
- Null string: If, instead using a blank for null cells in your upload document, you have substituted strings such as NULL, enter that string in the Null String field. The values are treated as null values on upload.
- Delimiter Type: Delimiters are detected automatically. You can choose a specific delimiter, such as a comma, tab, or pipe.
- Quoted String: Indicate whether your string fields are enclosed by quotes. Delimiter characters inside quotes are ignored.
- Includes Header Row: HEAVY.AI tries to infer whether the first row contains headers or data (for example, if the first row has only strings and the rest of the table contains number values, the first row is inferred to be headers). If HEAVY.AI infers incorrectly, you have the option of manually indicating whether or not the first row contains headers.
- Replicate Table: If you are importing non-geospatial data to a distributed database with more than one node, select this checkbox to replicate the table to all nodes in the cluster. This effectively adds the PARTITIONS='REPLICATED' option to the create table statement. See Replicated Tables.
Close Import Settings.
The Data Preview screen presents sample rows of imported data. The importer assigns a data type based on sampling, but you should examine and modify the selections as appropriate. Assign the correct data type to ensure optimal performance.
If your data contains column headers, verify they match the existing headers.
Click Import Data.

To append data from AWS, click Append Data, then follow the instructions for Loading S3 Data to HEAVY.AI.

Truncating a Table

Sometimes you might want to remove or replace the data in a table without losing the table definition itself.

To remove all data from a table:

Open Data Manager.
Select the table you want to truncate.
Click Delete All Rows.
A very scary red dialog box reminds you that the operation cannot be undone. Click DELETE TABLE ROWS.
Immerse displays the table information with a row count of 0.

Deleting a Table

You can drop a table entirely using Data Manager.

To delete a table:

Open Data Manager.
Select the table you want to delete.
Click DELETE TABLE.
A very scary red dialog box reminds you that the operation cannot be undone. Click DELETE TABLE.
Immerse deletes the table and returns you to the Data Manager TABLES list.