1 of 100

v6.4.3

Overview

HEAVY.AI is an analytics platform designed to handle very large datasets. It leverages the processing power of GPUs alongside traditional CPUs to achieve very high performance. HEAVY.AI combines an open-source SQL engine (HeavyDB), server-side rendering (HeavyRender), and web-based data visualization (Heavy Immerse) to provide a comprehensive platform for data analysis.

HeavyDB

The foundation of the platform is HeavyDB, an open-source, GPU-accelerated database. HeavyDB harnesses GPU processing power and returns SQL query results in milliseconds, even on tables with billions of rows. HeavyDB delivers high performance with rapid query compilation, query vectorization, and advanced memory management.

Native SQL

With native SQL support, HeavyDB returns query results hundreds of times faster than CPU-only analytical database platforms. Use your existing SQL knowledge to query data. You can use the standalone SQL engine with the command line, or the SQL editor that is part of the Heavy Immerse visual analytics interface. Your SQL query results can output to Heavy Immerse or to third-party software such as Birst, Power BI, Qlik, or Tableau.

Geospatial Data

HeavyDB can store and query data using native Open Geospatial Consortium (OGC) types, including POINT, LINESTRING, POLYGON, and MULTIPOLYGON. With geo type support, you can query geo data at scale using special geospatial functions. Using the power of GPU processing, you can quickly and interactively calculate distances between two points and intersections between objects.

Open Source

HeavyDB is open source and encourages contribution and innovation from a global community of users. It is available on Github under the Apache 2.0 license, along with components like a Python interface (heavyai) and JavaScript infrastructure (mapd-connector, mapd-charting), making HEAVY.AI the leader in open-source analytics.

HeavyRender

HeavyRender works on the server side, using GPU buffer caching, graphics APIs, and a Vega-based interface to generate custom pointmaps, heatmaps, choropleths, scatterplots, and other visualizations. HEAVY.AI enables data exploration by creating and sending lightweight PNG images to the web browser, avoiding high-volume data transfers. Fast SQL queries make metadata in the visualizations appear as if the data exists on the browser side.

Network bandwidth is a bottleneck for complex chart data, so HEAVY.AI uses in-situ rendering of on-GPU query results to accelerate visual rendering. This differentiates HEAVY.AI from systems that execute queries quickly but then transfer the results to the client for rendering, which slows performance.

Geospatial Analysis

Efficient geospatial analysis requires fast data-rendering of complex shapes on a map. HEAVY.AI can import and display millions of lines or polygons on a geo chart with minimal lag time. Server-side rendering technology prevents slowdowns associated with transferring data over the network to the client. You can select location shapes down to a local level, like census tracts or building footprints, and cross-filter interactively.

Visualize with Vega

Complex server-side visualizations are specified using an adaptation of the Vega Visualization Grammar. Heavy Immerse generates Vega rendering specifications behind the scenes; however, you can also generate custom visualizations using the same API. This customizable visualization system combines the agility of a lightweight frontend with the power of a GPU engine.

Heavy Immerse

Heavy Immerse is a web-based data visualization interface that uses HeavyDB and HeavyRender for visual interaction. Intuitive and easy to use, Heavy Immerse provides standard visualizations, such as line, bar, and pie charts, as well as complex data visualizations, such as geo point maps, geo heat maps, choropleths, and scatter plots. Heavy Immerse provides quick insights and makes them easy to recognize.

Dashboards

Use dashboards to create and organize your charts. Dashboards automatically cross-filter when interacting with data, and refresh with zero latency. You can create dashboards and interact with conventional charts and data tables, as well as scatterplots and geo charts created by HeavyRender. You can also create your own queries in the SQL editor.

Charts

Heavy Immerse lets you create a variety of different chart types. You can display pointmaps, heatmaps, and choropleths alongside non-geographic charts, graphs, and tables. When you zoom into any map, visualizations refresh immediately to show data filtered by that geographic context. Multiple sources of geographic data can be rendered as different layers on the same map, making it easy to find the spatial relationships between them.

Create geo charts with multiple layers of data to visualize the relationship between factors within a geographic area. Each layer represents a distinct metric overlaid on the same map. Those different metrics can come from the same or a different underlying dataset. You can manipulate the layers in various ways, including reorder, show or hide, adjust opacity, or add or remove legends.

Use Multiple Sources

Heavy Immerse can visually display dozens of datasets in the same dashboard, allowing you to find multi-factor relationships that you might not otherwise consider. Each chart (or groups of charts) in a dashboard can point to a different table, and filters are applied at the dataset level. Multisource dashboards make it easier to quickly compare across datasets, without merging the underlying tables.

Streaming Data

Heavy Immerse is ideal for high-velocity data that is constantly streaming; for example, sensor, clickstream, telematics, or network data. You can see the latest data to spot anomalies and trend variances rapidly. Immerse auto-refresh automatically updates dashboards at flexible intervals that you can tailor to your use case.

Ready to Get Started?

I want to...

See...

Install HEAVY.AI

Upgrade to the latest version

Configure HEAVY.AI

See some tutorials and demos to help get up and running

Learn more about charts in Heavy Immerse

Use HEAVY.AI in the cloud

See what APIs work with HEAVY.AI

Learn about features and resolved issues for each release

Know what issues and limitations to look out for

See answers to frequently asked questions

Installation and Configuration

System Requirements

Software Requirements

Operating Systems
- CentOS/RHEL 7.0 or later
- Ubuntu 18.04 or later

Ubuntu 22.04 is not currently supported.

Additional Components
- OpenJDK version 8 or higher
- EPEL
- wget or curl
- Kernel headers
- Kernel development packages
- log4j 2.15.0 or higher
NVIDIA hardware and software (for GPU installs only)
- Hardware: Ampere, Turing, Volta, or Pascal series GPU cards. HEAVY.AI recommends that each GPU card in a server or distributed environment be of the same series.
- Software:
  - NVIDIA CUDA drivers, version 470 or higher. Run nvidia-smi to determine the currently running driver.
  - Up-to-date Vulkan drivers.
Supported web browsers (Enterprise Edition, Immerse). Latest stable release of:
- Chrome
- FireFox
- Safari version 15.x or higher

Some features in Heavy Immerse are not supported in the Internet Explorer browser due to performance issues in IE. HEAVY.AI recommends that you use a different browser to experience the latest Immerse features.

Installation

You can download HEAVY.AI for your preferred platform from .

The CPU (no GPUs) install does not support backend rendering. For example, Pointmap and Scatterplot charts are not available. The GPU install supports all chart types.

The Open Source options do not require a license, and do not include Heavy Immerse.

Free Version

HEAVY.AI Free is a full-featured version of the HEAVY.AI platform available at no cost for non-hosted commercial use.

To get started with HEAVY.AI Free:

Go to the Get Started with HEAVY.AI, and in the HEAVY.AI Free section, click Get Free License.
On the Get HEAVY.AI Free page, enter your email address and click I Agree.
Open the HEAVY.AI Free Edition Activation Link email that you receive from HEAVY.AI, and click Click Here to view and download the free edition license. You will need this license to run HEAVY.AI after you install it. A copy of the license is also sent to your email.
In the What's Next section, click See Install Options to select the best version of HEAVY.AI for your hardware and software configuration. Follow the instructions for the download or cloud version you choose.
Install HEAVY.AI, using the instructions for your platform.
Verify that OmniSci is working correctly by following the instructions in the Checkpoint section at the end of the installation instructions. For example, the Checkpoint instructions for the CentOS CPU with Tarball installation is here.

Add Users

You can create additional HEAVY.AI users to collaborate with.

Connect to Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.
Open the SQL Editor.
Use the CREATE USER command to create a new user. For information on syntax and options, see CREATE USER.

Installing on Rocky Linux / RHEL

In this section you will find a recipe to install the heavy.ai platofrom on Red Hat and derivates like Rocky Linux

Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL

Install Prerequisites

Install the Extra Packages for Enterprise Linux (EPEL) repository and other packages before installing NVIDIA drivers.

RHEL-based distributions require Dynamic Kernel Module Support (DKMS) to build the GPU driver kernel modules. For more information, see . Upgrade the kernel and restart the machine.

Install Kernel Headers

Install kernel headers and development packages:

If installing kernel headers does not work correctly, follow these steps instead:

Identify the Linux kernel you are using by issuing the uname -r command.
Use the name of the kernel (4.18.0-553.el8_10.x86_64 in the following code example) to install kernel headers and development packages:

Install the dependencies and extra packages:

Install NVIDIA Drivers and Vulkan

CUDA is a parallel computing platform and application programming interface (API) model. It uses a CUDA-enabled graphics processing unit (GPU) for general-purpose processing. The CUDA platform provides direct access to the GPU virtual instruction set and parallel computation elements. For more information on CUDA unrelated to installing HEAVY.AI, see . You can install drivers in multiple ways. This section provides installation information using the or using .

Although using the NVIDIA website is more time-consuming and less automated, you are assured that the driver is certified for your GPU. Use this method if you are not sure which driver to install. If you prefer a more automated method and are confident that the driver is certified, you can use the DNF package manager method.

Install NVIDIA Drivers Using the NVIDIA Website

Install the CUDA package for your platform and operating system according to the instructions on the NVIDIA website ).

If you do not know the GPU model installed on your system, run this command:

The output shows the product type, series, and model. In this example, the product type is Tesla, the series is T (as Turing), and the model is T4.

Select the product type shown after running the command above.
Select the correct product series and model for your installation.
In the Operating System dropdown list, select Linux 64-bit.
In the CUDA Toolkit dropdown list, click a supported version (11.4 or higher).
Click Search.
On the resulting page, verify the download information and click Download.

Please check that the driver's version you download meets the HEAVI.AI .

Move the downloaded file to the server, change the permissions, and run the installation.

You might receive the following error during installation:

ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.

If you receive this error, blacklist the Nouveau driver by editing the /etc/modprobe.d/blacklist-nouveau.conffile, adding the following lines at the end:

blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off

Install NVIDIA Drivers Using DNF

Install a specific version of the driver for your GPU by installing the NVIDIA repository and using the DNF package manager.

When installing the driver, ensure your GPU model is supported and meets the HEAVI.AI .

Add the NVIDIA network repository to your system.

Install the driver version needed with dnf. For 8.0, the minimum version is 535.

To load the installed driver, you can run sudo modprobe nvidia or nvidia-smi commands, or , in case of driver upgrade, you can reboot your system to ensure that the new version of the driver is loaded using the command sudo reboot

Check NVIDIA Driver Installation

Run the specified command to verify that your drivers are installed correctly and recognize the GPUs in your environment. Depending on your environment, you should see output confirming the presence of your NVIDIA GPUs and drivers. This verification step ensures that your system can identify and utilize the GPUs as intended.

If you encounter an error similar to the following, the NVIDIA drivers are likely installed incorrectly: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Please ensure that the latest NVIDIA driver is installed and running.

Please review the section and correct any errors.

Install Vulkan

The back-end renderer requires a Vulkan-enabled driver and the Vulkan library to work correctly. Without these components, the database cannot start without disabling the back-end renderer.

To ensure the Vulkan library and its dependencies are installed, use the DNF.

For more information about troubleshooting Vulkan, see the section.

Install CUDA Toolkit ᴼᴾᵀᴵᴼᴺᴬᴸ

You must install the CUDA Toolkit if you use advanced features like C++ User-Defined Functions or User-Defined Table Functions to extend the database capabilities.

Add the NVIDIA network repository to your system:

2. List the available CUDA Toolkit versions using the DNF list command

3. Install the CUDA Toolkit version using DNF.

4. Check that everything is working correctly:

Installing on Ubuntu

In this section, you will find recipes to install HEAVY.AI platform and NVIDIA drivers using package manager like apt or tarball.

Installing on Docker

Installing OmniSci on Docker

In this section you will find the recipes to install HEAVY.AI platform using Dcoker.

Getting Started on AWS

Getting Started with AWS AMI

You can use the HEAVY.AI AWS AMI (Amazon Web Services Amazon Machine Image) to try HeavyDB and Heavy Immerse in the cloud. Perform visual analytics with the included New York Taxi database, or import and explore your own data.

Many options are available when deploying an AWS AMI. These instructions skip to the specific tasks you must perform to deploy a sample environment.

Prerequisite

You need a security key pair when you launch your HEAVY.AI instance. If you do not have one, create one before you continue.

Go to the EC2 Dashboard.
Select Key Pairs under Network & Security.
Click Create Key Pair.
Enter a name for your key pair. For example, MyKey.
Click Create. The key pair PEM file downloads to your local machine. For example, you would find MyKey.pem in your Downloads directory.

Launching Your Instance

Go to the and select the version you want to use. You can get overview information about the product, see pricing, and get usage and support information.
Click Continue to Subscribe to subscribe.
Read the Terms and Conditions, and then click Continue to Configuration.
Select the Fulfillment Option, Software Version, and Region.
Click Continue to Launch.
On the Launch this software page, select Launch through EC2, and then click Launch.
From the Choose and Instance Type page, select an available EC2 instance type, and click Review and Launch.
Review the instance launch details, and click Launch.
Select a key pair, or click Create a key pair to create a new key pair and download it, and then click Launch Instances.
On the Launch Status page, click the instance name to see it on your EC2 Dashboard Instances page.

Using HEAVY.AI Immerse on Your AWS Instance

To connect to Heavy Immerse, you need your Public IP address and Instance ID for the instance you created. You can find these values on the Description tab for your instance.

To connect to Heavy Immerse:

Point your Internet browser to the public IP address for your instance, on port 6273. For example, for public IP 54.83.211.182, you would use the URL https://54.83.211.182:6273.
If you receive an error message stating that the connection is not private, follow the prompts onscreen to click through to the unsecured website. To secure your site, see .
1. Enter the USERNAME (admin), PASSWORD ( {Instance ID} ), and DATABASE (heavyai). If you are using the BYOL version, enter you license key in the key field and click Apply.
Click Connect.
On the Dashboards page, click NYC Taxi Rides. Explore and filter the chart information on the NYC Taxis Dashboard.

For more information on Heavy Immerse features, see .

Importing Your Own Data

Working with your own familiar dataset makes it easier to see the advantages of HEAVY.AI processing speed and data visualization.

To import your own data to Heavy Immerse:

Export your data from your current datastore as a comma-separated value (CSV) or tab-separated value (TSV) file. HEAVY.AI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI.
Point your Internet browser to the public IP address for your instance, on port 6273. For example, for public IP 54.83.211.182, you would use the URL https://54.83.211.182:6273.
Enter the USERNAME (admin) and PASSWORD ( {instance ID} ). If you are using the BYOL version, enter you license key in the key field and click Apply.
Click Connect.
Click Data Manager, and then click Import Data.
Drag your data file onto the table importer page, or use the directory selector.
Click Import Files.
Verify the column names and datatypes. Edit them if needed.
Enter a Name for your table.
Click Save Table.
Click Connect to Table.
On the New Dashboard page, click Add Chart.
Choose a chart type.
Add dimensions and measures as required.
Click Apply.
Enter a Name for your dashboard.
Click Save.

For more information, see .

Accessing Your HEAVY.AI Instance Using SSH

Follow these instructions to connect to your instance using SSH from MacOS or Linux. For information on connecting from Windows, see .

Open a terminal window.
Locate your private key file (for example, MyKey.pem). The wizard automatically detects the key you used to launch the instance.
Your key must not be publicly viewable for SSH to work. Use this command to change permissions, if needed:
Connect to your instance using its Public DNS. The default user name is centos or ubuntu, depending on the version you are using. For example:
Use the following command to run the heavysql SQL command-line utility on HeavyDB. The default user is admin and the default password is { Instance ID }:
For more information, see .

Getting Started on GCP

Getting Started with HEAVY.AI on Google Cloud Platform

Follow these instructions to get started with HEAVY.AI on Google Cloud Platform (GCP).

Prerequisites

You must have a Google Cloud Platform account. If you do not have an account, follow these instructions to sign up for one.

To launch HEAVY.AI on Google Cloud Platform, you select and configure an instance.

Launching Your HEAVY.AI Instance

On the solution Launcher Page, click Launch on Compute Engine to begin configuring your deployment.

Before deploying a solution with a GPU machine type, avoid potential deployment failure by checking your available quota for a project to make sure that you have not exceeded your limit.

To launch HEAVY.AI on Google Cloud Platform, you select and configure a GPU-enabled instance.

Search for HEAVY.AI on the heavyai-launcher-public project on Google Cloud Platform, and select a solution. OmniSci has four instance types:
On the solution Launcher Page, click Launch to begin configuring your deployment.
On the new deployment page, configure the following:
- Deployment name
- Zone
- Machine type - Click Customize and configure Cores and Memory, and select Extend memory if necessary.
- GPU type. (Not applicable for CPU configurations.)
- Number of GPUs - (Not applicable for CPU configurations.) Select the number of GPUs; subject to quota and GPU type by region. For more information about GPU-equipped instances and associated resources, see GPU Models for Compute Engine.
- Boot disk type
- Boot disk size in GB
- Networking - Set the Network, Subnetwork, and External IP.
- Firewall - Select the required ports to allow TCP-based connectivity to HEAVY.AI. Click More to set IP ranges for port traffic and IP forwarding.
Accept the GCP Marketplace Terms of Service and click Deploy.
In the Deployment Manager, click the instance that you deployed.
Launch the Heavy Immerse client:
- Record the Admin password (Temporary).
- Click the Site address link to go to the Heavy Immerse login page. Enter the password you recorded, and click Connect.
- Copy your license key from the registration email message. If you have not received your license key, contact your Sales Representative or register for your 30-day trial here.
- Connect to Immerse using a web browser connected to your host machine on port 6273. For example, http://heavyai.mycompany.com:6273.
- When prompted, paste your license key in the text box and click Apply.
- Click Connect to start using HEAVY.AI.
On successful login, you see a list of sample dashboards loaded into your instance.

Getting Started on Azure

Getting Started with HEAVY.AI on Microsoft Azure

Follow these instructions to get started with HEAVY.AI on Microsoft Azure.

Prerequisites

You must have a Microsoft Azure account. If you do not have an account, go to to sign up for one.

Configure Your HEAVY.AI Instance

To launch HEAVY.AI on Microsoft Azure, you configure a GPU-enabled instance.

1) Log in to you Microsoft Azure portal.

2) On the left side menu, create a Resource group, or use one that your organization has created.

3) On the left side menu, click Virtual machines, and then click Add.

4) Create your virtual machine:

On the Basics tab:
- In Project Details, specify the Resource group.
- Specify the Instance Details:
  - Virtual machine name
  - Region
  - Image (Ubuntu 16.04 or higher, or CentOS/RHEL 7.0 or higher)
  - Size. Click Change size and use the Family filter to filter on GPU, based on your use case and requirements. Not all GPU VM variants are available in all regions.
- For Username, add any user name other than admin.
- In Inbound Port Rules, click Allow selected ports and select one or more of the following:
  - HTTP (80)
  - HTTPS (443)
  - SSH (22)
On the Disks tab, select Premium or Standard SSD, depending on your needs.
For the rest of the tabs and sections, use the default values.

5) Click Review + create. Azure reviews your entries, creates the required services, deploys them, and starts the VM.

6) Once the VM is running, select the VM you just created and click the Networking tab.

7) Click the Add inbound button and configure security rules to allow any source, any destination, and destination port 6273 so you can access Heavy Immerse from a browser on that port. Consider renaming the rule to 6273-Immerse or something similar so that the default name makes sense.

8) Click Add and verify that your new rule appears.

Azure-specific configuration is complete. Now, follow standard for your Linux distribution and installation method.

Upgrading

In this section, you will find recipes to upgrade from the OmniSci to the HEAVY.AI platform and upgrade between versions of the HEAVY.AI platform.

Upgrade Paths supported

It's not always possible to upgrade from your actual product version to the last one, but one or more intermediate upgrade steps are needed.

The following table shows the steps needed to move from one software version to another.

Initial version

Final version

Upgrade Steps

OmniSci less then 5.5

HEAVY.AI 7.0

Upgrade to 5.5 --> --> 7.0

OmniSci 5.5-5.10

HEAVY.AI 7.0

Upgrade to --> 7.0

HEAVY.AI 6.0-6.4

HEAVY.AI 7.0

Upgrade to 7.0

Versions 5.x and 6.0.0 are not currently supported; use these only as needed to facilitate an upgrade to a supported version.

As an example, if you are using an OmniSci version older than 5.5, you must upgrade to 5.5, then to 6.0 and after that to 7.0, while if you are using 6.0-6.4.4 you can upgrade to 7.0.0 in a single step.

Upgrading HEAVY.AI

This section is giving a recipe to upgrade between fully compatible products version.

As with any software upgrade, it is important that you back up your data before upgrading. Each release introduces efficiencies that are not necessarily compatible with earlier releases of the platform. HeavyAI is never expected to be backward compatible.

Back up the contents of your $HEAVYAI_STORAGE directory.

Upgrading from Omnisci

If you need to upgrade from Omnisci to HEAVY.AI 6.0 or later, please refer to the specific recipe.

Direct upgrades from Omnisci to HEAVY.AI version later than 6.0 aren't allowed nor supported.

Upgrading Using Docker

To upgrade HEAVY.AI in place in Docker

In a terminal window, get the Docker container ID.

sudo docker container ps --format "{{.Id}} {{.Image}}" \
-f status=running | grep omnisci\/

You should see output similar to the following. The first entry is the container ID. In this example, it is 9e01e520c30c:

9e01e520c30c omnisci/omnisci-ee-gpu

Stop the HEAVY.AI Docker container. For example:

docker container stop 9e01e520c30c

Optionally, remove the HEAVY.AI Docker container. This removes unused Docker containers on your system and saves disk space.

docker container rm 9e01e520c30c

Backup the Omnisci data directory (typically /var/lib/omnisci)

tar zcvf /backup_dir/omnisci_storage_backup.tar.gz /var/lib/omnisci

Download the latest version of the HEAVY.AI Docker image according to the Edition and device you are actually coming from Select the tab depending on the Edition (Enterprise, Free, or Open Source) and execution Device (GPU or CPU) you are upgrading.

sudo docker run -d --gpus=all \
  -v /var/lib/heavyai:/var/lib/heavyai \
  -p 6273-6278:6273-6278 \
  heavyai/heavyai-ee-cuda:latest

sudo docker run -d -v \
/var/lib/heavyai:/var/lib/heavyai \
-p 6273-6278:6273-6278 \
heavyai/heavyai-ee-cpu:latest

sudo docker run -d --gpu=all \
  -v /var/lib/heavyai:/var/lib/heavyai \
  -p 6273-6278:6273-6278 \
  heavyai/core-os-cuda:latest

sudo docker run -d -v \
/var/lib/heavyai:/var/lib/heavyai \
-p 6273-6278:6273-6278 \
heavyai/core-os-cpu:latest

If you don't want to upgrade to the latest version but want to upgrade to a specific version, change thelatesttag with the version needed.

If the version needed is the 6.0 use v6.0.0 as the version tag in the image name

heavyai/heavyai-ee-cuda:v6.0.0

Check that the docker is up and running a docker ps commnd:

sudo docker container ps --format "{{.Image}} {{.Status}}" \
-f status=running | grep heavyai\/

You should see an output similar to the following.

heavyai/heavyai-ee-cuda Up 48 seconds ago

This runs both the HEAVY.AI database and Immerse in the same container.

You can optionally add --rm to the Docker run command so that the container is removed when it is stopped.

See also the note regarding the CUDA JIT Cache in Optimizing Performance.

Upgrading HEAVY.AI Using Package Managers and Tarball

To upgrade an existing system installed with package managers or tarball. The commands upgrade HEAVY.AI in place without disturbing your configuration or stored data

Stop the HEAVY.AI services.

sudo systemctl stop heavydb heavy_web_server

Back up your $HEAVYAI_STORAGE directory (the default location is /var/lib/heavyai).

Run the appropriate set of commands depending on the method used to install the previous version of the software.

sudo yum update heavyai.x86_64

sudo apt update
sudo apt upgrade heavyai

Make a backup of your actual installation

sudo mv /opt/heavyai /opt/heavyai_backup

Download and Install the latest version following the install documentation for your Operative System CentOS/RHEL and Ubuntu

When the upgrade is complete, start the HEAVY.AI services.

sudo systemctl start heavydb heavy_web_server

CUDA Compatibility Drivers

This procedure is considered experimental.

In some situations, you might not be able to upgrade NVIDIA CUDA drivers on a regular basis. To work around this issue, NVIDIA provides compatibility drivers that allow users to use newer features without requiring a full upgrade. For information about compatibility drivers, see .

Installing the Drivers

Use the following commands to install the CUDA 11 compatibility drivers on Ubuntu:

After the last nvidia-smi, ensure that CUDA shows the correct version.

The driver version will still show as the old version.

Updating systemd Files

After installing the drivers, update the systemd files in /lib/systemd/system/heavydb.service.

In the service section, add or update the environment property

The file should look like that

Then force the reload of the systemd configuration

Uninstalling

This is a recipe to permanently remove HEAVY.AI Software, services, and data from your system.

Uninstalling HEAVY.AI from Docker

To uninstall HEAVY.AI in Docker, stop and delete the current Docker container.

In a terminal window, get the Docker container ID:

sudo docker container ps --format "{{.Id}} {{.Image}}" \
-f status=running | grep heavyai\/

You should see an output similar to the following. The first entry is the container ID. In this example, it is 9e01e520c30c:

9e01e520c30c omnisci/omnisci-ee-gpu

To see all containers, both running and stopped, use the following command:

sudo docker container ps -a

Stop the HEAVY.AI Docker container. For example:

sudo docker container stop 9e01e520c30c

Remove the HEAVY.AI Docker container to save disk space. For example:

sudo docker container rm 9e01e520c30c

Uninstalling HEAVY.AI on Redhat and Ubuntu

To uninstall an existing system installed with Yum, Apt, or Tarball connect using the user that runs the platform, typically heavyai.

Disable and stop all HEAVY.AI services.

sudo systemctl disable heavy_web_server --now
sudo systemctl disable heavydb --now

Remove the HEAVY.AI Installation files. (the $HEAVYAI_PATH defaults to /opt/heavyai)

sudo yum remove heavyai.x86_64

sudo apt remove heavyai

sudo rm -r $(readlink $HEAVYAI_PATH) $HEAVYAI_PATH

Delete the configuration files and the storage removing the $HEAVYAI_BASE directory. (defaults to /var/lib/heavyai)

sudo rm  -r $HEAVYAI_BASE

Remove permanently the configuration of the services.

sudo rm /lib/systemd/heavydb*.service
sudo rm /lib/systemd/heavy_web_server*.service
sudo systemctl daemon-reload
sudo systemctl reset-failed

Ports

HEAVY.AI uses the following ports.

Services and Utilities

Using Services

HEAVY.AI features two system services: heavydb and heavy_web_server. You can start these services individually using systemd.

Starting and Stopping HeavyDB Using `systemd`

For permanent installations of HeavyDB, HEAVY.AI recommends that you use systemd to manage HeavyDB services. systemd automatically handles tasks such as log management, starting the services on restart, and restarting the services if there is a problem.

In addition, systemd manages the open-file limit in Linux. Some cloud providers and distributions set this limit too low, which can result in errors as your HEAVY.AI environment and usage grow. For more information about adjusting the limits on open files, see in .

Initial Setup

You use the install_heavy_systemd.sh script to prepare systemd to run HEAVY.AI services. The script asks questions about your environment, then installs the systemd service files in the correct location. You must run the script as the root user so that the script can perform tasks such as creating directories and changing ownership.

The install_heavy_systemd.sh script asks for the information described in the following table.

Starting HeavyDB Using `systemd`

To manually start HeavyDB using systemd, run:

Restarting HeavyDB Using `systemd`

You can use systemd to restart HeavyDB — for example, after making configuration changes:

Stopping HeavyDB Using `systemd`

To manually stop HeavyDB using systemd, run:

Enabling HeavyDB on Startup

To enable the HeavyDB services to start on restart, run:

Using Configuration Parameters

You can customize the behavior of your HEAVY.AI servers by modifying your heavy.conf configuration file. See .

Using Utilities

HeavyDB includes the utilities for database initialization and for generating certificates and private keys for an HTTPS server.

initdb

Before using HeavyDB, initialize the data directory using initdb:

This creates three subdirectories:

catalogs: Stores HeavyDB catalogs
data: Stores HeavyDB data
log: Contains all HeavyDB log files.
disk_cache: Stores the data cached by HEAVY COnnect

The -f flag forces initdb to overwrite existing data and catalogs in the specified directory.

By default, initdb adds a sample table of geospatial data. Use the --skip-geo flag if you prefer not to load sample geospatial data.

generate_cert

This command generates certificates and private keys for an HTTPS server. The options are:

[{-ca} <bool>]: Whether this certificate should be its own Certificate Authority. The default is false.
[{-duration} <duration>]: Duration that the certificate is valid for. The default is 8760h0m0s.
[{-ecdsa-curve} <string>]: ECDSA curve to use to generate a key. Valid values are P224, P256, P384, P521.
[{-host} <string>]: Comma-separated hostnames and IPs to generate a certificate for.
[{-rsa-bits} <int>]: Size of RSA key to generate. Ignored if –ecdsa-curve is set. The default is 2048.
[{-start-date} <string>]: Start date formatted as Jan 1 15:04:05 2011

Configuration Parameters

Overview

HEAVY.AI has minimal configuration requirements with a number of additional configuration options. This topic describes the required and optional configuration changes you can use in your HEAVY.AI instance.

In release 4.5.0 and higher, HEAVY.AI requires that all configuration flags used at startup match a flag on the HEAVY.AI server. If any flag is misspelled or invalid, the server does not start. This helps ensure that all settings are intentional and will not have an unexpected impact on performance or data integrity.

Storage Directory

Before starting the HEAVY.AI server, you must initialize the persistent storage directory. To do so, create an empty directory at the desired path, such as /var/lib/heavyai.

Create the environment variable $HEAVYAI_BASE.

export HEAVYAI_BASE=/var/lib/heavyai

2. Then, change the owner of the directory to the user that the server will run as ($HEAVYAI_USER):

sudo mkdir -p $HEAVYAI_BASE sudo chown -R $HEAVYAI_USER $HEAVYAI_BASE

where $HEAVYAI_USER is the system user account that the server runs as, such as heavyai, and $HEAVYAI_BASE is the path to the parent of the HEAVY.AI server storage directory.

3. Run $HEAVYAI_PATH/bin/initheavy with the storage directory path as the argument:

$HEAVYAI_PATH/bin/initheavy $HEAVYAI_BASE/storage

Configuring a Custom Heavy Immerse Subdirectory

Immerse serves the application from the root path (/) by default. To serve the application from a sub-path, you must modify the $HEAVYAI_PATH/frontend/app-config.js file to change the IMMERSE_PATH_PREFIX value. The Heavy Immerse path must start with a forward slash (/).

Configuration File

The configuration file stores runtime options for your HEAVY.AI servers. You can use the file to change the default behavior.

The heavy.conf file is stored in the $HEAVYAI_BASE directory. The configuration settings are picked up automatically by the sudo systemctl start heavydb and sudo systemctl start heavy_web_server commands.

Set the flags in the configuration file using the format <flag> = <value>. Strings must be enclosed in quotes.

The following is a sample configuration file. The entry for data path is a string and must be in quotes. The last entry in the first section, for null-div-by-zero, is the Boolean value true and does not require quotes.

port = 6274 
http-port = 6278
data = "/var/lib/heavyai/storage"
null-div-by-zero = true

[web]
port = 6273
frontend = "/opt/heavyai/frontend"
servers-json = "/var/lib/heavyai/servers.json"
enable-https = true

To comment out a line in heavy.conf, prepend the line with the pound sign (#) character.

For encrypted backend connections, if you do not use a configuration file to start the database, Calcite expects passwords to be supplied through the command line, and calcite passwords will be visible in the processes table. If a configuration file is supplied, then passwords must be supplied in the file. If they are not, Calcite will fail.

Security

Implementing a Secure Binary Interface

Follow these instructions to start an HEAVY.AI server with an encrypted main port.

Required PKI Components

You need the following PKI (Public Key Infrastructure) components to implement a Secure Binary Interface.

A CRT (short for certificate) file containing the server's PKI certificate. This file must be shared with the clients that connect using encrypted communications. Ideally, this file is signed by a recognized certificate issuing agency.
A key file containing the server's private key. Keep this file secret and secure.
A Java TrustStore containing the server's PKI certificate. The password for the trust store is also required.

Although in this instance the trust store contains only information that can be shared, the Java TrustStore program requires it to be password protected.

A Java KeyStore and password.
In a distributed system, add the configuration parameters to the heavyai.conf file on the aggregator and all leaf nodes in your HeavyDB cluster.

Demonstration Script to Create "Mock/Test" PKI Components

You can use OpenSSL utilities to create the various PKI elements. The server certificate in this instance is self-signing, and should not be used in a production system.

Generate a new private key.
Use the private key to generate a certificate signing request.
Self sign the certificate signing request to create a public certificate.
Use the Java tools to create a key store from the public certificate.

To generate a keystore file from your server key:

Copy server.key to server.txt. Concatenate it with server.crt.
Use server.txt to create a PKCS12 file.
Use server.p12 to create a keystore.

Start the Server in Encrypted Mode with PKI Client Authentication

Start the server using the following options.

Example

Configuring heavyai.conf for Encrypted Connection

Alternatively, you can add the following configuration parameters to heavyai.conf to establish a Secure Binary Interface. The following configuration flags implement the same encryption shown in the runtime example above:

Passwords for the SSL truststore and keystore can be enclosed in single (') or double (") quotes.

Why Use Both server.crt and a Java TrustStore?

The server.crt file and the Java truststore contain the same public key information in different formats. Both are required by the server to establish both the secure client communication with the various interfaces and with its Calcite server. At startup, the Java truststore is passed to the Calcite server for authentication and to encrypt its traffic with the HEAVY.AI server.

Encrypted Credentials in Custom Applications

HEAVY.AI can accept a set of encrypted credentials for secure authentication of a custom application. This topic provides a method for providing an encryption key to generate encrypted credentials and configuration options for enabling decryption of those encrypted credentials.

Generating an Encryption Key

Generate a 128- or 256-bit encryption key and save it to a file. You can use to generate a suitable encryption key.

Configuring the Web Server

Set the file path of the encryption key file to the encryption-key-file-path web server parameter in heavyai.conf:

Alternatively, you can set the path using the --encryption-key-file-path=path/to/file command-line argument.

Generating Encrypted Credentials

Generate encrypted credentials for a custom application by running the following Go program, replacing the example key and credentials strings with an actual key and actual credentials. You can also run the program in a web browser at .

Loading and Exporting Data

Supported Data Sources

Kafka

is a distributed streaming platform. It allows you to create publishers, which create data streams, and consumers, which subscribe to and ingest the data streams produced by publishers.

You can use HeavyDB C++ program to consume a topic created by running Kafka shell scripts from the command line. Follow the procedure below to use a Kafka producer to send data, and a Kafka consumer to store the data, in HeavyDB.

This example assumes you have already installed and configured Apache Kafka. See the .

Creating a Topic

Create a sample topic for your Kafka producer.

Run the kafka-topics.sh script with the following arguments:
Create a file named myfile that consists of comma-separated data. For example:
Use heavysql to create a table to store the stream.

Using the Producer

Load your file into the Kafka producer.

Create and start a producer using the following command.

Using the Consumer

Load the data to HeavyDB using the Kafka console consumer and the KafkaImporter program.

Pull the data from Kafka into the KafkaImporter program.
Verify that the data arrived using heavysql.

Command Line

SQL

Data Definition (DDL)

Views

DDL - Views

A view is a virtual table based on the result set of a SQL statement. It derives its fields from a SELECT statement. You can do anything with a HEAVY.AI view query that you can do in a non-view HEAVY.AI query.

Nomenclature Constraints

View object names must use the NAME format, described in regex notation as:

[A-Za-z_][A-Za-z0-9\$_]*

CREATE VIEW

Creates a view based on a SQL statement.

Example

CREATE VIEW view_movies
AS SELECT movies.movieId, movies.title, movies.genres, avg(ratings.rating)
FROM ratings
JOIN movies on ratings.movieId=movies.movieId
GROUP BY movies.title, movies.movieId, movies.genres;

You can describe the view as you would a table.

\d view_movies
VIEW defined AS: SELECT  movies.movieId, movies.title, movies.genres,
avg(ratings.rating) FROM ratings JOIN movies ON ratings.movieId=movies.movieId
GROUP BY movies.title, movies.movieId, movies.genres
Column types:
    movieId INTEGER,
    title TEXT ENCODING DICT(32),
    genres TEXT ENCODING DICT(32),
    EXPR$3 DOUBLE

You can query the view as you would a table.

SELECT title, EXPR$3 from view_movies where movieId=260;
Star Wars: Episode IV - A New Hope (1977)|4.048937

DROP VIEW

Removes a view created by the CREATE VIEW statement. The view definition is removed from the database schema, but no actual data in the underlying base tables is modified.

Example

DROP VIEW IF EXISTS v_reviews;

Policies

You can use policies to provide row-level security (RLS) in HEAVY.AI.

CREATE POLICY

Create an RLS policy for a user or role (<name>); admin rights are required. All queries on the table for the user or role are automatically filtered to include only rows where the column contains any one of the values from the VALUES clause.

RLS filtering works similarly to a WHERE column = value clause, appended to every query or subquery on the table, would work. If policies on multiple columns in the same table are defined for a user or role, then a row is visible to that user or role if any one or more of the policies matches that row.

DROP POLICY

Drop an RLS policy for a user or role (<name>); admin rights are required. All values specified for the column by the policy are dropped. Effective values from another policy on an inherited role are not dropped.

SHOW POLICIES

Displays a list of all RLS policies that exist for a user or role. If EFFECTIVE is used, the list also include any policies that exist for all roles that apply to the requested user or role.

Data Manipulation (DML)

SQL Capabilities

ALTER SESSION SET

Change a parameter value for the current session.

Paremeter name

Values

Alter Session Examples

CURRENT_DATABASE

Switch to another database without need of re-login.

Your session will silently switch to the requested database.

The database exists, but the user does not have access to it:

The database does not exist:

EXECUTOR_DEVICE

Force the session to run the subsequent SQL commands in CPU mode:

Switch back the session to run in GPU mode

ALTER SYSTEM CLEAR

Clear CPU, GPU, or RENDER memory. Available to super users only.

ALTER SYSTEM CLEAR (CPU|GPU|RENDER) MEMORY

Examples

ALTER SYSTEM CLEAR CPU MEMORY

ALTER SYSTEM CLEAR GPU MEMORY

ALTER SYSTEM CLEAR RENDER MEMORY

Generally, the server handles memory management, and you do not need to use this command. If you are having unexpected memory issues, try clearing the memory to see if performance improves.

DELETE

Deletes rows that satisfy the WHERE clause from the specified table. If the WHERE clause is absent, all rows in the table are deleted, resulting in a valid but empty table.

Cross-Database Queries

In Release 6.4 and higher, you can run DELETE queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases.

To execute queries against another database, you must have ACCESS privilege on that database, as well as DELETE privilege.

Example

Delete rows from a table in the my_other_db database:

INSERT

Use INSERT for both single- and multi-row ad hoc inserts. (When inserting many rows, use the more efficient COPY command.)

INSERT INTO <table> (column1, ...) VALUES (row_1_value_1, ...), ..., (row_n_value_1, ...);

Examples

CREATE TABLE ar (ai INT[], af FLOAT[], ad2 DOUBLE[2]); 
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}); 
INSERT INTO ar VALUES (ARRAY[NULL,2],NULL,NULL); 
INSERT INTO ar VALUES (NULL,{},{2.0,NULL});
-- or a multi-row insert equivalent
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}), (ARRAY[NULL,2],NULL,NULL), (NULL,{},{2.0,NULL});

You can also insert into a table as SELECT, as shown in the following examples:

INSERT INTO destination_table SELECT * FROM source_table;

INSERT INTO destination_table (id, name, age, gender) SELECT * FROM source_table;

INSERT INTO destination_table (name, gender, age, id) SELECT name, gender, age, id  FROM source_table;

INSERT INTO votes_summary (vote_id, vote_count) SELECT vote_id, sum(*) FROM votes GROUP_BY vote_id;

You can insert array literals into array columns. The inserts in the following example each have three array values, and demonstrate how you can:

Create a table with variable-length and fixed-length array columns.
Insert NULL arrays into these colums.
Specify and insert array literals using {...} or ARRAY[...] syntax.
Insert empty variable-length arrays using{} and ARRAY[] syntax.
Insert array values that contain NULL elements.

CREATE TABLE ar (ai INT[], af FLOAT[], ad2 DOUBLE[2]); 
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}); 
INSERT INTO ar VALUES (ARRAY[NULL,2],NULL,NULL); 
INSERT INTO ar VALUES (NULL,{},{2.0,NULL});

Default Values

If you create a table with column that has a default value, or alter a table to add a column with a default value, using the INSERT command creates a record that includes the default value if it is omitted from the INSERT. For example, assume a table created as follows:

CREATE TABLE tbl (
   id INTEGER NOT NULL, 
   name TEXT NOT NULL DEFAULT 'John Doe', 
   age SMALLINT NOT NULL);

If you omit the name column from an INSERT or INSERT FROM SELECT statement, the missing value for column name is set to 'John Doe'.

INSERT INTO tbl (id, age) VALUES (1, 36); creates the record 1|'John Doe'|36 .

INSERT INTO tbl (id, age) SELECT id, age FROM old_tbl; also sets all the name values to John Doe .

KILL QUERY

Interrupt a queued query. Specify the query by using its session ID.

To see the queries in the queue, use the SHOW QUERIES command:

show queries;
query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU
947-ooNP        |RUNNING_IMPORTER    |0          |2021-08-03 ...|IMPORT_GEO_TABLE|Rio       |tcp:::ffff:127.0.0.1:47314|omnisci|CPU

To interrupt the last query in the list (ID 946-ooNP):

kill query '946-ooNP'

Showing the queries again indicates that 946-ooNP has been deleted:

show queries;
query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU

KILL QUERY is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt) is set.
Interrupting a query in ‘PENDING_QUEUE’ status is supported in both distributed and single-server mode.
To enable query interrupt for tables imported from data files in local storage, set enable_non_kernel_time_query_interrupt to TRUE. (It is enabled by default.)

LIKELY/UNLIKELY

Expression

Description

LIKELY(X)

Provides a hint to the query planner that argument X is a Boolean value that is usually true. The planner can prioritize filters on the value X earlier in the execution cycle and return results more efficiently.

UNLIKELY(X)

Provides a hint to the query planner that argument X is a Boolean value that is usually not true. The planner can prioritize filters on the value X later in the execution cycle and return results more efficiently.

Usage Notes

SQL normally assumes that terms in the WHERE clause that cannot be used by indices are usually true. If this assumption is incorrect, it could lead to a suboptimal query plan. Use the LIKELY(X) and UNLIKELY(X) SQL functions to provide hints to the query planner about clause terms that are probably not true, which helps the query planner to select the best possible plan.

Use LIKELY/UNLIKELY to optimize evaluation of OR/AND logical expressions. LIKELY/UNLIKELY causes the left side of an expression to be evaluated first. This allows the right side of the query to be skipped when possible. For example, in the clause UNLIKELY(A) AND B, if A evaluates to FALSE, B does not need to be evaluated.

Consider the following:

SELECT COUNT(*) FROM test WHERE UNLIKELY(x IN (7, 8, 9, 10)) AND y > 42;

If x is one of the values 7, 8, 9, or 10, the filter y > 42 is applied. If x is not one of those values, the filter y > 42 is not applied.

UPDATE

Changes the values of the specified columns based on the assign argument (identifier=expression) in all rows that satisfy the condition in the WHERE clause.

UPDATE table_name SET assign [, assign ]* [ WHERE booleanExpression ]

Example

UPDATE UFOs SET shape='ovate' where shape='eggish';

Currently, HEAVY.AI does not support updating a geo column type (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, or MULTIPOLYGON) in a table.

Update Via Subquery

You can update a table via subquery, which allows you to update based on calculations performed on another table.

Examples

UPDATE test_facts SET lookup_id = (SELECT SAMPLE(test_lookup.id) 
FROM test_lookup WHERE test_lookup.val = test_facts.val);

UPDATE test_facts SET val = val+1, lookup_id = (SELECT SAMPLE(test_lookup.id)
FROM test_lookup WHERE test_lookup.val = test_facts.val);

UPDATE test_facts SET lookup_id = (SELECT SAMPLE(test_lookup.id) 
FROM test_lookup WHERE test_lookup.val = test_facts.val) WHERE id < 10;

Cross-Database Queries

In Release 6.4 and higher, you can run UPDATE queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases.

To execute queries against another database, you must have ACCESS privilege on that database, as well as UPDATE privilege.

Example

Update a row in a table in the my_other_db database:

UPDATE my_other_db.customers SET name = 'Joe' WHERE id = 10;

Arrays

HEAVY.AI supports arrays in dictionary-encoded text and number fields (TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, and DOUBLE). Data stored in arrays are not normalized. For example, {green,yellow} is not the same as {yellow,green}. As with many SQL-based services, OmniSci array indexes are 1-based.

HEAVY.AI supports NULL variable-length arrays for all integer and floating-point data types, including dictionary-encoded string arrays. For example, you can insert NULL into BIGINT[ ], DOUBLE[ ], or TEXT[ ] columns. HEAVY.AI supports NULL fixed-length arrays for all integer and floating-point data types, but not for dictionary-encoded string arrays. For example, you can insert NULL into BIGINT[2] DOUBLE[3], but not into TEXT[2] columns.

Expression

Description

ArrayCol[n] ...

Returns value(s) from specific location n in the array.

UNNEST(ArrayCol)

Extract the values in the array to a set of rows. Requires GROUP BY; projecting UNNEST is not currently supported.

test = ANY ArrayCol

ANY compares a scalar value with a single row or set of values in an array, returning results in which at least one item in the array matches. ANY must be preceded by a comparison operator.

test = ALL ArrayCol

ALL compares a scalar value with a single row or set of values in an array, returning results in which all records in the array field are compared to the scalar value. ALL must be preceded by a comparison operator.

CARDINALITY()

Returns the number of elements in an array. For example:

Examples

The following examples show query results based on the table test_array created with the following statement:

CREATE TABLE test_array (name TEXT ENCODING DICT(32),colors TEXT[] ENCODING DICT(32), qty INT[]);

omnisql> SELECT * FROM test_array;
name|colors|qty
Banana|{green, yellow}|{1, 2}
Cherry|{red, black}|{1, 1}
Olive|{green, black}|{1, 0}
Onion|{red, white}|{1, 1}
Pepper|{red, green, yellow}|{1, 2, 3}
Radish|{red, white}|{}
Rutabaga|NULL|{}
Zucchini|{green, yellow}|{NULL}

omnisql> SELECT UNNEST(colors) AS c FROM test_array;
Exception: UNNEST not supported in the projection list yet.

omnisql> SELECT UNNEST(colors) AS c, count(*) FROM test_array group by c;
c|EXPR$1
green|4
yellow|3
red|4
black|2
white|2

omnisql> SELECT name, colors [2] FROM test_array;
name|EXPR$1
Banana|yellow
Cherry|black
Olive|black
Onion|white
Pepper|green
Radish|white
Rutabaga|NULL
Zucchini|yellow

omnisql> SELECT name, colors FROM test_array WHERE colors[1]='green';
name|colors
Banana|{green, yellow}
Olive|{green, black}
Zucchini|{green, yellow}

omnisql> SELECT * FROM test_array WHERE colors IS NULL;
name|colors|qty
Rutabaga|NULL|{}

The following queries use arrays in an INTEGER field:

omnisql> SELECT name, qty FROM test_array WHERE qty[2] >1;
name|qty
Banana|{1, 2}
Pepper|{1, 2, 3}

omnisql> SELECT name, qty FROM test_array WHERE 15< ALL qty;
No rows returned.

omnisql> SELECT name, qty FROM test_array WHERE 2 = ANY qty;
name|qty
Banana|{1, 2}
Pepper|{1, 2, 3}

omnisql> SELECT COUNT(*) FROM test_array WHERE qty IS NOT NULL;
EXPR$0
8

omnisql> SELECT COUNT(*) FROM test_array WHERE CARDINALITY(qty)<0;
EXPR$0
6

Logical Operators and Conditional and Subquery Expressions

Logical Operator Support

Conditional Expression Support

Geospatial and array column projections are not supported in the COALESCE function and CASE expressions.

Subquery Expression Support

Usage Notes

You can use a subquery anywhere an expression can be used, subject to any runtime constraints of that expression. For example, a subquery in a CASE statement must return exactly one row, but a subquery can return multiple values to an IN expression.
You can use a subquery anywhere a table is allowed (for example, FROM subquery), using aliases to name any reference to the table and columns returned by the subquery.

Table Expression and Join Support

<table> , <table> WHERE <column> = <column>
<table> [ LEFT ] JOIN <table> ON <column> = <column>

If a join column name or alias is not unique, it must be prefixed by its table name.

You can use BIGINT, INTEGER, SMALLINT, TINYINT, DATE, TIME, TIMESTAMP, or TEXT ENCODING DICT data types. TEXT ENCODING DICT is the most efficient because corresponding dictionary IDs are sequential and span a smaller range than, for example, the 65,535 values supported in a SMALLINT field. Depending on the number of values in your field, you can use TEXT ENCODING DICT(32) (up to approximately 2,150,000,000 distinct values), TEXT ENCODING DICT(16) (up to 64,000 distinct values), or TEXT ENCODING DICT(8) (up to 255 distinct values). For more information, see Data Types and Fixed Encoding.

Geospatial Joins

When possible, joins involving a geospatial operator (such as ST_Contains) build a binned spatial hash table (overlaps hash join), falling back to a Cartesian loop join if a spatial hash join cannot be constructed.

The enable-overlaps-hashjoin flag controls whether the system attempts to use the overlaps spatial join strategy (true by default). If enable-overlaps-hashjoin is set to false, or if the system cannot build an overlaps hash join table for a geospatial join operator, the system attempts to fall back to a loop join. Loop joins can be performant in situations where one or both join tables have a small number of rows. When both tables grow large, loop join performance decreases.

Two flags control whether or not the system allows loop joins for a query (geospatial for not): allow-loop-joins and trivial-loop-join-threshold. By default, allow-loop-joins is set to false and trivial-loop-join-threshold to 1,000 (rows). If allow allow-loop-joins is set to true, the system allows any query with a loop join, regardless of table cardinalities (measured in number of rows). If left to the implicit default of false or set explicitly to false, the system allows loop join queries as long as the inner table (right-side table) has fewer rows than the threshold specified by trivial-loop-join-threshold.

For optimal performance, the system should utilize overlaps hash joins whenever possible. Use the following guidelines to maximize the use of the overlaps hash join framework and minimize fallback to loop joins when conducting geospatial joins:

The inner (right-side) table should always be the more complicated primitive. For example, for ST_Contains(polygon, point), the point table should be the outer (left) table and the polygon table should be the inner (right) table.
Currently, ST_CONTAINS and ST_INTERSECTS joins between point and polygons/multi-polygon tables, and ST_DISTANCE < {distance} between two point tables are supported for accelerated overlaps hash join queries.
For pointwise-distance joins, only the pattern WHERE ST_DISTANCE(table_a.point_col, table_b.point_col) < distance_in_degrees supports overlaps hash joins. Patterns like the following fall back to loop joins:
- WHERE ST_DWITHIN(table_a.point_col, table_b.point_col, distance_in_degrees)
- WHERE ST_DISTANCE(ST_TRANSFORM(table_a.point_col, 900913), ST_TRANSFORM(table_b.point_col, 900913)) < 100

Using Joins in a Distributed Environment

You can create joins in a distributed environment in two ways:

Replicate small dimension tables that are used in the join.
Create a shard key on the column used in the join (note that there is a limit of one shard key per table). If the column involved in the join is a TEXT ENCODED field, you must create a SHARED DICTIONARY that references the FACT table key you are using to make the join.

# Table customers is very small
CREATE TABLE sales (
id INTEGER,
customerid TEXT ENCODING DICT(32),
saledate DATE ENCODING DAYS(32),
saleamt DOUBLE);

CREATE TABLE customers (
id TEXT ENCODING DICT(32),
someid INTEGER,
name TEXT ENCODING DICT(32))
WITH (partitions = 'replicated') #this causes the entire contents of this table to be replicated to each leaf node. Only recommened for small dimension tables.
SELECT c.id, c.name from sales s inner join customers c on c.id = s.customerid limit 10;

CREATE TABLE sales (
id INTEGER,
customerid BIGINT, #note the numeric datatype, so we don't need to specify a shared dictionary on the customer table
saledate DATE ENCODING DAYS(32),
saleamt DOUBLE,
SHARD KEY (customerid))
WITH (SHARD_COUNT = <num gpus in cluster>)

CREATE TABLE customers (
id TEXT BIGINT,
someid INTEGER,
name TEXT ENCODING DICT(32)
SHARD KEY (id))
WITH (SHARD_COUNT=<num gpus in cluster>);

SELECT c.id, c.name FROM sales s INNER JOIN customers c ON c.id = s.customerid LIMIT 10;

CREATE TABLE sales (
id INTEGER,
customerid TEXT ENCODING DICT(32),
saledate DATE ENCODING DAYS(32),
saleamt DOUBLE,
SHARD KEY (customerid))
WITH (SHARD_COUNT = <num gpus in cluster>)

#note the difference when customerid is a text encoded field:

CREATE TABLE customers (
id TEXT,
someid INTEGER,
name TEXT ENCODING DICT(32),
SHARD KEY (id),
SHARED DICTIONARY (id) REFERENCES sales(customerid))
WITH (SHARD_COUNT = <num gpus in cluster>)

SELECT c.id, c.name FROM sales s INNER JOIN customers c ON c.id = s.customerid LIMIT 10;

The join order for one small table and one large table matters. If you swap the sales and customer tables on the join, it throws an exception stating that table "sales" must be replicated.

generate_random_strings

Generates random string data.

Input Arguments

Parameter

Description

Data Type

Output Columns

Name

Description

Data Type

Example

Heavy Immerse

Introduction to Heavy Immerse

Heavy Immerse is a browser-based data visualization client that runs on top of the GPU-powered HeavyDB. It provides instantaneous representations of your data, from basic charts to rich and complex visualizations.

Immerse is installed with HEAVY.AI Enterprise Edition.

To create dashboards and data visualizations, click DASHBOARDS. You can search for dashboards, and list them by most recent or alphabetically.

Click DATA to import and manipulate data.

Click SQL EDITOR to perform Data Definition and Data Manipulation tasks on the command line.

When you navigate between the three utilities, you can:

Hold the command (ctrl) key as you click a link to open the utility in a new tab/window in the background.
Hold shift+command (ctrl) as you click a link to open the utility in a new tab/window in the foreground.
Hold no keys as you click a link to replace the contents of the current window.

HELP CENTER provides access to Immerse version information, tutorials, demos, and documentation. It also includes a link for sending email to HEAVY.AI.

Clicking the user icon at the far right opens a drop-down box where you can select a different database, change your UI theme, or log out of Immerse:

Admin Portal

The Admin Portal is a collection of dashboards available in the included information_schema database in Heavy Immerse. The dashboards display point-in-time information of the HEAVY.AI platform resources and users of the system.

Access to system dashboards is controlled using Immerse privileges; only users with Admin privileges or users/roles with access to the information_schema database can access the system dashboards.

The information_schema database and Admin Portal dashboards and system tables are installed when you install or upgrade to HEAVY.AI 6.0. For more detailed information on the tables available in the Admin Portal, see .

With the Admin Portal, you can see:

Database monitoring and database and web server logs.
Real-time data reporting for the system.
Point-in-time resource metrics and user engagement dashboards.

When you log in to the information_schema database, you see the Request Logs and Monitoring, System Resources, and User Roles and Permissions dashboards.

Request Logs and Monitoring

By default, the Request Logs and Monitoring dashboard does not appear in the Admin portal. To turn on the dashboard, set the enable-logs-system-tables parameter to TRUE in heavy.conf and restart the database.

The Request Logs and Monitoring dashboard includes the following charts on three tabs:

Number of Requests
Number of Fatals and Errors
Number of Unique Users
Avg Request Time (ms)
Max Request Time (ms)
Number of Requests per Dashboard
Number of Requests per API
Number of Requests per User

Database Server Logs - Sortable by log timestamp, severity level, message, file location, process ID, query ID, thread ID, and node.
Database Queries - Sortable by log timestamp, query string, execution time, and total time.

Web Server Logs - Sortable by log timestamp, severity, and message.
Web Server Access Logs - Sortable by log timestamp, endpoint, HTTP status, HTTP method, IP address, and response size.

System Resources Dashboard

The System Resources dashboard includes the following charts on three tabs:

Databases - Names of all available databases
# of Tables - Total number of tables
# of Dashboards - Total number of dashboards
# of Tables Per Database
# of Dashboards Per Database
Tables - Sortable name, column count, and owner information for all tables.
Dashboards - Sortable name, last update time, and owner information for all databases.

CPU Memory Utilization - Free, used, and unallocated
GPU Memory Utilization - Free, used, and unallocated
Tables with Highest CPU Memory Utilization
Tables with Highest GPU Memory Utilization
Columns with Highest CPU Memory Utilization
Columns with Highest GPU Memory Utilization

Tables with Highest Storage Utilization
Total Used Storage

User Roles and Permissions Dashboard

The User Roles and Permission Dashboard includes the following charts:

# of Users - Total number of users on the system
# of Roles - Total number of roles on the system
# of Table Owners - Total number of table owners on the system
# of Dashboard Owners - Total number of dashboard owners on the system
Users - Sortable list of users on the system
User-Role Assignments - Mapping of role names to user names, sortable by role or user
Roles - Sortable list of roles on the system
Databases - Sortable list of databases on the system
User Permissions - Mapping of user or role name, permission type, and database, sortable by any column.

Control Panel

The Control Panel gives super users visibility into roles and users of the current database, as well as feature flags, system table dashboards, and log files for the current HeavyDB instance.

To open the Control Panel, click the Account icon and then click Control Panel.

The Control Panel is considered beta functionality. Currently, you cannot add, delete, or edit roles or users in the Control Panel. Feature flags cannot be modified through the Control Panel.

Only super users have access to the Control Panel.

Feature Flags

To see which feature flags are currently set in Immerse, click Feature Flags under Customization.

Currently, feature flags can only be viewed in Immerse; they cannot be set or removed.

System Dashboard and Log Files

Links to the the following System Table dashboards are available on the Control Panel:

Links to the following log files are are available on the Control Panel:

Working with Dashboards

Create and save, search, and modify and manipulate dashboards

Dashboard List

Connect to Heavy Immerse by pointing a web browser to port 6273 on your HeavyDB server. When you launch Immerse, the landing page shows a list of saved dashboards. The number to the right of the Search box shows the number of dashboards displayed (left side of the slash) and the number of total dashboards (right side of the slash). Because no filters are applied here, these numbers are the same (5446).

You can:

Search for dashboards by name, source, or owner by entering a string in the search box.
Sort dashboards by name, source, modified date, owner, and whether the dashboard is shared or not. By default, dashboards are sorted by last modified. Click a column heading to sort by information defined by that column, and click again to toggle sorting the sorting order.
Filter dashboards by name, source, last modified date, owner, and shared status.
Create and save dashboard views that show a particular set of dashboards based on filter criteria you set.
Delete, download (export), and duplicate individual dashboards.
Perform bulk actions on selected dashboards.

Super users can restrict users who have the immerse_trial_mode role from downloading (exporting) dashboards by enabling Trial Mode. To enable trial mode, set the to TRUE.

Creating Dashboard Views

You can use filters to define the dashboards that are displayed, and then save the view defined by any filters you apply. To create a filter click the plus icon (+) to the left of the Search box.

Use one or more filters to define the view. For example, that following view shows dashboards with source flights and owner mapd:

You can then sort that filtered view the same way you would an unfiltered view.

To save the view, in the View name box, click the pencil icon and enter the name of the view, and then click the Save icon. Here, the view will be saved as flights - mapd.

Click the down arrow in the View name box to see the list of available filters. You can also toggle the selected filter on and off.

You can change the filters in a specified view and update it. For example, here the flights - mapd view has been updated to include only dashboards modified in 2019, so the Save icon is highlighted; click it to update the view definition.

Or, you can click the down arrow and then click + Add filter view to create a new filter view based on the updated filters. You can also duplicate or remove filter views.

A filter view is available only to the Immerse user who creates the view. If you log out of Immerse and then restart, you start with the view that you were using when you logged out.

Using Bulk Actions

You can select dashboards and perform the following actions. Select dashboards by clicking the box the the left of the dashboard name.

Export - Export (download) all selected dashboards as individual .json files.
Share/Unshare - Share or unshare selected dashboards with specific users or roles. If a user has been assigned the restricted_sharing role, sharing dashboards is unavailable.
Delete - Delete selected dashboards.

Creating a Dashboard

You can construct a Heavy Immerse dashboard following these steps:

Once you save the dashboard, with other HEAVY.AI users.

Starting a New Dashboard

Connect to Immerse by pointing a web browser to port 6273 on your HeavyDB server. When you launch Immerse, the landing page is a list of saved dashboards. You click New Dashboard in the upper right to configure a custom dashboard.

Adding a Chart

To add a chart, you click Add Chart, choose a chart type, set dimensions and measures, then click Apply. For more information on creating charts, see .

To create a chart:

Click Add Chart.
Choose a Data Source. For example, UFO_Sightings.
Choose a chart type. For example, Bar.
Set the Dimension. For example, country.
Set the Measure. For example, COUNT # Records.
Click Apply.

Removing a Chart

To remove a chart:

Hover the mouse over the chart.
In the upper-right corner of the chart, click the More Options icon, and then click Remove Chart.

Titling and Saving a Dashboard

To title and save a dashboard:

Click the title area.
Type a title.
Click Save.

Using Dashboard Tabs

Dashboard tabs enable you add multiple pages to a dashboard. Using tabs can reduce the number of charts on a dashboard page and make it easier to find the chart you want.

By default, dashboard tabs are disabled. To enable tabs, in your server.json file, set "ui/dashboard-tabs"to "true".

Dashboard tabs are located at the bottom left of the dashboard. The dashboard shown below has three tabs: Config UI (selected tab), Locked axis on scatter, and New Combo improvements:

Click a tab to open it, or use the right arrow icon to move to the next tab. Hovering on a tab reveals the three-dot menu, which you can use to duplicate, rename, or delete a tab.

Using a tabbed dashboard affects some dashboard actions you take. Refresh and Add Chart affect only the tab that you are currently viewing. Share, Duplicate, and Save affect all tabs on the dashboard.

Deleting a Dashboard

To delete a dashboard:

Click Dashboards.
Mouse over the dashboard you want to delete.
Click X at the end of the dashboard row.

Welcome to HEAVY.AI Documentation

Use of HEAVY.AI is subject to the terms of the .

What Will I Learn?

For Analysts

Learn how to use to gain new insights to your data with fast, responsive graphics and .

For Administrators

Learn how to and configure your HEAVY.AI instance, then for analysis.

For Developers and Data Scientists

Learn how to extend HEAVY.AI with an integrated and custom . Contribute to the HEAVY.AI Core Open Source project.

Release Highlights

For more complete release information, see the .

Release 6.4

HEAVY.AI continues to refine and extend the data connectors ecosystem. This release features general availability of data connectors for PostGreSQL, beta Immerse connectors for Snowflake and Redshift, and SQL support for Google BigQuery and Hive (beta). These managed data connections let you use HEAVY.AI as an acceleration platform, wherever your source data lives. Scheduling and automated caching ensure that from an end-user perspective, fast analytics are always running on the latest available data.

Immerse features four new chart types: Contour, Cross-section, Wind barb and Skew-t. While these are especially useful for atmospheric and geotechnical data visualization, Contour and Cross-section also have more general application.

Major improvements for time series analysis have been added. This includes time series comparison via window functions, and a large number of SQL window function additions and performance enhancements.

This release also includes two major architectural improvements:

The ability to perform cross-database queries in SQL, increasing flexibility across the board.
Render queries no longer block other GPU queries. In many use cases, renders can be significantly slower than other common queries. This should result in significant performance gains, particularly in map-heavy dashboards.

Release 6.2

Heavy Immerse

Chart animation through cross filter replay, allowing controlled playback of time-based data such as weather maps or GPS tracks.
You can now directly export your charts and dashboards as image files.
New control panel enables administrators to view the configuration of the system and easily access logs and system tables.
HeavyConnect now provides graphical Heavy Immerse support for Redshift, Snowflake, and PostGIS connections.
For CPU-only systems, mapping capabilities are improved with the introduction of multilayer CPU-rendered geo.

General Analytics

Numerous improvements to core SQL and geoSQL capabilities.
Support for string to numeric, timestamp, date, and time types with the new TRY_CAST operator.
Explicit and implicit cast support for numeric, timestamp, date, and time types.
Advanced string functions facilitate extraction of data from JSON and externally encoded string formats.
Improvements to COUNT DISTINCT reduces memory requirements considerably in cases with very large cardinalities or highly skewed data distributions.
Added MULTIPOINT and MULTILINESTRING geo types.
Convex and concave hull operators, allowing generation of polygons from points and multipoints. For example, you could generate polygons from clusters of GPS points.
Syntax and performance optimizations across all geometry types, table orderings, and commonly nested functions.
Significant functionality extension of window functions; define windows directly in temporal terms, which is particularly important in time series with missing observations. Window frame support allows improved control at the edges of windows.

Advanced Analytics

Two new functions now support direct loading of LiDAR data: tf_point_cloud_metadata quickly searches tile metadata and helps you find data to import, and tf_load_point_cloud does the actual import importing.
Network graph analytics functions have been added. These can work on networks alone, including non-geographic networks, or can find the least-cost path along a geographic network.
New spatial aggregation and smoothing functions. Aggregations work particularly well with LiDAR data--for example to pass through only the highest point within an area to create building or canopy height maps. Smoothing helps with noisy datasets and can reveal larger-scale patterns while minimizing visual distractions.

Release 6.1

Release 6.1.0 features more granular administrative monitoring dashboards based on logs. These have been accessible in an open format on the server side, and now they are available in Immerse, by specific dashboards, users, or queries. Intermediate and advanced SQL support continues to mature, with INSERT, window functions, and UNION ALL.

This release contains a number of user interface polish items requested by customers. Cartography now supports polygons with colorful borders and transparent fills. Table presentation has been enhanced in various ways, from alignment to zebra striping. And dashboard saving reminders have been scaled back, based on customer feedback.

The extension framework now features an enhanced “custom source” dialog, as well as new SQL commands to see installed extensions and their parameters. We introduce three new extensions. The first, tf_compute_dwell_times, reduces GPS event stream data volumes considerably while keeping relevant information. The others compute feature similarity scores and are very general.

This release also includes initial public betas of our PostgreSQL Immerse connector, and SQL support for COPY FROM ODBC database connections, making it easier to connect to your enterprise data.

Release 6.0

This release features large advances in data access, including intelligent linking to enterprise data (HeavyConnect) and support for raster geodata. SQL support includes high-performance string functions, as well as enhancements to window functions and table unions. Performance improvements are noticeable across the product, including fundamental advances in rendering, query compilation, and data transport. Our system administration tools have been expanded with a new Admin Portal, as well as additional system tables supporting detailed diagnostics. Major strides in extensibility include new charting options and a new extensions framework (beta).

Name Changes

Rebranded platform from OmniSci to HEAVY.AI, with OmniSciDB now HeavyDB, OmniSci Render now HeavyRender, and OmniSci Immerse now Heavy Immerse.

HeavyConnect and Data Import

HeavyConnect allows the HEAVY.AI platform to work seamlessly as an accelerator for data in other data lakes and data warehouses. For Release 6.0, CSV and Parquet files on local file systems and in S3 buckets can be linked or imported. Other SQL databases are also supported via ODBC (beta).
HeavyConnect enables users to specify a data refresh schedule, which ensures access to up-to-date data.
Heavy Immerse now supports import of dozens of raster data formats, including geoTIFF, geoJPEG , and PNG. HeavySQL now supports most any vector GIS file format.
Support is included for multidimensional arrays common in the sciences, including Grib2, NetCDF, and hd5.
Immerse now supports linking or import of files on the server filesystem (local or mounted). This help prevent slow data transfers when client bandwidth is limited.
File globbing and filtering allow import of thousands of files at once.

Other Immerse Enhancements

New Gauge chart for easy visualization of key metrics relative to target thresholds.
New landing page and Help Center.
Enhanced mapping workflows with automated column picking.

SQL Enhancements

Support for a wide range of performant string operations using a new string dictionary translation framework, as well as the ability to on-the-fly dictionary encode none-encoded strings with a new ENCODE_TEXT operator.
Support for UNION ALL is now enabled by default, with significant performance improvements from the previous release (where it was beta flagged).
Significant functionality and performance improvements for window functions, including the ability to support expressions in PARTITION and ORDER clauses.

Performance

Parallel compilation of queries and a new multi-executor shared code cache provide up to 20% throughput/concurrency gains for interactive usage scenarios.
10X+ performance improvements in many cases for initial join queries via optimized Join Hash Table framework.
New result set recycler allows for expensive query sub-steps to be cached via the SQL hint /*+ keep_result */, which can significantly increase performance when a subquery is used across multiple queries.
Arrow execution endpoints now leverage the parallel execution framework, and Arrow performance has been significantly improved when high-cardinality dictionary-encoded text columns are returned
Introduces a novel polygon rendering algorithm that does not require pre-triangulated or pre-grouped polygons and can render dynamically generated geometry on the fly (via ST_Buffer). The new algorithm is comparable to its predecessor in terms of both performance and memory and enables optimizations and enhancements in future releases.
New binary transport protocol to Heavy Immerse that significantly increases performance and interactivity for large result sets

System Administration

A new Admin Portal provides information on system resources usage and users.
System table support under a new information_schema database, containing 10 new system tables providing system statistics and memory and storage utilization.

Extensibility

New system and user-defined UDF framework (beta), comprising both row (scalar) and table (UDTF) functions, including the ability to define fast UDFs via Numba Python using the RBC framework, which are then inlined into the HeavyDB compiled query code for performant CPU and GPU execution.
System-provided table functions include generate_series for easy numeric series generation, tf_geo_rasterize_slope for fast geospatial binning and slope/aspect computation over elevation data, and others, with more capabilities planned for future releases.
Leveraging the new table function framework, a new HeavyRF module (licensed separate) includes tf_rf_prop and tf_rf_prop_max_signal table functions for fast radio frequency signal propagation analysis and visualization.
New Iframe chart type in Heavy Immerse to allow easier addition of custom chart types. (BETA)

Release 5.10

Row-level security (RLS) can be used by an administrator to apply security filtering to queries run as a user or with a role.
Support for import from dozens of image and raster file types, such as jpeg, png, geotiff, and ESRI grid, including remote files.
Significantly more performant, parallelized window functions, executing up to 10X faster than in Release 5.9.
Automatic use of columnar output (instead of the default row-wise output) for large projections, reducing query times by 5-10X in some cases.
Support for full set of ST_TRANSFORM SRIDs supported by geos/proj4 library.
Support for numerous vector GIS files (100+ formats supported by current GDAL release).
Support for multidimensional array import from formats common in science and meteorology.
Improved Table chart export to access all data represented by a Table chart.
Introduced dashboard-level named custom SQL.

Release 5.9

Significant speedup for POINT and fixed-length array imports and CTAS/ITAS, generally 5-20X faster.
The PNG encoding step of a render request is no longer a blocking step, providing improvement to render concurrency.
Adds support to hide legacy chart types from add/edit chart menu in preparation for future deprecation (defaults to off).
BETA - Adds custom expressions to table columns, allowing for reusable custom dimensions and measures within a single dashboard (defaults to off).
BETA - Adds Crosslink feature with Crosslink Panel UI, allowing crossfilters to fire across different data sources within the same dashboard (defaults to off).
BETA - Adds Custom SQL Source support and Custom SQL Source Manager, allowing the creation of a data source as a SQL statement (defaults to off)

Release 5.8

Parallel execution framework is on by default. Running with multiple executors allows parts of query evaluation, such as code generation and intermediate reductions, to be executed concurrently. Currently available for single-node deployments.
Spatial joins between geospatial point types using the ST_Distance operator are accelerated using the overlaps hash join framework, with speedups up to 100x compared to Release 5.7.1.
Significant performance gains for many query patterns through optimization of query code generation, particularly benefitting CPU queries.
Window functions can now be executed without a partition clause being specified (to signify a partition encompassing all rows in the table).
Window functions can now execute over tables with multiple fragments and/or shards.
Native support for ST_Transform between all UTM Zones and EPSG:4326 (Lon/Lat) and EPSG:900913 (Web Mercator).
ST_Equals support for geospatial columns.
Support for the ANSI SQL WIDTH_BUCKET operator for easier and more performant numeric binning, now also used in Immerse for all numeric histogram visualizations
The Vulkan backend renderer is now enabled by default. The legacy OpenGL renderer is still available as a fallback if there are blocking issues with Vulkan. You can disable the Vulkan renderer using the renderer-use-vulkan-driver=false configuration flag.
- Vulkan provides improved performance, memory efficiency, and concurrency.
- You are likely to see some performance and memory footprint improvements with Vulkan in Release 5.8, most significantly in multi-GPU systems.
Support for file path regex filter and sort order when executing the COPY FROM command.
New ALTER SYSTEM CLEAR commands that enable clearing CPU or GPU memory from Immerse SQL Editor or any other SQL client.

Release 5.7

Extensive enhancements to Immerse support for parameters. Parameters can now be used in chart column selectors, chart filters, chart titles, global filters, and dashboard titles. Dashboards can have parameter widgets embedded on them, side-by-side with charts. Parameter values are visible in chart axes/labels, legends, and tooltips, and you can toggle parameter visibility.
In Immerse Pointmap charts, you can specify which color-by attribute always render on top, which is useful for highlight anomalies in data.
Significantly faster and more accurate "lasso" tool filters geospatial data on Immerse Pointmap charts, leveraging native geospatial intersection operations.
Immerse 3D Pointmap chart and HTML support in text charts are available as a beta feature.
Airplane symbol shape has been added as a built-in mark type for the Vega rendering API.
Vega symbol and multi-GPU polygon renders have been made significantly faster.
User-interrupt of query kernels is now on by default. Queries can be interrupted using Ctrl + C in omnisql, or by calling the interrupt API.
Parallel executors is in public beta (set with --num-executors flag).
Support for APPROX_QUANTILE aggregate.
Support for default column values when creating a table and across all append endpoints, including COPY TO, INSERT INTO TABLE SELECT, INSERT, and binary load APIs.
Faster and more robust ability to return result sets in Apache Arrow format when queried from a remote client (i.e. non-IPC).
More performant and robust high-cardinality group-by queries.
ODBC driver now supports Geospatial data types.

Release 5.6

Custom SQL dimensions, measures, and filters can now be parameterized in Immerse, enabling more flexible and powerful scenario analysis, projections, and comparison use cases.
New angle measure added to Pointmap and Scatter charts, allowing orientation data to be visualized with wedge and arrow icons.
Custom SQL modal with validation and column name display now enabled across all charts in Immerse.
Significantly faster point-in-polygon joins through a new range join hash framework.
Approximate Median function support.
INSERT and INSERT FROM SELECT now support specification of a subset of columns.
Automatic metadata updates and vacuuming for optimizing space usage.
Significantly improved OmniSciDB startup time, as well as a number of significant load and performance improvements.
Improvements to line and polygon stroke rendering and point/symbol rendering.

Release 5.5

Ability to set annotations on New Combo charts for different dimension/measure combinations.
New ‘Arrow-over-the-wire’ capability to deliver result sets in Apache Arrow format, with ~3x performance improvement over Thrift-based result set serialization.
Support for concurrent SELECT and UPDATE/DELETE queries for single-node installations
Initial OmniSci Render support for CPU-only query execution ("Query on CPU, render on GPU"), allowing for a wider set of deployment infrastructure choices.
Cap metadata stored on previous states of a table by using MAX_ROLLBACK_EPOCHS, improving performance for streaming and small batch load use cases and modulating table size on disk

Release 5.4

Added initial compilation support for NVIDIA Ampere GPUs.
Improved performance for UPDATE and DELETE queries.
Improved the performance of filtered group-by queries on large-cardinality string columns.
Added SQL function SAMPLE_RATIO, which takes a proportion between 0 and 1 as an input argument and filters rows to obtain a sampling of a dataset.
Added support for exporting geo data in GeoJSON format.
Dashboard filter functionality is expanded, and filters can be saved as views.
You can perform bulk actions on the dashboard list.
New UI Setting panel in Immerse for customizing charts.
Tabbed dashboards.
SQL Editor now handles Vega JSON requests.

Release 5.3

New Combo chart type in Immerse provides increased configurability and flexibility.
Immerse chart-specific filters and quick filters add increased flexibility and speed.
Updated Immerse Filter panel provides a Simple mode and Advanced mode for viewing and creating filters.
On multilayer charts, layer visibility can be set by zoom level.
Different map charts can be synced together for pan and zoom actions, regardless of data source.
Array support for the Array type over JDBC.
SELECT DISTINCT in UNION ALL is supported. (UNION ALL is prerelease and must be explicitly enabled.
Support for joins on DECIMAL types.
Performance improvements on CUDA GPUs, particularly Volta and Turing.

Release 5.2

NULL support for geospatial types, including in ALTER TABLE ADD COLUMN.
: SHOW TABLES, SHOW DATABASES, SHOW CREATE TABLE, and SHOW USER SESSIONS.
Ability to perform updates and deletes on temporary tables.
Updates to JDBC driver, including escape syntax handling for the fn keyword and added support to get table metadata.
Notable performance improvements, particularly for join queries, projection queries with order by and/or limit, queries with scalar subqueries, and multicolumn group-by queries.
Query interrupt capability improved to allow canceling long-running queries, also supports JDBC now.
Completely overhauled , including query formatting, snippets, history and more.
Database switching from within Immerse, as well as dashboard URLs that contain the database name.
Over 50% reduction in load times for the dashboards list initial load and search.
Cohort builder now supports count (# records) in aggregate filter.
Improved error handling and more meaningful error messages.
Custom logos can now be configured separately for light and dark themes.
Logos can be configured to deep-link to a specific URL.

Release 5.1

Added support for UPDATE via JOIN with a subquery in the WHERE clause.
Initial support for (that is, non-persistent) tables.
Improved performance for multi-column GROUP BY queries, as well as single column GROUP BY queries with high cardinality. Performance improvement varies depending on data volume and available hardware, but most use cases can expect a 1.5 to 2x performance increase over OmniSciDB 5.0.
Improved support for EXISTS and NOT EXISTS subqueries.
Added support for LINESTRING, POLYGON, and MULTIPOLYGON in user defined functions.
Immerse log-ins are fully sessionized and persist across page refreshes.
Pie chart now supports and percentage labels.
Cohorts can now be built with
New filter sets can be created through duplicating existing filter sets.
Dashboard URLs now .

Release 5.0

The new filter panel in Immerse enables the ability to toggle filters on and off, and introduces Filter Sets to provide quick access to different sets of filters in one dashboard.
Immerse now supports using global and cross-filters to interactively build cohorts of interest, and the ability to apply a cohort as a dashboard filter, either within the existing filter set or in a new filter set.
Data Catalog, located within Data Import, is a repository of datasets that users can use to enhance existing analyses.
- To see these new features in action, please watch this , where Rachel Wang demonstrates how you can use them.
Added support for binary dump and restore of database tables.
Added support for compile-time registered user-defined functions in C++, and experimental support for runtime user-defined SQL functions and table functions in Python via the Remote Backend Compiler.
Support for some forms of correlated subqueries.
Support for update via subquery, to allow for updating a table based on calculations performed on another table.
Multistep queries that generate large, intermediate result sets now execute up to 2.5x faster by leveraging new JIT code generator for reductions and optimized columnarization of intermediate query results.
Frontend-rendered choropleths now support the selection of base map layers.

This link is for the benefit of the search crawler.

Release Notes

Release notes for currently supported releases

Use of HEAVY.AI is subject to the terms of the .

The latest release of HEAVY.AI is .

Currently Supported Releases

| | | | | | | | | | |

For release notes for releases that are no longer supported, as well as links to documentation for those releases, see .

As with any software upgrade, it is important that you back up your data before you upgrade HEAVY.AI. Each release introduces efficiencies that are not necessarily compatible with earlier releases of the platform. HEAVY.AI is never expected to be backward compatible.

For assistance during the upgrade process, contact HEAVY.AI support at before you upgrade your system.

Release 6.4.3 - February 27, 2023

Heavy Immerse - New Features and Improvements

Added feature flag ui/session_create_timeout with a default value of 10000 (10 seconds) for modifying login request timeout.

Release 6.4.2 - February 15, 2023

HeavyDB - New Features and Improvements

Adds the HeavyDB server configuration parameter for enabling or disabling automated foreign table scheduled refreshes..

HeavyDB - Fixed Issues

Fixes a crash that could occur when S3 CSV-backed foreign tables with append refreshes are refreshed multiple times.
Fixes a crash that could occur when foreign tables with geospatial columns are refreshed after cache evictions.
Fixes a crash that could occur when querying foreign tables backed by Parquet files with empty row groups.
Fixes an error that could occur when select queries used in ODBC foreign tables reference case sensitive column names.
Fixes a crash that could occur when CSV backed foreign tables with geospatial columns are refreshed without updates to the underlying CSV files.
Fixes a crash that could occur in heavysql when executing the \detect command with geospatial files.
Fixes a casting error that could occur when executing left join queries.
Fixes a crash that could occur when accessing the disk cache on HeavyDB servers with the read-only configuration parameter enabled.
Fixes an error that could occur when executing queries that project geospatial columns.
Fixes a crash that could occur when executing the EXTRACT function with the ISODOW date_part parameter on GPUs.
Fixes an error that could occur when importing CSV or Parquet files with text columns containing more than 32,767 characters into HeavyDB NONE ENCODED text columns.

Heavy Render - Fixed Issues

Fixes a Vulkan Device Lost error that could occur when rendering complex polygon data with thousands of polygons in a single pixel.

Release 6.4.1 - January 30, 2023

HeavyDB - New Features and Improvements

Optimizes result set buffer allocations for CPU group by queries.
Enables trimming of white spaces in quoted fields during CSV file imports, when both the trim_spaces and quoted options are set.

HeavyDB - Fixed Issues

Fixes an error that could occur when importing CSV files with quoted fields that are surrounded by white spaces.
Fixes a crash that could occur when tables are reordered for range join queries.
Fixes a crash that could occur for join queries with intermediate projections.
Fixes a crash that could occur for queries with geospatial join predicate functions that use literal parameters.
Fixes an issue where queries could intermittently and incorrectly return error responses.
Fixes an issue where queries could return incorrect results when filter push-down through joins is enabled.
Fixes a crash that could occur for queries with join predicates that compare string dictionary encoded and nonencoded text columns.
Fixes an issue where hash table optimizations could ignore the max-cacheable-hashtable-size-bytes and hashtable-cache-total-bytes server configuration parameters.
Fixes an issue where sharded table join queries that are executed on multiple GPUs could return incorrect results.
Fixes a crash that could occur when sharded table join queries are executed on multiple GPUs with the from-table-reordering server configuration parameter enabled.

Heavy Immerse - New Features and Improvements

Multilayer support for Contour and Windbarb charts.
Enable Contour charts by default (feature flag: ).
Support custom SQL measures in Contour charts.
Restrict export from Heavy Immerse by enabling trial mode (feature flag: ). Trial mode enables a super user to restrict export capabilities for users who have the immerse_trial_mode role.

Heavy Immerse - Fixed Issues

Allow MULTILINESTRING to be used in selectors for Linemap charts.
Allow MULTILINESTRING to be used in Immerse SQL Editor.

Release 6.4.0 - December 16, 2022

This release features general availability of data connectors for PostGreSQL, beta Immerse connectors for Snowflake and Redshift, and SQL support for Google BigQuery and Hive (beta). These managed data connections let you use HEAVY.AI as an acceleration platform wherever your source data may live. Scheduling and automated caching ensure that fast analytics are always running on the latest available data.

Immerse features four new chart types: Contour, Cross-section, Wind barb, and Skew-t. While especially useful for atmospheric and geotechnical data visualization, Contour and Cross-section also have more general application.

Major improvements for time series analysis have been added. This includes an Immerse user interface for time series, and a large number of SQL window function additions and performance enhancements.

The release also includes two major architectural improvements:

The ability to perform cross-database queries, both in SQL and in Immerse, increasing flexibility across the board. For example, you can now easily build an Immerse dashboard showing system usage combined with business data. You might also make a read-only database of data shared across a set of users.
Render queries no longer block other GPU queries. In many use cases, renders can be significantly slower than other common queries. This should result in significant performance gains, particularly in map-heavy dashboards.

HeavyDB - New Features and Improvements

Core SQL

Adds support for cross database SELECT, UPDATE, and DELETE queries.
Support for MODE SQL aggregate.
Add support for strtok_to_array.
Support for ST_NumGeometries().
Support ST_TRANSFORM applied to literal geo types.
Enhanced query tracing ensures all child operations for a query_id are properly logged with that ID.

Data Linking and Import

Adds support for BigQuery and Hive HeavyConnect and import.
Adds support for table restore from S3 archive files.
Improves integer column type detection in Snowflake import/HeavyConnect data preview.
Adds HeavyConnect and import support for Parquet required scalar fields.
Improves import status error message when an invalid request is made.

Table Function Enhancements

Support POINT, LINESTRING, and POLYGON input and output types in table functions.
Support default values for scalar table function arguments.
Add tf_raster_contour table function to generate contours given x, y, and z arguments. This function is exposed in Immerse, but has additional capabilities available in SQL, such as supporting floating point contour intervals.
Return file path and file name from tf_point_cloud_metadata table function.
Previous length limit of 32K characters per values for none-encoded text columns has been lifted, and now none-encoded text values can be up to 2^31 - 1 characters (approximately 2.1billion characters).
Support array column outputs from table functions.
Add TEXT ENCODING DICT and Array<TEXT ENCODING DICT> type support for runtime functions/UDFs.
Allow transient TEXT ENCODING DICT column inputs into table functions.

Window Function Enhancements

Support COUNT_IF function.
Support SUM_IF function.
Support NTH_VALUE window function.
Support NTH_VALUE_IN_FRAME window function.
Support FIRST_VALUE_IN_FRAME and LAST_VALUE_IN_FRAME window functions.
Support CONDITIONAL_TRUE_EVENT.
Support ForwardFill and BackwardFill window functions to fill in missing (null) values based on previous non-null values in window.

HeavyDB - Fixed Issues

Fixes an issue where databases with duplicate names but different capitalization could be created.
Fixes an issue where raster imports could fail due to inconsistent band names.
Fixes an issue that could occur when DUMP/RESTORE commands were executed concurrently.
Fixes an issue where certain session updates do not occur when licenses are updated.
Fixes an issue where import/HeavyConnect data preview could return unsupported decimal types.
Fixes an issue where import/HeavyConnect data preview for PostgreSQL queries involving variable length columns could result in an error.
Fixes an issue where NULL elements in array columns with the NOT NULL constraint were not projected correctly.
Fixes a crash that could occur in certain scenarios where UPDATE and DELETE queries contain subqueries.
Fixes an issue where ingesting ODBC unsigned SQL_BIGINT into HeavyDB BIGINT columns using HeavyConnect or import could result in storage of incorrect data.
Fixes a crash that could occur in distributed configurations, when switching databases and accessing log based system tables with rolled off logs.
Fixes an error that occurred when importing Parquet files that did not contain statistics metadata.
Ensure query hint is propagated to subqueries.
Fix crash that could occur when LAG_IN_FRAME or LEAD_IN_FRAME were missing order-by or frame clause.
Fix bug where LAST_VALUE window function could return wrong results.
Fix issue where “Cannot use fast path for COUNT DISTINCT” could be reported from a count distinct operation.
Various bug fixes for support of VALUES() clause.
Improve handling of generic input expressions for window aggregate functions.
Fix bug where COUNT(*) and COUNT(1) over window frame could cause crash.
Fix wrong coordinate used for origin_y_bin in tf_raster_graph_shortest_slope_weighted_path.
Speed up table function binding in cases with no ColumnList arguments.
Support arrays of transient encoded strings into table functions.

Heavy Render - New Features and Improvements

Render queries no longer block parallel execution queue for other queries.

Heavy Immerse - New Features and Improvements

The Immerse PostgreSQL connector is now generally available, and is joined by public betas of Redshift and Snowflake.
New chart types:
- Contour chart. Contours can be applied to any geo point data, but are especially useful when applied to smoothly-varying pressure and elevation data. They can help reveal general patterns even in noisy primary data. Contours can be based on any point data, including that from regular raster grids like a temperature surface, or from sparse points like LiDAR data.
- Cross-section chart. As the name suggests, this allows a new view on 2.5D or 3D datasets, where a selected data dimension is plotted on the vertical axis for a slice of geographic data. In addition to looking in profile at parts of the atmosphere in weather modeling, this can also be used to look at geological sections below terrain.
- Representing vector force fields takes a step forward with the Wind barb plot. Wind barbs are multidimensional symbols which convey at a glance both strength and direction.
- Skew-T is a highly specialized multidimensional chart used primarily by meteorologists. Skew-Ts are heavily used in weather modeling and can help predict, for example, where thunderstorms or dry lightning are likely to occur.
Initial support for window functions in Immerse, enabling time lag analysis in charts. For example, you can now plot month-over-month or quarter-over-quarter sales or web traffic volume.
For categorical data, in addition to supporting aggregations based on the number of unique values, MODE is now supported. This supports the creation of groups based on the most-common value.

Release 6.2.7 - November 1, 2022

HeavyDB - Fixed Issues

Fixed an issue where a restarted server can potentially deadlock if the first two queries are executed at the same time and use different executors.

Release 6.2.5 - October 26, 2022

HeavyDB - Fixed Issues

Fixed an issue where COUNT DISTINCT or APPROX_COUNT_DISTINCT, when run on a CASE statement that outputs literal strings, could cause a crash.

Release 6.2.4 - October 12, 2022

HeavyDB - Fixed Issues

Fixes a crash when using COUNT() or COUNT(1) with the window function, i.e., COUNT(*) OVER (PARTITION BY x).
Fixes an incorrect result when using a date column as a partition key, like SUM(x) OVER (PARTITION BY DATE_COL).
Improves the performance of window functions when a literal expression is used as one of the input expressions of window functions like LAG(x, 1).
Improves query execution preparation phase by preventing redundant processing of the same nodes, especially when a complex input query is evaluated.
Fixes geometry type checking for range join operator that could cause a crash in some cases.
Resolves a query that may return an incorrect result when it has many projection expressions (for example, more than 50 8-byte output expressions) when using a window function expression.
Fixes an issue where the Resultset recycler ignores the server configuration size metrics.
Fixes a race condition where multiple catalogs could be created on initialization, resulting in possible deadlocks, server hangs, increased memory pressure, and slow performance.

Release 6.2.1 - September 27, 2022

HeavyDB - Fixed Issues

Fixes a crash encountered during some SQL queries when the read-only setting was enabled.
Fixes an issue in tf_raster_graph_shortest_slope_weighted_path table function that would lead some inputs to be incorrectly rejected.

6.2.0 - September 23, 2022

In Release 6.2.0, Heavy Immerse adds animation and a control panel system. HeavyConnect now includes connectors for Redshift, Snowflake, and PostGIS. The SQL system is extended with support for casting and time-based window functions. GeoSQL gets direct LiDAR import, multipoints, and multilinestrings, as well as graph network algorithms. Other enhancements include performance improvements and reduced memory requirements across the product.

HeavyDB - New Features and Improvements

SQL Improvements

TRY_CAST support for string to numeric, timestamp, date, and time casts.
Implicit and explicit CAST support for numeric, timestamp, date, and time to TEXT type.
CAST support from Timestamp(0|3|6|9) types to Time(0) type.
Concat (||) operator now supports multiple nonliteral inputs.
JSON_VALUE operator to extract fields from JSON string columns.
BASE64_ENCODE and BASE64_DECODE operators for BASE64 encoding/decoding of string columns.
POSITION operator to extract index of search string from strings.
Add hash-based count distinct operator to better handle case of sparse columns.

Geospatial

Support MULTILINESTRING OGC geospatial type.
Support MULTIPOINT OGC geospatial type.
Support ST_NumGeometries.
Support ST_ConvexHull and ST_ConcaveHull.
Improved table reordering to maximize invocation of accelerated geo joins.
Support ST_POINT, ST_TRANSFORM and ST_SETSRID as expressions for probing columns in point-to-point distance joins.
Support accelerated overlaps hash join for ST_DWITHIN clause comparing two POINT columns.
Support for POLYGON to MULTIPOLYGON promotion in SQLImporter.

Window Functions

RANGE window function FRAME support for Time, Date, and Timestamp types.
Support LEAD_IN_FRAME / LAG_IN_FRAME window functions that compute LEAD / LAG in reference to a window frame.

Extension Functions

Add TextEncodingNone support for scalar UDF and extension functions.
Support array inputs and outputs to table functions.
Support literal interval types for UDTFs.
Add support for table functions range annotations for literal inputs

Performance and Control

Make max CPU threads configurable via a startup flag.
Support array types for Arrow/select_ipc endpoints.
Add support for query hint to control dynamic watchdog.
Add query hint to control Cuda block and grid size for query.
Adds an echo all option to heavysql that prints all executed commands and queries.
Improved decimal precision error messages during table creation.

HeavyConnect

Add support for file roll offs to HeavyConnect local and S3 file use cases.
Add HeavyConnect support for non-AWS S3-compatible endpoints.

Advanced Analytics

LiDAR

Add tf_point_cloud_metadata table function to read metadata from one or more LiDAR/point cloud files, optionally filtered by a bounding box.
Add tf_load_point_cloud table function to load data from one or more LiDAR/point cloud files, optionally filtered by bounding box and optionally cached in memory for subsequent queries.

Graph and Path Functions

Add tf_graph_shortest_path table function to compute shortest edge-weighted path between two points in a graph constructed from an input edge list
Add tf_graph_shortest_paths_distances table function to compute the shortest edge-weighted distances between a starting point and all other points in a graph constructed from an input edge list.
Add tf_grid_graph_shortest_slope_weighted_path table function to compute the shortest slope-weighted path between two points along rasterized data.

Enhanced Spatial Aggregations

Support configurable aggregation types for tf_geo_rasterize and tf_geo_rasterize_slope table functions, allowing for AVG, MIN, MAX, SUM, and COUNT aggregations.
Support two-pass gaussian blur aggregation post-processing for tf_geo_rasterize and tf_geo_rasterize_slope table functions.

RF Propagation Extension Improvements

Add dynamic ray splitting to tf_rf_prop_max_signal table function for improved performance and terrain coverage.
Add variant of tf_rf_prop_max_signal table function that takes per-RF source/tower transmission power (watts) and frequency (MHz).
Add variant of generate_series table function that generates series of timestamps between a start and end timestamp at specified time intervals.

Fixed Issues

ST_Centroid now automatically picks up SRID of underlying geometry.
Fixed a crash that occurred when ST_DISTANCE had an ST_POINT input for its hash table probe column.
Fixed an issue where a query hint would not propagate to a subquery.
Improved overloaded table function type deduction eliminates type mismatches when table function outputs are used downstream.
Properly handle cases of RF sources outside of terrain bounding box for tf_rf_prop_max_signal.
Fixed an issue where specification of unsupported GEOMETRY column type during table creation could lead to a crash.
Fixed a crash that could occur due to execution of concurrent create and drop table commands.
Fixed a crash that could occur when accessing the Dashboards system table.
Fixed a crash that could occur as a result of type mismatches in ITAS queries.
Fixed an issue that could occur due to band name sanitization during raster imports.
Fixed a memory leak that could occur when dropping temporary tables.
Fixed a crash that could occur due to concurrent execution of a select query and long-running write query on the same table.

Heavy Render - New Features and Improvements

Disables render group assignment by default.
Supports rendering of MULTILINESTRING geometries.
Memory footprint required for compositing renders on multi-GPU systems is significantly reduced. Any multi-GPU system will see improvements, but is most noticeable on systems with 4 or more GPUs. For example, rendering a 1400 x 1400 image results in ~450mb of memory saved when using 8 GPUs for a query. Multi-gpu system configurations should be able to set the res-gpu-mem configuration flag value lower as a result, freeing memory for other subsystems.
Adds INFO logging of peak render memory usage for the lifetime of the server process. The render memory logged is peak render query output buffer size (controlled with the render-mem-bytes configuration flag) and peak render buffer usage (controlled with the res-gpu-mem configuration flag). These peaks are logged in the INFO log on server shutdown, when GPU memory is cleared via clear_gpu_memory endpoint, or when a new peak is reached. These logged peaks can be useful to adjust the render-mem-bytes and res-gpu-mem configuration flags to improve memory utilization by avoiding reserving memory that might go unused. Examples of the log messages:
- When a new peak render-mem-bytes is reached: New peak render buffer usage (render-mem-bytes):37206200 of 1000000000
- When a new peak res-gpu-mem is reached: New peak render memory usage (res-gpu-mem): 166033024
- Peaks logged on server shutdown or on clear_gpu_memory: Render memory peak utilization:
  Query result buffer (render-mem-bytes): 37206200 of 1000000000 Images and buffers (res-gpu_mem): 660330240 Total allocated: 1660330240

Heavy Render - Fixed Issues

Fixed an issue the occurred when trying to hit-test a multiline SQL expression.

Heavy Immerse - New Features and Improvements

Dashboard and chart image export
Crossfilter replay
Improved popup support in the base 3D chart
New Multilayer CPU rendered Geo charts: Pointmap, Linemap, and Choropleth (Beta)
Control Panel (Beta)
Redshift, Snowflake, and PostGIS HeavyConnect support (Beta)
Skew-T chart (Beta)
Support for limiting the number of charts in a dashboard through the ui/limit_charts_per_dashboard feature flag. The default value is 0 (no limit).

Heavy Immerse - Fixed Issues

Fixed duplicate column names importer error.
Various bug fixes and user-interface improvements.

6.1.1 - July 26, 2022

HeavyDB - New Features and Improvements

Adds support for POLYGON to MULTIPOLYGON promotion in the load table Thrift APIs and SQLImporter.

HeavyDB - Fixed Issues

Fixes an issue that caused an intermittent KafkaImporter crash on CentOS 7.9.
Fixes an issue that cause incorrect results in multiple aggregation of date columns that include COUNT DISTINCT.

Heavy Immerse - New Features and Improvements

Adds support for limiting the number of charts in a dashboard through the ui/limit_charts_per_dashboard feature flag. The default value is 0 (no limit).

6.1.0 - July 7, 2022

HeavyDB - New Features

Administrative Controls and Feedback

Adds a new set of log-based (request_logs, server_logs, web_server_logs, and web_server_access_logs) system tables.
Adds a new Request Logs and Monitoring dashboard.
Adds a new SHOW CREATE SERVER command, which displays the create server DDL for a specified foreign server.
Adds support for non-super-user execution of SHOW CREATE TABLE on views.
Adds a new ALTER SESSION SET EXECUTOR_DEVICE command, which updates the type of executor device (CPU or GPU) for the current session.
Adds a new ALTER SESSION SET CURRENT_DATABASE command, which updates the connected database for the current session.
Adds a new ALTER DATABASE OWNER TO command, which allows super users to change the owner of a database.

Core SQL Functionality Improvements

Extends the INSERT command to support inserting multiple rows at once/batch insert.
Add support for default values on shard key columns.
Add initial support for Window function framing, including support for BETWEEN ROWS clause for all numeric and date/time types, and BETWEEN RANGE clause for numeric types.
Enable group-by push down for UNION ALL such that group-by and aggregate operations applied to the output of UNION ALL are evaluated on the UNION ALL inputs, improving performance.
Add support for LCASE (alias for LOWER), UCASE (alias for UPPER), LEFT, and RIGHT string functions

Import / Export

Adds a new trim_spaces option for delimited file import.
(BETA) Adds data import/COPY FROM support from Relational Database Management Systems and Data Warehouses using the Open Database Connectivity (ODBC) interface.

Performance Enhancements

Initial support for CUDA streams to parallelize GPU computation and memory transfers.
Increase per-GPU projection limit with watchdog enabled from 32M to 128M rows to take advantage of improvements in large projection support in recent releases.

Extensibility Infrastructure

Add new SHOW FUNCTIONS and SHOW FUNCTIONS DETAILS commands to show registered compile-time UDFs and extension functions in the system and their arguments, and SHOW RUNTIME FUNCTIONS [DETAILS] and SHOW RUNTIME TABLE FUNCTIONS [DETAILS] to show user-defined runtime functions/table functions.
Support timestamp inputs and outputs for table functions.

Advanced Analytics

Add tf_compute_dwell_times table function, that given a query input input with entity keys and timestamps, and parameters specifying the minimum session time, minimum number of session records, and max inactive seconds, outputs all unique sessions found in the data with the duration of the session (dwell time).
Add tf_feature_self_similarity table function, that given a query input of entity keys/IDs, a set of feature columns, and a metric column, scores each pair of entities based on their similarity, computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally TF/IDF weighted.
Add tf_feature_similarity table function, that given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity, computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.

HeavyDB - Fixed Issues

Fixed an issue where some join queries on ODBC-backed foreign tables can return empty result sets for the first query.
Fixed an issue where append refreshes on foreign tables backed by delimited or regex-parsed files ignore file-path filter and sort options.
Fixed a crash that can occur when very large dates are specified for the refresh_start_date_time foreign table option.
Fixed a crash that can occur when a foreign table’s data source is updated within a refresh window.
Fixed an issue where databases owned by deleted user accounts are not visible, and adds a restriction that prevents dropping users who own databases.
Fixed an issue where joins on string dictionary-encoded columns would hit spurious none-encoded string translation
Fixed issue with certain UNION ALL query patterns, such as UNION ALL containing logical values.
Disabled KEY_FOR_STRING for UNNEST operations on string dictionary-encoded columns, to prevent a crash.
Fixed an issue where logged stats for raster imports could overflow.
Fixed an issue where joins on synthetic tables (for example, created with a VALUES statement or table function without an underlying table) could crash.
Fixed an issue where require checks used on string dictionary inputs to a table function could crash.
Fixed a crash and/or wrong query results that can occur when a decimal literal is used in a nested query.

HeavyRender - Fixed Issues

Fixed a potential crash when attempting to auto-retry a render immediately after an OutOfGpuMemory exception is thrown. This crash can occur only if the render-oom-retry-threshold configuration option is set.
Fixed a regression where polygons with transparent colors are rendered opaque.
Corrects an issue with point/symbol rendering with explicitly Vega projections where the projection was not being updated when panned/zoomed if the query did not change.
Significant improvements in hit-testing consistency and stability when rendering queries with subqueries, window functions, or table functions.

Heavy Immerse - New Features

Formatting and Cartography

Font size controls.
Borders and Zebra Striping in Table charts.
Justify content in Table charts.
Customization polygon border control.
Allow measure date formatting for table charts.
Extend y-axis on Vega combo charts to end at the next whole value past the highest data point.

User Experience Enhancements

Add layer visibility toggle to kebab dropdown on multi-layer raster charts.
Made unsaved changes modal less aggressive.
Custom Source Table Functions Browser
Don’t show unsaved warning modal after adding default filter set.

Connectivity

(BETA) PostgreSQL connector.
Allows maxBounds to be set in servers.json.

Heavy Immerse - Fixed Issues

Toggle dashboard unsaved when updating annotations.
Dashboard save state behavior fixes.
Table Chart order by group keys when present.
Use key_for_string when ordering by known dictionary measures/dimensions.
Add default formatting for date/time on table chart.
Add admin feature flag to hide key manager.
Customizable polygon border color and existing border bug fixes.
Cannot append to table using PostgreSQL connector.
Building a raster chart with the layer visibility toggle feature flag enabled causes a crash.

6.0.0 - April 11, 2022

HeavyDB - New Features and Improvements

Support for fast on dictionary-encoded text columns (the default), including LOWER, UPPER, INITCAP, TRIM/LTRIM/RTRIM, LPAD/RPAD, REVERSE, REPEAT, SUBSTRING/SUBSTR, REPLACE, OVERLAY, SPLIT_PART, REGEXP_REPLACE, REGEXP_SUBSTR, AND CONCAT (||). The output of these expressions can be chained, grouped-by, and used in both the left and right side of join predicates.
Support for fast string equality/inequality operations without the previous requirement of watchdog disablement when the two columns do not share dictionaries.
Support for fast case statements with multiple text column inputs that do not share dictionary-encoded strings.
Support for ENCODE_TEXT to encode none-encoded strings, which can then be grouped on and manipulated like dictionary-encoded strings. This operator is not intended for interactive use at scale but instead for ELT-like scenarios. Use the new server flag watchdog-none-encoded-string-translation-limit to set the upper cardinality allowed for such operations (1,000,000 by default).
Support for UNION ALL is enabled by default, and now works across string columns that do not share dictionaries with significantly better performance than in the previous release.
Window functions now support expressions in the PARTITION BY and ORDER clauses.
Support for subqueries in CASE statement clauses.
SHOW USER DETAILS is changed to only list those users with access to the currently-selected database. Previously, all users on the HeavyDB instance would be listed; this is still available to superusers with .
10X improvements in initial join performance (including geo joins) through faster, parallelized hash table construction, removing redundant inter-thread hash table computation.
Improved join ordering to avoid loop joins in certain scenarios.
Parallel compilation of queries as well as inter-executor generated code increases concurrency and throughput in common, Immerse-driven scenarios by up to 20%. Also decreases latency for a single user interacting with dashboards or issuing SQL queries in a way that required new plans to be code-generated.
New result set recycler allows query substeps (expensive in subqueries) can be cached using SQL hints ( /*+ keep_result */), dramatically improving performance where the subquery is reused across multiple queries (for example, in Immerse) and only outer steps of the query vary.
The default for the header option of COPY TO to a CSV/TSV file has been changed from 'false' to 'true'.
Faster dictionary map in StringDictionaryProxy, accelerating various string operations involving transient entries.
Arrow execution endpoints now use multiple executors and can run concurrently like queries issued to the Thrift endpoints.
Addition of sparse dictionary output capability for Arrow queries, which automatically creates a subset of a string dictionary to send via Arrow when it detects that it is faster than sending the full, unfiltered dictionary. This provides orders-of-magnitude better server- and client-side performance and scalability for common cases where large dictionary-encoded text columns are filtered or top-k sorted such that only a small subset of dictionary entries are needed in the result set.
ST_INTERSECTS now can operate directly on top of compressed (the default) coordinates, leading to 2-3X increase in speed.
New table function framework allows for both and . Table functions can run on both CPU and GPU and are designed for efficient, scalable execution of custom algorithms in-situ on data that might be hard or impossible to implement in SQL.
Support for table function (similar to Postgres) for easy and fast integer series generation, particularly useful for left joins against binned tables to fill in gaps, whether for visualization or downstream operations like window functions, and for generation of string columns of a user-defined size and cardinality.
Support for and table functions to efficiently bin vector data into gap-free bins, with the optional ability to fill in null values, apply box blur, and compute slope and aspect ratios
Initial support for HeavyRF, a module that allows for real-time, ray-based computation of signal propagation, taking inputs of both terrain data and real or hypothetical signal sources.
Beta support for Python-defined scalar (row-level) and tabular , using the RBC library to translate Numba python code into LLVM IR that is JITed into query execution code for fast, scalable, custom user-defined capabilities.
Complete redesign and rewrite of Parquet import to one that is more robust, efficient, and performant.
Adds support for import from regex parsed files on either the server file system or S3 using the COPY FROM command.
The geo and parquet WITH options of COPY FROM have been deprecated and replaced by source_type. Using the deprecated syntax generates the following: Deprecation Warning: COPY FROM WITH (geo='true') is deprecated. Use WITH (source_type='geo_file') instead. Update any scripts you have to replace the deprecated syntax with the new syntax. For more information, see .
(BETA) Adds support for import from RDMS/data warehouses using the COPY FROM command.
Adds support.
A new default information_schema database contains that provide information regarding CPU/GPU memory utilization, storage space utilization, database objects, and database object permissions.
New that enable intuitive visualization of system resource utilization and user roles and permissions.
Support for Zarr and NetCDF raster file import.
You can now import raster files with ground control points geospatial references.
Support for file path filtering, globbing, and sorting when importing geo and raster files.
Improved error messaging when attempting to save a dashboard that uses a duplicate dashboard name.

HeavyConnect

Support for connections to delimited files on either the server file system or S3. S3 support includes an option to use the S3 Select API, which provides better performance but with limitations on supported column types.
Support for connections to Parquet files on either the server file system or S3. HeavyConnect leverages Parquet metadata to provide efficient data access and row group-level filter push down.
Parquet column type coercion. Convert Parquet column types to more memory-efficient HeavyDB column types for use cases that guarantee no loss of information.
Connections to regex parsed files on either the server file system or S3. This enables you to query unstructured text files, such as logs, by specifying regular expression patterns that extract components of the text files into table columns.
(BETA) Support for connections to Relational Database Management Systems and Data Warehouses, leveraging the Open Database Connectivity (ODBC) interface to provide seamless access to data.
(BETA) ODBC column type coercion. Use HeavyConnect to convert ODBC column types to more memory-efficient HeavyDB column types for use cases that guarantee no loss of information.
Support for scheduled data refreshes. Specify a start date time and interval at which connected data gets refreshed.
Adds support for disk level caching. By default, data fetched by HeavyConnect are cached at the disk level in addition to normal CPU/GPU level caching. This provides better overall query performance for network based connections, such as S3, and systems with limited CPU/GPU memory capacity. Disk cache size and level can be set through HeavyDB server configuration.
Adds support for file path filtering, globbing, and sorting for Parquet, delimited, and regex parsed file use cases.
Complete redesign and rewrite of the Parquet detect_column_types Thrift API. The Parquet detect/data preview feature is now more robust, efficient, and performant.

HeavyDB - Fixed Issues

Change to query interrupt mechanism allowing certain classes of queries such as loop joins to be easily and quickly interrupted.
Fixed crash that could occur with joins on predicates that had functions on the left hand side expression, i.e. geoToH3.
Fix crash that could occur with Arrow queries that did not return results.
Avoid building metadata for empty result sets.
Fixes a crash that can occur when executing queries on GPU that involve a baseline group by and variable length column projections.
Fixes some table query concurrency bottlenecks. Previously, queries such as INSERT, TRUNCATE, and DROP TABLE required system wide locks to execute and would therefore block execution of other unrelated queries. These kinds of queries can now be executed concurrently.
Fixes a crash that can occur on server restart when the disk cache is enabled and tables with cached data are deleted.
Fixes a crash that can occur when the max_rows table option is altered for an empty table.
Fixes an issue in the JDBC driver where tables from multiple databases are listed even when a single database is specified.
Fixes an issue where raster POINT column type import would incorrectly throw an exception.
Fixes a crash that can occur when restoring a dump for a table with previously deleted columns.
Updates the export COPY TO command to include headers by default.
Removes the file_type parameter from the create_table Thrift API. This parameter was not used.
Fixes a crash that can occur when executing SQL commands containing comments.
Fixed the setting for default database (DEFAULT_DB) being ignored in a SAML login for a user who already exists.

HeavyRender - New Features and Improvements

The OpenGL renderer driver has been fully removed as of this release. Vulkan is the only available driver and enables a more modern, flexible API. As a result, the renderer-use-vulkan-driver program option has been removed. Remove any references to that program option from your configuration files. For more on the move to the Vulkan driver, see .
A novel polygon rendering algorithm is now used as the default when rendering polygons. This algorithm does no triangulation nor does it require “render groups” (a hidden column to assist the old polygon rendering algorithm). However, the render groups column is still added on import as a fallback. See for more on render group deprecation.
You can now hit-test certain render queries with subqueries more effectively. For example, if the subquery is only used for filter predicates, renders should now be sped up and hit-testing more flexible.

HeavyRender - Fixed Issues

Render times are now being logged correctly (“render_vega-COMPLETED nonce:2 Total Execution: (ms), Total Render: (ms)”). The execution time and render time were incorrectly logged as 0 in Releases 5.9 and 5.10
Fixes a regression introduced in Release 5.10.0 when hit-testing an Immerse cohort-generated query. The hit-test would result in an error such as the following: “Cannot find column in hit-test cache for query …”
Resolves a crash when trying to hit-test render queries with window functions or cursorless table functions.
Fixes an issue where a multi-layer, multi-GPU render with a poly or line mark as the first layer can result in ghosting artifacts if the query associated with that layer resulted in 0 rows.
Fixes an issue when switching between a density accumulation scale with an auto-computed range (via min/max/+-1stStdDev/+-2ndStdDev) to a scale with an explicitly defined range. In this case, the explicit case was not reflected.
Removes a legacy constraint that prevented you from rendering a query that referenced one or more tables with more than one polygon/multipolygon column.

Heavy Immerse - New Features and Improvements

Improved speed of server interface using the Thrift binary protocol.
Data Manager has been redesigned to support HeavyConnect via S3, server file uploads, and expanded raster file support.
Introduced the new Gauge chart type.
Introduced a Welcome Panel and Help Center menu.
Rebranded interface for HEAVY.AI. Updated styles for the default dark and light themes.
Added option to toggle the legend on the New Combo chart.
Added configuration option for setting the default chart type.
Added configuration option for hiding specified chart types.
Added auto-selection of geo columns and measures on geo chart types.
Adjusted maximum bins for larger Top-N groups.
Added support for cross-domain configuration without SSL.
BETA: Added filter support for global custom expressions.
BETA: Introduced the new iframe chart type.
BETA: Introduced Arrow transport protocol for a limited number of chart types.

Heavy Immerse - Fixed Issues

Fixed various minor UI and performance issues.
Fixed parameter creation from dashboard title in Safari browser.
Fixed displaying of the Jupyter logo when integration is unavailable.

Configuration Parameters for HEAVY.AI Web Server

Following are the parameters for runtime settings on HeavyAI Web Server. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.

Flag

Description

Default

additional-file-upload-extensions <string>

Denote additional file extensions for uploads. Has no effect if --enable-upload-extension-check is not set.

allow-any-origin

Allows for a CORS exception to the same-origin policy. Required to be true if Immerse is hosted on a different domain or subdomain hosting heavy_web_server and heavydb.

Allowing any origin is a less secure mode than what heavy_web_server requires by default.

--allow-any-origin = false

-b | backend-url <string>

URL to http-port on heavydb. Change to avoid collisions with other services.

http://localhost:6278

-B | binary-backend-url <string>

URL to http-binary-port on heavydb.

http://localhost:6276

cert string

Certificate file for HTTPS. Change for testing and debugging.

cert.pem

-c | config <string>

Path to HeavyDB configuration file. Change for testing and debugging.

-d | data <string>

Path to HeavyDB data directory. Change for testing and debugging.

data

data-catalog <string>

Path to data catalog directory.

n/a

docs string

Path to documentation directory. Change if you move your documentation files to another directory.

docs

enable-binary-thrift

Use the binary thrift protocol.

TRUE[1]

enable-browser-logs [=arg]

Enable access to current log files via web browser. Only super users (while logged in) can access log files.

Log files are available at http[s]://host:port/logs/log_name.

The web server log files: ACCESS - http[s]://host:port/logs/access ALL - http[s]://host:port/logs/all

HeavyDB log files: INFO - http[s]://host:port/logs/info WARNING - http[s]://host:port/logs/warning ERROR - http[s]://host:port/logs/

FALSE[0]

enable-cert-verification

TLS certificate verification is a security measure that can be disabled for the cases of TLS certificates not issued by a trusted certificate authority. If using a locally or unofficially generated TLS certificate to secure the connection between heavydb and heavy_web_server, this parameter must be set to false. heavy_web_server expects a trusted certificate authority by default.

--enable-cert-verification = true

enable-cross-domain [=arg]

Enable frontend cross-domain authentication. Cross-domain session cookies require the SameSite = None; Secure headers. Can only be used with HTTPS domains; requires enable-https to be true.

FALSE[0]

enable-https

Enable HTTPS support. Change to enable secure HTTP.

enable-https-authentication

Enable PKI authentication.

enable-https-redirect [=arg]

Enable a new port that heavy_web_server listens on for incoming HTTP requests. When received, it returns a redirect response to the HTTPS port and protocol, so that browsers are immediately and transparently redirected. Use to provide an HEAVY.AI front end that can run on both the HTTP protocol (http://my-heavyai-frontend.com) on default HTTP port 80, and on the primary HTTPS protocol (https://my-heavyai-frontend.com) on default https port 443, and have requests to the HTTP protocol automatically redirected to HTTPS. Without this, requests to HTTP fail. Assuming heavy_web_server can attach to ports below 1024, the configuration would be: enable-https-redirect = TRUE = 80

FALSE[0]

enable-non-kernel-time-query-interrupt

Enable non-kernel-time query interrupt.

TRUE[1]

enable-runtime-query-interrupt

Enbale runtime query interrupt.

TRUE[1]

enable-upload-extension-check

Disables restrictive file extension upload check.

encryption-key-file-path <string>

Path to the file containing the credential payload cipher key. Key must be 256 bits in length.

-f | frontend string

Path to frontend directory. Change if you move the location of your frontend UI files.

frontend

http-to-https-redirect-port = arg

Configures the http (incoming) port used by . The port option specifies the redirect port number. Use to provide an HEAVY.AI front end that can run on both the HTTP protocol (http://my-heavyai-frontend.com) on default HTTP port 80, and on the primary HTTPS protocol (https://my-heavyai-frontend.com) on default https port 443, and have requests to the HTTP protocol automatically redirected to HTTPS. Without this, requests to HTTP fail. Assuming heavy_web_server can attach to ports below 1024, the configuration would be: enable-https-redirect = TRUE = 80

6280

idle-session-duration = arg

Idle session default, in minutes.

jupyter-prefix-string <string>

Jupyter Hub base_url for Jupyter integration.

/jupyter

jupyter-url-string <string>

URL for Jupyter integration.

-j |jwt-key-file

Path to a key file for client session encryption.

The file is expected to be a PEM-formatted ( .pem ) certificate file containing the unencrypted private key in PKCS #1, PCKS #8, or ASN.1 DER form.

Example PEM file creation using OpenSSL.

Required only if using a high-availability server configuration or another server configuration that requires an instance of Immerse to talk to multiple heavy_web_server instances.

Each heavy_web_server instance needs to use the same encryption key to encrypt and decrypt client session information which is used for session persistence ("sessionization") in Immerse.

key <string>

Key file for HTTPS. Change for testing and debugging.

key.pem

max-tls-version

Refers to the version of TLS encryption used to secure web protocol connections. Specifies a maximum TLS version.

min-tls-version

Refers to the version of TLS encryption used to secure web protocol connections. Specifies a minimum TLS version.

--min-tls-version = VersionTLS12

peer-cert <string>

Peer CA certificate PKI authentication.

peercert.pem

-p | port int

Frontend server port. Change to avoid collisions with other services.

6273

-r | read-only

Enable read-only mode. Prevent changes to the data.

secure-acao-uri

If set, ensures that all Access-Allow-Origin headers are set to the value provided.

servers-json <string>

Path to servers.json. Change for testing and debugging.

session-id-header <string>

Session ID header.

immersesid

ssl-cert <string>

SSL validated public certificate.

sslcert.pem

ssl-private-key <string>

SSL private key file.

sslprivate.key

strip-x-headers <strings>

List of custom X http request headers to be removed from incoming requests. Use --strip-x-headers=""to allow all X headers through.

[X-HeavyDB-Username]

timeout duration

Maximum request duration in #h#m#s format. For example 0h30m0s represents a duration of 30 minutes. Controls the maximum duration of individual HTTP requests. Used to manage resource exhaustion caused by improperly closed connections. This also limits the execution time of queries made over the Thrift HTTP transport. Increase the duration if queries are expected to take longer than the default duration of one hour; for example, if you COPY FROM a large file when using heavysql with the HTTP transport.

1h0m0s

tls-cipher-suites <strings>

Refers to the combination of algorithms used in TLS encryption to secure web protocol connections.

All available TLS cipher suites compatible with HTTP/2:

TLS_RSA_WITH_RC4_128_SHA
TLS_RSA_WITH_AES_128_CBC_SHA
TLS_ECDHE_RSA_WITH_AES_128_ GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_128_ GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_ GCM_SHA384
TLS_ECDHE_ECDSA_WITH_AES_256_ GCM_SHA384
TLS_ECDHE_RSA_WITH_CHACHA20_ POLY1305
TLS_ECDHE_ECDSA_WITH_CHACHA20_ POLY1305
TLS_AES_128_GCM_SHA256
TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256
TLS_FALLBACK_SCSV
<code></code>
Limit security vulnerabilities by specifying the allowed TLS ciphers in the encryption used to secure web protocol connections.

The following cipher suites are accepted by default:

TLS_ECDHE_RSA_WITH_AES_128_ GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_128_ GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_ GCM_SHA384
TLS_RSA_WITH_AES_256_GCM_ SHA384

tls-curves <strings>

Refers to the types of Elliptic Curve Cryptography (ECC) used in TLS encryption to secure web protocol connections.

All available TLS elliptic Curve IDs:

secp256r1 (Curve ID P256)
CurveP256 (Curve ID P256)
secp384r1 (Curve ID P384)
CurveP384 (Curve ID P384)
secp521r1 (Curve ID P521)
CurveP521 (Curve ID P521)
x25519 (Curve ID X25519)
X25519 (Curve ID X25519)
Limit security vulnerabilities by specifying the allowed TLS cipher suites in the encryption used to secure web protocol connections.

The following TLS curves are accepted by default:

CurveP521
CurveP384
CurveP256

tmpdir string

Path for temporary file storage. Used as a staging location for file uploads. Consider locating this directory on the same file system as the HEAVY.AI data directory. If not specified on the command line, heavyai_web_server recognizes the standard TMPDIR environment variable as well as a specific HEAVYAI_TMPDIR environment variable, the latter of which takes precedence. If you use neither the command-line argument nor one of the environment variables, the default, /tmp/ is used.

/tmp

ultra-secure-mode

Enables secure mode that sets Access-Allow-Origin headers to --secure-acao-uriand sets security headers like X-Frame-Options, Content-Security-Policy, and Strict-Transport-Security.

-v | verbose

Enable verbose logging. Adds log messages for debugging purposes.

version

Return version.

Datatypes

Datatypes and Fixed Encoding

This topic describes standard datatypes and space-saving variations for values stored in HEAVY.AI.

Datatypes

Each HEAVY.AI datatype uses space in memory and on disk. For certain datatypes, you can use fixed encoding for a more compact representation of these values. You can set a default value for a column by using the DEFAULT constraint; for more information, see CREATE TABLE.

Datatypes, variations, and sizes are described in the following table.

Datatype

Size (bytes)

Notes

BIGINT

Minimum value: -9,223,372,036,854,775,807; maximum value: 9,223,372,036,854,775,807.

BIGINT ENCODING FIXED(8)

Minimum value: -127; maximum value: 127

BIGINT ENCODING FIXED(16)

Same as SMALLINT.

BIGINT ENCODING FIXED(32)

Same as INTEGER.

BOOLEAN

TRUE: 'true', '1', 't'. FALSE: 'false', '0', 'f'. Text values are not case-sensitive.

DATE]

Same as DATE ENCODING DAYS(32).

DATE ENCODING DAYS(16)

Range in days: -32,768 - 32,767 Range in years: +/-90 around epoch, April 14, 1880 - September 9, 2059. Minumum value: -2,831,155,200; maximum value: 2,831,068,800. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.

DATE ENCODING DAYS(32)

Range in years: +/-5,883,517 around epoch. Maximum date January 1, 5885487 (approximately). Minimum value: -2,147,483,648; maximum value: 2,147,483,647. Supported formats when using COPY FROM: mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, dd/mmm/yyyy.

DATE ENCODING FIXED(16)

In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

DATE ENCODING FIXED(32)

In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

DECIMAL

2, 4, or 8

Takes precision and scale parameters: DECIMAL(precision,scale)

Size depends on precision:

Up to 4: 2 bytes
5 to 9: 4 bytes
10 to 18 (maximum): 8 bytes

Scale must be less than precision.

DOUBLE

Variable precision. Minimum value: -1.79e308; maximum value: 1.79e308

EPOCH

Seconds ranging from -30610224000 (1/1/1000 00:00:00) through 185542587100800 (1/1/5885487 23:59:59).

FLOAT

Variable precision. Minimum value: -3.4e38; maximum value: 3.4e38.

INTEGER

Minimum value: -2,147,483,647; maximum value: 2,147,483,647.

INTEGER ENCODING FIXED(8)

Minumum value: -127; maximum value: 127.

INTEGER ENCODING FIXED(16)

Same as SMALLINT.

LINESTRING

Variable[2]

Geospatial datatype. A sequence of 2 or more points and the lines that connect them. For example: LINESTRING(0 0,1 1,1 2)

MULTILINESTRING

Variable[2]

Geospatial datatype. A set of associated lines. For example: MULTILINESTRING((0 0, 1 0, 2 0), (0 1, 1 1, 2 1))

MULTIPOINT

Variable[2]

Geospatial datatype. A set of points. For example: MULTIPOINT((0 0), (1 0), (2 0))

MULTIPOLYGON

Variable[2]

Geospatial datatype. A set of one or more polygons. For example:MULTIPOLYGON(((0 0,4 0,4 4,0 4,0 0),(1 1,2 1,2 2,1 2,1 1)), ((-1 -1,-1 -2,-2 -2,-2 -1,-1 -1)))

POINT

Variable[2]

Geospatial datatype. A point described by two coordinates. When the coordinates are longitude and latitude, HEAVY.AI stores longitude first, and then latitude. For example: POINT(0 0)

POLYGON

Variable[2]

Geospatial datatype. A set of one or more rings (closed line strings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings). For example: POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1))

SMALLINT

Minimum value: -32,767; maximum value: 32,767.

SMALLINT ENCODING FIXED(8)

Minumum value: -127 ; maximum value: 127.

TEXT ENCODING DICT

Max cardinality 2 billion distinct string values. Maximum string length is 32,767.

TEXT ENCODING DICT(8)

Max cardinality 255 distinct string values.

TEXT ENCODING DICT(16)

Max cardinality 64 K distinct string values.

TEXT ENCODING NONE

Variable

Size of the string + 6 bytes. Maximum string length is 32,767.

Note: Importing TEXT ENCODING NONE fields using the has limitations for Immerse. When you use string instead of string [dict. encode] for a column when importing, you cannot use that column in Immerse dashboards.

TIME

Minimum value: 00:00:00; maximum value: 23:59:59.

TIME ENCODING FIXED(32)

Minimum value: 00:00:00; maximum value: 23:59:59.

TIMESTAMP(0)

Linux timestamp from -30610224000 (1/1/1000 00:00:00) through 29379542399 (12/31/2900 23:59:59). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS or YYYY-MM-DDTHH:MM:SS (the T is dropped when the field is populated).

TIMESTAMP(3) (milliseconds)

Linux timestamp from -30610224000000 (1/1/1000 00:00:00.000) through 29379542399999 (12/31/2900 23:59:59.999). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.fff or YYYY-MM-DDTHH:MM:SS.fff (the T is dropped when the field is populated).

TIMESTAMP(6) (microseconds)

Linux timestamp from -30610224000000000 (1/1/1000 00:00:00.000000) through 29379542399999999 (12/31/2900 23:59:59.999999). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.ffffff or YYYY-MM-DDTHH:MM:SS.ffffff (the T is dropped when the field is populated).

TIMESTAMP(9) (nanoseconds)

Linux timestamp from -9223372036854775807 (09/21/1677 00:12:43.145224193) through 9223372036854775807 (11/04/2262 23:47:16.854775807). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.fffffffff or YYYY-MM-DDTHH:MM:SS.fffffffff (the T is dropped when the field is populated).

TIMESTAMP ENCODING FIXED(32)

Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07. Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS or YYYY-MM-DDTHH:MM:SS (the T is dropped when the field is populated).

TINYINT

Minimum value: -127; maximum value: 127.

[1] - In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see DATE ENCODING DAYS(16)).

[2] - See Storage and Compression below for information about geospatial datatype sizes.

HEAVY.AI does not support geometry arrays.
Timestamp values are always stored in 8 bytes. The greater the precision, the lower the fidelity.

Geospatial Datatypes

HEAVY.AI supports the LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON, POINT, and MULTIPOINT geospatial datatypes.

In the following example:

p0, p1, ls0, and poly0 are simple (planar) geometries.
p4 is point geometry with Web Mercator longitude/latitude coordinates.
p2, p3, mp, ls1, ls2, mls1, mls2, poly1, and mpoly0 are geometries using WGS84 SRID=4326 longitude/latitude coordinates.

CREATE TABLE geo ( name TEXT ENCODING DICT(32),
                   p0 POINT,
                   p1 GEOMETRY(POINT),
                   p2 GEOMETRY(POINT, 4326),
                   p3 GEOMETRY(POINT, 4326) ENCODING NONE,
                   p4 GEOMETRY(POINT, 900913),
                   mp GEOMETRY(MULTIPOINT, 4326),
                   ls0  LINESTRING,
                   ls1 GEOMETRY(LINESTRING, 4326) ENCODING COMPRESSED(32),
                   ls2 GEOMETRY(LINESTRING, 4326) ENCODING NONE,
                   mls1 GEOMETRY(MULTILINESTRING, 4326) ENCODING COMPRESSED(32),
                   mls2 GEOMETRY(MULTILINESTRING, 4326) ENCODING NONE,
                   poly0 POLYGON,
                   poly1 GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32),
                   mpoly0 GEOMETRY(MULTIPOLYGON, 4326)
                  );

Storage

Geometry storage requirements are largely dependent on coordinate data. Coordinates are normally stored as 8-byte doubles, two coordinates per point, for all points that form a geometry. Each POINT geometry in the p1 column, for example, requires 16 bytes.

Compression

WGS84 (SRID 4326) coordinates are compressed to 32 bits by default. This sacrifices some precision but reduces storage requirements by half.

For example, columns p2, mp, ls1, mls1, poly1, and mpoly0 in the table defined above are compressed. Each geometry in the p2 column requires 8 bytes, compared to 16 bytes for p0.

You can explicitly disable compression. WGS84 columns p3, ls2, mls2 are not compressed and continue using doubles. Simple (planar) columns p0, p1, ls0, poly1 and non-4326 column p4 are not compressed.

For more information about geospatial datatypes and functions, see Geospatial Capabilities.

Defining Arrays

Define datatype arrays by appending square brackets, as shown in the arrayexamples DDL sample.

CREATE TABLE arrayexamples (
  tiny_int_array TINYINT[],
  int_array INTEGER[],
  big_int_array BIGINT[],
  text_array TEXT[] ENCODING DICT(32), --OmniSci supports only DICT(32) TEXT arrays.
  float_array FLOAT[],
  double_array DOUBLE[],
  decimal_array DECIMAL(18,6)[],
  boolean_array BOOLEAN[],
  date_array DATE[],
  time_array TIME[],
  timestamp_array TIMESTAMP[])

You can also define fixed-length arrays. For example:

CREATE TABLE arrayexamples (
  float_array3 FLOAT[3],
  date_array4 DATE[4]

Fixed-length arrays require less storage space than variable-length arrays.

Fixed Encoding

To use fixed-length fields, the range of the data must fit into the constraints as described. Understanding your schema and the scope of potential values in each field helps you to apply fixed encoding types and save significant storage space.

These encodings are most effective on low-cardinality TEXT fields, where you can achieve large savings of storage space and improved processing speed, and on TIMESTAMP fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07. If a TEXT ENCODING field does not match the defined cardinality, HEAVY.AI substitutes a NULL value and logs the change.

For DATE types, you can use the terms FIXED and DAYS interchangeably. Both are synonymous for the DATE type in HEAVY.AI.

Some of the INTEGER options overlap. For example, INTEGER ENCODINGFIXED(8) and SMALLINT ENCODINGFIXED(8) are essentially identical.

Shared Dictionaries

You can improve performance of string operations and optimize storage using shared dictionaries. You can share dictionaries within a table or between different tables in the same database. The table with which you want to share dictionaries must exist when you create the table that references the TEXT ENCODING DICT field, and the column that you are referencing in that table must also exist. The following small DDL shows the basic structure:

CREATE TABLE text_shard (
i TEXT ENCODING DICT(32),
s TEXT ENCODING DICT(32),
SHARD KEY (i))
WITH (SHARD_COUNT = 2);

CREATE TABLE text_shard1 (
i TEXT,
s TEXT ENCODING DICT(32),
SHARD KEY (i),
SHARED DICTIONARY (i) REFERENCES text_shard(i))
WITH (SHARD_COUNT = 2);

In the table definition, make sure that referenced columns appear before the referencing columns.

For example, this DDL is a portion of the schema for the flights database. Because airports are both origin and destination locations, it makes sense to reuse the same dictionaries for name, city, state, and country values.

create table flights (
*
*
*
dest_name TEXT ENCODING DICT,
dest_city TEXT ENCODING DICT,
dest_state TEXT ENCODING DICT,
dest_country TEXT ENCODING DICT,

*
*
*
origin_name TEXT,
origin_city TEXT,
origin_state TEXT,
origin_country TEXT,
*
*
*

SHARED DICTIONARY (origin_name) REFERENCES flights(dest_name),
SHARED DICTIONARY (origin_city) REFERENCES flights(dest_city),
SHARED DICTIONARY (origin_state) REFERENCES flights(dest_state),
SHARED DICTIONARY (origin_country) REFERENCES flights(dest_country),
*
*
*
)
WITH(
*
*
*
)

To share a dictionary in a different existing table, replace the table name in the REFERENCES instruction. For example, if you have an existing table called us_geography, you can share the dictionary by following the pattern in the DDL fragment below.

create table flights (

*
*
*

SHARED DICTIONARY (origin_city) REFERENCES us_geography(city),
SHARED DICTIONARY (origin_state) REFERENCES us_geography(state),
SHARED DICTIONARY (origin_country) REFERENCES us_geography(country),
SHARED DICTIONARY (dest_city) REFERENCES us_geography(city),
SHARED DICTIONARY (dest_state) REFERENCES us_geography(state),
SHARED DICTIONARY (dest_country) REFERENCES us_geography(country),

*
*
*
)
WITH(
*
*
*
);

The referencing column cannot specify the encoding of the dictionary, because it uses the encoding from the referenced column.

Geospatial Capabilities

HEAVY.AI supports a subset of object types and functions for storing and writing queries for geospatial definitions.

Geospatial Datatypes

Type

Size

Example

LINESTRING

Variable

A sequence of 2 or more points and the lines that connect them. For example: LINESTRING(0 0,1 1,1 2)

MULTIPOLYGON

Variable

A set of one or more polygons. For example:MULTIPOLYGON(((0 0,4 0,4 4,0 4,0 0),(1 1,2 1,2 2,1 2,1 1)), ((-1 -1,-1 -2,-2 -2,-2 -1,-1 -1)))

POINT

Variable

A point described by two coordinates. When the coordinates are longitude and latitude, HEAVY.AI stores longitude first, and then latitude. For example: POINT(0 0)

POLYGON

Variable

A set of one or more rings (closed line strings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings). For example: POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1))

MULTIPOINT

Variable

A set of one or more points. For example: MULTIPOINT((0 0), (1 1), (2 2))

MULTILINESTRING

Variable

A set of one or more associated lines, each of two or more points. For example: MULTILINESTRING((0 0, 1 0, 2 0), (0 1, 1 1, 2 1))

For information about geospatial datatype sizes, see Storage and Compression in Datatypes.

For more information on WKT primitives, see Wikipedia: Well-known Text: Geometric objects.

HEAVY.AI supports SRID 4326 (WGS 84) and 900913 (Google Web Mercator), and 32601-32660,32701-32760 (Universal Transverse Mercator (UTM) Zones). When using geospatial fields, you set the SRID to determine which reference system to use. HEAVY.AI does not assign a default SRID.

CREATE TABLE simple_geo (
                          name TEXT ENCODING DICT(32), 
                          location GEOMETRY(POINT,4326)
                         );

If you do not set the SRID of the geo field in the table, you can set it in a SQL query using ST_SETSRID(column_name, SRID). For example, ST_SETSRID(a.pt,4326).

When representing longitude and latitude, the first coordinate is assumed to be longitude in HEAVY.AI geospatial primitives.

You create geospatial objects as geometries (planar spatial data types), which are supported by the planar geometry engine at run time. When you call ST_DISTANCE on two geometry objects, the engine returns the shortest straight-line planar distance, in degrees, between those points. For example, the following query returns the shortest distance between the point(s) in p1 and the polygon(s) in poly1:

SELECT ST_DISTANCE(p1, poly1) FROM geo1;

For information about importing data, see Importing Geospatial Data.

Geospatial Literals

Geospatial functions that expect geospatial object arguments accept geospatial columns, geospatial objects returned by other functions, or string literals containing WKT representations of geospatial objects. Supplying a WKT string is equivalent to calling a geometry constructor. For example, these two queries are identical:

SELECT COUNT(*) FROM geo1 WHERE ST_DISTANCE(p1, `POINT(1 2)`) < 1.0;
SELECT COUNT(*) FROM geo1 WHERE ST_DISTANCE(p1, ST_GeomFromText('POINT(1 2)')) < 1.0;

You can create geospatial literals with a specific SRID. For example:

SELECT ST_CONTAINS(
                     mpoly2, 
                     ST_GeomFromText('POINT(-71.064544 42.28787)', 4326)
                   )
                   FROM geo2;

Support for Geography

HEAVY.AI provides support for geography objects and geodesic distance calculations, with some limitations.

Exporting Coordinates from Immerse

HeavyDB supports import from any coordinate system supported by the Geospatial Data Abstraction Library (GDAL). On import, HeavyDB will convert to and store in WGS84 encoding, and rendering is accurate in Immerse.

However, no built-in way to reference the original coordinates currently exists in Immerse, and coordinates exported from Immerse will be WGS84 coordinates. You can work around this limitation by adding to the dataset a column or columns in non-geo format that could be included for display in Immerse (for example, in a popup) or on export.

Distance Calculation

Currently, HEAVY.AI supports spheroidal distance calculation between:

Two points using either SRID 4326 or 900913.
A point and a polygon/multipolygon using SRID 900913.

Using SRID 900913 results in variance compared to SRID 4326 as polygons approach the North and South Poles.

The following query returns the points and polygons within 1,000 meters of each other:

SELECT a.poly_name, b.pt_name FROM poly a, pt b 
WHERE ST_Distance(
   ST_Transform(b.heavyai_geo, 900913),
   ST_Transform(b.location, 900913))<1000;

See the tables in Geospatial Functions below for examples.

Geospatial Functions

HEAVY.AI supports the functions listed.

Geometry Constructors

Function

Description

ST_Centroid

Computes the geometric center of a geometry as a POINT.

ST_GeomFromText(WKT)

Return a specified geometry value from Well-known Text representation.

ST_GeomFromText(WKT, SRID)

Return a specified geometry value from Well-known Text representation and an SRID.

ST_GeogFromText(WKT)

Return a specified geography value from Well-known Text representation.

ST_GeogFromText(WKT, SRID)

Return a specified geography value from Well-known Text representation and an SRID.

ST_Point(double lon, double lat)

Return a point constructed on the fly from the provided coordinate values. Constant coordinates result in construction of a POINT literal.

Example: ST_Contains(poly4326, ST_SetSRID(ST_Point(lon, lat), 4326))

Geometry Processing

Function

Description

ST_Buffer

Returns a geometry covering all points within a specified distance from the input geometry. Performed by the GEOS module. The output is currently limited to the MULTIPOLYGON type.

Calculations are in the units of the input geometry’s SRID. Buffer distance is expressed in the same units. Example:

SELECT ST_Buffer('LINESTRING(0 0, 10 0, 10 10)', 1.0);

Special processing is automatically applied to WGS84 input geometries (SRID=4326) to limit buffer distortion:

Implementation first determines the best planar SRID to which to project the 4326 input geometry.
Preferred SRIDs are UTM and Lambert (LAEA) North/South zones, with Mercator used as a fallback.
Buffer distance is interpreted as distance in meters (units of all planar SRIDs being considered).
The input geometry is transformed to the best planar SRID and handed to GEOS, along with buffer distance.
The buffer geometry built by GEOS is then transformed back to SRID=4326 and returned.

Example: Build 10-meter buffer geometries (SRID=4326) with limited distortion:

SELECT ST_Buffer(poly4326, 10.0) FROM tbl;

ST_Centroid

Computes the geometric center of a geometry as a POINT.

FunctionDescription Special processing is automatically applied to WGS84 input geometries (SRID=4326) to limit buffer distortion:

Implementation first determines the best planar SRID to which to project the 4326 input geometry.
Preferred SRIDs are UTM and Lambert (LAEA) North/South zones, with Mercator used as a fallback.
Buffer distance is interpreted as distance in meters (units of all planar SRIDs being considered).
The input geometry is transformed to the best planar SRID and handed to GEOS, along with buffer distance.
The buffer geometry built by GEOS is then transformed back to SRID=4326 and returned.

Example: Build 10-meter buffer geometries (SRID=4326) with limited distortion:SELECT ST_Buffer(poly4326, 10.0) FROM tbl; .ST_CentroidComputes the geometric center of a geometry as a POINT.

Geometry Editors

Function

Description

ST_TRANSFORM

Returns a geometry with its coordinates transformed to a different spatial reference. Currently, WGS84 to Web Mercator transform is supported. For example:ST_DISTANCE( ST_TRANSFORM(ST_GeomFromText('POINT(-71.064544 42.28787)', 4326), 900913), ST_GeomFromText('POINT(-13189665.9329505 3960189.38265416)', 900913) )

ST_TRANSFORM is not currently supported in projections. It can be used only to transform geo inputs to other functions, such as ST_DISTANCE.

ST_SETSRID

Set the to a specific integer value. For example:

ST_TRANSFORM(

ST_SETSRID(ST_GeomFromText('POINT(-71.064544 42.28787)'), 4326), 900913 )

Geometry Accessors

Function

Description

ST_X

Returns the X value from a POINT column.

ST_Y

Returns the Y value from a POINT column.

ST_XMIN

Returns X minima of a geometry.

ST_XMAX

Returns X maxima of a geometry.

ST_YMIN

Returns Y minima of a geometry.

ST_YMAX

Returns Y maxima of a geometry.

ST_STARTPOINT

Returns the first point of a LINESTRING as a POINT.

ST_ENDPOINT

Returns the last point of a LINESTRING as a POINT.

ST_POINTN

Return the Nth point of a LINESTRING as a POINT.

ST_NPOINTS

Returns the number of points in a geometry.

ST_NRINGS

Returns the number of rings in a POLYGON or a MULTIPOLYGON.

ST_SRID

Returns the spatial reference identifier for the underlying object.

ST_NUMGEOMETRIES

Returns the MULTI count of MULTIPOINT, MULTILINESTRING or MULTIPOLYGON. Returns 1 for non-MULTI geometry.

Overlay Functions

Function

Description

ST_INTERSECTION

Returns a geometry representing an intersection of two geometries; that is, the section that is shared between the two input geometries. Performed by the GEOS module.

The output is currently limited to MULTIPOLYGON type, because HEAVY.AI does not support mixed geometry types within a geometry column, and ST_INTERSECTION can potentially return points, lines, and polygons from a single intersection operation. Lower-dimension intersecting features such as points and line strings are returned as very small buffers around those features. If needed, true points can be recovered by applying the ST_CENTROID method to point intersection results. In addition, ST_PERIMETER/2 of resulting line intersection polygons can be used to approximate line length. Empty/NULL geometry outputs are not currently supported.

Examples: SELECT ST_Intersection('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))'); SELECT ST_Area(ST_Intersection(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;

ST_DIFFERENCE

Returns a geometry representing the portion of the first input geometry that does not intersect with the second input geometry. Performed by the GEOS module. Input order is important; the return geometry is always a section of the first input geometry.

The output is currently limited to MULTIPOLYGON type, for the same reasons described in ST_INTERSECTION. Similar post-processing methods can be applied if needed. Empty/NULL geometry outputs are not currently supported.

Examples: SELECT ST_Difference('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))'); SELECT ST_Area(ST_Difference(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;

ST_UNION

Returns a geometry representing the union (or combination) of the two input geometries. Performed by the GEOS module.

The output is currently limited to MULTIPOLYGON type for the same reasons described in ST_INTERSECTION. Similar post-processing methods can be applied if needed. Empty/NULL geometry outputs are not currently supported.

Examples: SELECT ST_UNION('POLYGON((0 0,3 0,3 3,0 3))', 'POLYGON((1 1,4 1,4 4,1 4))'); SELECT ST_AREA(ST_UNION(poly, 'POLYGON((1 1,3 1,3 3,1 3,1 1))')) FROM tbl;

Spatial Relationships and Measurements

Function

Description

ST_DISTANCE

Returns shortest planar distance between geometries. For example: ST_DISTANCE(poly1, ST_GeomFromText('POINT(0 0)')) Returns shortest geodesic distance between two points, in meters, if given two point geographies. Point geographies can be specified through casts from point geometries or as literals. For example: ST_DISTANCE( CastToGeography(p2), ST_GeogFromText('POINT(2.5559 49.0083)', 4326) )

SELECT a.name, ST_DISTANCE( CAST(a.pt AS GEOGRAPHY), CAST(b.pt AS GEOGRAPHY) ) AS dist_meters FROM starting_point a, destination_points b;

You can also calculate the distance between a POLYGON and a POINT. If both fields use SRID 4326, then the calculated distance is in 4326 units (degrees). If both fields use SRID 4326, and both are transformed into 900913, then the results are in 900913 units (meters).

The following SQL code returns the names of polygons where the distance between the point and polygon is less than 1,000 meters.

SELECT a.poly_name FROM poly a, point b WHERE ST_DISTANCE( ST_TRANSFORM(b.location,900913), ST_TRANSFORM(a.heavyai_geo,900913) ) < 1000;

ST_EQUALS

Returns TRUE if the first input geometry and the second input geometry are spatially equal; that is, they occupy the same space. Different orderings of points can be accepted as equal if they represent the same geometry structure.

POINTs comparison is performed natively. All other geometry comparisons are performed by GEOS.

If input geometries are both uncompressed or compressed, all comparisons to identify equality are precise. For mixed combinations, the comparisons are performed with a compression-specific tolerance that allows recognition of equality despite subtle precision losses that the compression may introduce. Note: Geo columns and literals with SRID=4326 are compressed by default.

Examples: SELECT COUNT(*) FROM tbl WHERE ST_EQUALS('POINT(2 2)', pt); SELECT ST_EQUALS('POLYGON ((0 0,1 0,0 1))', 'POLYGON ((0 0,0 0.5,0 1,1 0,0 0))');

ST_MAXDISTANCE

Returns longest planar distance between geometries. In effect, this is the diameter of a circle that encloses both geometries.For example:

Currently supported variants:

ST_CONTAINS

Returns true if the first geometry object contains the second object. For example:

You can also use ST_CONTAINS to:

Return the count of polys that contain the point (here as WKT): SELECT count(*) FROM geo1 WHERE ST_CONTAINS(poly1, 'POINT(0 0)');
Return names from a polys table that contain points in a points table: SELECT a.name FROM polys a, points b WHERE ST_CONTAINS(a.heavyai_geo, b.location);
Return names from a polys table that contain points in a points table, using a single point in WKT instead of a field in another table: SELECT name FROM poly WHERE ST_CONTAINS( heavyai_geo, ST_GeomFromText('POINT(-98.4886935 29.4260508)', 4326) );

ST_INTERSECTS

Returns true if two geometries intersect spatially, false if they do not share space. For example:

SELECT ST_INTERSECTS( 'POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))', 'POINT(1 1)' ) FROM tbl;

ST_AREA

Returns the area of planar areas covered by POLYGON and MULTIPOLYGON geometries. For example:

SELECT ST_AREA( 'POLYGON((1 0, 0 1, -1 0, 0 -1, 1 0),(0.1 0, 0 0.1, -0.1 0, 0 -0.1, 0.1 0))' ) FROM tbl;

ST_AREA does not support calculation of geographic areas, but rather uses planar coordinates. Geographies must first be projected in order to use ST_AREA. You can do this ahead of time before import or at runtime, ideally using an equal area projection (for example, a national equal-area Lambert projection). The area is calculated in the projection's units. For example, you might use Web Mercator runtime projection to get the area of a polygon in square meters:

ST_AREA( ST_TRANSFORM( ST_GeomFromText( 'POLYGON((-76.6168198439371 39.9703199555959, -80.5189990254673 40.6493554919257, -82.5189990254673 42.6493554919257, -76.6168198439371 39.9703199555959) )', 4326 ), 900913) )

Web Mercator is not an equal area projection, however. Unless compensated by a scaling factor, Web Mercator areas can vary considerably by latitude.

ST_PERIMETER

Returns the cartesian perimeter of POLYGON and MULTIPOLYGON geometries. For example: SELECT ST_PERIMETER('POLYGON( (1 0, 0 1, -1 0, 0 -1, 1 0), (0.1 0, 0 0.1, -0.1 0, 0 -0.1, 0.1 0) )' ) from tbl; It will also return the geodesic perimeter of POLYGON and MULTIPOLYGON geometries. For example:

SELECT ST_PERIMETER( ST_GeogFromText( 'POLYGON( (-76.6168198439371 39.9703199555959, -80.5189990254673 40.6493554919257, -82.5189990254673 42.6493554919257, -76.6168198439371 39.9703199555959) )', 4326) ) from tbl;

ST_LENGTH

Returns the cartesian length of LINESTRING geometries. For example: SELECT ST_LENGTH('LINESTRING(1 0, 0 1, -1 0, 0 -1, 1 0)') FROM tbl; It also returns the geodesic length of LINESTRING geographies. For example:

SELECT ST_LENGTH( ST_GeogFromText('LINESTRING( -76.6168198439371 39.9703199555959, -80.5189990254673 40.6493554919257, -82.5189990254673 42.6493554919257)', 4326) ) FROM tbl;

ST_WITHIN

Returns true if geometry A is completely within geometry B. For example the following SELECT statement returns true:

SELECT ST_WITHIN( 'POLYGON ((1 1, 1 2, 2 2, 2 1))', 'POLYGON ((0 0, 0 3, 3 3, 3 0))' ) FROM tbl;

ST_DWITHIN

Returns true if the geometries are within the specified distance of each one another. Distance is specified in units defined by the spatial reference system of the geometries. For example: SELECT ST_DWITHIN( 'POINT(1 1)', 'LINESTRING (1 2,10 10,3 3)', 2.0 ) FROM tbl; ST_DWITHIN supports geodesic distances between geographies, currently limited to geographic points. For example, you can check whether Los Angeles and Paris, specified as WGS84 geographic point literals, are within 10,000km of one another.

SELECT ST_DWITHIN(

ST_GeogFromText( 'POINT(-118.4079 33.9434)', 4326), ST_GeogFromText('POINT(2.5559 49.0083)', 4326 ), 10000000.0) FROM tbl;

ST_DFULLYWITHIN

Returns true if the geometries are fully within the specified distance of one another. Distance is specified in units defined by the spatial reference system of the geometries. For example: SELECT ST_DFULLYWITHIN( 'POINT(1 1)', 'LINESTRING (1 2,10 10,3 3)', 10.0) FROM tbl; This function supports:

ST_DFULLYWITHIN(POINT, LINESTRING, distance) ST_DFULLYWITHIN(LINESTRING, POINT, distance)

ST_DISJOINT

Returns true if the geometries are spatially disjoint (that is, the geometries do not overlap or touch. For example:

SELECT ST_DISJOINT( 'POINT(1 1)', 'LINESTRING (0 0,3 3)' ) FROM tbl;

Additional Notes

You can use SQL code similar to the examples in this topic as global filters in Immerse.
CREATE TABLE AS SELECT is not currently supported for geo data types in distributed mode.
GROUP BY is not supported for geo types (POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, or MULTIPOLYGON.

You can use \d table_name to determine if the SRID is set for the geo field:

heavysql> \d starting_point
CREATE TABLE starting_point (
                               name TEXT ENCODING DICT(32),
                               myPoint GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32)
                             )

If no SRID is returned, you can set the SRID using ST_SETSRID(column_name, SRID). For example, ST_SETSRID(myPoint, 4326).

Tables

DDL - Tables

These functions are used to create and modify data tables in HEAVY.AI.

Nomenclature Constraints

Table names must use the NAME format, described in regex notation as:

[A-Za-z_][A-Za-z0-9\$_]*

Table and column names can include quotes, spaces, and the underscore character. Other special characters are permitted if the name of the table or column is enclosed in double quotes (" ").

Spaces and special characters other than underscore (_) cannot be used in Heavy Immerse.
Column and table names enclosed in double quotes cannot be used in Heavy Immerse

CREATE TABLE

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] <table>
  (<column> <type> [NOT NULL] [DEFAULT <value>] [ENCODING <encodingSpec>],
  [SHARD KEY (<column>)],
  [SHARED DICTIONARY (<column>) REFERENCES <table>(<column>)], ...)
  [WITH (<property> = value, ...)];

Create a table named <table> specifying <columns> and table properties.

Supported Datatypes

Datatype

Size (bytes)

Notes

BIGINT

Minimum value: -9,223,372,036,854,775,807; maximum value: 9,223,372,036,854,775,807.

BOOLEAN

TRUE: 'true', '1', 't'. FALSE: 'false', '0', 'f'. Text values are not case-sensitive.

DATE*

Same as DATE ENCODING DAYS(32).

DATE ENCODING DAYS(32)

DATE ENCODING DAYS(16)

DATE ENCODING FIXED(32)

In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

DATE ENCODING FIXED(16)

In DDL statements defaults to DATE ENCODING DAYS(16). Deprecated.

DECIMAL

2, 4, or 8

Takes precision and scale parameters: DECIMAL(precision,scale).

Size depends on precision:

Up to 4: 2 bytes
5 to 9: 4 bytes
10 to 18 (maximum): 8 bytes

Scale must be less than precision.

DOUBLE

Variable precision. Minimum value: -1.79 x e^308; maximum value: 1.79 x e^308.

FLOAT

Variable precision. Minimum value: -3.4 x e^38; maximum value: 3.4 x e^38.

INTEGER

Minimum value: -2,147,483,647; maximum value: 2,147,483,647.

SMALLINT

Minimum value: -32,767; maximum value: 32,767.

TEXT ENCODING DICT

Max cardinality 2 billion distinct string values

TEXT ENCODING NONE

Variable

Size of the string + 6 bytes

TIME

Minimum value: 00:00:00; maximum value: 23:59:59.

TIMESTAMP

Linux timestamp from -30610224000 (1/1/1000 00:00:00.000) through 29379542399 (12/31/2900 23:59:59.999).

Can also be inserted and stored in human-readable format:

YYYY-MM-DD HH:MM:SS
YYYY-MM-DDTHH:MM:SS (The T is dropped when the field is populated.)

TINYINT

Minimum value: -127; maximum value: 127.

* In OmniSci release 4.4.0 and higher, you can use existing 8-byte DATE columns, but you can create only 4-byte DATE columns (default) and 2-byte DATE columns (see DATE ENCODING FIXED(16)).

For more information, see Datatypes and Fixed Encoding.

For geospatial datatypes, see Geospatial Primitives.

Examples

Create a table named tweets and specify the columns, including type, in the table.

CREATE TABLE IF NOT EXISTS tweets (
   tweet_id BIGINT NOT NULL,
   tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
   lat FLOAT,
   lon FLOAT,
   sender_id BIGINT NOT NULL,
   sender_name TEXT NOT NULL ENCODING DICT,
   location TEXT ENCODING  DICT,
   source TEXT ENCODING DICT,
   reply_to_user_id BIGINT,
   reply_to_tweet_id BIGINT,
   lang TEXT ENCODING  DICT,
   followers INT,
   followees INT,
   tweet_count INT,
   join_time TIMESTAMP ENCODING  FIXED(32),
   tweet_text TEXT,
   state TEXT ENCODING  DICT,
   county TEXT ENCODING DICT,
   place_name TEXT,
   state_abbr TEXT ENCODING DICT,
   county_state TEXT ENCODING DICT,
   origin TEXT ENCODING DICT,
   phone_numbers bigint);

Create a table named delta and assign a default value San Francisco to column city.

CREATE TABLE delta (
   id INTEGER NOT NULL, 
   name TEXT NOT NULL, 
   city TEXT NOT NULL DEFAULT 'San Francisco' ENCODING DICT(16));

Default values currently have the following limitations:

Only literals can be used for column DEFAULT values; expressions are not supported.
You cannot define a DEFAULT value for a shard key. For example, the following does not parse: CREATE TABLE tbl (id INTEGER NOT NULL DEFAULT 0, name TEXT, shard key (id)) with (shard_count = 2);
For arrays, use the following syntax: ARRAY[A, B, C, …. N]
The syntax {A, B, C, ... N} is not supported.
Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with malformed literal as a default value, but when you try to insert a row with a default value, it will throw an error.

Supported Encoding

Encoding

Descriptions

DICT

Dictionary encoding on string columns (default for TEXT columns). Limit of 2 billion unique string values.

FIXED (bits)

Fixed length encoding of integer or timestamp columns. See .

NONE

No encoding. Valid only on TEXT columns. No Dictionary is created. Aggregate operations are not possible on this column type.

WITH Clause Properties

Property

Description

fragment_size

Number of rows per fragment that is a unit of the table for query processing. Default: 32 million rows, which is not expected to be changed.

max_rollback_epochs

Limit the number of epochs a table can be rolled back to. Limiting the number of epochs helps to limit the amount of on-disk data and prevent unmanaged data growth.

Limiting the number of rollback epochs also can increase system startup speed, especially for systems on which data is added in small batches or singleton inserts. Default: 3.

The following example creates the table test_table and sets the maximum epoch rollback number to 50:

CREATE TABLE test_table(a int) WITH (MAX_ROLLBACK_EPOCHS = 50);

max_rows

Used primarily for streaming datasets to limit the number of rows in a table, to avoid running out of memory or impeding performance. When the max_rows limit is reached, the oldest fragment is removed. When populating a table from a file, make sure that your row count is below the max_rows setting. If you attempt load more rows at one time than the max_rows setting defines, the records up to the max_rows limit are removed, leaving only the additional rows. Default: 2^62. In a distributed system, the maximum number of rows is calculated as max_rows * leaf_count. In a sharded distributed system, the maximum number of rows is calculated as max_rows * shard_count.

page_size

Number of I/O page bytes. Default: 1MB, which does not need to be changed.

partitions

Partition strategy option:

SHARDED: Partition table using sharding.
REPLICATED: Partition table using replication.

shard_count

Number of shards to create, typically equal to the number of GPUs across which the data table is distributed.

sort_column

Name of the column on which to sort during bulk import.

Sharding

Sharding partitions a database table across multiple servers so each server has a part of the table with the same columns but with different rows. Partitioning is based on a sharding key defined when you create the table.

Without sharding, the dimension tables involved in a join are replicated and sent to each GPU, which is not feasible for dimension tables with many rows. Specifying a shard key makes it possible for the query to execute efficiently on large dimension tables.

Currently, specifying a shard key is useful for joins, only:

If two tables specify a shard key with the same type and the same number of shards, a join on that key only sends a part of the dimension table column data to each GPU.
For multi-node installs, the dimension table does not need to be replicated and the join executes locally on each leaf.

Constraints

A shard key must specify a single column to shard on. There is no support for sharding by a combination of keys.
One shard key can be specified for a table.
Data are partitioned according to the shard key and the number of shards (shard_count).
A value in the column specified as a shard key is always sent to the same partition.
The number of shards should be equal to the number of GPUs in the cluster.
Sharding is allowed on the following column types:
- DATE
- INT
- TEXT ENCODING DICT
- TIME
- TIMESTAMP
Tables must share the dictionary for the column to be involved in sharded joins. If the dictionary is not specified as shared, the join does not take advantage of sharding. Dictionaries are reference-counted and only dropped when the last reference drops.

Recommendations

Set shard_count to the number of GPUs you eventually want to distribute the data table across.
Referenced tables must also be shard_count -aligned.
Sharding should be minimized because it can introduce load skew accross resources, compared to when sharding is not used.

Examples

Basic sharding:

CREATE TABLE  customers(
   accountId text,
   name text,
   SHARD KEY (accountId))
  WITH (shard_count = 4);

Sharding with shared dictionary:

CREATE TABLE transactions(
   accountId text,
   action text,
   SHARD KEY (accountId),
   SHARED DICTIONARY (accountId) REFERENCES customers(accountId))
  WITH (shard_count = 4);

Temporary Tables

Using the TEMPORARY argument creates a table that persists only while the server is live. They are useful for storing intermediate result sets that you access more than once.

Adding or dropping a column from a temporary table is not supported.

Example

CREATE TEMPORARY TABLE customers(
   accountId TEXT,
   name TEXT,
   timeCreated TIMESTAMP)

CREATE TABLE AS SELECT

CREATE TABLE [IF NOT EXISTS] <newTableName> AS (<SELECT statement>) [WITH (<property> = value, ...)];

Create a table with the specified columns, copying any data that meet SELECT statement criteria.

WITH Clause Properties

Property

Description

fragment_size

Number of rows per fragment that is a unit of the table for query processing. Default = 32 million rows, which is not expected to be changed.

max_chunk_size

Size of chunk that is a unit of the table for query processing. Default: 1073741824 bytes (1 GB), which is not expected to be changed.

max_rows

Used primarily for streaming datasets to limit the number of rows in a table. When the max_rows limit is reached, the oldest fragment is removed. When populating a table from a file, make sure that your row count is below the max_rows setting. If you attempt load more rows at one time than the max_rows setting defines, the records up to the max_rows limit are removed, leaving only the additional rows. Default = 2^62.

page_size

Number of I/O page bytes. Default = 1MB, which does not need to be changed.

partitions

Partition strategy option:

SHARDED: Partition table using sharding.
REPLICATED: Partition table using replication.

use_shared_dictionaries

Controls whether the created table creates its own dictionaries for text columns, or instead shares the dictionaries of its source table. Uses shared dictionaries by default (true), which increases the speed of table creation.

Setting to false shrinks the dictionaries if SELECT for the created table has a narrow filter; for example: CREATE TABLE new_table AS SELECT * FROM old_table WITH (USE_SHARED_DICTIONARIES='false');

vacuum

Formats the table to more efficiently handle DELETE requests. The only parameter available is delayed. Rather than immediately remove deleted rows, vacuum marks items to be deleted, and they are removed at an optimal time.

Examples

Create the table newTable. Populate the table with all information from the table oldTable, effectively creating a duplicate of the original table.

CREATE TABLE newTable AS (SELECT * FROM oldTable);

Create a table named trousers. Populate it with data from the columns name, waist, and inseam from the table wardrobe.

CREATE TABLE trousers AS (SELECT name, waist, inseam FROM wardrobe);

Create a table named cosmos. Populate it with data from the columns star and planet from the table universe where planet has the class M.

CREATE TABLE IF NOT EXISTS cosmos AS (SELECT star, planet FROM universe WHERE class='M');

ALTER TABLE

ALTER TABLE <table> RENAME TO <table>;
ALTER TABLE <table> RENAME COLUMN <column> TO <column>;
ALTER TABLE <table> ADD [COLUMN] <column> <type> [NOT NULL] [ENCODING <encodingSpec>];
ALTER TABLE <table> ADD (<column> <type> [NOT NULL] [ENCODING <encodingSpec>], ...);
ALTER TABLE <table> ADD (<column> <type> DEFAULT <value>);
ALTER TABLE <table> DROP COLUMN <column_1>[, <column_2>, ...];
ALTER TABLE <table> SET MAX_ROLLBACK_EPOCHS=<value>

Examples

Rename the table tweets to retweets.

ALTER TABLE tweets RENAME TO retweets;

Rename the column source to device in the table retweets.

ALTER TABLE retweets RENAME COLUMN source TO device;

Add the column pt_dropoff to table tweets with a default value point(0,0).

ALTER TABLE tweets ADD COLUMN pt_dropoff POINT DEFAULT 'point(0 0)';

Add multiple columns a, b, and c to table table_one with a default value of 15 for column b.

ALTER TABLE table_one ADD a INTEGER, b INTEGER NOT NULL DEFAULT 15, c TEXT;

Default values currently have the following limitations:

Only literals can be used for column DEFAULT values; expressions are not supported.
For arrays, use the following syntax: ARRAY[A, B, C, …. N]. The syntax {A, B, C, ... N} is not supported.
Some literals, like NUMERIC and GEO types, are not checked at parse time. As a result, you can define and create a table with a malformed literal as a default value, but when you try to insert a row with a default value, it throws an error.

Add the column lang to the table tweets using a TEXT ENCODING DICTIONARY.

ALTER TABLE tweets ADD COLUMN lang TEXT ENCODING DICT;

Add the columns lang and encode to the table tweets using a TEXT ENCODING DICTIONARY for each.

ALTER TABLE tweets ADD (lang TEXT ENCODING DICT, encode TEXT ENCODING DICT);

Drop the column pt_dropoff from table tweets.

ALTER TABLE tweets DROP COLUMN pt_dropoff;

Limit on-disk data growth by setting the number of allowed epoch rollbacks to 50:

ALTER TABLE test_table SET MAX_ROLLBACK_EPOCHS=50;

You cannot add a dictionary-encoded string column with a shared dictionary when using ALTER TABLE ADD COLUMN.
Currently, HEAVY.AI does not support adding a geo column type (POINT, LINESTRING, POLYGON, or MULTIPOLYGON) to a table.
HEAVY.AI supports ALTER TABLE RENAME TABLE and ALTER TABLE RENAME COLUMN for temporary tables. HEAVY.AI does not support ALTER TABLE ADD COLUMN to modify a temporary table.

DROP TABLE

DROP TABLE [IF EXISTS] <table>;

Deletes the table structure, all data from the table, and any dictionary content unless it is a shared dictionary. (See the Note regarding disk space reclamation.)

Example

DROP TABLE IF EXISTS tweets;

DUMP TABLE

DUMP TABLE <table> TO '<filepath>' [WITH (COMPRESSION='<compression_program>')];

Archives data and dictionary files of the table <table> to file <filepath>.

Valid values for <compression_program> include:

gzip (default)
pigz
lz4
none

If you do not choose a compression option, the system uses gzip if it is available. If gzip is not installed, the file is not compressed.

The file path must be enclosed in single quotes.

Dumping a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being dumped.
The DUMP command is not supported on distributed configurations.
You must have a least GRANT CREATE ON DATABASE privilege level to use the DUMP command.

Example

DUMP TABLE tweets TO '/opt/archive/tweetsBackup.gz' WITH (COMPRESSION='gzip');

RENAME TABLE

RENAME TABLE <table> TO <table>[, <table> TO <table>, <table> TO <table>...];

Rename a table or multiple tables at once.

Examples

Rename a single table:

RENAME TABLE table_A TO table_B;

Swap table names:

RENAME TABLE table_A TO table_B, table_B TO table_A;

RENAME TABLE table_A TO table_B, table_B TO table_C, table_C TO table_A;

Swap table names multiple times:

RENAME TABLE table_A TO table_A_stale, table_B TO table_B_stale, table_A_new TO table_A, table_B_new TO table_B;

RESTORE TABLE

RESTORE TABLE <table> FROM '<filepath>' [WITH (COMPRESSION='<compression_program>')];

Restores data and dictionary files of table <table> from the file at <filepath>. If you specified a compression program when you used the DUMP TABLE command, you must specify the same compression method during RESTORE.

Restoring a table decompresses and then reimports the table. You must have enough disk space for both the new table and the archived table, as well as enough scratch space to decompress the archive and reimport it.

The file path must be enclosed in single quotes.

You can also restore a table from archives stored in S3-compatible endpoints:

RESTORE TABLE <table> FROM '<S3_file_URL>' 
  WITH (compression = '<compression_program>', 
        s3_region = '<region>', 
        s3_access_key = '<access_key>', 
        s3_secret_key = '<secret_key>', 
        s3_session_token = '<session_token>');

s3_region is required. All features discussed in the S3 import documentation, such as custom S3 endpoints and server privileges, are supported.

Restoring a table locks writes to that table. Concurrent reads are supported, but you cannot import to a table that is being restored.
The RESTORE command is not supported on distributed configurations.
You must have a least GRANT CREATE ON DATABASE privilege level to use the RESTORE command.

Do not attempt to use RESTORE TABLE with a table dump created using a release of HEAVY.AI that is higher than the release running on the server where you will restore the table.

Examples

Restore table tweets from /opt/archive/tweetsBackup.gz:

RESTORE TABLE tweets FROM '/opt/archive/tweetsBackup.gz' 
   WITH (COMPRESSION='gzip');

Restore table tweets from a public S3 file or using server privileges (with the allow-s3-server-privileges server flag enabled):

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz'
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1');

Restore table tweets from a private S3 file using AWS access keys:

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 2 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy');

Restore table tweets from a private S3 file using temporary AWS access keys/session token:

RESTORE TABLE tweets FROM 's3://my-s3-bucket/archive/tweetsBackup.gz' 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_access_key = 'xxxxxxxxxx', s3_secret_key = 'yyyyyyyyy',
      s3_session_token = 'zzzzzzzz');

Restore table tweets from an S3-compatible endpoint:

RESTORE TABLE tweets FROM 's3://my-gcp-bucket/archive/tweetsBackup.gz' 2 
   WITH (compression = 'gzip', 
      s3_region = 'us-east-1', 
      s3_endpoint = 'storage.googleapis.com');

TRUNCATE TABLE

TRUNCATE TABLE <table>;

Use the TRUNCATE TABLE statement to remove all rows from a table without deleting the table structure.

This releases table on-disk and memory storage and removes dictionary content unless it is a shared dictionary. (See the note regarding disk space reclamation.)

Removing rows is more efficient than using DROP TABLE. Dropping followed by recreating the table invalidates dependent objects of the table requiring you to regrant object privileges. Truncating has none of these effects.

Example

TRUNCATE TABLE tweets;

When you DROP or TRUNCATE, the command returns almost immediately. The directories to be purged are marked with the suffix \_DELETE_ME_. The files are automatically removed asynchronously.

In practical terms, this means that you will not see a reduction in disk usage until the automatic task runs, which might not start for up to five minutes.

You might also see directory names appended with \_DELETE_ME_. You can ignore these, with the expectation that they will be deleted automatically over time.

OPTIMIZE TABLE

OPTIMIZE TABLE [<table>] [WITH (VACUUM='true')]

Use this statement to remove rows from storage that have been marked as deleted via DELETE statements.

When run without the vacuum option, the column-level metadata is recomputed for each column in the specified table. HeavyDB makes heavy use of metadata to optimize query plans, so optimizing table metadata can increase query performance after metadata widening operations such as updates or deletes. If the configuration parameter enable-auto-metadata-update is not set, HeavyDB does not narrow metadata during an update or delete — metadata is only widened to cover a new range.

When run with the vacuum option, it removes any rows marked "deleted" from the data stored on disk. Vacuum is a checkpointing operation, so new copies of any vacuum records are deleted. Using OPTIMIZE with the VACUUM option compacts pages and deletes unused data files that have not been repopulated.

Beginning with Release 5.6.0, OPTIMIZE should be used infrequently, because UPDATE, DELETE, and IMPORT queries manage space more effectively.

VALIDATE

VALIDATE

Performs checks for negative and inconsistent epochs across table shards for single-node configurations.

If VALIDATE detects epoch-related issues, it returns a report similar to the following:

heavysql> validate;
Result

Negative epoch value found for table "my_table". Epoch: -1.
Epoch values for table "my_table_2" are inconsistent:
Table Id  Epoch     
========= ========= 
4         1         
5         2

If no issues are detected, it reports as follows:

Instance OK

VALIDATE CLUSTER

VALIDATE CLUSTER [WITH (REPAIR_TYPE = ['NONE' | 'REMOVE'])];

Perform checks and report discovered issues on a running HEAVY.AI cluster. Compare metadata between the aggregator and leaves to verify that the logical components between the processes are identical.

VALIDATE CLUSTER also detects and reports issues related to table epochs. It reports when epochs are negative or when table epochs across leaf nodes or shards are inconsistent.

Examples

If VALIDATE CLUSTER detects issues, it returns a report similar to the following:

mapd@thing3 ~]$ /mnt/gluster/dist_mapd/mapd-sw2/bin/mapdql -p HyperInteractive
User admin connected to database heavyai
heavysql> validate cluster;
Result
 Node          Table Count 
 ===========   =========== 
 Aggregator     1116
 Leaf 0         1114
 Leaf 1         1114
No matching table on Leaf 0 for Table cities_dtl_POINTS table id 56
No matching table on Leaf 1 for Table cities_dtl_POINTS table id 56
No matching table on Leaf 0 for Table cities_dtl table id 80
No matching table on Leaf 1 for Table cities_dtl table id 80
Table details don't match on Leaf 0 for Table view_geo table id 95
Table details don't match on Leaf 1 for Table view_geo table id 95

If no issues are detected, it will report as follows:

Cluster OK

You can include the WITH(REPAIR_TYPE) argument. (REPAIR_TYPE='NONE') is the same as running the command with no argument. (REPAIR_TYPE='REMOVE') removes any leaf objects that have issues. For example:

VALIDATE CLUSTER WITH (REPAIR_TYPE = 'REMOVE');

Epoch Issue Example

This example output from the VALIDATE CLUSTER command on a distributed setup shows epoch-related issues:

heavysql> validate cluster;
Result

Negative epoch value found for table "my_table". Epoch: -16777216.
Epoch values for table "my_table_2" are inconsistent:
Node      Table Id  Epoch     
========= ========= ========= 
Leaf 0    4         1         
Leaf 1    4         2

Configuration Parameters for HeavyDB

Following are the parameters for runtime settings on HeavyDB. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.

For example, consider allow-loop-joins [=arg(=1)] (=0).

If you do not use this flag, loop joins are not allowed by default.
If you provide no arguments, the implied value is 1 (true) (allow-loop-joins).
If you provide the argument 0, that is the same as the default (allow-loop-joins=0).
If you provide the argument 1, that is the same as the implied value (allow-loop-joins=1).

Flag

Description

Default Value

allow-cpu-retry [=arg]

Allow the queries that failed on GPU to retry on CPU, even when watchdog is enabled. When watchdog is enabled, most queries that run on GPU and throw a watchdog exception fail. Turn this on to allow queries that fail the watchdog on GPU to retry on CPU. The default behavior is for queries that run out of memory on GPU to throw an error if watchdog is enabled. Watchdog is enabled by default.

TRUE[1]

allow-local-auth-fallback [=arg(=1)] (=0)

If SAML or LDAP logins are enabled, and the logins fail, this setting enables authentication based on internally stored login credentials. Command-line tools or other tools that do not support SAML might reject those users from logging in unless this feature is enabled. This allows a user to log in using credentials on the local database.

FALSE[0]

allow-loop-joins [=arg(=1)] (=0)

Enables all join queries to fall back to the loop join implementation. During a loop join, queries loop over all rows from all tables involved in the join, and evaluate the join condition. By default, loop joins are only allowed if the number of rows in the inner table is fewer than the , since loop joins are computationally expensive and run for an extended period. Modifying the trivial-loop-join-threshold is a safer alternative to globally enabling loop joins. You might choose to globally enable loop joins when you have many small tables for which loop join performance has been determined to be acceptable but modifying the trivial join loop threshold would be tedious.

FALSE[0]

allowed-export-paths = ["root_path_1", root_path_2", ...]

Specify a list of allowed root paths that can be used in export operations, such as the COPY TO command. Helps prevent exploitation of security vulnerabilities and prevent server crashes, data breaches, and full remote control of the host machine. For example:

allowed-export-paths = ["/heavyai-storage/data/heavyai_export", "/home/centos"] The list of paths must be on the same line as the configuration parameter.

Allowed file paths are enforced by default. The default export path (<data directory>/heavyai_export) is allowed by default, and all child paths of that path are allowed.

When using commands with other paths, the provided paths must be under an allowed root path. If you try to use a nonallowed path in a COPY TO command, an error response is returned.

N/A

allow-s3-server-privileges

Allow S3 server privileges if IAM user credentials are not provided. Credentials can be specified with environment variables (such as AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and so on), an AWS credentials file, or when running on an EC2 instance, with an IAM role that is attached to the instance.

FALSE[0]

allowed-import-paths = ["root_path_1", "root_path_2", ...]

Specify a list of allowed root paths that can be used in import operations, such as the COPY FROM command. Helps prevent exploitation of security vulnerabilities and prevent server crashes, data breaches, and full remote control of the host machine.

For example:

allowed-import-paths = ["/heavyai-storage/data/heavyai_import", "/home/centos"] The list of paths must be on the same line as the configuration parameter.

Allowed file paths are enforced by default. The default import path (<data directory>/heavyai_import) is allowed by default, and all child paths of that allowed path are allowed.

When using commands with other paths, the provided paths must be under an allowed root path. If you try to use a nonallowed path in a COPY FROM command, an error response is returned.

N/A

approx_quantile_buffer arg

Size of a temporary buffer that is used to copy in the data for APPROX_MEDIAN calculation. When full, is sorted before being merged into the internal distribution buffer configured in approx_quantile_centroids.

1000

approx_quantile_centroids arg

Size of the internal buffer used to approximate the distribution of the data for which the APPOX_MEDIAN calculation is taken. The larger the value, the greater the accuracy of the answer.

300

auth-cookie-name arg

Configure the authentication cookie name. If not explicitly set, the default name is oat.

oat

bigint-count [=arg]

Use 64-bit count. Disabled by default because 64-bit integer atomics are slow on GPUs. Enable this setting if you see negative values for a count, indicating overflow. In addition, if your data set has more than 4 billion records, you likely need to enable this setting.

FALSE[0]

bitmap-memory-limitarg

Set the maximum amount of memory (in GB) allocated for APPROX_COUNT_DISTINCT bitmaps per execution kernel (thread or GPU).

calcite-max-mem arg

Max memory available to calcite JVM. Change if Calcite reports out-of-memory errors.

1024

calcite-port arg

Calcite port number. Change to avoid collisions with ports already in use.

6279

calcite-service-timeout

Service timeout value, in milliseconds, for communications with Calcite. On databases with large numbers of tables, large numbers of concurrent queries, or many parallel updates and deletes, Calcite might return less quickly. Increasing the timeout value can prevent THRIFT_EAGAIN timeout errors.

5000

columnar-large-projections[=arg]

Sets automatic use of columnar output, instead of row-wise output, for large projections.

TRUE

columnar-large-projections-threshold arg

Set the row-number threshold size for columnar output instead of row-wise output.

1000000

config arg

Path to heavy.conf. Change for testing and debugging.

$HEAVYAI_STORAGE/ heavy.conf

cpu-only

Run in CPU-only mode. Set this flag to force HeavyDB to run in CPU mode, even when GPUs are available. Useful for debugging and on shared-tenancy systems where the current HeavyDB instance does not need to run on GPUs.

FALSE

cpu-buffer- mem-bytes arg

Size of memory reserved for CPU buffers [bytes]. Change to restrict the amount of CPU/system memory HeavyDB can consume. A default value of 0 indicates no limit on CPU memory use. (HEAVY.AI Server uses all available CPU memory on the system.)

cuda-block-size arg

Size of block to use on GPU. GPU performance tuning: Number of threads per block. Default of 0 means use all threads per block.

cuda-grid-size arg

Size of grid to use on GPU. GPU performance tuning: Number of blocks per device. Default of 0 means use all available blocks per device.

data arg

Directory path to HEAVY.AI catalogs. Change for testing and debugging.

$HEAVYAI_STORAGE

db-query-list arg

Path to file containing HEAVY.AI queries. Use a query list to autoload data to GPU memory on startup to speed performance. See .

N/A

dynamic-watchdog-time-limit [=arg]

Dynamic watchdog time limit, in milliseconds. Change if dynamic watchdog is stopping queries expected to take longer than this limit.

100000

enable-auto-clear-render-mem [=arg]

Enable/disable clear render gpu memory on out-of-memory errors during rendering. If an out-of-gpu-memory exception is thrown while rendering, many users respond by running \clear_gpu via the heavysql command-line interface to refresh/defrag the memory heap. This process can be automated with this flag enabled. At present, only GPU memory in the renderer is cleared automatically.

TRUE[1]

enable-auto-metadata-update [=arg]

Enable automatic metadata updates on UPDATE queries. Automatic metadata updates are turned on by default. Disabling may result in stale metadata and reductions in query performance.

TRUE[1]

enable-columnar-output [=arg]

Allows HEAVY.AI Core to directly materialize intermediate projections and the final ResultSet in Columnar format where appropriate. Columnar output is an internal performance enhancement that projects the results of an intermediate processing step in columnar format. Consider disabling this feature if you see unexpected performance regressions in your queries.

TRUE[1]

enable-data-recycler [=arg]

Set to TRUE to enable the data recycler. Enabling the recycler enables the following:

Hashtable recycler, which is the cache storage.
Hashing scheme recycler, which preserves a hashtable layout (such as perfect hashing and keyed hashing).
Overlaps hashtable tuning parameter recycler. Each overlap hashtable has its own parameters used during hashtable building.

TRUE[0]

enable-debug-timer [=arg]

Enable fine-grained query execution timers for debug. For debugging, logs verbose timing information for query execution (time to load data, time to compile code, and so on).

FALSE[0]

enable-direct-columnarization [=arg(=1)](=0)

Columnarization organizes intermediate results in a multi-step query in the most efficient way for the next step in the process. If you see an unexpected performance regression, you can try setting this value to false, enabling the earlier HEAVY.AI columnarization behavior.

TRUE[1]

enable-dynamic-watchdog [=arg]

Enable dynamic watchdog.

FALSE[0]

enable-filter-push-down [=arg(=1)] (=0)

Enable filter push-down through joins. Evaluates filters in the query expression for selectivity and pushes down highly selective filters into the join according to selectivity parameters. See also

FALSE[0]

enable-foreign-table-scheduled-refresh [=arg]

Enable scheduled refreshes of foreign tables. Enables automated refresh of foreign tables with "REFRESH_TIMING_TYPE" option of "SCHEDULED" based on the specified refresh schedule.

TRUE[1]

enable-geo-ops-on-uncompressed-coords [=arg(=1)] (=0)

Allow geospatial operations ST_Contains and ST_Intersects to process uncompressed coordinates where possible to increase execution speed. Provides control over the selection of ST_Contains and ST_Intersects implementations. By default, for certain combinations of compressed geospatial arguments, such as ST_Contains(POLYGON, POINT), the implementation can process uncompressed coordinate values. This can result in much faster execution but could decrease precision. Disabling this option enables full decompression, which is slower but more precise.

TRUE[1]

enable-logs-system-tables [=arg(=1)] (=0)

Enable use of logs system tables. Also enables the Request Logs and Monitoring system dashboard (Enterprise Edition only).

FALSE[0]

enable-overlaps-hashjoin [=arg(=1)] (=0)

Enable the overlaps hash join framework allowing for range join (for example, spatial overlaps) computation using a hash table.

TRUE[1]

enable-runtime-query-interrupt [=arg(=1)] (=0)

Enable the runtime query interrupt. Enables runtime query interrupt. Setting to TRUE can reduce performance slightly. Use with to set the interrupt frequency.

FALSE[0]

enable-runtime-udf

Enable runtime user defined function registration. Enables runtime registration of user defined functions. This functionality is turned off unless you specifically request it, to prevent unintentional inclusion of nonstandard code. This setting is a precursor to more advanced object permissions planned in future releases.

FALSE[0]

enable-string-dict-hash-cache[=arg(=1)] (=0)

When importing a large table with low cardinality, set the flag to TRUE and leave it on to assist with bulk queries. If using String Dictionary Server, set the flag to FALSE if the String Dictionary server uses more memory than the physical system can support.

TRUE[1]

enable-thrift-logs [=arg(=1)] (=0)

Enable writing messages directly from Thrift to stdout/stderr. Change to enable verbose Thrift messages on the console.

FALSE[0]

enable-watchdog [arg]

Enable watchdog.

TRUE[1]

filter-push-down-low-frac

Higher threshold for selectivity of filters which are pushed down. Filters with selectivity lower than this threshold are considered for a push down.

filter-push-down-passing-row-ubound

Upper bound on the number of rows that should pass the filter if the selectivity is less than the high fraction threshold.

flush-log [arg]

Immediately flush logs to disk. Set to FALSE if this is a performance bottleneck.

TRUE[1]

from-table-reordering [=arg(=1)] (=1)

Enable automatic table reordering in FROM clause. Reorders the sequence of a join to place large tables on the inside of the join clause and smaller tables on the outside. HEAVY.AI also reorders tables between join clauses to prefer hash joins over loop joins. Change this value only in consultation with an HEAVY.AI engineer.

TRUE[1]

gpu-buffer-mem-bytes [=arg]

Size of memory reserved for GPU buffers in bytes per GPU. Change to restrict the amount of GPU memory HeavyDB can consume per GPU. A default value of 0 indicates no limit on GPU memory use (HeavyDB uses all available GPU memory across all active GPUs on the system).

gpu-input-mem-limit arg

Force query to CPU when input data memory usage exceeds this percentage of available GPU memory. OmniSciDB loads data to GPU incrementally until data exceeds GPU memory, at which point the system retries on CPU. Loading data to GPU evicts any resident data already loaded or any query results that are cached. Use this limit to avoid attempting to load datasets to GPU when they obviously will not fit, preserving cached data on GPU and increasing query performance. If watchdog is enabled and allow-cpu-retry is not enabled, the query fails instead of re-running on CPU.

0.9

hashtable-cache-total-bytes [=arg]

The total size of the cache storage for hashtable recycler, in bytes. Increase the cache size to store more hashtables. Must be larger than or equal to the value defined in max-cacheable-hashtable-size-bytes.

4294967296 (4GB)

hll-precision-bits [=arg]

Number of bits used from the hash value used to specify the bucket number. Change to increase or decrease approx_count_distinct() precision. Increased precision decreases performance.

http-port arg

HTTP port number. Change to avoid collisions with ports already in use.

6278

idle-session-duration arg

Maximum duration of an idle session, in minutes. Change to increase or decrease duration of an idle session before timeout.

inner-join-fragment-skipping [=arg(=1)] (=0)

Enable or disable inner join fragment skipping. Enables skipping fragments for improved performance during inner join operations.

FALSE[0]

license arg

Path to the file containing the license key. Change if your license file is in a different location or has a different name.

log-auto-flush

Flush logging buffer to file after each message. Changing to false can improve performance, but log lines might not appear in the log for a very long time. HEAVY.AI does not recommend changing this setting.

TRUE[1]

log-directory arg

Path to the log directory. Can be either a relative path to the $HEAVYAI_STORAGE/data directory or an absolute path. Use this flag to control the location of your HEAVY.AI log files. If the directory does not exist, HEAVY.AI creates the top level directory. For example, a/b/c/logdir is created only if the directory path a/b/c already exists.

/var/lib/heavyai/ data/heavyai_log

log-file-name

Boilerplate for the name of the HEAVY.AI log files. You can customize the name of your HEAVY.AI log files. {SEVERITY} is the only braced token recognized. It allows you to create separate files for each type of error message greater than or equal to the log-severity configuration option.

heavydb.{SEVERITY}. %Y%m%d-%H%M%S.log

log-max-files

Maximum number of log files to keep. When the number of log files exceeds this number, HEAVY.AI automatically deletes the oldest files.

100

log-min-free-space

Minimum number of bytes left on device before oldest log files are deleted. This is a safety feature to be sure the disk drive of the log directory does not fill up, and guarantees that at least this many bytes are free.

20971520

log-rotation-size

Maximum file size in bytes before new log files are started. Change to increase/decrease size of files. If log files fill quickly, you might want to increase this number so that there are fewer log files.

10485760

log-rotate-daily

Start new log files at midnight. Set to false to write to log files until they are full, rather than restarting each day.

TRUE[1]

log-severity

Log to file severity levels:

DEBUG4

DEBUG3

DEBUG2

DEBUG1

INFO

WARNING

ERROR

FATAL

All levels after your chosen base severity level are listed. For example, if you set the severity level to WARNING, HEAVY.AI only logs WARNING, ERROR, and FATAL messages.

INFO

log-severity-clog

Log to console severity level: INFO WARNING ERROR FATAL. Output chosen severity messages to STDERR from running process.

WARNING

log-symlink

Symbolic link to the active log. Creates a symbolic link for every severity greater than or equal to the configuration option.

heavydb. {SEVERITY}.log

log-user-id

Log internal numeric user IDs instead of textual user names.

log-user-origin

Look up the origin of inbound connections by IP address and DNS name and print this information as part of stdlog. Some systems throttle DNS requests or have other network constraints that preclude timely return of user origin information. Set to FALSE to improve performance on those networks or when large numbers of users from different locations make rapid connect/disconnect requests to the server.

TRUE[1]

logs-system-tables-max-files-count [=arg]

Maximum number of log files that can be processed by each logs system table.

100

max-cacheable-hashtable-size-bytes [=arg]

Maximum size of the hashtable that the hashtable recycler can store. Limiting the size can enable more hashtables to be stored. Must be lesser than or equal to the value defined in hashtable-cache-total-bytes.

2147483648 (2GB)

max-session-duration arg

Maximum duration of the active session, in minutes. Change to increase or decrease session duration before timeout.

43200 (30 days)

null-div-by-zero [=arg]

Allows processing to complete when when the dataset would cause a divide by zero error. Set to TRUE if you prefer to return null when dividing by zero, and set to FALSE to throw an exception.

FALSE[0]

num-executors arg

Beta functionality in Release 5.7. Set the number of executors.

num-gpus arg

Number of GPUs to use. In a shared environment, you can assign the number of GPUs to a particular application. The default, -1, uses all available GPUs. Use in conjunction with .

-1

num-reader-threads arg

Number of reader threads to use. Drop the number of reader threads to prevent imports from using all available CPU power. Default is to use all threads.

overlaps-bucket- threshold arg

The minimum size of a bucket corresponding to a given inner table range for the overlaps hash join.

-p | port int

HeavyDB server port. Change to avoid collisions with other services if 6274 is already in use.

6274

pending-query-interrupt-freq=arg

Frequency with which to check the interrupt status of pending queries, in milliseconds. Values larger than 0 are valid. If you set pending-query-interrupt-freq=100, each session's interrupt status is checked every 100 ms.

For example, assume you have three sessions (S1, S2, and S3) in your queue, and assume S1 contains a running query, and S2 and S3 hold pending queries. If you setpending-query-interrupt-freq=1000 both S2 and S3 are interrupted every 1000 ms (1 sec). See running-query-interrupt-freq for information about interrupting running queries. Decreasing the value increases the speed with which pending queries are removed, but also increases resource usage.

1000 (1 sec)

pki-db-client-auth [=arg]

Attempt authentication of users through a PKI certificate. Set to TRUE for the server to attempt PKI authentication.

FALSE[0]

read-only [=arg(=1)]

Enable read-only mode. Prevents changes to the dataset.

FALSE[0]

render-mem-bytes arg

Specifies the size of a per-GPU buffer that render query results are written to; allocated at the first rendering call. Persists while the server is running unless you run \clear_gpu_memory. Increase if rendering a large number of points or symbols and you get the following out-of-memory exception: Not enough OpenGL memory to render the query results.

Default is 500 MB.

500000000

render-oom-retry-threshold = arg

A render execution time limit in milliseconds to retry a render request if an out-of-gpu-memory error is thrown. Requires enable-auto-clear-render-mem = true. If enable-auto-clear-render-mem = true, a retry of the render request can be performed after an out-of-gpu-memory exception. A retry only occurs if the first run took less than the threshold set here (in milliseconds). The retry is attempted after the render gpu memory is automatically cleared. If an OOM exception occurs, clearing the memory might get the request to succeed. Providing a reasonable threshold might give more stability to memory-constrained servers w/ rendering enabled. Only a single retry is attempted. A value of 0 disables retries.

rendering [=arg]

Enable or disable backend rendering. Disable rendering when not in use, freeing up memory reserved by render-mem-bytes. To reenable rendering, you must restart HEAVY.AI Server.

TRUE[1]

res-gpu-mem =arg

Reserved memory for GPU. Reserves extra memory for your system (for example, if the GPU is also driving your display, such as on a laptop or single-card desktop). HEAVY.AI uses all the memory on the GPU except for render-mem-bytes + res-gpu-mem. Also useful if other processes, such as a machine-learning pipeline, share the GPU with HEAVY.AI. In advanced rendering scenarios or distributed setups, increase to free up additional memory for the renderer, or for aggregating results for the renderer from multiple leaf nodes. HEAVY.AI recommends always setting res-gpu-mem when using backend rendering.

134217728

running-query-interrupt-freq arg

Controls the frequency of interruption status checking for running queries. Range: 0.0 (less frequently) to 1.0 (more frequently).

For example, if you have 10 threads that evaluate a query of a table that has 1000 rows, then each thread advances its thread index up to 10 times. In this case, if you set the flag close to 1.0, you check a session's interrupt status for every increment of the thread index.

If we set the flag value as close to 0.0, you only check the session's interrupt status when the index increment is close to 10. The default value of running interrupt checking is close to half of the maximum increment of the thread index.

Frequent interrupt status checking reduces latency for the interrupt but also can decrease query performance.

seek-kafka-commit = <N>

Set the offset of the last Kafka message to be committed from a Kafka data stream. Set the offset of the last Kafka message to be committed from a Kafka data stream. This way, Kafka does not resend those messages. After the Kafka server commits messages through the number N, it resends messages starting at message N+1. This is particularly useful when you want to create a replica of the HEAVY.AI server from an existing data directory.

N/A

ssl-cert path

Path to the server's public PKI certificate (.crt file). Define the path the the .crt file. Used to establish an encrypted binary connection.

ssl-keystore path

Path to the server keystore. Used for an encrypted binary connection. The path to Java trust store containing the server's public PKI key. Used by HeavyDB to connect to the encrypted Calcite server port.

ssl-keystore-password password

The password for the SSL keystore. Used to create a binary encrypted connection to the Calcite server.

ssl-private-key path

Path to the server's private PKI key. Define the path to the HEAVY.AI server PKI key. Used to establish an encrypted binary connection.

ssl-trust-ca path

Enable use of CA-signed certificates presented by Calcite. Defines the file that contains trusted CA certificates. This information enables the server to validate the TCP/IP Thrift connections it makes as a client to the Calcite server. The certificate presented by the Calcite server is the same as the certificate used to identify the database server to its clients.

ssl-trust-ca-server path

Path to the file containing trusted CA certificates; for PKI authentication. Used to validate certificates submitted by clients. If the certificate provided by the client (in the password field of the connect command) was not signed by one of the certificates in the trusted file, then the connection fails. PKI authentication works only if the server is configured to encrypt connections via TLS. The common name extracted from the client certificate is used as the name of the user to connect. If this name does not already exist, the connection fails. If LDAP or SAML are also enabled, the servers fall back to these authentication methods if PKI authentication fails. Currently works only with clients. To allow connection from other clients, set allow-local-auth-fallback or add LDAP/SAML authentication.

ssl-trust-password password

The password for the SSL trust store. Password to the SSL trust store containing the server's public PKI key. Used to establish an encrypted binary connection.

ssl-trust-store path

The path to Java trustStore containing the server's public PKI key. Used by the Calcite server to connect to the encrypted OmniSci server port, to establish an encrypted binary connection.

start-gpu arg

First GPU to use. Used in shared environments in which the first assigned GPU is not GPU 0. Use in conjunction with .

FALSE[0]

trivial-loop-join-threshold [=arg]

The maximum number of rows in the inner table of a loop join considered to be trivially small.

1000

use-hashtable-cache

Set to TRUE to enable the hashtable recycler. Supports complex scenarios, such as hashtable recycling for queries that have subqueries.

TRUE[1]

vacuum-min-selectivity [=arg]

Specify the percentage (with a value of 0 implying 0% and a value of 1 implying 100%) of deleted rows in a fragment at which to perform automatic vacuuming.

Automatic vacuuming occurs when deletes or updates on variable-length columns result in a percentage of deleted rows in a fragment exceeding the specified threshold. The default threshold is 10% of deleted rows in a fragment.

When changing this value, consider the most common types of queries run on the system. In general, if you have infrequent updates and deletes, set vacuum-min-selectivity to a low value. Set it higher if you have frequent updates and deletes, because vacuuming adds overhead to affected UPDATE and DELETE queries.

watchdog-none-encoded-string-translation-limit [=arg]

The number of strings that can be casted using the ENCODED_TEXT string operator.

1,000,000

window-function-frame-aggregation-tree-fanout [=arg]

Tree fan out of the aggregation tree is used to compute aggregation over the window frame.

Additional Enterprise Edition Parameters

Following are additional parameters for runtime settings for the Enterprise Edition of HeavyDB. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.

Flag

Description

Default Value

cluster arg

Path to data leaves list JSON file. Indicates that the HEAVY.AI server instance is an aggregator node, and where to find the rest of its cluster. Change for testing and debugging.

$HEAVYAI_BASE

compression-limit-bytes [=arg(=536870912)] (=536870912)

Compress result sets that are transferred between leaves. Minimum length of payload above which data is compressed.

536870912

compressor arg (=lz4hc)

Compressor algorithm to be used by the server to compress data being transferred between server. See for compression algorithm options.

lz4hc

ldap-dn arg

LDAP Distinguished Name.

ldap-role-query-regex arg

RegEx to use to extract role from role query result.

ldap-role-query-url arg

LDAP query role URL.

ldap-superuser-role arg

The role name to identify a superuser.

ldap-uri arg

LDAP server URI.

leaf-conn-timeout [=arg]

Leaf connect timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if a connection cannot be established.

20000

leaf-recv-timeout [=arg]

Leaf receive timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if data is not received in the time allotted.

300000

leaf-send-timeout [=arg]

Leaf send timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if data is not sent in the time allotted.

300000

saml-metadata-file arg

Path to identity provider metadata file.

Required for running SAML. An identity provider (like Okta) supplies a metadata file. From this file, HEAVY.AI uses:

Public key of the identity provider to verify that the SAML response comes from it and not from somewhere else.
URL of the SSO login page used to obtain a SAML token.

saml-sp-target-url arg

URL of the service provider for which SAML assertions should be generated. Required for running SAML. Used to verify that a SAML token was issued for HEAVY.AI and not for some other service.

saml-sync-roles arg (=0)

Enable mapping of SAML groups to HEAVY.AI roles. The SAML Identity provider (for example, Okta) automatically creates users at login and assigns them roles they already have as groups in SAML.

saml-sync-roles [=0]

string-servers arg

Path to string servers list JSON file. Indicates that HeavyDB is running in distributed mode and is required to designate a leaf server when running in distributed mode.

Functions and Operators

Functions and Operators (DML)

Basic Mathematical Operators

Operator

Description

+numeric

Returns numeric

–numeric

Returns negative value of numeric

numeric1 + numeric2

Sum of numeric1 and numeric2

numeric1 – numeric2

Difference of numeric1 and numeric2

numeric1 * numeric2

Product of numeric1 and numeric2

numeric1 / numeric2

Quotient (numeric1 divided by numeric2)

Mathematical Operator Precedence

Parenthesization
Multiplication and division
Addition and subtraction

Comparison Operators

Operator

Description

=

Equals

<>

Not equals

>

Greater than

>=

Greater than or equal to

<

Less than

<=

Less than or equal to

BETWEEN x AND y

Is a value within a range

NOT BETWEEN x AND y

Is a value not within a range

IS NULL

Is a value that is null

IS NOT NULL

Is a value that is not null

NULLIF(x, y)

Compare expressions x and y. If different, return x. If they are the same, return null. For example, if a dataset uses ‘NA’ for null values, you can use this statement to return null using SELECT NULLIF(field_name,'NA').

IS TRUE

True if a value resolves to TRUE.

IS NOT TRUE

True if a value resolves to FALSE.

Mathematical Functions

Function

Description

ABS(x)

Returns the absolute value of x

CEIL(x)

Returns the smallest integer not less than the argument

DEGREES(x)

Converts radians to degrees

EXP(x)

Returns the value of e to the power of x

FLOOR(x)

Returns the largest integer not greater than the argument

LN(x)

Returns the natural logarithm of x

LOG(x)

Returns the natural logarithm of x

LOG10(x)

Returns the base-10 logarithm of the specified float expression x

MOD(x,y)

Returns the remainder of int x divided by int y

PI()

Returns the value of pi

POWER(x,y)

Returns the value of x raised to the power of y

RADIANS(x)

Converts degrees to radians

ROUND(x)

Rounds x to the nearest integer value, but does not change the data type. For example, the double value 4.1 rounds to the double value 4.

ROUND_TO_DIGIT (x,y)

Rounds x to y decimal places

SIGN(x)

Returns the sign of x as -1, 0, 1 if x is negative, zero, or positive

SQRT(x)

Returns the square root of x.

TRUNCATE(x,y)

Truncates x to y decimal places

WIDTH_BUCKET(target,lower-boundary,upper-boundary,bucket-count)

Define equal-width intervals (buckets) in a range between the lower boundary and the upper boundary, and returns the bucket number to which the target expression is assigned.

target - A constant, column variable, or general expression for which a bucket number is returned.
lower-boundary - Lower boundary for the range of values to be partitioned equally.
upper-boundary - Upper boundary for the range of values to be partitioned equally.
partition_count - Number of equal-width buckets in the range defined by the lower and upper boundaries.

Expressions can be constants, column variables, or general expressions.

Example Create 10 age buckets of equal size, with lower bound 0 and upper bound 100 ([0,10], [10,20]... [90,100]), and classify the

age of a customer accordingly:

SELECT WIDTH_BUCKET(age, 0, 100, 10) FROM customer;

For example, a customer of age 34 is assigned to bucket 3 ([30,40]) and the function returns the value 3.

Trigonometric Functions

Function

Description

ACOS(x)

Returns the arc cosine of x

ASIN(x)

Returns the arc sine of x

ATAN(x)

Returns the arc tangent of x

ATAN2(y,x)

Returns the arc tangent of (x, y) in the range (-π,π]. Equal to ATAN(y/x) for x > 0.

COS(x)

Returns the cosine of x

COT(x)

Returns the cotangent of x

SIN(x)

Returns the sine of x

TAN(x)

Returns the tangent of x

Geometric Functions

Function

Description

DISTANCE_IN_METERS(fromLon, fromLat, toLon, toLat)

Calculates distance in meters between two WGS84 positions.

CONV_4326_900913_X(x)

Converts WGS84 latitude to WGS84 Web Mercator x coordinate.

CONV_4326_900913_Y(y)

Converts WGS84 longitude to WGS84 Web Mercator y coordinate.

String Functions

Function

Description

BASE64_DECODE(str)

Decodes a BASE64-encoded string.

BASE64_ENCODE(str)

Encodes a string to a BASE64-encoded string.

CHAR_LENGTH(str)

Returns the number of characters in a string. Only works with unencoded fields (ENCODING set to none).

str1 || str2 [ || str3... ]

Returns the string that results from concatenating the strings specified. Note that numeric, date, timestamp, and time types will be implicitly casted to strings as necessary, so explicit casts of non-string types to string types is not required for inputs to the concatenation operator. Note that concatenating a variable string with a string literal, i.e. county_name || ' County' is significantly more performant than concatenating two or more variable strings, i.e. county_name || ', ' || state_name. Hence for for multi-variable string concatenation, it is recommended to use an update statement to materialize the concatenated output rather than performing it inline when such operations are expected to be routinely repeated.

ENCODE_TEXT(none_encoded_str)

Converts a none-encoded string to a transient dictionary-encoded string to allow for operations like group-by on top. When the watchdog is enabled, the number of strings that can be casted using this operator is capped by the value set with the watchdog-none-encoded-string-translation-limit flag (1,000,000 by default).

INITCAP(str)

Returns the string with initial caps after any of the defined delimiter characters, with the remainder of the characters lowercased. Valid delimiter characters are !, ?, @, ", ^, #, $, &, ~, _, ,, ., :, ;, +, -, *, %, /, |, \, [, ], (, ), {, }, <, >.

JSON_VALUE(json_str, path)

Returns the string of a field given by path instr. Paths start with the $ character, with sub-fields split by . and array members indexed by [], with array indices starting at 0. For example, JSON_VALUE('{"name": "Brenda", "scores": [89, 98, 94]}', '$.scores[1]') would yield a TEXT return field of '98'. Note that currentlyLAX parsing mode (any unmatched path returns null rather than errors) is the default, and STRICT parsing mode is not supported.

KEY_FOR_STRING(str)

Returns the dictionary key of a dictionary-encoded string column.

LCASE(str)

Returns the string in all lower case. Only ASCII character set is currently supported. Same as LOWER.

LEFT(str, num)

Returns the left-most number (num) of characters in the string (str).

LENGTH(str)

Returns the length of a string in bytes. Only works with unencoded fields (ENCODING set to none).

LOWER(str)

Returns the string in all lower case. Only ASCII character set is currently supported. Same as LCASE.

LPAD(str, len, [lpad_str ])

Left-pads the string with the string defined in lpad_str to a total length of len. If the optional lpad_str is not specified, the space character is used to pad. If the length of str is greater than len, then characters from the end of str are truncated to the length of len. Characters are added from lpad_str successively until the target length len is met. If lpad_str concatenated with str is not long enough to equal the target len, lpad_str is repeated, partially if necessary, until the target length is met.

LTRIM(str, chars)

Removes any leading characters specified in chars from the string. Alias for TRIM.

OVERLAY(strPLACING replacement_strFROM start [FORlen])

Replaces in str the number of characters defined in len with characters defined in replacement_str at the location start. Regardless of the length of replacement_str, len characters are removed from str unless start + replacement_str is greater than the length of str, in which case all characters from start to the end of str are replaced. Ifstart is negative, it specifies the number of characters from the end of str.

POSITION ( search_str IN str [FROM start_position])

Returns the position of the first character in search_str if found in str, optionally starting the search at start_position. If search_str is not found, 0 is returned. If search_str or str are null, null is returned.

REGEXP_REPLACE(str, pattern [, new_str, position, occurrence, [flags]])

Replace one or all matches of a substring in string str that matches pattern , which is a regular expression in POSIX regex syntax.

new_str (optional) is the string that replaces the string matching the pattern. If new_str is empty or not supplied, all found matches are removed.

The occurrence integer argument (optional) specifies the single match occurrence of the pattern to replace, starting from the beginning of str; 0 (replace all) is the default. Use a negative occurrence argument to signify the nth-to-last occurrence to be replaced.

pattern uses .

Use a positive position argument to indicate the number of characters from the beginning of str. Use a negative position argument to indicate the number of characters from the end of str.

Back-references/capture groups can be used to capture and replace specific sub-expressions.

Use the following optional flags to control the matching behavior: c - Case-sensitive matching. i - Case-insensitive matching.

If not specified, REGEXP_REPLACE defaults to case sensitive search.

REGEXP_SUBSTR(str, pattern [, position, occurrence, flags, group_num])

Search string str for pattern, which is a , and return the matching substring.

Use position to set the character position to begin searching. Use occurrence to specify the occurrence of the pattern to match.

Use a positive position argument to indicate the number of characters from the beginning of str. Use a negative position argument to indicate the number of characters from the end of str.

The occurrence integer argument (optional) specifies the single match occurrence of the pattern to replace, with 0 being mapped to the first (1) occurrence. Use a negative occurrence argument to signify the nth-to-last group in pattern is returned.

Use optional flags to control the matching behavior: c - Case-sensitive matching.

e - Extract submatches. i - Case-insensitive matching.

The c and i flags cannot be used together; e can be used with either. If neither c nor i are specified, or if pattern is not provided, REGEXP_SUBSTR defaults to case-sensitive search.

If the e flag is used, REGEXP_SUBSTR returns the capture group group_num of pattern matched in str. If the e flag is used, but no capture groups are provided in pattern, REGEXP_SUBSTR returns the entire matching pattern, regardless of group_num. If the e flag is used but no group_num is provided, a value of 1 for group_num is assumed, so the first capture group is returned.

REPEAT(str, num)

Repeats the string the number of times defined in num.

REPLACE(str, from_str, new_str)

Replaces all occurrences of substring from_str within a string, with a new substring new_str.

REVERSE(str)

Reverses the string.

RIGHT(str, num)

Returns the right-most number (num) of characters in the string (str).

RPAD(str, len, rpad_str)

Right-pads the string with the string defined in rpad_str to a total length of len. If the optional rpad_str is not specified, the space character is used to pad. If the length of str is greater than len, then characters from the beginning of str are truncated to the length of len. Characters are added from rpad_str successively until the target length len is met. If rpad_str concatenated with str is not long enough to equal the target len, rpad_str is repeated, partially if necessary, until the target length is met.

RTRIM(str)

Removes any trailing spaces from the string.

SPLIT_PART(str, delim, field_num)

Split the string based on a delimiter delim and return the field identified by field_num. Fields are numbered from left to right.

STRTOK_TO_ARRAY(str, [delim])

Tokenizes the string str using optional delimiter(s) delim and returns an array of tokens. An empty array is returned if no tokens are produced in tokenization. NULL is returned if either parameter is a NULL.

SUBSTR(str, start, [len])

Alias for SUBSTRING.

SUBSTRING(str FROM start [ FOR len])

Returns a substring of str starting at index start for len characters.

The start position is 1-based (that is, the first character of str is at index 1, not 0). However, start 0 aliases to start 1.

If start is negative, it is considered to be |start| characters from the end of the string.

If len is not specified, then the substring from start to the end of str is returned.

If len is not specified, then the substring from start to the end of str is returned.

If start + len is greater than the length of str, then the characters in str from start to the end of the string are returned.

TRIM([BOTH | LEADING | TRAILING] [trim_str FROM str])

Removes characters defined in trim_str from the beginning, end, or both of str. If trim_str is not specified, the space character is the default. If the trim location is not specified, defined characters are trimmed from both the beginning and end of str.

TRY_CAST( str AS type)

Attempts to cast/convert a string type to any valid numeric, timestamp, date, or time type. If the conversion cannot be performed, null is returned. Note that TRY_CAST is not valid for non-string input types.

UCASE(str)

Returns the string in uppercase format. Only ASCII character set is currently supported. Same as UPPER.

UPPER(str)

Returns the string in uppercase format. Only ASCII character set is currently supported. Same as UCASE.

Pattern-Matching Functions

Name

Example

Description

str LIKE pattern

'ab' LIKE 'ab'

Returns true if the string matches the pattern (case-sensitive)

str NOT LIKE pattern

'ab' NOT LIKE 'cd'

Returns true if the string does not match the pattern

str ILIKE pattern

'AB' ILIKE 'ab'

Returns true if the string matches the pattern (case-insensitive). Supported only when the right side is a string literal; for example, colors.name ILIKE 'b%

str REGEXP POSIX pattern

'^[a-z]+r$'

Lowercase string ending with r

REGEXP_LIKE ( str , POSIX pattern )

'^[hc]at'

cat or hat

Usage Notes

The following wildcard characters are supported by LIKE and ILIKE:

% matches any number of characters, including zero characters.
_ matches exactly one character.

Date/Time Functions

Function

Description

CURRENT_DATE

CURRENT_DATE()

Returns the current date in the GMT time zone.

Example:

SELECT CURRENT_DATE();

CURRENT_TIME

CURRENT_TIME()

Returns the current time of day in the GMT time zone.

Example:

SELECT CURRENT_TIME();

CURRENT_TIMESTAMP

CURRENT_TIMESTAMP()

Return the current timestamp in the GMT time zone. Same as NOW().

Example:

SELECT CURRENT_TIMESTAMP();

DATEADD('date_part', interval, date | timestamp)

Returns a date after a specified time/date interval has been added.

Example:

SELECT DATEADD('MINUTE', 6000, dep_timestamp) Arrival_Estimate FROM flights_2008_10k LIMIT 10;

DATEDIFF('date_part', date, date)

Returns the difference between two dates, calculated to the lowest level of the date_part you specify. For example, if you set the date_part as DAY, only the year, month, and day are used to calculate the result. Other fields, such as hour and minute, are ignored.

Example:

SELECT DATEDIFF('YEAR', plane_issue_date, now()) Years_In_Service FROM flights_2008_10k LIMIT 10;

DATEPART('interval', date | timestamp)

Returns a specified part of a given date or timestamp as an integer value. Note that 'interval' must be enclosed in single quotes.

Example:

SELECT DATEPART('YEAR', plane_issue_date) Year_Issued FROM flights_2008_10k LIMIT 10;

DATE_TRUNC(date_part, timestamp)

Truncates the timestamp to the specified date_part. DATE_TRUNC(week,...) starts on Monday (ISO), which is different than EXTRACT(dow,...), which starts on Sunday.

Example:

SELECT DATE_TRUNC(MINUTE, arr_timestamp) Arrival FROM flights_2008_10k LIMIT 10;

EXTRACT(date_part FROM timestamp)

Returns the specified date_part from timestamp.

Example:

SELECT EXTRACT(HOUR FROM arr_timestamp) Arrival_Hour FROM flights_2008_10k LIMIT 10;

INTERVAL 'count' date_part

Adds or Subtracts count date_part units from a timestamp. Note that 'count' is enclosed in single quotes.

Example:

SELECT arr_timestamp + INTERVAL '10' YEAR FROM flights_2008_10k LIMIT 10;

NOW()

Return the current timestamp in the GMT time zone. Same as CURRENT_TIMESTAMP().

Example:

NOW();

TIMESTAMPADD(date_part, count, timestamp | date)

Adds an interval of count date_part to timestamp or date and returns signed date_part units in the provided timestamp or date form.

Example:

SELECT TIMESTAMPADD(DAY, 14, arr_timestamp) Fortnight FROM flights_2008_10k LIMIT 10;

TIMESTAMPDIFF(date_part, timestamp1, timestamp2)

Subtracts timestamp1 from timestamp2 and returns the result in signed date_part units.

Example:

SELECT TIMESTAMPDIFF(MINUTE, arr_timestamp, dep_timestamp) Flight_Time FROM flights_2008_10k LIMIT 10;

Supported Types

Supported date_part types:

DATE_TRUNC [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
            MICROSECOND, NANOSECOND, MILLENNIUM, CENTURY, DECADE, WEEK, 
            WEEK_SUNDAY, QUARTERDAY]
EXTRACT    [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
            MICROSECOND, NANOSECOND, DOW, ISODOW, DOY, EPOCH, QUARTERDAY, 
            WEEK, WEEK_SUNDAY, DATEEPOCH]
DATEDIFF   [YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, 
            MICROSECOND, NANOSECOND, WEEK]

Supported interval types:

DATEADD       [DECADE, YEAR, QUARTER, MONTH, WEEK, WEEKDAY, DAY, 
               HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, NANOSECOND]
TIMESTAMPADD  [YEAR, QUARTER, MONTH, WEEKDAY, DAY, HOUR, MINUTE,
               SECOND, MILLISECOND, MICROSECOND, NANOSECOND]
DATEPART      [YEAR, QUARTER, MONTH, DAYOFYEAR, QUARTERDAY, WEEKDAY, DAY, HOUR,
               MINUTE, SECOND, MILLISECOND, MICROSECOND, NANOSECOND]

Accepted Date, Time, and Timestamp Formats

Datatype

Formats

Examples

DATE

YYYY-MM-DD

2013-10-31

DATE

MM/DD/YYYY

10/31/2013

DATE

DD-MON-YY

31-Oct-13

DATE

DD/Mon/YYYY

31/Oct/2013

EPOCH

1383262225

TIME

HH:MM

23:49

TIME

HHMMSS

234901

TIME

HH:MM:SS

23:49:01

TIMESTAMP

DATE TIME

31-Oct-13 23:49:01

TIMESTAMP

DATETTIME

31-Oct-13T23:49:01

TIMESTAMP

DATE:TIME

11/31/2013:234901

TIMESTAMP

DATE TIME ZONE

31-Oct-13 11:30:25 -0800

TIMESTAMP

DATE HH.MM.SS PM

31-Oct-13 11.30.25pm

TIMESTAMP

DATE HH:MM:SS PM

31-Oct-13 11:30:25pm

TIMESTAMP

1383262225

Usage Notes

For two-digit years, years 69-99 are assumed to be previous century (for example, 1969), and 0-68 are assumed to be current century (for example, 2016).
For four-digit years, negative years (BC) are not supported.
Hours are expressed in 24-hour format.
When time components are separated by colons, you can write them as one or two digits.
Months are case insensitive. You can spell them out or abbreviate to three characters.
For timestamps, decimal seconds are ignored. Time zone offsets are written as +/-HHMM.
For timestamps, a numeric string is converted to +/- seconds since January 1, 1970. Supported timestamps range from -30610224000 (January 1, 1000) through 29379456000 (December 31, 2900).
On output, dates are formatted as YYYY-MM-DD. Times are formatted as HH:MM:SS.
Linux EPOCH values range from -30610224000 (1/1/1000) through 185542587100800 (1/1/5885487). Complete range in years: +/-5,883,517 around epoch.

Statistical and Aggregate Functions

Both double-precision (standard) and single-precision floating point statistical functions are provided. Single-precision functions run faster on GPUs but might cause overflow errors.

Double-precision FP Function

Single-precision FP Function

Description

AVG(x)

Returns the average value of x

COUNT()

Returns the count of the number of rows returned

COUNT(DISTINCT x)

Returns the count of distinct values of x

APPROX_COUNT_DISTINCT(x, e)

Returns the approximate count of distinct values of x with defined expected error rate e, where e is an integer from 1 to 100. If no value is set for e, the approximate count is calculated using the system-widehll-precision-bits configuration parameter.

APPROX_MEDIAN(x)

Returns the approximate median of x. Two server configuration parameters affect memory usage:

<code></code><code></code>
<code></code>

Accuracy of APPROX_MEDIAN depends on the distribution of data; see .

APPROX_PERCENTILE(x,y)

Returns the approximate quantile of x, where y is the value between 0 and 1.

For example, y=0 returns MIN(x), y=1 returns MAX(x), and y=0.5 returns APPROX_MEDIAN(x).

MAX(x)

Returns the maximum value of x

MIN(x)

Returns the minimum value of x

SINGLE_VALUE

Returns the input value if there is only one distinct value in the input; otherwise, the query fails.

SUM(x)

Returns the sum of the values of x

SAMPLE(x)

Returns one sample value from aggregated column x. For example, the following query returns population grouped by city, along with one value from the state column for each group:

Note: This was previously LAST_SAMPLE, which is now deprecated.

CORRELATION(x, y)

CORRELATION_FLOAT(x, y)

Alias of CORR. Returns the coefficient of correlation of a set of number pairs.

CORR(x, y)

CORR_FLOAT(x, y)

Returns the coefficient of correlation of a set of number pairs.

COUNT_IF(conditional_expr)

Returns the number of rows satisfying the given condition_expr.

COVAR_POP(x, y)

COVAR_POP_FLOAT(x, y)

Returns the population covariance of a set of number pairs.

COVAR_SAMP(x, y)

COVAR_SAMP_FLOAT(x, y)

Returns the sample covariance of a set of number pairs.

STDDEV(x)

STDDEV_FLOAT(x)

Alias of STDDEV_SAMP. Returns sample standard deviation of the value.

STDDEV_POP(x)

STDDEV_POP_FLOAT(x)

Returns the population standard the standard deviation of the value.

STDDEV_SAMP(x)

STDDEV_SAMP_FLOAT(x)

Returns the sample standard deviation of the value.

SUM_IF(conditional_expr)

Returns the sum of all expression values satisfying the given condition_expr.

VARIANCE(x)

VARIANCE_FLOAT(x)

Alias of VAR_SAMP. Returns the sample variance of the value.

VAR_POP(x)

VAR_POP_FLOAT(x)

Returns the population variance sample variance of the value.

VAR_SAMP(x)

VAR_SAMP_FLOAT(x)

Returns the sample variance of the value.

Usage Notes

COUNT(DISTINCT x), especially when used in conjunction with GROUP BY, can require a very large amount of memory to keep track of all distinct values in large tables with large cardinalities. To avoid this large overhead, use APPROX_COUNT_DISTINCT.
APPROX_COUNT_DISTINCT(x, e) gives an approximate count of the value x, based on an expected error rate defined in e. The error rate is an integer value from 1 to 100. The lower the value of e, the higher the precision, and the higher the memory cost. Select a value for e based on the level of precision required. On large tables with large cardinalities, consider using APPROX_COUNT_DISTINCT when possible to preserve memory. When data cardinalities permit, OmniSci uses the precise implementation of COUNT(DISTINCT x) for APPROX_COUNT_DISTINCT. Set the default error rate using the -hll-precision-bits configuration parameter.
The accuracy of APPROX_MEDIAN (x) upon the distribution of data. For example:
- For 100,000,000 integers (1, 2, 3, ... 100M) in random order, APPROX_MEDIAN can provide a highly accurate answer 5+ significant digits.
- For 100,000,001 integers, where 50,000,000 have value of 0 and 50,000,001 have value of 1, APPROX_MEDIAN returns a value close to 0.5, even though the median is 1.
Currently, OmniSci does not support grouping by non-dictionary-encoded strings. However, with the SAMPLE aggregate function, you can select non-dictionary-encoded strings that are presumed to be unique in a group. For example:
```
SELECT user_name, SAMPLE(user_decription) FROM tweets GROUP BY user_name;
```
If the aggregated column (user_description in the example above) is not unique within a group, SAMPLE selects a value that might be nondeterministic because of the parallel nature of OmniSci query execution.

Miscellaneous Functions

Function

Description

SAMPLE_RATIO(x)

Returns a Boolean value, with the probability of True being returned for a row equal to the input argument. The input argument is a numeric value between 0.0 and 1.0. Negative input values (return False), input values greater than 1.0 returns True, and null input values return False.

The result of the function is deterministic per row; that is, all calls of the operator for a given row return the same result. The sample ratio is probabilistic, but is generally within a thousandth of a percentile of the actual range when the underlying dataset is millions of records or larger.

The following example filters approximately 50% of the rows from t and returns a count that is approximately half the number of rows in t:

SELECT COUNT(*) FROM t WHERE SAMPLE_RATIO(0.5)

User-Defined Functions

You can create your own C++ functions and use them in your SQL queries.

User-defined Functions (UDFs) require clang++ version 9. You can verify the version installed using the command clang++ --version.
UDFs currently allow any authenticated user to register and execute a runtime function. By default, runtime UDFs are globally disabled but can be enabled with the runtime flag enable-runtime-udf.

Create your function and save it in a .cpp file; for example, /var/lib/omnisci/udf_myFunction.cpp.
Add the UDF configuration flag to omnisci.conf. For example:
```
udf = "/var/lib/omnisci/udf_myFunction.cpp"
```
Use your function in a SQL query. For example:
```
SELECT udf_myFunction FROM myTable
```

Sample User-Defined Function

This function, udf_diff.cpp, returns the difference of two values from a table.

#include <cstdint>
#if defined(__CUDA_ARCH__) && defined(__CUDACC__) && defined(__clang__)
#define DEVICE __device__
#define NEVER_INLINE
#define ALWAYS_INLINE
#else
#define DEVICE
#define NEVER_INLINE __attribute__((noinline))
#define ALWAYS_INLINE __attribute__((always_inline))
#endif
#define EXTENSION_NOINLINE extern "C" NEVER_INLINE DEVICE
EXTENSION_NOINLINE int32_t udf_diff(const int32_t x, const int32_t y) { return x - y; }

Code Commentary

Include the standard integer library, which supports the following datatypes:

bool
int8_t (cstdint), char
int16_t (cstdint), short
int32_t (cstdint), int
int64_t (cstdint), size_t
float
double
void

#include <cstdint>

The next four lines are boilerplate code that allows OmniSci to determine whether the server is running with GPUs. OmniSci chooses whether it should compile the function inline to achieve the best possible performance.

#include <cstdint>
#if defined(__CUDA_ARCH__) && defined(__CUDACC__) && defined(__clang__)
#define DEVICE __device__
#define NEVER_INLINE
#define ALWAYS_INLINE
#else
#define DEVICE
#define NEVER_INLINE __attribute__((noinline))
#define ALWAYS_INLINE __attribute__((always_inline))
#endif
#define EXTENSION_NOINLINE extern "C" NEVER_INLINE DEVICE

The next line is the actual user-defined function, which returns the difference between INTEGER values x and y.

EXTENSION_NOINLINE int32_t udf_diff(const int32_t x, const int32_t y) { return x - y; }

To run the udf_diff function, add this line to your /var/lib/omnisci/omnisci.conf file (in this example, the .cpp file is stored at /var/lib/omnisci/udf_diff.cpp):

udf = "/var/lib/omnisci/udf_diff.cpp"

Restart the OmniSci server.

Use your command from an OmniSci SQL client to query, for example, a table named myTable that contains the INTEGER columns myInt1 and myInt2.

SELECT udf_diff(myInt1, myInt2) FROM myTable LIMIT 1;

OmniSci returns the difference as an INTEGER value.

Roles and Privileges

HEAVY.AI supports data security using a set of database object access privileges granted to users or roles.

Users and Privileges

When you create a database, the admin superuser is created by default. The admin superuser is granted all privileges on all database objects. Superusers can create new users that, by default, have no database object privileges.

Superusers can grant users selective access privileges on multiple database objects using two mechanisms: role-based privileges and user-based privileges.

Role-based Privileges

Grant roles access privileges on database objects.
Grant roles to users.
Grant roles to other roles.

User-based Privileges

When a user has privilege requirements that differ from role privileges, you can grant privileges directly to the user. These mechanisms provide data security for many users and classes of users to access the database.

You have the following options for granting privileges:

Each object privilege can be granted to one or many roles, or to one or many users.
A role and/or user can be granted privileges on one or many objects.
A role can be granted to one or many users or other roles.
A user can be granted one or many roles.

This supports the following many-to-many relationships:

Objects and roles
Objects and users
Roles and users

These relationships provide flexibility and convenience when granting/revoking privileges to and from users.

Granting object privileges to roles and users, and granting roles to users, has a cumulative effect. The result of several grant commands is a combination of all individual grant commands. This applies to all database object types and to privileges inherited by objects. For example, object privileges granted to the object of database type are propagated to all table-type objects of that database object.

Who Can Grant Object Privileges?

Only a superuser or an object owner can grant privileges for on object.

A superuser has all privileges on all database objects.
A non-superuser user has only those privileges on a database object that are granted by a superuser.
A non-superuser user has ALL privileges on a table created by that user.

Roles and Privileges Persistence

Roles can be created and dropped at any time.
Object privileges and roles can be granted or revoked at any time, and the action takes effect immediately.
Privilege state is persistent and restored if the HEAVY.AI session is interrupted.

Database Object Privileges

There are five database object types, each with its own privileges.

ACCESS - Connect to the database. The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

ALL - Allow all privileges on this database except issuing grants and dropping the database.

SELECT, INSERT, TRUNCATE, UPDATE, DELETE - Allow these operations on any table in the database.

ALTER SERVER - Alter servers in the current database.

CREATE SERVER - Create servers in the current database.

CREATE TABLE - Create a table in the current database. (Also CREATE.)

CREATE VIEW - Create a view for the current database.

CREATE DASHBOARD - Create a dashboard for the current database.

DELETE DASHBOARD - Delete a dashboard for this database.

DROP SERVER - Drop servers from the current database.

DROP - Drop a table from the database.

DROP VIEW - Drop a view for this database.

EDIT DASHBOARD - Edit a dashboard for this database.

SELECT VIEW - Select a view for this database.

SERVER USAGE - Use servers (through foreign tables) in the current database.

VIEW DASHBOARD - View a dashboard for this database.

VIEW SQL EDITOR - Access the SQL Editor in Immerse for this database.

Users with SELECT privilege on views do not require SELECT privilege on underlying tables referenced by the view to retrieve the data queried by the view. View queries work without error whether or not users have direct access to referenced tables. This also applies to views that query tables in other databases.

To create views, users must have SELECT privilege on queried tables in addition to the CREATE VIEW privilege.

SELECT, INSERT, TRUNCATE, UPDATE, DELETE - Allow these SQL statements on this table.

DROP - Drop this table.

To create views, users must have SELECT privilege on queried tables in addition to the CREATE VIEW privilege.

SELECT - Select from this view. Users do not need privileges on objects referenced by this view.

DROP - Drop this view.

To create views, users must have SELECT privilege on queried tables in addition to the CREATE VIEW privilege.

VIEW - View this dashboard.

EDIT - Edit this dashboard.

DELETE - Delete this dashboard.

DROP - Drop this server from the current database.

ALTER - Alter this server in the current database.

USAGE - Use this server (through foreign tables) in the current database.

Privileges granted on a database-type object are inherited by all tables of that database.

Privilege Commands

SQL

Description

Create role.

Drop role.

Grant role to user or to another role.

Revoke role from user or from another role.

Grant role privilege(s) on a database table to a role or user.

Revoke role privilege(s) on database table from a role or user.

Grant role privilege(s) on a database view to a role or user.

Revoke role privilege(s) on database view from a role or user.

Grant role privilege(s) on database to a role or user.

Revoke role privilege(s) on database from a role or user.

Grant role privilege(s) on server to a role or user.

Revoke role privilege(s) on server from a role or user.

Grant role privilege(s) on dashboard to a role or user.

Revoke role privilege(s) on dashboard from a role or user.

Example

The following example shows a valid sequence for granting access privileges to non-superuser user1 by granting a role to user1 and by directly granting a privilege. This example presumes that table1 and user1 already exist, and that user1 has ACCESS privileges on the database where table1 exists.

Create the r_select role.
Grant the SELECT privilege on table1 to the r_select role. Any user granted the r_select role gains the SELECT privilege.
```
GRANT SELECT ON TABLE table1 TO r_select;
```
Grant the r_select role to user1, giving user1 the SELECT privilege on table1.
Directly grant user1 the INSERT privilege on table1.
```
GRANT INSERT ON TABLE table1 TO user1;
```

See Example Roles and Privileges Session for a more complete example.

CREATE ROLE

Create a role. Roles are granted to users for role-based database object access.

This clause requires superuser privilege and <roleName> must not exist.

Synopsis

CREATE ROLE <roleName>;

Parameters

<roleName>

Name of the role to create.

Example

Create a payroll department role called payrollDept.

CREATE ROLE payrollDept;

DROP ROLE

Remove a role.

This clause requires superuser privilege and <roleName> must exist.

Synopsis

DROP ROLE [IF EXISTS] <roleName>;

Parameters

<roleName>

Name of the role to drop.

Example

Remove the payrollDept role.

DROP ROLE payrollDept;

GRANT

Grant role privileges to users and to other roles.

The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

This clause requires superuser privilege. The specified <roleNames> and <userNames> must exist.

Synopsis

GRANT <roleNames> TO <userNames>, <roleNames>;

Parameters

<roleNames>

Names of roles to grant to users and other roles. Use commas to separate multiple role names.

<userNames>

Names of users. Use commas to separate multiple user names.

Examples

Assign payrollDept role privileges to user dennis.

GRANT payrollDept TO dennis;

Grant payrollDept and accountsPayableDept role privileges to users dennis and mike and role hrDept.

GRANT payrollDept, accountsPayableDept TO dennis, mike, hrDept;

REVOKE

Remove role privilege from users or from other roles. This removes database object access privileges granted with the role.

This clause requires superuser privilege. The specified <roleNames> and <userNames> must exist.

Synopsis

REVOKE <roleNames> FROM <userNames>, <roleNames>;

Parameters

<roleNames>

Names of roles to remove from users and other roles. Use commas to separate multiple role names.

<userName>

Names of the users. Use commas to separate multiple user names.

Example

Remove payrollDept role privileges from user dennis.

REVOKE payrollDept FROM dennis;

Revoke payrollDept and accountsPayableDept role privileges from users dennis and fred and role hrDept.

REVOKE payrollDept, accountsPayableDept FROM dennis, fred, hrDept;

GRANT ON TABLE

Define the privilege(s) a role or user has on the specified table. You can specify any combination of the INSERT, SELECT, DELETE, UPDATE, DROP, or TRUNCATE privilege or specify all privileges.

The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

This clause requires superuser privilege, or <tableName> must have been created by the user invoking this command. The specified <tableName> and users or roles defined in <entityList> must exist.

Synopsis

GRANT <privilegeList> ON TABLE <tableName> TO <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

ALL

Grant all possible access privileges on <tableName> to <entityList>.

ALTER TABLE

Grant ALTER TABLE privilege on <tableName> to <entityList>.

DELETE

Grant DELETE privilege on <tableName> to <entityList>.

DROP

Grant DROP privilege on <tableName> to <entityList>.

INSERT

Grant INSERT privilege on <tableName> to <entityList>.

SELECT

Grant SELECT privilege on <tableName> to <entityList>.

TRUNCATE

Grant TRUNCATE privilege on <tableName> to <entityList>.

UPDATE

Grant UPDATE privilege on <tableName> to <entityList>.

<tableName>

Name of the database table.

<entityList>

Name of entity or entities to be granted the privilege(s).

Parameter Value

Descriptions

role

Name of role.

user

Name of user.

Examples

Permit all privileges on the employees table for the payrollDept role.

GRANT ALL ON TABLE employees TO payrollDept;

Permit SELECT-only privilege on the employees table for user chris.

GRANT SELECT ON TABLE employees TO chris;

Permit INSERT-only privilege on the employees table for the hrdept and accountsPayableDept roles.

GRANT INSERT ON TABLE employees TO hrDept, accountsPayableDept;

Permit INSERT, SELECT, and TRUNCATE privileges on the employees table for the role hrDept and for users dennis and mike.

GRANT INSERT, SELECT, TRUNCATE ON TABLE employees TO hrDept, dennis, mike;

REVOKE ON TABLE

Remove the privilege(s) a role or user has on the specified table. You can remove any combination of the INSERT, SELECT, DELETE, UPDATE, or TRUNCATE privileges, or remove all privileges.

This clause requires superuser privilege or <tableName> must have been created by the user invoking this command. The specified <tableName> and users or roles in <entityList> must exist.

Synopsis

REVOKE <privilegeList> ON TABLE <tableName> FROM <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

ALL

Remove all access privilege for <entityList> on <tableName>.

ALTER TABLE

Remove ALTER TABLE privilege for <entityList> on <tableName>.

DELETE

Remove DELETE privilege for <entityList> on <tableName>.

DROP

Remove DROP privilege for <entityList> on <tableName>.

INSERT

Remove INSERT privilege for <entityList> on <tableName>.

SELECT

Remove SELECT privilege for <entityList> on <tableName>.

TRUNCATE

Remove TRUNCATE privilege for <entityList> on <tableName>.

UPDATE

Remove UPDATE privilege for <entityList> on <tableName>.

<tableName>

Name of the database table.

<entityList>

Name of entities to be denied the privilege(s).

Parameter Value

Descriptions

role

Name of role.

user

Name of user.

Example

Prohibit SELECT and INSERT operations on the employees table for the nonemployee role.

REVOKE ALL ON TABLE employees FROM nonemployee;

Prohibit SELECT operations on the directors table for the employee role.

REVOKE SELECT ON TABLE directors FROM employee;

Prohibit INSERT operations on the directors table for role employee and user laura.

REVOKE INSERT ON TABLE directors FROM employee, laura;

Prohibit INSERT, SELECT, and TRUNCATE privileges on the employees table for the role nonemployee and for users dennis and mike.

REVOKE INSERT, SELECT, TRUNCATE ON TABLE employees FROM nonemployee, dennis, mike;

GRANT ON VIEW

Define the privileges a role or user has on the specified view. You can specify any combination of the SELECT, INSERT, or DROP privileges, or specify all privileges.

This clause requires superuser privileges, or <viewName> must have been created by the user invoking this command. The specified <viewName> and users or roles in <entityList> must exist.

Synopsis

GRANT <privilegeList> ON VIEW <viewName> TO <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

ALL

Grant all possible access privileges on <viewName> to <entityList>.

DROP

Grant DROP privilege on <viewName> to <entityList>.

INSERT

Grant INSERT privilege on <viewName> to <entityList>.

SELECT

Grant SELECT privilege on <viewName> to <entityList>.

<viewName>

Name of the database view.

<entityList>

Name of entities to be granted the privileges.

Parameter Value

Descriptions

role

Name of role.

user

Name of user.

Examples

Permit SELECT, INSERT, and DROP privileges on the employees view for the payrollDept role.

GRANT ALL ON VIEW employees TO payrollDept;

Permit SELECT-only privilege on the employees view for the employee role and user venkat.

GRANT SELECT ON VIEW employees TO employee, venkat;

Permit INSERT and DROP privileges on the employees view for the hrDept and acctPayableDept roles and users simon and dmitri.

GRANT INSERT, DROP ON VIEW employees TO hrDept, acctPayableDept, simon, dmitri;

REVOKE ON VIEW

Remove the privileges a role or user has on the specified view. You can remove any combination of the INSERT, DROP, or SELECT privileges, or remove all privileges.

This clause requires superuser privilege, or <viewName> must have been created by the user invoking this command. The specified <viewName> and users or roles in <entityList> must exist.

Synopsis

REVOKE <privilegeList> ON VIEW <viewName> FROM <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

ALL

Remove all access privilege for <entityList> on <viewName>.

DROP

Remove DROP privilege for <entityList> on <viewName>.

INSERT

Remove INSERT privilege for <entityList> on <viewName>.

SELECT

Remove SELECT privilege for <entityList> on <viewName>.

<viewName>

Name of the database view.

<entityList>

Name of entity to be denied the privilege(s).

Parameter Value

Descriptions

role

Name of role.

user

Name of user.

Example

Prohibit SELECT, DROP, and INSERT operations on the employees view for the nonemployee role.

REVOKE ALL ON VIEW employees FROM nonemployee;

Prohibit SELECT operations on the directors view for the employee role.

REVOKE SELECT ON VIEW directors FROM employee;

Prohibit INSERT and DROP operations on the directors view for the employee and manager role and for users ashish and lindsey.

REVOKE INSERT, DROP ON VIEW directors FROM employee, manager, ashish, lindsey;

GRANT ON DATABASE

Define the valid privileges a role or user has on the specified database. You can specify any combination of privileges, or specify all privileges.

The ACCESS privilege is a prerequisite for all other privileges at the database level. Without the ACCESS privilege, a user or role cannot perform tasks on any other database objects.

This clause requires superuser privileges.

Synopsis

GRANT <privilegeList> ON DATABASE <dbName> TO <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

ACCESS

Grant ACCESS (connection) privilege on <dbName> to <entityList>.

ALL

Grant all possible access privileges on <dbName> to <entityList>.

ALTER TABLE

Grant ALTER TABLE privilege on <dbName> to <entityList>.

ALTER SERVER

Grant ALTER SERVER privilege on <dbName> to <entityList>.

CREATE SERVER

Grant CREATE SERVER privilege on <dbName> to <entityList>;

CREATE TABLE

Grant CREATE TABLE privilege on <dbName> to <entityList>. Previously CREATE.

CREATE VIEW

Grant CREATE VIEW privilege on <dbName> to <entityList>.

CREATE DASHBOARD

Grant CREATE DASHBOARD privilege on <dbName> to <entityList>.

CREATE

Grant CREATE privilege on <dbName> to <entityList>.

DELETE

Grant DELETE privilege on <dbName> to <entityList>.

DELETE DASHBOARD

Grant DELETE DASHBOARD privilege on <dbName> to <entityList>.

DROP

Grant DROP privilege on <dbName> to <entityList>.

DROP SERVER

Grant DROP privilege on <dbName> to <entityList>.

DROP VIEW

Grant DROP VIEW privilege on <dbName> to <entityList>.

EDIT DASHBOARD

Grant EDIT DASHBOARD privilege on <dbName> to <entityList>.

INSERT

Grant INSERT privilege on <dbName> to <entityList>.

SELECT

Grant SELECT privilege on <dbName> to <entityList>.

SELECT VIEW

Grant SELECT VIEW privilege on <dbName> to <entityList>.

SERVER USAGE

Grant SERVER USAGE privilege on <dbName> to <entityList>.

TRUNCATE

Grant TRUNCATE privilege on <dbName> to <entityList>.

UPDATE

Grant UPDATE privilege on <dbName> to <entityList>.

VIEW DASHBOARD

Grant VIEW DASHBOARD privilege on <dbName> to <entityList>.

VIEW SQL EDITOR

Grant VIEW SQL EDITOR privilege in Immerse on <dbName> to <entityList>.

<dbName>

Name of the database, which must exist, created by CREATE DATABASE.

<entityList>

Name of the entity to be granted the privilege.

Parameter Value

Descriptions

role

Name of role, which must exist.

user

Name of user, which must exist. See .

Examples

Permit all operations on the companydb database for the payrollDept role and user david.

GRANT ALL ON DATABASE companydb TO payrollDept, david;

Permit SELECT-only operations on the companydb database for the employee role.

GRANT ACCESS, SELECT ON DATABASE companydb TO employee;

Permit INSERT, UPDATE, and DROP operations on the companydb database for the hrdept and manager role and for users irene and stephen.

GRANT ACCESS, INSERT, UPDATE, DROP ON DATABASE companydb TO hrdept, manager, irene, stephen;

REVOKE ON DATABASE

Remove the operations a role or user can perform on the specified database. You can specify privileges individually or specify all privileges.

This clause requires superuser privilege or the user must own the database object. The specified <dbName> and roles or users in <entityList> must exist.

Synopsis

REVOKE <privilegeList> ON DATABASE <dbName> FROM <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

ACCESS

Remove ACCESS (connection) privilege on <dbName> from <entityList>.

ALL

Remove all possible privileges on <dbName> from <entityList>.

ALTER SERVER

Remove ALTER SERVER privilege on <dbName> from <entityList>

ALTER TABLE

Remove ALTER TABLE privilege on <dbName> from <entityList>.

CREATE TABLE

Remove CREATE TABLE privilege on <dbName> from <entityList>. Previously CREATE.

CREATE VIEW

Remove CREATE VIEW privilege on <dbName> from <entityList>.

CREATE DASHBOARD

Remove CREATE DASHBOARD privilege on <dbName> from <entityList>.

CREATE

Remove CREATE privilege on <dbName> from <entityList>.

CREATE SERVER

Remove CREATE SERVER privilege on <dbName> from <entityList>.

DELETE

Remove DELETE privilege on <dbName> from <entityList>.

DELETE DASHBOARD

Remove DELETE DASHBOARD privilege on <dbName> from <entityList>.

DROP

Remove DROP privilege on <dbName> from <entityList>.

DROP SERVER

Remove DROP SERVER privilege on <dbName> from <entityList>.

DROP VIEW

Remove DROP VIEW privilege on <dbName> from <entityList>.

EDIT DASHBOARD

Remove EDIT DASHBOARD privilege on <dbName> from <entityList>.

INSERT

Remove INSERT privilege on <dbName> from <entityList>.

SELECT

Remove SELECT privilege on <dbName> from <entityList>.

SELECT VIEW

Remove SELECT VIEW privilege on <dbName> from <entityList>.

SERVER USAGE

Remove SERVER USAGE privilege on <dbName> from <entityList>.

TRUNCATE

Remove TRUNCATE privilege on <dbName> from <entityList>.

UPDATE

Remove UPDATE privilege on <dbName> from <entityList>.

VIEW DASHBOARD

Remove VIEW DASHBOARD privilege on <dbName> from <entityList>.

VIEW SQL EDITOR

Remove VIEW SQL EDITOR privilege in Immerse on <dbName> from <entityList>.

<dbName>

Name of the database.

<entityList>

Parameter Value

Descriptions

role

Name of role.

user

Name of user.

Example

Prohibit all operations on the employees database for the nonemployee role.

REVOKE ALL ON DATABASE employees FROM nonemployee;

Prohibit SELECT operations on the directors database for the employee role and for user monica.

REVOKE SELECT ON DATABASE directors FROM employee;

Prohibit INSERT, DROP, CREATE, and DELETE operations on the directors database for employee role and for users max and alex.

REVOKE INSERT, DROP, CREATE, DELETE ON DATABASE directors FROM employee;

GRANT ON SERVER

Define the valid privileges a role or user has for working with servers. You can specify any combination of privileges or specify all privileges.

This clause requires superuser privileges, or <serverName> must have been created by the user invoking the command.

Synopsis

GRANT <privilegeList> ON SERVER <serverName> TO <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

DROP

Grant DROP privileges on <serverName> on current database to <entityList>.

ALTER

Grant ALTER privilege on <serverName> on current database to <entityList>.

USAGE

Grant USAGE privilege (through foreign tables) on <serverName> on current database to <entityList>.

<serverName>

Name of the server, which must exist on the current database, created by CREATE SERVER ON DATABASE.

<entityList>

Parameter Value

Descriptions

role

Name of role, which must exist.

user

Name of user, which must exist. See .

Examples

Grant DROP privilege on server parquet_s3_server to user fred:

GRANT DROP ON SERVER parquet_s3_server TO fred

Grant ALTER privilege on server parquet_s3_server to role payrollDept:

GRANT ALTER ON SERVER parquet_s3_server TO payrollDept;

Grant USAGE and ALTER privileges on server parquet_s3_server to role payrollDept and user jamie:

GRANT USAGE, ALTER ON SERVER parquet_s3_server TO payrollDept, jamie;

REVOKE ON SERVER

Remove privileges a role or user has for working with servers. You can specify any combination of privileges or specify all privileges.

This clause requires superuser privileges, or <serverName> must have been created by the user invoking the command.

Synopsis

REVOKE <privilegeList> ON SERVER <serverName> FROM <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

DROP

Remove DROP privileges on <serverName> on current database for <entityList>.

ALTER

Remove ALTER privilege on <serverName> on current database for <entityList>.

USAGE

Remove USAGE privilege (through foreign tables) on <serverName> on current database for <entityList>.

<serverName>

Name of the server, which must exist on the current database, created by CREATE SERVER ON DATABASE.

<entityList>

Parameter Value

Descriptions

role

Name of role, which must exist.

user

Name of user, which must exist. See .

Examples

Revoke DROP privilege on server parquet_s3_server for user inga:

REVOKE DROP ON SERVER parquet_s3_server FROM inga

Grant ALTER privilege on server parquet_s3_server for role payrollDept:

REVOKE ALTER ON SERVER parquet_s3_server FROM payrollDept;

Grant USAGE and ALTER privileges on server parquet_s3_server for role payrollDept and user marvin:

REVOKE USAGE, ALTER ON SERVER parquet_s3_server FROM payrollDept, marvin;

GRANT ON DASHBOARD

Define the valid privileges a role or user has for working with dashboards. You can specify any combination of privileges or specify all privileges.

This clause requires superuser privileges.

Synopsis

GRANT <privilegeList> [ON DASHBOARD <dashboardId>] TO <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

ALL

Grant all possible access privileges on <dashboardId> to <entityList>.

CREATE

Grant CREATE privilege to <entityList>.

DELETE

Grant DELETE privilege on <dashboardId> to <entityList>.

EDIT

Grant EDIT privilege on <dashboardId> to <entityList>.

VIEW

Grant VIEW privilege on <dashboardId> to <entityList>.

<dashboardId>

ID of the dashboard, which must exist, created by CREATE DASHBOARD. To show a list of all dashboards and IDs in heavysql, run the \dash command when logged in as superuser.

<entityList>

Parameter Value

Descriptions

role

Name of role, which must exist.

user

Name of user, which must exist. See .

Examples

Permit all privileges on the dashboard ID 740 for the payrollDept role.

GRANT ALL ON DASHBOARD 740 TO payrollDept;

Permit VIEW-only privilege on dashboard 730 for the hrDept role and user dennis.

GRANT VIEW ON DASHBOARD 730 TO hrDept, dennis;

Permit EDIT and DELETE privileges on dashboard 740 for the hrDept and accountsPayableDept roles and for user pavan.

GRANT EDIT, DELETE ON DASHBOARD 740 TO hrdept, accountsPayableDept, pavan;

REVOKE ON DASHBOARD

Remove privileges a role or user has for working with dashboards. You can specify any combination of privileges, or all privileges.

This clause requires superuser privileges.

Synopsis

REVOKE <privilegeList> [ON DASHBOARD <dashboardId>] FROM <entityList>;

Parameters

<privilegeList>

Parameter Value

Descriptions

ALL

Revoke all possible access privileges on <dashboardId> for <entityList>.

CREATE

Revoke CREATE privilege for <entityList>.

DELETE

Revoke DELETE privilege on <dashboardId> for <entityList>.

EDIT

Revoke EDIT privilege on <dashboardId> for <entityList>.

VIEW

Revoke VIEW privilege on <dashboardId> for <entityList>.

<dashboardId>

ID of the dashboard, which must exist, created by CREATE DASHBOARD.

<entityList>

Parameter Value

Descriptions

role

Name of role, which must exist.

user

Name of user, which must exist. See .

Revoke DELETE privileges on dashboard 740 for the payrollDept role.

REVOKE DELETE ON DASHBOARD 740 FROM payrollDept;

Revoke all privileges on dashboard 730 for hrDept role and users dennis and mike.

REVOKE ALL ON DASHBOARD 730 FROM hrDept, dennis, mike;

Revoke EDIT and DELETE of dashboard 740 for the hrDept and accountsPayableDept roles and for users dante and jonathan.

REVOKE EDIT, DELETE ON DASHBOARD 740 FROM hrdept, accountsPayableDept, dante, jonathan;

Common Privilege Levels for Non-Superusers

The following privilege levels are typically recommended for non-superusers in Immerse. Privileges assigned for users in your organization may vary depending on access requirements.

Privilege

Command Syntax to Grant Privilege

Access a database

GRANT ACCESS ON DATABASE <dbName> TO <entityList>;

Create a table

GRANT CREATE TABLE ON DATABASE <dbName> TO <entityList>;

Select a table

GRANT SELECT ON TABLE <tableName> TO <entityList>;

View a dashboard

GRANT VIEW ON DASHBOARD <dashboardId> TO <entityList>;

Create a dashboard

GRANT CREATE DASHBOARD ON DATABASE <dbName> TO <entityList>;

Edit a dashboard

GRANT EDIT ON DASHBOARD TO ;

Delete a dashboard

GRANT DELETE DASHBOARD ON DATABASE <dbName> TO <entityList>;

Example: Roles and Privileges

These examples assume that tables table1 through table4 are created as needed:

create table table1 (id smallint);
create table table2 (id smallint);
create table table3 (id smallint);
create table table4 (id smallint);

The following examples show how to work with users, roles, tables, and dashboards.

Create User Accounts

create user marketingDeptEmployee1 (password = 'md1');
create user marketingDeptEmployee2 (password = 'md2');
create user marketingDeptManagerEmployee3 (password = 'md3');

create user salesDeptEmployee1 (password = 'sd1');
create user salesDeptEmployee2 (password = 'sd2');
create user salesDeptEmployee3 (password = 'sd3');
create user salesDeptEmployee4 (password = 'sd4');
create user salesDeptManagerEmployee5 (password = 'sd5');

Grant Access to Users on Database

grant access on database heavyai to marketingDeptEmployee1, marketingDeptEmployee2, marketingDeptManagerEmployee3;
grant access on database heavyai to salesDeptEmployee1, salesDeptEmployee2, salesDeptEmployee3, salesDeptEmployee4, salesDeptManagerEmployee5;

Create Marketing Department Roles

create role marketingDeptRole1;
create role marketingDeptRole2;

Grant Marketing Department Roles to Marketing Department Employees

grant marketingDeptRole1 to marketingDeptEmployee1, marketingDeptManagerEmployee3;
grant marketingDeptRole2 to marketingDeptEmployee2, marketingDeptManagerEmployee3;

Grant Privelege to Marketing Department Roles

grant select on table table1 to marketingDeptRole1;
grant select on table table2 to marketingDeptRole1;
grant select on table table2 to marketingDeptRole2;

Create Sales Department Roles

create role salesDeptRole1;
create role salesDeptRole2;
create role salesDeptRole3;

Grant Sales Department Roles to Sales Department Employees

grant salesDeptRole1 to salesDeptEmployee1;
grant salesDeptRole2 to salesDeptEmployee2, salesDeptEmployee3;
grant salesDeptRole3 to salesDeptEmployee4;

Grant Privilege to Sales Department Roles

grant select on table table1 to salesDeptRole1;
grant select on table table3 to salesDeptRole1, salesDeptRole2;
grant select on table table4 to salesDeptRole3;

Grant All Sales Roles to Sales Department Manager and Marketing Department Manager

grant salesDeptRole1, salesDeptRole2, salesDeptRole3 to salesDeptManagerEmployee5, marketingDeptManagerEmployee3;

Grant View on Dashboards

Use the \dash command to list all dashboards and their unique IDs in HEAVY.AI:

heavysql> \dash 
Dashboard ID | Dashboard Name    | Owner 
1            | Marketing_Summary | heavyai

Here, the Marketing_Summary dashboard uses table2 as a data source. The role marketingDeptRole2 has select privileges on that table. Grant view access on the Marketing_Summary dashboard to marketingDeptRole2:

grant view on dashboard 1 to marketingDeptRole2;

Relationships Between Users, Roles, and Tables

The following table shows the roles and privileges for each user created in the previous example.

User

Roles Granted

Table Privileges

salesDeptEmployee1

salesDeptRole1

SELECT on Tables 1, 3

salesDeptEmployee2

salesDeptRole2

SELECT on Table 3

salesDeptEmployee3

salesDeptRole2

SELECT on Table 3

salesDeptEmployee4

salesDeptRole3

SELECT on Table 4

salesDeptManagerEmployee5

salesDeptRole1, salesDeptRole2, salesDeptRole3

SELECT on Tables 1, 3, 4

marketingDeptEmployee1

marketingDeptRole1

SELECT on Tables 1, 2

marketingDeptEmployee2

marketingDeptRole2

SELECT on Table 2

marketingDeptManagerEmployee3

marketingDeptRole1, marketingDeptRole2, salesDeptRole1, salesDeptRole2, salesDeptRole3

SELECT on Tables 1, 2, 3, 4

Commands to Report Roles and Privileges

Use the following commands to list current roles and assigned privileges. If you have superuser access, you can see privileges for all users. Otherwise, you can see only those roles and privileges for which you have access.

Results for users, roles, privileges, and object privileges are returned in creation order.

\dash

Lists all dashboards and dashboard IDs in HEAVY.AI. Requires superuser privileges. Dashboard privileges are assigned by dashboard ID because dashboard names may not be unique.

Example

heavysql> \dash database heavyai 
Dashboard ID | Dashboard Name    | Owner 
1            | Marketing_Summary | heavyai

heavysql> \dash database heavyai Dashboard ID | Dashboard Name | Owner 1 | Marketing_Summary | heavyai

\object_privileges objectType `_objectName`_

Reports all privileges granted to the specified object for all roles and users. If the specified objectName does not exist, no results are reported. Used for databases and tables only.

Example

heavysql> \object_privileges database heavyai 
marketingDeptEmployee1 privileges: login-access 
marketingDeptEmployee2 privileges: login-access marketingDeptManagerEmployee3 privileges: login-access
salesDeptEmployee1 privileges: login-access 
salesDeptEmployee2 privileges: login-access 
salesDeptEmployee3 privileges: login-access 
salesDeptEmployee4 privileges: login-access 
salesDeptManagerEmployee5 privileges: login-access

\privileges roleName | userName

Reports all object privileges granted to the specified role or user. The roleName or userName specified must exist.

Example

heavysql> \privileges salesDeptRole1 
table1 (table): select 
table3 (table): select
heavysql> \privileges salesDeptManagerEmployee5 
mapd (database): login-access

heavysql> \privileges marketingdeptrole2 
table2 (table): select
Marketing_Summary (dashboard): view

\role_list userName

Reports all roles granted to the given user. The userName specified must exist.

Example

heavysql> \role_list salesDeptManagerEmployee5
salesDeptRole3 
salesDeptRole2 
salesDeptRole1

\roles

Reports all roles.

Example

heavysql> \roles
marketingDeptRole1 
marketingDeptRole2 
salesDeptRole1 
salesDeptRole2 
salesDeptRole3

\u

Lists all users.

Example

heavysql> \u 
heavyai 
marketingDeptEmployee1 
marketingDeptEmployee2 
salesDeptEmployee1 
salesDeptEmployee2 
salesDeptEmployee3 
salesDeptEmployee4 
salesDeptManagerEmployee5 
marketingDeptManagerEmployee3

Example: Data Security

The following example demonstrates field-level security using two views:

view_users_limited, in which users only see three of seven fields: userid, First_Name, and Department.
view_users_full, users see all seven fields.

Source Data

Create Views

create view view_users_limited as select userid, First_Name, Department from users;
create view view_users_full as select userid, First_Name, Department, Address, City, State, Zip from users;

Create Users

create user readonly1 (password = 'rr1');
create user readonly2 (password = 'rr2');

Grant Access to Users on Database

grant access on database heavyai to readonly1, readonly2;

Create Roles

create role limited_readonly;
create role full_readonly;

Grant Roles to Users

grant limited_readonly to readonly1;
grant full_readonly to readonly2;

Grant Privilege to View Roles

grant select on view view_users_limited to limited_readonly;
grant select on view view_users_full TO full_readonly;

Verify Views

User readonly1 sees no tables, only the specific view granted, and only the three specific columns returned in the view:

heavysql> \t
heavysql> \v
view_users_limited
heavysql> select * from view_users_limited;
userid|First_Name|Department
1|Todd|C Suite
2|Don|Sales
3|Mike|Customer Success

User readonly2 sees no tables, only the specific view granted, and all seven columns returned in the view:

heavysql> \t
heavysql> \v
view_users_full
heavysql> select * from view_users_full;
userid|First_Name|Department|Address|City|State|Zip
1|Todd|C Suite|1 Front Street|San Francisco|CA|94111
2|Don|Sales|1 5th Avenue|New York|NY|10001
3|Mike|Customer Succes|100 Main Street|Reston|VA|20191

Loading Data with SQL

This topic describes several ways to load data to HEAVY.AI using SQL commands.

If there is a potential for duplicate entries, and you want to avoid loading duplicate rows, see How can I avoid creating duplicate rows? on the Troubleshooting page.
If a source file uses a reserved word, HEAVY.AI automatically adds an underscore at the end of the reserved word. For example, year is converted to year_.

COPY FROM

CSV/TSV Import

Use the following syntax for CSV and TSV files:

COPY <table> FROM '<file pattern>' [WITH (<property> = value, ...)];

<file pattern> must be local on the server. The file pattern can contain wildcards if you want to load multiple files. In addition to CSV, TSV, and TXT files, you can import compressed files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ format.

COPY FROM appends data from the source into the target table. It does not truncate the table or overwrite existing data.

You can import client-side files (\copy command in heavysql) but it is significantly slower. For large files, HEAVY.AI recommends that you first scp the file to the server, and then issue the COPY command.

HEAVYAI supports Latin-1 ASCII format and UTF-8. If you want to load data with another encoding (for example, UTF-16), convert the data to UTF-8 before loading it to HEAVY.AI.

Available properties in the optional WITH clause are described in the following table.

Parameter

Description

Default Value

array_delimiter

A single-character string for the delimiter between input values contained within an array.

, (comma)

array_marker

A two-character string consisting of the start and end characters surrounding an array.

{ }(curly brackets). For example, data to be inserted into a table with a string array in the second column (for example, BOOLEAN, STRING[], INTEGER) can be written as true,{value1,value2,value3},3

buffer_size

Size of the input file buffer, in bytes.

8388608

delimiter

A single-character string for the delimiter between input fields; most commonly:

, for CSV files
\t for tab-delimited files

Other delimiters include | ,~, ^, and;.

Note: OmniSci does not use file extensions to determine the delimiter.

',' (CSV file)

escape

A single-character string for escaping quotes.

'"' (double quote)

geo

Import geo data. Deprecated and scheduled for removal in a future release.

'false'

header

Either 'true' or 'false', indicating whether the input file has a header line in Line 1 that should be skipped.

'true'

line_delimiter

A single-character string for terminating each line.

'\n'

lonlat

In OmniSci, POINT fields require longitude before latitude. Use this parameter based on the order of longitude and latitude in your source data.

'true'

max_reject

Number of records that the COPY statement allows to be rejected before terminating the COPY command. Records can be rejected for a number of reasons, including invalid content in a field, or an incorrect number of columns. The details of the rejected records are reported in the ERROR log. COPY returns a message identifying how many records are rejected. The records that are not rejected are inserted into the table, even if the COPY stops because the max_reject count is reached.

Note: If you run the COPY command from OmniSci Immerse, the COPY command does not return messages to Immerse once the SQL is verified. Immerse does not show messages about data loading, or about data-quality issues that result in max_reject triggers.

100,000

nulls

A string pattern indicating that a field is NULL.

An empty string, 'NA', or \N

parquet

Import data in Parquet format. Parquet files can be compressed using Snappy. Other archives such as .gz or .zip must be unarchived before you import the data. Deprecated and scheduled for removal in a future release.

'false'

plain_text

Indicates that the input file is plain text so that it bypasses the libarchive decompression utility.

CSV, TSV, and TXT are handled as plain text.

quote

A single-character string for quoting a field.

" (double quote). All characters inside quotes are imported “as is,” except for line delimiters.

quoted

Either 'true' or 'false', indicating whether the input file contains quoted fields.

'true'

source_srid

When importing into GEOMETRY(*, 4326) columns, specifies the SRID of the incoming geometries, all of which are transformed on the fly. For example, to import from a file that contains EPSG:2263 (NAD83 / New York Long Island) geometries, run the COPY command and include WITH (source_srid=2263). Data targeted at non-4326 geometry columns is not affected.

source_type='<type>'

Type can be one of the following:

delimited_file - Import as CSV.

geo_file - Import as Geo file. Use for shapefiles, GeoJSON, and other geo files. Equivalent to deprecated geo='true'.

raster_file - Import as a raster file.

parquet_file - Import as a Parquet file. Equivalent to deprecated parquet='true'.

delimited_file

threads

Number of threads for performing the data import.

Number of CPU cores on the system

trim_spaces

Indicate whether to trim side spaces ('true') or not ('false').

'false'

By default, the CSV parser assumes one row per line. To import a file with multiple lines in a single field, specify threads = 1 in the WITH clause.

Examples

COPY tweets FROM '/tmp/tweets.csv' WITH (nulls = 'NA'); 
COPY tweets FROM '/tmp/tweets.tsv' WITH (delimiter = '\t', quoted = 'false'); 
COPY tweets FROM '/tmp/*' WITH (header='false'); 
COPY trips FROM '/mnt/trip/trip.parquet/part-00000-0284f745-1595-4743-b5c4-3aa0262e4de3-c000.snappy.parquet' with (parquet='true');

Geo Import

You can use COPY FROM to import geo files. You can create the table based on the source file and then load the data:

COPY FROM 'source' WITH (source_type='geo_file', ...);

You can also append data to an existing, predefined table:

COPY tableName FROM 'source' WITH (source_type='geo_file', ...);

Use the following syntax, depending on the file source.

Local server

COPY [tableName] FROM '/filepath' WITH (source_type='geo_file', ...);

Web site

COPY [tableName] FROM '[http _https_]://_website/filepath_' WITH (source_type='geo_file', ...);

Amazon S3

COPY [tableName] FROM 's3://bucket/filepath' WITH (source_type='geo_file', s3_region='region', s3_access_key='accesskey', s3_secret_key='secretkey', ... );

If you are using COPY FROM to load to an existing table, the field type must match the metadata of the source file. If it does not, COPY FROM throws an error and does not load the data.
COPY FROM appends data from the source into the target table. It does not truncate the table or overwrite existing data.
Supported DATE formats when using COPY FROM include mm/dd/yyyy, dd-mmm-yy, yyyy-mm-dd, and dd/mmm/yyyy.
COPY FROM fails for records with latitude or longitude values that have more than 4 decimal places.

The following WITH options are available for geo file imports from all sources.

geo_assign_render_groups

Enable or disable automatic render group assignment for polygon imports; can be true or false. If polygons are not needed for rendering, set this to false to speed up import.

true

geo_coords_type

Coordinate type used; must be geography.

N/A

geo_coords_encoding

Coordinates encoding; can be geoint(32) or none.

geoint(32)

geo_coords_srid

Coordinates spatial reference; must be 4326 (WGS84 longitude/latitude).

N/A

geo_explode_collections

Explodes MULTIPOLYGON, MULTILINESTRING, or MULTIPOINT geo data into multiple rows in a POLYGON, LINESTRING, or POINT column, with all other columns duplicated.

When importing from a WKT CSV with a MULTIPOLYGON column, the table must have been manually created with a POLYGON column.

When importing from a geo file, the table is automatically created with the correct type of column.

When the input column contains a mixture of MULTI and single geo, the MULTI geo are exploded, but the singles are imported normally. For example, a column containing five two-polygon MULTIPOLYGON rows and five POLYGON rows imports as a POLYGON column of fifteen rows.

false

Currently, a manually created geo table can have only one geo column. If it has more than one, import is not performed.

Any GDAL-supported file type can be imported. If it is not supported, GDAL throws an error.

An ESRI file geodatabase can have multiple layers, and importing it results in the creation of one table for each layer in the file. This behavior differs from that of importing shapefiles, GeoJSON, or KML files, which results in a single table. For more information, see Importing an ESRI File Geodatabase.

The first compatible file in the bundle is loaded; subfolders are traversed until a compatible file is found. The rest of the contents in the bundle are ignored. If the bundle contains multiple filesets, unpack the file manually and specify it for import.

For more information about importing specific geo file formats, see Importing Geospatial Files.

CSV files containing WKT strings are not considered geo files and should not be parsed with the source_type='geo' option. When importing WKT strings from CSV files, you must create the table first. The geo column type and encoding are specified as part of the DDL. For example, for a polygon with no encoding, try the following:

ggpoly GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32)

Raster Import

You can use COPY FROM to import raster files supported by GDAL as one row per pixel, where a pixel may consist of one or more data bands, with optional corresponding pixel or world-space coordinate columns. This allows the data to be rendered as a point/symbol cloud that approximates a 2D image.

COPY FROM 'source' WITH (source_type='raster_file', ...);

Use the same syntax that you would for geo files, depending on the file source.

The following WITH options are available for raster file imports from all sources.

Parameter

Description

Default Value

raster_import_bands='<bandname>[,<bandname>,...]'

Allows specification of one or more band names to selectively import; useful in the context of large raster files where not all the bands are relevant. Bands are imported in the order provided, regardless of order in the file. You can rename bands using <bandname>=<newname>[,<bandname>=<newname,...>]Names must be those discovered by the , including any suffixes for de-duplication.

An empty string, indicating to import all bands from all datasets found in the file.

raster_point_transform='<transform>'

Specifies the processing for floating-point coordinate values: auto - Transform based on raster file type (world for geo, none for non-geo).

none - No affine or world-space conversion. Values will be equivalent to the integer pixel coordinates.

file - File-space affine transform only. Values will be in the file's coordinate system, if any (e.g. geospatial).

world - World-space geospatial transform. Values will be projected to WGS84 lon/lat (if the file has a geospatial SRID).

auto

raster_point_type='<type>'

Specifies the required type for the additional pixel coordinate columns: auto - Create columns based on raster file type (double for geo, int or smallint for non-geo, dependent on size).

none - Do not create pixel coordinate columns.

smallint or int - Create integer columns of names raster_x and raster_y and fill with the raw pixel coordinates from the file.

float or double - Create floating-point columns of names raster_x and raster_y (or raster_lon and raster_lat) and fill with file-space or world-space projected coordinates.

point - Create a POINT column of name raster_point and fill with file-space or world-space projected coordinates.

auto

Illegal combinations of raster_point_type and raster_point_transform are rejected. For example, world transform can only be performed on raster files that have a geospatial coordinate system in their metadata, and cannot be performed if <type> is an integer format (which cannot represent world-space coordinate values).

Any GDAL-supported file type can be imported. If it is not supported, GDAL throws an error.

HDF5 and possibly other GDAL drivers may not be thread-safe, so use WITH (threads=1) when importing.

Archive file import (.zip, .tar, .tar.gz) is not currently supported for raster files.

Band and Column Names

The following raster file formats contain the metadata required to derive sensible names for the bands, which are then used for their corresponding columns:

GRIB2 - geospatial/meteorological format
OME TIFF - an OpenMicroscopy format

The band names from the file are sanitized (illegal characters and spaces removed) and de-duplicated (addition of a suffix in cases where the same band name is repeated within the file or across datasets).

For other formats, the columns are named band_1_1, band_1_2 , and so on.

The sanitized and de-duplicated names must be used for the raster_import_bands option.

Band and Column Data Types

Raster files can have bands in the following data types:

Signed or unsigned 8-, 16-, or 32-bit integer
32- or 64-bit floating point
Complex number formats (not supported)

Signed data is stored in the directly corresponding column type, as follows:

int8 -> TINYINT int16 -> SMALLINT int32 -> INT float32 -> FLOAT float64 -> DOUBLE

Unsigned integer column types are not currently supported, so any data of those types is converted to the next size up signed column type:

uint8 -> SMALLINT uint16 -> INT uint32 -> BIGINT

Column types cannot currently be overridden.

ODBC Import

ODBC import is currently a beta feature.

You can use COPY FROM to import data from a Relational Database Management System (RDMS) or data warehouse using the Open Database Connectivity (ODBC) interface.

COPY <table_name> FROM '<select_query>' WITH (source_type = 'odbc', ...);

The following WITH options are available for ODBC import.

data_source_name

Data source name (DSN) configured in the odbc.ini file. Only one of data_source_name or connection_string can be specified.

connection_string

A set of semicolon-separated key=value pairs that define the connection parameters for an RDMS. For example: Driver=DriverName;Database=DatabaseName;Servername=HostName;Port=1234

Only one of data_source_name or connection_string can be specified.

sql_order_by

Comma-separated list of column names that provide a unique ordering for the result set returned by the specified SQL SELECT statement.

username

Username on the RDMS. Applies only when data_source_name is used.

password

Password credential for the RDMS. This option only applies when data_source_name is used.

credential_string

A set of semicolon separated “key=value” pairs, which define the access credential parameters for an RDMS. For example:

Username=username;Password=password

Applies only when connection_string is used.

Examples

Using a data source name:

COPY example_table
  FROM 'SELECT * FROM remote_postgres_table WHERE event_timestamp > ''2020-01-01'';'
  WITH 
    (source_type = 'odbc', 
     sql_order_by = 'event_timestamp',
     data_source_name = 'postgres_db_1',
     username = 'my_username',
     password = 'my_password');

Using a connection string:

COPY example_table
  FROM 'SELECT * FROM remote_postgres_table WHERE event_timestamp > ''2020-01-01'';'
  WITH 
    (source_type = 'odbc',
     sql_order_by = 'event_timestamp',
     connection_string = 'Driver=PostgreSQL;Database=my_postgres_db;Servername=my_postgres.example.com;Port=1234',
     credential_string = 'Username=my_username;Password=my_password');

For information about using ODBC HeavyConnect, see ODBC Data Wrapper Reference.

Globbing, Filtering, and Sorting Parquet and CSV Files

These examples assume the following folder and file structure:

Globbing

Local Parquet/CSV files can now be globbed by specifying either a path name with a wildcard or a folder name.

Globbing a folder recursively returns all files under the specified folder. For example,

COPY table_1 FROM ".../subdir";

returns file_3, file_4, file_5.

Globbing with a wildcard returns any file paths matching the expanded file path. So

COPY table_1 FROM ".../subdir/file*"; returns file_3, file_4.

Does not apply to S3 cases, because file paths specified for S3 always use prefix matching.

Filtering

Use file filtering to filter out unwanted files that have been globbed. To use filtering, specify the REGEX_PATH_FILTER option. Files not matching this pattern are not included on import. Consistent across local and S3 use cases.

The following regex expression:

COPY table_1 from ".../" WITH (REGEX_PATH_FILTER=".*file_[4-5]");

returns file_4, file_5.

Sorting

Use the FILE_SORT_ORDER_BY option to specify the order in which files are imported.

FILE_SORT_ORDER_BY Options

pathname (default)
date_modified
regex *
regex_date *
regex_number *

*FILE_SORT_REGEX option required

Using FILE_SORT_ORDER_BY

COPY table_1 from ".../" WITH (FILE_SORT_ORDER_BY="date_modified");

Using FILE_SORT_ORDER_BY with FILE_SORT_REGEX

Regex sort keys are formed by the concatenation of all capture groups from the FILE_SORT_REGEX expression. Regex sort keys are strings but can be converted to dates or FLOAT64 with the appropriate FILE_SORT_ORDER_BY option. File paths that do not match the provided capture groups or that cannot be converted to the appropriate date or FLOAT64 are treated as NULLs and sorted to the front in a deterministic order.

Multiple Capture Groups:

FILE_SORT_REGEX=".*/data_(.*)_(.*)_" /root/dir/unmatchedFile → <NULL> /root/dir/data_andrew_54321_ → andrew54321 /root/dir2/data_brian_Josef_ → brianJosef

Dates:

FILE_SORT_REGEX=".*data_(.*) /root/data_222 → <NULL> (invalid date conversion) /root/data_2020-12-31 → 2020-12-31 /root/dir/data_2021-01-01 → 2021-01-01

Import:

COPY table_1 from ".../" WITH (FILE_SORT_ORDER_BY="regex", FILE_SORT_REGEX=".*file_(.)");

Geo and Raster File Globbing

Limited filename globbing is supported for both geo and raster import. For example, to import a sequence of same-format GeoTIFF files into a single table, you can run the following:

COPY table FROM '/path/path/something_*.tiff' WITH (source_type='raster_file')

The files are imported in alphanumeric sort order, per regular glob rules, and all appended to the same table. This may fail if the files are not all of the same format (band count, names, and types).

For non-geo/raster files (CSV and Parquet), you can provide just the path to the directory OR a wildcard; for example:

/path/to/directory/ /path/to/directory /path/to/directory/*

For geo/raster files, a wildcard is required, as shown in the last example.

SQLImporter

SQLImporter is a Java utility run at the command line. It runs a SELECT statement on another database through JDBC and loads the result set into HeavyDB.

Usage

java -cp [HEAVY.AI utility jar file]:[3rd party JDBC driver]
SQLImporter
-u <userid>; -p <password>; [(--binary|--http|--https [--insecure])]
-s <heavyai server host> -db <omnsci db> --port <heavyai server port>
[-d <other database JDBC drive class>] -c <other database JDBC connection string>
-su <other database user> -sp <other database user password> -ss <other database sql statement>
-t <HEAVY.AI target table> -b <transfer buffer size> -f <table fragment size>
[-tr] [-nprg] [-adtf] -i <init commands file>

Flags

-r                             Row load limit 
-h,--help                      Help message
-r <arg>;                      Row load limit 
-h,--help                      Help message 
-u,--user <arg>;               HEAVY.AI user 
-p,--passwd <arg>;             HEAVY.AI password 
--binary                       Use binary transport to connect to HEAVY.AI 
--http                         Use http transport to connect to HEAVY.AI 
--https                        Use https transport to connect to HEAVY.AI 
-s,--server <arg>;             HEAVY.AI Server 
-db,--database <arg>;          HEAVY.AI Database 
--port <arg>;                  HEAVY.AI Port 
--ca-trust-store <arg>;        CA certificate trust store 
--ca-trust-store-passwd <arg>; CA certificate trust store password 
--insecure <arg>;              Insecure TLS - Do not validate server HEAVY.AI server certificates 
-d,--driver <arg>;             JDBC driver class 
-c,--jdbcConnect <arg>;        JDBC connection string 
-su,--sourceUser <arg>;        Source user 
-sp,--sourcePasswd <arg>;      Source password 
-ss,--sqlStmt <arg>;           SQL Select statement 
-t,--targetTable <arg>;        HEAVY.AI Target Table 
-b,--bufferSize <arg>;         Transfer buffer size 
-f,--fragmentSize <arg>;       Table fragment size 
-tr,--truncate                 Truncate table if it exists 
-nprg,--noPolyRenderGroups     Disable render group assignment  
-adtf,--allowDoubleToFloat     Allow narrow casting  
-i,--initializeFile <arg>;     File containing init command for DB

HEAVY.AI recommends that you use a service account with read-only permissions when accessing data from a remote database.

In release 4.6 and higher, the user ID (-u) and password (-p) flags are required. If your password includes a special character, you must escape the character using a backslash (\).

If the table does not exist in HeavyDB, SQLImporter creates it. If the target table in HeavyDB does not match the SELECT statement metadata, SQLImporter fails.

If the truncate flag is used, SQLImporter truncates the table in HeavyDB before transferring the data. If the truncate flag is not used, SQLImporter appends the results of the SQL statement to the target table in HeavyDB.

The -i argument provides a path to an initialization file. Each line of the file is sent as a SQL statement to the remote database. You can use -i to set additional custom parameters before the data is loaded.

The SQLImporter string is case-sensitive. Incorrect case returns the following:

Error: Could not find or load main class com.mapd.utility.SQLimporter

PostgreSQL/PostGIS Support

You can migrate geo data types from a PostgreSQL database. The following table shows the correlation between PostgreSQL/PostGIS geo types and HEAVY.AI geo types.

point

lseg

linestring

polygon

multipolygon

Other PostgreSQL types, including circle, box, and path, are not supported.

HeavyDB Example

java -cp /opt/heavyai/bin/heavyai-utility-<db-version}>.jar 
com.mapd.utility.SQLImporter -u admin -p HyperInteractive -db heavyai --port 6274 
-t mytable -su admin -sp HyperInteractive -c "jdbc:heavyai:myhost:6274:heavyai" 
-ss "select * from mytable limit 1000000000"

By default, 100,000 records are selected from HeavyDB. To select a larger number of records, use the LIMIT statement.

Hive Example

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/hive-jdbc-1.2.1000.2.6.1.0-129-standalone.jar
com.mapd.utility.SQLImporter
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password
-c "jdbc:hive2://server_address:port_number/database_name"
-ss "select * from source_table_name"

Google Big Query Example

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:./GoogleBigQueryJDBC42.jar:
./google-oauth-client-1.22.0.jar:./google-http-client-jackson2-1.22.0.jar:./google-http-client-1.22.0.jar:./google-api-client-1.22.0.jar:
./google-api-services-bigquery-v2-rev355-1.22.0.jar 
com.mapd.utility.SQLImporter
-d com.simba.googlebigquery.jdbc42.Driver 
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password 
-c "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=project-id;OAuthType=0;
[email protected];OAuthPvtKeyPath=/home/simba/myproject.json;"
-ss "select * from schema.source_table_name"

PostgreSQL Example

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/tmp/postgresql-42.2.5.jar 
com.mapd.utility.SQLImporter 
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password 
-c "jdbc:postgresql://127.0.0.1/postgres"
-ss "select * from schema_name.source_table_name"

SQLServer Example

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:/path/sqljdbc4.jar
com.mapd.utility.SQLImporter
-d com.microsoft.sqlserver.jdbc.SQLServerDriver 
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password 
-c "jdbc:sqlserver://server:port;DatabaseName=database_name"
-ss "select top 10 * from dbo.source_table_name"

MySQL Example

java -cp /opt/heavyai/bin/heavyai-utility-<db-version>.jar:mysql/mysql-connector-java-5.1.38-bin.jar
com.mapd.utility.SQLImporter 
-u user -p password
-db Heavyai_database_name --port 6274 -t Heavyai_table_name
-su source_user -sp source_password 
-c "jdbc:mysql://server:port/database_name"
-ss "select * from schema_name.source_table_name"

StreamInsert

Stream data into HeavyDB by attaching the StreamInsert program to the end of a data stream. The data stream can be another program printing to standard out, a Kafka endpoint, or any other real-time stream output. You can specify the appropriate batch size, according to the expected stream rates and your insert frequency. The target table must exist before you attempt to stream data into the table.

<data stream> | StreamInsert <table name> <database name> \
{-u|--user} <user> {-p|--passwd} <password> [{--host} <hostname>] \
[--port <port number>][--delim <delimiter>][--null <null string>] \
[--line <line delimiter>][--batch <batch size>][{-t|--transform} \
transformation ...][--retry_count <num_of_retries>] \
[--retry_wait <wait in secs>][--print_error][--print_transform]

Setting

Default

Description

<table_name>

n/a

Name of the target table in OmniSci

<database_name>

n/a

Name of the target database in OmniSci

-u

n/a

User name

-p

n/a

User password

--host

n/a

Name of OmniSci host

--delim

comma (,)

Field delimiter, in single quotes

--line

newline (\n)

Line delimiter, in single quotes

--batch

10000

Number of records in a batch

--retry_count

Number of attempts before job fails

--retry_wait

Wait time in seconds after server connection failure

--null

n/a

String that represents null values

--port

6274

Port number for OmniSciDB on localhost

`-t

--transform`

n/a

Regex transformation

--print_error

False

Print error messages

--print_transform

False

Print description of transform.

--help

n/a

List options

For more information on creating regex transformation statements, see RegEx Replace.

Example

cat file.tsv | /path/to/heavyai/SampleCode/StreamInsert stream_example \
heavyai --host localhost --port 6274 -u imauser -p imapassword \
--delim '\t' --batch 1000

Importing AWS S3 Files

You can use the SQL COPY FROM statement to import files stored on Amazon Web Services Simple Storage Service (AWS S3) into an HEAVY.AI table, in much the same way you would with local files. In the WITH clause, specify the S3 credentials and region information of the bucket accessed.

COPY <table> FROM '<S3_file_URL>' WITH ([[s3_access_key = '<key_name>',s3_secret_key = '<key_secret>',] | [s3_session_token - '<AWS_session_token']] s3_region = '<region>');

Access key and secret key, or session token if using temporary credentials, and region are required. For information about AWS S3 credentials, see https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys.

HEAVY.AI does not support the use of asterisks (*) in URL strings to import items. To import multiple files, pass in an S3 path instead of a file name, and COPY FROM imports all items in that path and any subpath.

Custom S3 Endpoints

HEAVY.AI supports custom S3 endpoints, which allows you to import data from S3-compatible services, such as Google Cloud Storage.

To use custom S3 endpoints, add s3_endpoint to the WITH clause of a COPY FROM statement; for example, to set the S3 endpoint to point to Google Cloud Services:

COPY trips FROM 's3://heavyai-importtest-data/trip-data/trip_data_9.gz' WITH (header='true', s3_endpoint='storage.googleapis.com');

For information about interoperability and setup for Google Cloud Services, see Cloud Storage Interoperability.

You can also configure custom S3 endpoints by passing the s3_endpoint field to Thrift import_table.

Examples

The following examples show failed and successful attempts to copy the table trips from AWS S3.

heavysql> COPY trips FROM 's3://heavyai-s3-no-access/trip_data_9.gz';
Exception: failed to list objects of s3 url 's3://heavyai-s3-no-access/trip_data_9.gz': AccessDenied: Access Denied

heavysql> COPY trips FROM 's3://heavyai-s3-no-access/trip_data_9.gz' with (s3_access_key='xxxxxxxxxx',s3_secret_key='yyyyyyyyy');
Exception: failed to list objects of s3 url 's3://heavyai-s3-no-access/trip_data_9.gz': AuthorizationHeaderMalformed: Unable to parse ExceptionName: AuthorizationHeaderMalformed Message: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-1'

heavysql> COPY trips FROM 's3://heavyai-testdata/trip.compressed/trip_data_9.csv' with (s3_access_key='xxxxxxxx',s3_secret_key='yyyyyyyy',s3_region='us-west-1');
Result
Loaded: 100 recs, Rejected: 0 recs in 0.361000 secs

The following example imports all the files in the trip.compressed directory.

heavysql> copy trips from 's3://heavyai-testdata/trip.compressed/' with (s3_access_key='xxxxxxxx',s3_secret_key='yyyyyyyy',s3_region='us-west-1');
Result
Loaded: 105200 recs, Rejected: 0 recs in 1.890000 secs

trips Table

The table trips is created with the following statement:

heavysql> \d trips
        CREATE TABLE trips (
        medallion TEXT ENCODING DICT(32),
        hack_license TEXT ENCODING DICT(32),
        vendor_id TEXT ENCODING DICT(32),
        rate_code_id SMALLINT,
        store_and_fwd_flag TEXT ENCODING DICT(32),
        pickup_datetime TIMESTAMP,
        dropoff_datetime TIMESTAMP,
        passenger_count SMALLINT,
        trip_time_in_secs INTEGER,
        trip_distance DECIMAL(14,2),
        pickup_longitude DECIMAL(14,2),
        pickup_latitude DECIMAL(14,2),
        dropoff_longitude DECIMAL(14,2),
        dropoff_latitude DECIMAL(14,2))
WITH (FRAGMENT_SIZE = 75000000);

Using Server Privileges to Access AWS S3

You can configure HEAVY.AI server to provide AWS credentials, which allows S3 Queries to be run without specifying AWS credentials. S3 Regions are not configured by the server, and will need to be passed in either as a client side environment variable or as an option with the request.

Example Commands

\detect: $ export AWS_REGION=us-west-1 heavysql > \detect <s3-bucket-uri
import_table: $ ./Heavyai-remote -h localhost:6274 import_table "'<session-id>'" "<table-name>" '<s3-bucket-uri>' 'TCopyParams(s3_region="'us-west-1'")'
COPY FROM: heavysql > COPY <table-name> FROM <s3-bucket-uri> WITH(s3_region='us-west-1');

Configuring AWS Credentials

Enable server privileges in the server configuration file heavy.conf allow-s3-server-privileges = true
For bare metal installations set the following environment variables and restart the HeavyDB service: AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=xxx AWS_SESSION_TOKEN=xxx (required only for AWS STS credentials)
For HeavyDB docker images, start a new container mounted with the configuration file using the option: -v <dirname-containing-heavy.conf>:/var/lib/heavyaiand set the following environment options: -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=xxx -e AWS_SESSION_TOKEN=xxx (required only for AWS STS credentials)

Enable server privileges in the server configuration file heavy.conf allow-s3-server-privileges = true
For bare metal installations Specify a shared AWS credentials file and profile with the following environment variables and restart the HeavyDB service. AWS_SHARED_CREDENTIALS_FILE=~/.aws/credentials AWS_PROFILE=default
For HeavyDB docker images, start a new container mounted with the configuration file and AWS shared credentials file using the following options: -v <dirname-containing-/heavy.conf>:/var/lib/heavyai -v <dirname-containing-/credentials>:/<container-credential-path>and set the following environment options: -e AWS_SHARED_CREDENTIALS_FILE=<container-credential-path> -e AWS_PROFILE=<active-profile>

Prerequisites

An IAM Policy that has sufficient access to the S3 bucket.
An IAM AWS Service Role of type Amazon EC2 , which is assigned the IAM Policy from (1).

Setting Up an EC2 Instance with Roles

For a new EC2 Instance:

AWS Management Console > Services > Compute > EC2 > Launch Instance.
Select desired Amazon Machine Image (AMI) > Select.
Select desired Instance Type > Next: Configure Instance Details.
IAM Role > Select desired IAM Role > Review and Launch.
Review other options > Launch.

For an existing EC2 Instance:

AWS Management Console > Services > Compute > EC2 > Instances.
Mark desired instance(s) > Actions > Security > Modify IAM Role.
Select desired IAM Role > Save.
Restart the EC2 Instance.

KafkaImporter

You can ingest data from an existing Kafka producer to an existing table in HEAVY.AI using KafkaImporter on the command line:

KafkaImporter <table_name> <database_name> {-u|--user <user_name> \
{-p|--passwd <user_password>} [{--host} <hostname>] \
[--port <HeavyDB_port>] [--http] [--https] [--skip-verify] \
[--ca-cert <path>] [--delim <delimiter>] [--batch <batch_size>] \
[{-t|--transform} transformation ...] [retry_count <retry_number>] \
[--retry_wait <delay_in_seconds>] --null <null_value_string> [--quoted true|false] \
[--line <line_delimiter>] --brokers=<broker_name:broker_port> \ 
--group-id=<kafka_group_id> --topic=<topic_type> [--print_error] [--print_transform]

KafkaImporter requires a functioning Kafka cluster. See the Kafka website and the Confluent schema registry documentation.

KafkaImporter Options

Setting

Default

Description

<table_name>

n/a

Name of the target table in OmniSci

<database_name>

n/a

Name of the target database in OmniSci

-u <username>

n/a

User name

-p <password>

n/a

User password

--host <hostname>

localhost

Name of OmniSci host

--port <port_number>

6274

Port number for OmniSciDB on localhost

--http

n/a

Use HTTP transport

--https

n/a

Use HTTPS transport

--skip-verify

n/a

Do not verify validity of SSL certificate

--ca-cert <path>

n/a

Path to the trusted server certificate; initiates an encrypted connection

--delim <delimiter>

comma (,)

Field delimiter, in single quotes

--line <delimiter>

newline (\n)

Line delimiter, in single quotes

--batch <batch_size>

10000

Number of records in a batch

--retry_count <retry_number>

Number of attempts before job fails

--retry_wait <seconds>

Wait time in seconds after server connection failure

--null <string>

n/a

String that represents null values

--quoted <boolean>

false

Whether the source contains quoted fields

`-t

--transform`

n/a

Regex transformation

--print_error

false

Print error messages

--print_transform

false

Print description of transform

--help

n/a

List options

--group-id <id>

n/a

Kafka group ID

--topic <topic>

n/a

The Kafka topic to be ingested

--brokers <broker_name:broker_port>

localhost:9092

One or more brokers

KafkaImporter Logging Options

Setting

Default

Description

--log-directory <directory>

mapd_log

Logging directory; can be relative to data directory or absolute

--log-file-name <filename>

n/a

Log filename relative to logging directory; has format KafkaImporter.{SEVERITY}.%Y%m%d-%H%M%S.log

--log-symlink <symlink>

n/a

Symlink to active log; has format KafkaImporter.{SEVERITY}

--log-severity <level>

INFO

Log-to-file severity level: INFO, WARNING, ERROR, or FATAL

--log-severity-clog <level>

ERROR

Log-to-console severity level: INFO, WARNING, ERROR, or FATAL

--log-channels

n/a

Log channel debug info

--log-auto-flush

n/a

Flush logging buffer to file after each message

--log-max-files <files_number>

100

Maximum number of log files to keep

--log-min-free-space <bytes>

20,971,520

Minimum number of bytes available on the device before oldest log files are deleted

--log-rotate-daily

Start new log files at midnight

--log-rotation-size <bytes>

10485760

Maximum file size, in bytes, before new log files are created

Configure KafkaImporter to use your target table. KafkaImporter listens to a pre-defined Kafka topic associated with your table. You must create the table before using the KafkaImporter utility. For example, you might have a table named customer_site_visit_events that listens to a topic named customer_site_visit_events_topic.

The data format must be a record-level format supported by HEAVY.AI.

KafkaImporter listens to the topic, validates records against the target schema, and ingests topic batches of your designated size to the target table. Rejected records use the existing reject reporting mechanism. You can start, shut down, and configure KafkaImporter independent of the HeavyDB engine. If KafkaImporter is running and the database shuts down, KafkaImporter shuts down as well. Reads from the topic are nondestructive.

KafkaImporter is not responsible for event ordering; a streaming platform outside HEAVY.AI (for example, Spark streaming, flink) should handle the stream processing. HEAVY.AI ingests the end-state stream of post-processed events.

KafkaImporter does not handle dynamic schema creation on first ingest, but must be configured with a specific target table (and its schema) as the basis. There is a 1:1 correspondence between target table and topic.

cat tweets.tsv | -./KafkaImporter tweets_small heavyai-u imauser-p imapassword--delim '\t'--batch 100000--retry_count 360--retry_wait 10--null null--port 9999--brokers=localhost:9092--group-id=testImport1--topic=tweet
cat tweets.tsv | ./KafkaImporter tweets_small heavyai
-u imauser
-p imapassword
--delim '\t'
--batch 100000
--retry_count 360
--retry_wait 10
--null null
--port 9999
--brokers=localhost:9092
--group-id=testImport1
--topic=tweet

StreamImporter

StreamImporter is an updated version of the StreamInsert utility used for streaming reads from delimited files into HeavyDB. StreamImporter uses a binary columnar load path, providing improved performance compared to StreamInsert.

You can ingest data from a data stream to an existing table in HEAVY.AI using StreamImporter on the command line.

StreamImporter <table_name> <database_name> {-u|--user <user_name> \
{-p|--passwd <user_password>} [{--host} <hostname>] [--port <HeavyDB_port>] \
[--http] [--https] [--skipverify] [--ca-cert <path>] [--delim <delimiter>] \
[--null <null string>] [--line <line delimiter>]  [--quoted <boolean>] \
 [--batch <batch_size>] [{-t|--transform} transformation ...] \
[retry_count <number_of_retries>] [--retry_wait <delay_in_seconds>]  \
[--print_error] [--print_transform]

StreamImporter Options

Setting

Default

Description

<table_name>

n/a

Name of the target table in OmniSci

<database_name>

n/a

Name of the target database in OmniSci

-u <username>

n/a

User name

-p <password>

n/a

User password

--host <hostname>

n/a

Name of OmniSci host

--port <port>

6274

Port number for OmniSciDB on localhost

--http

n/a

Use HTTP transport

--https

n/a

Use HTTPS transport

--skip-verify

n/a

Do not verify validity of SSL certificate

--ca-cert <path>

n/a

Path to the trusted server certificate; initiates an encrypted connection

--delim <delimiter>

comma (,)

Field delimiter, in single quotes

--null <string>

n/a

String that represents null values

--line <delimiter>

newline (\n)

Line delimiter, in single quotes

--quoted <boolean>

true

Either true or false, indicating whether the input file contains quoted fields.

--batch <number>

10000

Number of records in a batch

--retry_count <retry_number>

Number of attempts before job fails

--retry_wait <seconds>

Wait time in seconds after server connection failure

`-t

--transform`

n/a

Regex transformation

--print_error

false

Print error messages

--print_transform

false

Print description of transform

--help

n/a

List options

StreamImporter Logging Options

Setting

Default

Description

--log-directory <directory>

mapd_log

Logging directory; can be relative to data directory or absolute

--log-file-name <filename>

n/a

Log filename relative to logging directory; has format StreamImporter.{SEVERITY}.%Y%m%d-%H%M%S.log

--log-symlink <symlink>

n/a

Symlink to active log; has format StreamImporter.{SEVERITY}

--log-severity <level>

INFO

Log-to-file severity level: INFO, WARNING, ERROR, or FATAL

--log-severity-clog <level>

ERROR

Log-to-console severity level: INFO, WARNING, ERROR, or FATAL

--log-channels

n/a

Log channel debug info

--log-auto-flush

n/a

Flush logging buffer to file after each message

--log-max-files <files_number>

100

Maximum number of log files to keep

--log-min-free-space <bytes>

20,971,520

Minimum number of bytes available on the device before oldest log files are deleted

--log-rotate-daily

Start new log files at midnight

--log-rotation-size <bytes>

10485760

Maximum file size, in bytes, before new log files are created

Configure StreamImporter to use your target table. StreamImporter listens to a pre-defined data stream associated with your table. You must create the table before using the StreamImporter utility.

The data format must be a record-level format supported by HEAVY.AI.

StreamImporter listens to the stream, validates records against the target schema, and ingests batches of your designated size to the target table. Rejected records use the existing reject reporting mechanism. You can start, shut down, and configure StreamImporter independent of the HeavyDB engine. If StreamImporter is running but the database shuts down, StreamImporter shuts down as well. Reads from the stream are non-destructive.

StreamImporter is not responsible for event ordering - a first class streaming platform outside HEAVY.AI (for example, Spark streaming, flink) should handle the stream processing. HEAVY.AI ingests the end-state stream of post-processed events.

StreamImporter does not handle dynamic schema creation on first ingest, but must be configured with a specific target table (and its schema) as the basis.

There is a 1:1 correspondence between target table and a stream record.

cat tweets.tsv | ./StreamImporter tweets_small heavyai
-u imauser
-p imapassword
--delim '\t'
--batch 100000
--retry_count 360
--retry_wait 10
--null null
--port 9999

Importing Data from HDFS with Sqoop

You can consume a CSV or Parquet file residing in HDFS (Hadoop Distributed File System) into HeavyDB.

Copy the HEAVY.AI JDBC driver into the Apache Sqoop library, normally found at /usr/lib/sqoop/lib/.

Example

The following is a straightforward import command. For more information on options and parameters for using Apache Sqoop, see the user guide at sqoop.apache.org.

sqoop-export --table iAmATable \
--export-dir /user/cloudera/ \
--connect "jdbc:heavyai:000.000.000.0:6274:heavyai" \
--driver com.heavyai.jdbc.HeavyaiDriver \
--username imauser \
--password imapassword \
--direct \
--batch

The --connect parameter is the address of a valid JDBC port on your HEAVY.AI instance.

Troubleshooting: Avoiding Duplicate Rows

To detect duplication prior to loading data into HeavyDB, you can perform the following steps. For this example, the files are labeled A,B,C...Z.

Load file A into table MYTABLE.
Run the following query.
```
select count(t1.uniqueCol) as dups from MYTABLE t1 join MYTABLE t2 on t1.uCol = t2.uCol;
```
There should be no rows returned; if rows are returned, your first A file is not unique.
Load file B into table TEMPTABLE.
Run the following query.
```
select count(t1.uniqueCol) as dups from MYTABLE t1 join MYTABLE t2 on t1.uCol = t2.uCol;
```
There should be no rows returned if file B is unique. Fix B if the information is not unique using details from the selection.
Load the fixed B file into MYFILE.
Drop table TEMPTABLE.
Repeat steps 3-6 for the rest of the set for each file prior to loading the data to the real MYTABLE instance.

v6.4.3

Overview

Overview

HeavyDB

Native SQL

Geospatial Data

Open Source

HeavyRender

Geospatial Analysis

Visualize with Vega

Heavy Immerse

Dashboards

Charts

Use Multiple Sources

Streaming Data

Ready to Get Started?

Installation and Configuration

System Requirements

Software Requirements

Installation

Free Version

Add Users

Installing on Rocky Linux / RHEL

Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL

Install Prerequisites

Install Kernel Headers

Install NVIDIA Drivers and Vulkan

Install NVIDIA Drivers Using the NVIDIA Website

Install NVIDIA Drivers Using DNF

Check NVIDIA Driver Installation

Install Vulkan

Install CUDA Toolkit ᴼᴾᵀᴵᴼᴺᴬᴸ

Installing on Ubuntu

Installing on Docker

Getting Started on AWS

Prerequisite

Launching Your Instance

Using HEAVY.AI Immerse on Your AWS Instance

Importing Your Own Data

Accessing Your HEAVY.AI Instance Using SSH

Getting Started on GCP

Prerequisites

Launching Your HEAVY.AI Instance

Getting Started on Azure

Prerequisites

Configure Your HEAVY.AI Instance

Upgrading

Upgrade Paths supported

Upgrading HEAVY.AI

Upgrading from Omnisci

Upgrading Using Docker

Upgrading HEAVY.AI Using Package Managers and Tarball

CUDA Compatibility Drivers

Installing the Drivers

Updating systemd Files

Uninstalling

Uninstalling HEAVY.AI from Docker

Uninstalling HEAVY.AI on Redhat and Ubuntu

Ports

Services and Utilities

Using Services

Starting and Stopping HeavyDB Using systemd

Initial Setup

Starting HeavyDB Using systemd

Restarting HeavyDB Using systemd

Stopping HeavyDB Using systemd

Enabling HeavyDB on Startup

Using Configuration Parameters

Using Utilities

initdb

generate_cert

Configuration Parameters

Overview

Storage Directory

Configuring a Custom Heavy Immerse Subdirectory

Configuration File

Security

Implementing a Secure Binary Interface

Required PKI Components

Demonstration Script to Create "Mock/Test" PKI Components

Starting and Stopping HeavyDB Using `systemd`

Starting HeavyDB Using `systemd`

Restarting HeavyDB Using `systemd`

Stopping HeavyDB Using `systemd`