Search…
Examples
The following examples demonstrate how to use HeavyConnect using the foreign data wrappers, servers, tables, and user mappings. The examples use the following directory structure and underlying Parquet files. The highlighted numbers on the graphic correspond to the numbered examples that follow.
Details of the commands used in the examples are available in the Command Reference.

Example 1: Directory structures and refresh periods

HeavyConnect to the Parquet data organized in the year 2020. Users expect monthly updated data that has been added to existing data.
  1. 1.
    Create a custom foreign server. This server is reused in all of the examples that follow.
1
CREATE SERVER example_parquet_server FOREIGN DATA WRAPPER parquet_file WITH (
2
storage_type = 'LOCAL_FILE',
3
base_path = '/2020'
4
);
Copied!
2. Create the foreign table to use. In the CREATE FOREIGN TABLE statement, specify the update type and scheduling.
1
CREATE FOREIGN TABLE example_year_2020 (
2
id INTEGER,
3
vendor_id TEXT ENCODING DICT(32),
4
year_datetime TIMESTAMP(0),
5
year_activities_longitude DECIMAL(14,2),
6
year_activities_latitude DECIMAL(14,2),
7
precipitation SMALLINT,
8
snow_depth SMALLINT,
9
snowfall SMALLINT,
10
max_temperature SMALLINT,
11
min_temperature SMALLINT,
12
average_wind_speed SMALLINT)
13
SERVER example_parquet_server
14
WITH (REFRESH_TIMING_TYPE='SCHEDULED',
15
REFRESH_UPDATE_TYPE='APPEND',
16
REFRESH_START_DATE_TIME='2020-01-31T22:30:00Z',
17
REFRESH_INTERVAL='31D');
Copied!
You can now query and build dashboards using the foreign table example_year_2020.

Example 2: Directory structures and refresh periods

HeavyConnect to the Parquet data organized in the month of March. Users expect weekly updated data that has been added to the existing data.
  1. 1.
    Reuse the foreign server created in Example 1.
  2. 2.
    Create the foreign table for use. In the CREATE FOREIGN TABLE statement, specify the update type, scheduling, and the additional file path to the month of March.
1
CREATE FOREIGN TABLE example_month_march (
2
id INTEGER,
3
vendor_id TEXT ENCODING DICT(32),
4
year_datetime TIMESTAMP(0),
5
year_activities_longitude DECIMAL(14,2),
6
year_activities_latitude DECIMAL(14,2),
7
precipitation SMALLINT,
8
snow_depth SMALLINT,
9
snowfall SMALLINT,
10
max_temperature SMALLINT,
11
min_temperature SMALLINT,
12
average_wind_speed SMALLINT)
13
SERVER example_parquet_server
14
WITH (REFRESH_TIMING_TYPE='SCHEDULED',
15
REFRESH_UPDATE_TYPE='APPEND',
16
REFRESH_START_DATE_TIME='2020-03-01 22:30',
17
REFRESH_INTERVAL='7D',
18
FILE_PATH='Mar');
Copied!
When specifying a file path in the foreign table creation statement, the path is additive to the base path specified in the foreign server.
You can now query and build dashboards using the foreign table example_month_march.

Example 3: Directory structures and refresh periods

HeavyConnect to the Parquet data Mar_01_2020. Users expect daily updated data that may have been edited throughout the file.
  1. 1.
    Use the foreign server you created in Example 1.
  2. 2.
    Create the foreign table for use. In the CREATE FOREIGN TABLE statement, specify the update type, scheduling, and the additional file path to the month of March.
1
CREATE FOREIGN TABLE example_day_march_01 (
2
id INTEGER,
3
vendor_id TEXT ENCODING DICT(32),
4
year_datetime TIMESTAMP(0),
5
year_activities_longitude DECIMAL(14,2),
6
year_activities_latitude DECIMAL(14,2),
7
precipitation SMALLINT,
8
snow_depth SMALLINT,
9
snowfall SMALLINT,
10
max_temperature SMALLINT,
11
min_temperature SMALLINT,
12
average_wind_speed SMALLINT)
13
SERVER example_parquet_server
14
WITH (REFRESH_TIMING_TYPE='SCHEDULED',
15
REFRESH_UPDATE_TYPE='ALL',
16
REFRESH_START_DATE_TIME='2020-03-01 23:30',
17
REFRESH_INTERVAL='1D',
18
FILE_PATH='Mar/Mar_01_2020.parquet');
Copied!
REFRESH_UPDATE_TYPE='ALL' instructs the system to update all metadata and takes additional time for larger datasets. This can have an impact on performance when used with a short REFRESH_INTERVAL.
You can now query and build dashboards using the foreign table example_day_march_01.

Example 4: Directory structures and refresh periods

HeavyConnect to the Parquet data of year-over-year (YoY) January data. Users manually refresh the data that has been updated throughout the file.
  1. 1.
    Create the YoY Jan directory with symlinks to the January directories.
  2. 2.
    Use the foreign server provided by HeavyDB. You do not need to create a new one.
  3. 3.
    Create the foreign table for use. In the CREATE FOREIGN TABLE statement, specify the update type, scheduling, the additional file path, and the default HeavyDB server.
1
CREATE FOREIGN TABLE example_yoy_january (
2
id INTEGER,
3
vendor_id TEXT ENCODING DICT(32),
4
year_datetime TIMESTAMP(0),
5
year_activities_longitude DECIMAL(14,2),
6
year_activities_latitude DECIMAL(14,2),
7
precipitation SMALLINT,
8
snow_depth SMALLINT,
9
snowfall SMALLINT,
10
max_temperature SMALLINT,
11
min_temperature SMALLINT,
12
average_wind_speed SMALLINT)
13
SERVER default_local_parquet
14
WITH (REFRESH_TIMING_TYPE='manual',
15
REFRESH_UPDATE_TYPE='ALL',
16
FILE_PATH='/yoy_january');
Copied!
To refresh the data, you need to issue the command REFRESH FOREIGN TABLES example_yoy_january.
You can now query and build dashboards using the foreign table example_yoy_january.

Example 5: AWS S3 Datastore

This example provides the full workflow required to HeavyConnect to a private S3 Parquet datastore.
See the AWS documentation for information on organizing your data and how folders are represented in S3.
  1. 1.
    Create a custom S3 foreign server for the private S3 bucket.
1
CREATE SERVER example_S3_parquet_server FOREIGN DATA WRAPPER parquet_file WITH (
2
storage_type = 'AWS_S3',
3
base_path = '/2020',
4
s3_bucket = 'my-s3-bucket',
5
aws_region = 'us-west-1'
6
);
Copied!
2. Set up credentials for the foreign server. In the CREATE USER MAPPING statement, set the S3 access keys and S3 secret key.
1
CREATE USER MAPPING FOR PUBLIC SERVER example_S3_parquet_server WITH (
2
s3_access_key = 'xxxx',
3
s3_secret_key = 'xxxx'
4
);
Copied!
3. Create the foreign table to use. In the CREATE FOREIGN TABLE statement, specify the update type and scheduling.
1
CREATE FOREIGN TABLE example_year_2020 (
2
id INTEGER,
3
vendor_id TEXT ENCODING DICT(32),
4
year_datetime TIMESTAMP(0),
5
year_activities_longitude DECIMAL(14,2),
6
year_activities_latitude DECIMAL(14,2),
7
precipitation SMALLINT,
8
snow_depth SMALLINT,
9
snowfall SMALLINT,
10
max_temperature SMALLINT,
11
min_temperature SMALLINT,
12
average_wind_speed SMALLINT)
13
SERVER example_S3_parquet_server
14
WITH (REFRESH_TIMING_TYPE='SCHEDULED',
15
REFRESH_UPDATE_TYPE='APPEND',
16
REFRESH_START_DATE_TIME='2020-01-31 22:30',
17
REFRESH_INTERVAL='31D');
Copied!
You can now query and build dashboards using the foreign table example_year_2020.

Example 6: AWS S3 Datastore

This example provides the full workflow required to HeavyConnect to a public S3 CSV datastore that uses the S3 Select access type.
  1. 1.
    Create a custom S3 foreign server for the public S3 bucket.
1
CREATE SERVER example_S3_csv_server FOREIGN DATA WRAPPER delimited_file WITH (
2
storage_type = 'AWS_S3',
3
base_path = '/2020',
4
s3_bucket = 'my-s3-bucket',
5
aws_region = 'us-west-1'
6
);
Copied!
2. Set the public credentials for the foreign server, which uses the S3 Select access type. In the CREATE USER MAPPING statement, set the S3 access keys and S3 secret key.
1
CREATE USER MAPPING FOR PUBLIC SERVER example_S3_csv_server WITH (
2
s3_access_key = 'xxxx',
3
s3_secret_key = 'xxxx'
4
);
Copied!
3. Create the foreign table using the S3 Select access type. In the CREATE FOREIGN TABLE statement, specify the update type and scheduling.
1
CREATE FOREIGN TABLE example_year_2020 (
2
id INTEGER,
3
vendor_id TEXT ENCODING DICT(32),
4
year_datetime TIMESTAMP(0),
5
year_activities_longitude DECIMAL(14,2),
6
year_activities_latitude DECIMAL(14,2),
7
precipitation SMALLINT,
8
snow_depth SMALLINT,
9
snowfall SMALLINT,
10
max_temperature SMALLINT,
11
min_temperature SMALLINT,
12
average_wind_speed SMALLINT)
13
SERVER example_S3_csv_server
14
WITH (REFRESH_TIMING_TYPE='SCHEDULED',
15
REFRESH_UPDATE_TYPE='APPEND',
16
REFRESH_START_DATE_TIME='2020-01-31 22:30',
17
REFRESH_INTERVAL='31D',
18
S3_ACCESS_TYPE = 'S3_SELECT');
Copied!
You can now query and build dashboards using the foreign table example_year_2020.

Example 7: Processing Access Logs

This example illustrates how the regex parsed file data wrapper can be used to query a local log file, which uses the Common Log Format standard. Assume that the log file has the following content:
1
20.182.146.93 - joe [17/Nov/2021:13:00:00 -0800] "GET /posts HTTP/1.1" 200 1000
2
182.226.45.18 - bob [17/Nov/2021:13:05:00 -0800] "GET /home HTTP/1.1" 200 402
3
0.230.116.14 - sue [17/Nov/2021:13:20:00 -0800] "GET /profile HTTP/1.1" 200 550
4
20.182.146.93 - joe [17/Nov/2021:13:20:00 -0800] "POST /posts/1234/comments HTTP/1.1" 500 334
5
20.182.146.93 - joe [17/Nov/2021:13:21:00 -0800] "POST /posts/1234/comments HTTP/1.1" 200 120
6
20.182.146.93 - joe [17/Nov/2021:13:30:00 -0800] "POST /posts/1235/comments HTTP/1.1" 200 89
7
182.226.45.18 - bob [17/Nov/2021:13:31:00 -0800] "GET /posts HTTP/1.1" 200 1000
8
0.230.116.14 - sue [17/Nov/2021:13:31:00 -0800] "GET /posts HTTP/1.1" 200 1000
9
0.230.116.14 - sue [17/Nov/2021:13:35:00 -0800] "POST /posts/1234/comments HTTP/1.1" 200 49
10
20.182.146.93 - joe [17/Nov/2021:13:40:00 -0800] "POST /posts/1234/comments HTTP/1.1" 200 100
Copied!
Create a foreign table that extracts all the fields in the logs:
1
CREATE FOREIGN TABLE access_logs (
2
ip_address TEXT,
3
user_id TEXT,
4
log_timestamp TIMESTAMP,
5
http_method TEXT,
6
endpoint TEXT,
7
http_status SMALLINT,
8
response_size BIGINT
9
) SERVER default_local_regex_parsed
10
WITH (file_path = '/logs/sample.log',
11
line_regex = '^(\d+\.\d+\.\d+\.\d+)\s+\-\s+(\w+)\s+\[([^\]]+)\]\s+\"(\w+)\s+([^\s]+)\s+HTTP\/1\.1"\s+(\d+)\s+(\d+)#x27;);
12
Copied!
Tip: Use a regex visualizer tool with a sample of the text file when determining the "line_regex" string.
The table can now be queried using a HeavyDB client as normal:
1
SELECT * FROM access_logs WHERE http_status != 200;
2
3
ip_address |user_id|log_timestamp |http_method|endpoint |http_status|response_size
4
20.182.146.93|joe |2021-11-17 21:20:00|POST |/posts/1234/comments|500 |334
Copied!
The previous example uses the default provided "default_local_regex_parsed" server, which can be used to access files on the local file system without having to create a separate server object. Similar default servers exist for the CSV (default_local_delimited) and Parquet (default_local_parquet) data wrappers.

Example 8: Processing Multi-Line Access Logs

The above example shows how a log file can be queried using the regex parsed file data wrapper. However, in certain cases, log messages may span multiple lines. In such a case, a "line_start_regex" option can be used to indicate the start of a new entry.
Assume that the log file has the following content with some entries spanning multiple lines:
1
20.182.146.93 - joe [17/Nov/2021:13:00:00 -0800] "GET /posts
2
HTTP/1.1" 200 1000
3
182.226.45.18 - bob
4
[17/Nov/2021:13:05:00 -0800] "GET /home HTTP/1.1" 200 402
5
0.230.116.14 - sue [17/Nov/2021:13:20:00 -0800] "GET /profile HTTP/1.1" 200 550
6
20.182.146.93 - joe [17/Nov/2021:13:20:00 -0800] "POST /posts/1234/comments HTTP/1.1" 500 334
7
20.182.146.93 - joe [17/Nov/2021:13:21:00 -0800] "POST
8
/posts/1234/comments HTTP/1.1" 200 120
9
20.182.146.93 - joe
10
[17/Nov/2021:13:30:00 -0800]
11
"POST /posts/1235/comments HTTP/1.1"
12
200
13
89
14
182.226.45.18 - bob [17/Nov/2021:13:31:00 -0800] "GET /posts HTTP/1.1" 200 1000
15
0.230.116.14 - sue [17/Nov/2021:13:31:00 -0800]
16
"GET /posts HTTP/1.1" 200
17
1000
18
0.230.116.14 - sue [17/Nov/2021:13:35:00 -0800] "POST /posts/1234/comments HTTP/1.1" 200 49
19
20.182.146.93 - joe [17/Nov/2021:13:40:00 -0800] "POST /posts/1234/comments HTTP/1.1"
20
200 100
Copied!
Create a foreign table that extracts all the fields in the logs:
1
CREATE FOREIGN TABLE access_logs (
2
ip_address TEXT,
3
user_id TEXT,
4
log_timestamp TIMESTAMP,
5
http_method TEXT,
6
endpoint TEXT,
7
http_status SMALLINT,
8
response_size BIGINT
9
) SERVER default_local_regex_parsed
10
WITH (file_path = '/logs/sample.log',
11
line_regex = '^(\d+\.\d+\.\d+\.\d+)\s+\-\s+(\w+)\s+\[([^\]]+)\]\s+\"(\w+)\s+([^\s]+)\s+HTTP\/1\.1"\s+(\d+)\s+(\d+)#x27;,
12
line_start_regex = '^(\d+\.\d+\.\d+\.\d+)');
Copied!
The table can now be queried using a HeavyDB client as normal:
1
SELECT * FROM access_logs WHERE http_status != 200;
2
3
ip_address |user_id|log_timestamp |http_method|endpoint |http_status|response_size
4
20.182.146.93|joe |2021-11-17 21:20:00|POST |/posts/1234/comments|500 |334
5
Copied!

Example 9: PostgreSQL Access Using an ODBC DSN (Beta)

ODBC HeavyConnect is currently in beta.
This example illustrates how the ODBC data wrapper can be used to access data residing in a PostgreSQL RDMS.
  1. 1.
    Download the PostgreSQL ODBC driver (see https://odbc.postgresql.org/ for more details. Alternatively, this can be done via an Operating System package manager).
  2. 2.
    Add an /etc/odbc.ini file with the following configuration:
1
[ODBC Data Sources]
2
postgres_db_1=postgres_db_1
3
4
[postgres_db_1]
5
Description=Local PostgreSQL database
6
Driver=/usr/lib/odbc-drivers/libpgodbc.so ; Symlink to downloaded driver .so file
7
Database=postgres
8
Servername=localhost
9
Port=5432
10
Copied!
Alternatively, the above configuration can be put in an .odbc.ini file in the home directory (i.e. ~/.odbc.ini), if the server process is started under a specific user account.
3. Create a custom ODBC foreign server for the PostgreSQL database:
1
CREATE SERVER my_postgres_server FOREIGN DATA WRAPPER odbc
2
WITH (data_source_name = 'postgres_db_1');
Copied!
data_source_name is set to the name that is configured in the ~/.odbc.ini file.
4. Set the credentials for the foreign server using a user mapping:
1
CREATE USER MAPPING FOR PUBLIC SERVER my_postgres_server
2
WITH (username = 'username', password = 'password');
Copied!
5. Create a foreign table that references the above server:
1
CREATE FOREIGN TABLE example_table (device_id INTEGER, message TEXT, event_timestamp TIMESTAMP)
2
SERVER my_postgres_server
3
WITH (sql_select = 'SELECT * FROM remote_postgres_table WHERE event_timestamp > $2020-01-01$;',
4
sql_order_by = 'event_timestamp');
Copied!
You can now query the foreign table as normal.

Example 10: PostgreSQL Access Using ODBC Connection String (Beta)

ODBC HeavyConnect is currently in beta.
The above example shows how an RDMS can be accessed using an ODBC configuration file that resides in the server. System administrator might not know and configure all possible RDMS databases that users want to use; the ODBC data wrapper provides an alternative of setting the configuration for remote RDMS databases using a connection string.
In this example, assume that data needs to be fetched from another PostgreSQL database called "my_postgres_db", which is on a server with hostname "my_postgres.example.com" running on port 1234.
  1. 1.
    Download the PostgreSQL ODBC driver (and drivers for all types of RDMS users are expected to use).
  2. 2.
    Add an /etc/odbcinst.ini file with configurations for all drivers:
1
[ODBC Drivers]
2
PostgreSQL=Installed
3
Redshift=Installed
4
Snowflake=Installed
5
6
[PostgreSQL]
7
Description=PostgreSQL ODBC driver
8
Driver=/usr/lib/odbc-drivers/libpgodbc.so ; Symlink to downloaded driver .so file
9
10
[Redshift]
11
Description=Redshift ODBC driver
12
Driver=/usr/lib/odbc-drivers/libredshiftodbc.so ; Symlink to downloaded driver .so file
13
14
[Snowflake]
15
Description=Snowflake ODBC Driver
16
Driver=/usr/lib/odbc-drivers/libsnowflakeodbc.so ; Symlink to downloaded driver .so file
Copied!
Use an intuitive name for the driver. Ideally, this should be the official name of the RDMS, so that users can easily figure out the driver name to use when creating the foreign server object.
3. Create a custom ODBC foreign server for the PostgreSQL database:
1
CREATE SERVER my_postgres_server FOREIGN DATA WRAPPER odbc
2
WITH (connection_string = 'Driver=PostgreSQL;Database=my_postgres_db;Servername=my_postgres.example.com;Port=1234');
Copied!
Users can create foreign server objects using their RDMS database details and name of the installed driver.
4. Set the credentials for the foreign server using a user mapping:
1
CREATE USER MAPPING FOR PUBLIC SERVER my_postgres_server
2
WITH (credential_string = 'Username=username;Password=password');
Copied!
When the "connection_string" option is used in the foreign server definition, the corresponding user mapping has to use a "credential_string" option, which contains the username and password.
5. Create a foreign table that references the above server:
1
CREATE FOREIGN TABLE example_table (device_id INTEGER, message TEXT, event_timestamp TIMESTAMP)
2
SERVER my_postgres_server
3
WITH (sql_select = 'SELECT * FROM remote_postgres_table WHERE event_timestamp > $2020-01-01$;',
4
sql_order_by = 'event_timestamp');
Copied!
You can now query the foreign table as normal.