HEAVY.AI has minimal configuration requirements with a number of additional configuration options. This topic describes the required and optional configuration changes you can use in your HEAVY.AI instance.
In release 4.5.0 and higher, HEAVY.AI requires that all configuration flags used at startup match a flag on the HEAVY.AI server. If any flag is misspelled or invalid, the server does not start. This helps ensure that all settings are intentional and will not have an unexpected impact on performance or data integrity.
Before starting the HEAVY.AI server, you must initialize the persistent storage
directory. To do so, create an empty directory at the desired path, such as /var/lib/heavyai
.
Create the environment variable $HEAVYAI_BASE
.
2. Then, change the owner of the directory to the user that the server will run as ($HEAVYAI_USER):
where $HEAVYAI_USER is the system user account that the server runs as, such as heavyai
, and $HEAVYAI_BASE is the path to the parent of the HEAVY.AI server storage directory.
3. Run $HEAVYAI_PATH/bin/initheavy with the storage directory path as the argument:
Immerse serves the application from the root path (/) by default. To serve the application from a sub-path, you must modify the $HEAVYAI_PATH/frontend/app-config.js file to change the IMMERSE_PATH_PREFIX value. The Heavy Immerse path must start with a forward slash (/).
The configuration file stores runtime options for your HEAVY.AI servers. You can use the file to change the default behavior.
The heavy.conf file is stored in the $HEAVYAI_BASE directory. The configuration settings are picked up automatically by the sudo systemctl start heavydb
and sudo systemctl start heavy_web_server
commands.
Set the flags in the configuration file using the format <flag> = <value>
. Strings must be enclosed in quotes.
The following is a sample configuration file. The entry for data
path is a string and must be in quotes. The last entry in the first section, for null-div-by-zero
, is the Boolean value true
and does not require quotes.
To comment out a line in heavy.conf, prepend the line with the pound sign (#) character.
For encrypted backend connections, if you do not use a configuration file to start the database, Calcite expects passwords to be supplied through the command line, and calcite passwords will be visible in the processes table. If a configuration file is supplied, then passwords must be supplied in the file. If they are not, Calcite will fail.
Following are the parameters for runtime settings on HeavyAI Web Server. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.
Following are the parameters for runtime settings on HeavyDB. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.
For example, consider allow-loop-joins [=arg(=1)] (=0)
.
If you do not use this flag, loop joins are not allowed by default.
If you provide no arguments, the implied value is 1 (true) (allow-loop-joins
).
If you provide the argument 0, that is the same as the default (allow-loop-joins=0
).
If you provide the argument 1, that is the same as the implied value (allow-loop-joins=1
).
Following are additional parameters for runtime settings for the Enterprise Edition of HeavyDB. The parameter syntax provides both the implied value and the default value as appropriate. Optional arguments are in square brackets, while implied and default values are in parentheses.
Flag
Description
Default
additional-file-upload-extensions <string>
Denote additional file extensions for uploads. Has no effect if --enable-upload-extension-check
is not set.
allow-any-origin
Allows for a CORS exception to the same-origin policy. Required to be true if Immerse is hosted on a different domain or subdomain hosting heavy_web_server and heavydb.
Allowing any origin is a less secure mode than what heavy_web_server requires by default.
--allow-any-origin = false
-b | backend-url <string>
URL to http-port on heavydb. Change to avoid collisions with other services.
http://localhost:6278
-B | binary-backend-url <string>
URL to http-binary-port on heavydb.
http://localhost:6276
cert string
Certificate file for HTTPS. Change for testing and debugging.
cert.pem
-c | config <string>
Path to HeavyDB configuration file. Change for testing and debugging.
-d | data <string>
Path to HeavyDB data directory. Change for testing and debugging.
data
data-catalog <string>
Path to data catalog directory.
n/a
docs string
Path to documentation directory. Change if you move your documentation files to another directory.
docs
enable-binary-thrift
Use the binary thrift protocol.
TRUE[1]
enable-browser-logs [=arg]
Enable access to current log files via web browser. Only super users (while logged in) can access log files.
Log files are available at http[s]://host:port/logs/log_name.
The web server log files: ACCESS - http[s]://host:port/logs/access ALL - http[s]://host:port/logs/all
HeavyDB log files: INFO - http[s]://host:port/logs/info WARNING - http[s]://host:port/logs/warning ERROR - http[s]://host:port/logs/
FALSE[0]
enable-cert-verification
TLS certificate verification is a security measure that can be disabled for the cases of TLS certificates not issued by a trusted certificate authority. If using a locally or unofficially generated TLS certificate to secure the connection between heavydb and heavy_web_server, this parameter must be set to false. heavy_web_server expects a trusted certificate authority by default.
--enable-cert-verification = true
enable-cross-domain [=arg]
Enable frontend cross-domain authentication. Cross-domain session cookies require the SameSite = None; Secure
headers. Can only be used with HTTPS domains; requires enable-https
to be true.
FALSE[0]
enable-https
Enable HTTPS support. Change to enable secure HTTP.
enable-https-authentication
Enable PKI authentication.
enable-https-redirect [=arg]
Enable a new port that heavy_web_server listens on for incoming HTTP requests. When received, it returns a redirect response to the HTTPS port and protocol, so that browsers are immediately and transparently redirected. Use to provide an HEAVY.AI front end that can run on both the HTTP protocol (http://my-heavyai-frontend.com) on default HTTP port 80, and on the primary HTTPS protocol (https://my-heavyai-frontend.com) on default https port 443, and have requests to the HTTP protocol automatically redirected to HTTPS. Without this, requests to HTTP fail. Assuming heavy_web_server can attach to ports below 1024, the configuration would be: enable-https-redirect = TRUE http-to-https-redirect-port = 80
FALSE[0]
enable-non-kernel-time-query-interrupt
Enable non-kernel-time query interrupt.
TRUE[1]
enable-runtime-query-interrupt
Enbale runtime query interrupt.
TRUE[1]
enable-upload-extension-check
Disables restrictive file extension upload check.
encryption-key-file-path <string>
Path to the file containing the credential payload cipher key. Key must be 256 bits in length.
-f | frontend string
Path to frontend directory. Change if you move the location of your frontend UI files.
frontend
http-to-https-redirect-port = arg
Configures the http (incoming) port used by enable-https-redirect. The port option specifies the redirect port number. Use to provide an HEAVY.AI front end that can run on both the HTTP protocol (http://my-heavyai-frontend.com) on default HTTP port 80, and on the primary HTTPS protocol (https://my-heavyai-frontend.com) on default https port 443, and have requests to the HTTP protocol automatically redirected to HTTPS. Without this, requests to HTTP fail. Assuming heavy_web_server can attach to ports below 1024, the configuration would be: enable-https-redirect = TRUE http-to-https-redirect-port = 80
6280
idle-session-duration = arg
Idle session default, in minutes.
60
jupyter-prefix-string <string>
Jupyter Hub base_url for Jupyter integration.
/jupyter
jupyter-url-string <string>
URL for Jupyter integration.
-j |jwt-key-file
Path to a key file for client session encryption.
The file is expected to be a PEM-formatted ( .pem ) certificate file containing the unencrypted private key in PKCS #1, PCKS #8, or ASN.1 DER form.
Example PEM file creation using OpenSSL.
Required only if using a high-availability server configuration or another server configuration that requires an instance of Immerse to talk to multiple heavy_web_server instances.
Each heavy_web_server instance needs to use the same encryption key to encrypt and decrypt client session information which is used for session persistence ("sessionization") in Immerse.
key <string>
Key file for HTTPS. Change for testing and debugging.
key.pem
max-tls-version
Refers to the version of TLS encryption used to secure web protocol connections. Specifies a maximum TLS version.
min-tls-version
Refers to the version of TLS encryption used to secure web protocol connections. Specifies a minimum TLS version.
--min-tls-version = VersionTLS12
peer-cert <string>
Peer CA certificate PKI authentication.
peercert.pem
-p | port int
Frontend server port. Change to avoid collisions with other services.
6273
-r | read-only
Enable read-only mode. Prevent changes to the data.
secure-acao-uri
If set, ensures that all Access-Allow-Origin
headers are set to the value provided.
servers-json <string>
Path to servers.json. Change for testing and debugging.
session-id-header <string>
Session ID header.
immersesid
ssl-cert <string>
SSL validated public certificate.
sslcert.pem
ssl-private-key <string>
SSL private key file.
sslprivate.key
strip-x-headers <strings>
List of custom X http request headers to be removed from incoming requests. Use --strip-x-headers=""
to allow all X headers through.
[X-HeavyDB-Username]
timeout duration
Maximum request duration in #h#m#s
format. For example 0h30m0s
represents a duration of 30 minutes. Controls the maximum duration of individual HTTP requests. Used to manage resource exhaustion caused by improperly closed connections.
This also limits the execution time of queries made over the Thrift HTTP transport. Increase the duration if queries are expected to take longer than the default duration of one hour; for example, if you COPY FROM a large file when using heavysql with the HTTP transport.
1h0m0s
tls-cipher-suites <strings>
Refers to the combination of algorithms used in TLS encryption to secure web protocol connections.
All available TLS cipher suites compatible with HTTP/2:
TLS_RSA_WITH_RC4_128_SHA
TLS_RSA_WITH_AES_128_CBC_SHA
TLS_ECDHE_RSA_WITH_AES_128_
GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_128_
GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_
GCM_SHA384
TLS_ECDHE_ECDSA_WITH_AES_256_
GCM_SHA384
TLS_ECDHE_RSA_WITH_CHACHA20_
POLY1305
TLS_ECDHE_ECDSA_WITH_CHACHA20_
POLY1305
TLS_AES_128_GCM_SHA256
TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256
TLS_FALLBACK_SCSV
<code></code>
Limit security vulnerabilities by specifying the allowed TLS ciphers in the encryption used to secure web protocol connections.
The following cipher suites are accepted by default:
TLS_ECDHE_RSA_WITH_AES_128_
GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_128_
GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_
GCM_SHA384
TLS_RSA_WITH_AES_256_GCM_
SHA384
tls-curves <strings>
Refers to the types of Elliptic Curve Cryptography (ECC) used in TLS encryption to secure web protocol connections.
All available TLS elliptic Curve IDs:
secp256r1
(Curve ID P256)
CurveP256
(Curve ID P256)
secp384r1
(Curve ID P384)
CurveP384
(Curve ID P384)
secp521r1
(Curve ID P521)
CurveP521
(Curve ID P521)
x25519
(Curve ID X25519)
X25519
(Curve ID X25519)
Limit security vulnerabilities by specifying the allowed TLS cipher suites in the encryption used to secure web protocol connections.
The following TLS curves are accepted by default:
CurveP521
CurveP384
CurveP256
tmpdir string
Path for temporary file storage. Used as a staging location for file uploads. Consider locating this directory on the same file system as the HEAVY.AI data directory. If not specified on the command line, heavyai_web_server
recognizes the standard TMPDIR
environment variable as well as a specific HEAVYAI_TMPDIR
environment variable, the latter of which takes precedence. If you use neither the command-line argument nor one of the environment variables, the default, /tmp/
is used.
/tmp
ultra-secure-mode
Enables secure mode that sets Access-Allow-Origin
headers to --secure-acao-uri
and sets security headers like X-Frame-Options
, Content-Security-Policy
, and Strict-Transport-Security
.
-v | verbose
Enable verbose logging. Adds log messages for debugging purposes.
version
Return version.
Flag
Description
Default Value
allow-cpu-retry [=arg]
Allow the queries that failed on GPU to retry on CPU, even when watchdog is enabled. When watchdog is enabled, most queries that run on GPU and throw a watchdog exception fail. Turn this on to allow queries that fail the watchdog on GPU to retry on CPU. The default behavior is for queries that run out of memory on GPU to throw an error if watchdog is enabled. Watchdog is enabled by default.
TRUE[1]
allow-cpu-kernel-concurrency
Allow for multiple queries to run execution kernels concurrently on CPU.
Example: In a system with a number of executor of 4 (controlled by the parameter number-executors) 3+1 (the +1 is depending by the allow-cpu-gpu-kernel-concurrency) can run concurrently in the CPU.
DEFAULT: ON
allow-cpu-gpu-kernel-concurrency
Allow multiple queries to run execution kernels concurrently on CPU while a GPU query is executing.
Example: In a system with a number of executor of 4 (controlled by the parameter number-executors), on of the 4 slot can be used to run a GPU query, along with other 3 running on CPU.
DEFAULT: ON
allow-local-auth-fallback
[=arg(=1)] (=0)
If SAML or LDAP logins are enabled, and the logins fail, this setting enables authentication based on internally stored login credentials. Command-line tools or other tools that do not support SAML might reject those users from logging in unless this feature is enabled. This allows a user to log in using credentials on the local database.
FALSE[0]
allow-loop-joins [=arg(=1)] (=0)
Enables all join queries to fall back to the loop join implementation. During a loop join, queries loop over all rows from all tables involved in the join, and evaluate the join condition. By default, loop joins are only allowed if the number of rows in the inner table is fewer than the trivial-loop-join-threshold, since loop joins are computationally expensive and run for an extended period. Modifying the trivial-loop-join-threshold is a safer alternative to globally enabling loop joins. You might choose to globally enable loop joins when you have many small tables for which loop join performance has been determined to be acceptable but modifying the trivial join loop threshold would be tedious.
FALSE[0]
allowed-export-paths = ["root_path_1", root_path_2", ...]
Specify a list of allowed root paths that can be used in export operations, such as the COPY TO command. Helps prevent exploitation of security vulnerabilities and prevent server crashes, data breaches, and full remote control of the host machine. For example:
allowed-export-paths = ["/heavyai-storage/data/heavyai_export", "/home/centos"]
The list of paths must be on the same line as the configuration parameter.
Allowed file paths are enforced by default. The default export path (<data directory>/heavyai_export
) is allowed by default, and all child paths of that path are allowed.
When using commands with other paths, the provided paths must be under an allowed root path. If you try to use a nonallowed path in a COPY TO command, an error response is returned.
N/A
allow-s3-server-privileges
Allow S3 server privileges if IAM user credentials are not provided. Credentials can be specified with environment variables (such as AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and so on), an AWS credentials file, or when running on an EC2 instance, with an IAM role that is attached to the instance.
FALSE[0]
allowed-import-paths = ["root_path_1", "root_path_2", ...]
Specify a list of allowed root paths that can be used in import operations, such as the COPY FROM command. Helps prevent exploitation of security vulnerabilities and prevent server crashes, data breaches, and full remote control of the host machine.
For example:
allowed-import-paths = ["/heavyai-storage/data/heavyai_import", "/home/centos"]
The list of paths must be on the same line as the configuration parameter.
Allowed file paths are enforced by default. The default import path (<data directory>/heavyai_import
) is allowed by default, and all child paths of that allowed path are allowed.
When using commands with other paths, the provided paths must be under an allowed root path. If you try to use a nonallowed path in a COPY FROM command, an error response is returned.
N/A
approx_quantile_buffer
arg
Size of a temporary buffer that is used to copy in the data for APPROX_MEDIAN calculation. When full, is sorted before being merged into the internal distribution buffer configured in approx_quantile_centroids
.
1000
approx_quantile_centroids
arg
Size of the internal buffer used to approximate the distribution of the data for which the APPOX_MEDIAN calculation is taken. The larger the value, the greater the accuracy of the answer.
300
auth-cookie-name
arg
Configure the authentication cookie name. If not explicitly set, the default name is oat
.
oat
bigint-count [=arg]
Use 64-bit count. Disabled by default because 64-bit integer atomics are slow on GPUs. Enable this setting if you see negative values for a count, indicating overflow. In addition, if your data set has more than 4 billion records, you likely need to enable this setting.
FALSE[0]
bitmap-memory-limit
arg
Set the maximum amount of memory (in GB) allocated for APPROX_COUNT_DISTINCT bitmaps per execution kernel (thread or GPU).
8
calcite-max-mem arg
Max memory available to calcite JVM. Change if Calcite reports out-of-memory errors.
1024
calcite-port arg
Calcite port number. Change to avoid collisions with ports already in use.
6279
calcite-service-timeout
Service timeout value, in milliseconds, for communications with Calcite. On databases with large numbers of tables, large numbers of concurrent queries, or many parallel updates and deletes, Calcite might return less quickly. Increasing the timeout value can prevent THRIFT_EAGAIN timeout errors.
5000
columnar-large-projections[=arg]
Sets automatic use of columnar output, instead of row-wise output, for large projections.
TRUE
columnar-large-projections-threshold arg
Set the row-number threshold size for columnar output instead of row-wise output.
1000000
config arg
Path to heavy.conf. Change for testing and debugging.
$HEAVYAI_STORAGE/ heavy.conf
cpu-only
Run in CPU-only mode. Set this flag to force HeavyDB to run in CPU mode, even when GPUs are available. Useful for debugging and on shared-tenancy systems where the current HeavyDB instance does not need to run on GPUs.
FALSE
cpu-buffer-
mem-bytes arg
Size of memory reserved for CPU buffers [bytes]. Change to restrict the amount of CPU/system memory HeavyDB can consume. A default value of 0 indicates no limit on CPU memory use. (HEAVY.AI Server uses all available CPU memory on the system.)
0
cuda-block-size arg
Size of block to use on GPU. GPU performance tuning: Number of threads per block. Default of 0 means use all threads per block.
0
cuda-grid-size arg
Size of grid to use on GPU. GPU performance tuning: Number of blocks per device. Default of 0 means use all available blocks per device.
0
data arg
Directory path to HEAVY.AI catalogs. Change for testing and debugging.
$HEAVYAI_STORAGE
db-query-list arg
Path to file containing HEAVY.AI queries. Use a query list to autoload data to GPU memory on startup to speed performance. See Preloading Data.
N/A
dynamic-watchdog-time-limit [=arg]
Dynamic watchdog time limit, in milliseconds. Change if dynamic watchdog is stopping queries expected to take longer than this limit.
100000
enable-auto-clear-render-mem [=arg]
Enable/disable clear render gpu memory on out-of-memory errors during rendering. If an out-of-gpu-memory exception is thrown while rendering, many users respond by running \clear_gpu
via the heavysql command-line interface to refresh/defrag the memory heap. This process can be automated with this flag enabled. At present, only GPU memory in the renderer is cleared automatically.
TRUE[1]
enable-auto-metadata-update [=arg]
Enable automatic metadata updates on UPDATE queries. Automatic metadata updates are turned on by default. Disabling may result in stale metadata and reductions in query performance.
TRUE[1]
enable-columnar-output [=arg]
Allows HEAVY.AI Core to directly materialize intermediate projections and the final ResultSet in Columnar format where appropriate. Columnar output is an internal performance enhancement that projects the results of an intermediate processing step in columnar format. Consider disabling this feature if you see unexpected performance regressions in your queries.
TRUE[1]
enable-data-recycler [=arg]
Set to TRUE to enable the data recycler. Enabling the recycler enables the following:
Hashtable recycler, which is the cache storage.
Hashing scheme recycler, which preserves a hashtable layout (such as perfect hashing and keyed hashing).
Overlaps hashtable tuning parameter recycler. Each overlap hashtable has its own parameters used during hashtable building.
TRUE[0]
enable-debug-timer [=arg]
Enable fine-grained query execution timers for debug. For debugging, logs verbose timing information for query execution (time to load data, time to compile code, and so on).
FALSE[0]
enable-direct-columnarization
[=arg(=1)](=0)
Columnarization organizes intermediate results in a multi-step query in the most efficient way for the next step in the process. If you see an unexpected performance regression, you can try setting this value to false, enabling the earlier HEAVY.AI columnarization behavior.
TRUE[1]
enable-dynamic-watchdog [=arg]
Enable dynamic watchdog.
FALSE[0]
--enable-executor-resource-mgr [=arg]
Disable the executor resource manager.
TRUE[1]
enable-filter-push-down [=arg(=1)] (=0)
Enable filter push-down through joins. Evaluates filters in the query expression for selectivity and pushes down highly selective filters into the join according to selectivity parameters. See also What is Predicate Pushdown?
FALSE[0]
enable-foreign-table-scheduled-refresh
[=arg]
Enable scheduled refreshes of foreign tables. Enables automated refresh of foreign tables with "REFRESH_TIMING_TYPE" option of "SCHEDULED" based on the specified refresh schedule.
TRUE[1]
enable-geo-ops-on-uncompressed-coords [=arg(=1)] (=0)
Allow geospatial operations ST_Contains
and ST_Intersects
to process uncompressed coordinates where possible to increase execution speed.
Provides control over the selection of ST_Contains
and ST_Intersects
implementations. By default, for certain combinations of compressed geospatial arguments, such as ST_Contains(POLYGON, POINT)
, the implementation can process uncompressed coordinate values. This can result in much faster execution but could decrease precision. Disabling this option enables full decompression, which is slower but more precise.
TRUE[1]
enable-logs-system-tables [=arg(=1)] (=0)
Enable use of logs system tables. Also enables the Request Logs and Monitoring system dashboard (Enterprise Edition only).
FALSE[0]
enable-overlaps-hashjoin [=arg(=1)] (=0)
Enable the overlaps hash join framework allowing for range join (for example, spatial overlaps) computation using a hash table.
TRUE[1]
enable-runtime-query-interrupt [=arg(=1)] (=0)
Enable the runtime query interrupt. Enables runtime query interrupt. Setting to TRUE can reduce performance slightly. Use with runtime-query-interrupt-frequency
to set the interrupt frequency.
FALSE[0]
enable-runtime-udf
Enable runtime user defined function registration. Enables runtime registration of user defined functions. This functionality is turned off unless you specifically request it, to prevent unintentional inclusion of nonstandard code. This setting is a precursor to more advanced object permissions planned in future releases.
FALSE[0]
enable-string-dict-hash-cache[=arg(=1)] (=0)
When importing a large table with low cardinality, set the flag to TRUE and leave it on to assist with bulk queries. If using String Dictionary Server, set the flag to FALSE if the String Dictionary server uses more memory than the physical system can support.
TRUE[1]
enable-thrift-logs [=arg(=1)] (=0)
Enable writing messages directly from Thrift to stdout/stderr. Change to enable verbose Thrift messages on the console.
FALSE[0]
enable-watchdog [arg]
Enable watchdog.
TRUE[1]
executor-cpu-result-mem-ratio
Set executor resource manager reserved memory for query result sets as a ratio greater than 0, representing the fraction of the system memory not allocatable for the CPU buffer pool. Values of 1.0 are permitted to allow over-subscription when warranted, but too high a value can cause out-of-memory errors.
Example: In a system with 256GB of Ram, the default of the cpu-buffer-size is 204.8GB so this ratio will be calculated on 51.2GB, limiting the maximum result set memory for a single query to 41GB
Executor-cpu-result-mem-bytes
Set executor resource manager reserved memory for query result sets in bytes. This overrides the default reservation of 80% the size of the system memory that is not allocated for the CPU buffer pool. Use 0 for auto.
DEFAULT: None (result memory size is controlled via the ratio setting above)
executor-per-query-max-cpu-threads-ratio
Set max fraction of executor resource manager total CPU slots/threads that can be allocated for a single query.
Note we allow executor-per-query-max-cpu-threads-ratio to have values > 1 to allow over-subscription of threads when warranted, given we may be overly pessimistic about kernel core occupation for some classes of queries. Care should be taken however with setting this value too high as thrashing and thread starvation can result. Example: on a physical server with 24 logical CPUs or in a VM with 24 vCPU the executor thread will be doubled to 48, so a value of 0.9 will use up 43 threads for a single query. This value can be lowered to lower memory requirements of single queries.
DEFAULT: 0.9
executor-per-query-max-cpu-result-mem-ratio
Set max fraction of executor resource manager total CPU result memory reservation that can be allocated for a single query.
Note we allow executor-per-query-max-cpu-result-mem-ratio to have values > 0 to allow over-subscription of memory when warranted, but user should be careful with this as too high a value can cause out-of-memory errors.
Default: 0.8
filter-push-down-low-frac
Higher threshold for selectivity of filters which are pushed down. Filters with selectivity lower than this threshold are considered for a push down.
filter-push-down-passing-row-ubound
Upper bound on the number of rows that should pass the filter if the selectivity is less than the high fraction threshold.
flush-log [arg]
Immediately flush logs to disk. Set to FALSE if this is a performance bottleneck.
TRUE[1]
from-table-reordering [=arg(=1)] (=1)
Enable automatic table reordering in FROM clause. Reorders the sequence of a join to place large tables on the inside of the join clause and smaller tables on the outside. HEAVY.AI also reorders tables between join clauses to prefer hash joins over loop joins. Change this value only in consultation with an HEAVY.AI engineer.
TRUE[1]
gpu-buffer-mem-bytes [=arg]
Size of memory reserved for GPU buffers in bytes per GPU. Change to restrict the amount of GPU memory HeavyDB can consume per GPU. A default value of 0 indicates no limit on GPU memory use (HeavyDB uses all available GPU memory across all active GPUs on the system).
0
Maximum amount of memory in bytes that can be used for the GPU code cache.
134217728 (128MB)
gpu-input-mem-limit arg
Force query to CPU when input data memory usage exceeds this percentage of available GPU memory. OmniSciDB loads data to GPU incrementally until data exceeds GPU memory, at which point the system retries on CPU. Loading data to GPU evicts any resident data already loaded or any query results that are cached. Use this limit to avoid attempting to load datasets to GPU when they obviously will not fit, preserving cached data on GPU and increasing query performance.
If watchdog is enabled and allow-cpu-retry
is not enabled, the query fails instead of re-running on CPU.
0.9
hashtable-cache-total-bytes [=arg]
The total size of the cache storage for hashtable recycler, in bytes. Increase the cache size to store more hashtables. Must be larger than or equal to the value defined in max-cacheable-hashtable-size-bytes
.
4294967296 (4GB)
hll-precision-bits [=arg]
Number of bits used from the hash value used to specify the bucket number. Change to increase or decrease approx_count_distinct()
precision. Increased precision decreases performance.
11
http-port arg
HTTP port number. Change to avoid collisions with ports already in use.
6278
idle-session-duration arg
Maximum duration of an idle session, in minutes. Change to increase or decrease duration of an idle session before timeout.
60
inner-join-fragment-skipping [=arg(=1)] (=0)
Enable or disable inner join fragment skipping. Enables skipping fragments for improved performance during inner join operations.
FALSE[0]
license arg
Path to the file containing the license key. Change if your license file is in a different location or has a different name.
log-auto-flush
Flush logging buffer to file after each message. Changing to false can improve performance, but log lines might not appear in the log for a very long time. HEAVY.AI does not recommend changing this setting.
TRUE[1]
log-directory arg
Path to the log directory. Can be either a relative path to the $HEAVYAI_STORAGE/data directory or an absolute path. Use this flag to control the location of your HEAVY.AI log files. If the directory does not exist, HEAVY.AI creates the top level directory. For example, a/b/c/logdir is created only if the directory path a/b/c already exists.
/var/lib/heavyai/ data/heavyai_log
log-file-name
Boilerplate for the name of the HEAVY.AI log files. You can customize the name of your HEAVY.AI log files. {SEVERITY} is the only braced token recognized. It allows you to create separate files for each type of error message greater than or equal to the log-severity configuration option.
heavydb.{SEVERITY}. %Y%m%d-%H%M%S.log
log-max-files
Maximum number of log files to keep. When the number of log files exceeds this number, HEAVY.AI automatically deletes the oldest files.
100
log-min-free-space
Minimum number of bytes left on device before oldest log files are deleted. This is a safety feature to be sure the disk drive of the log directory does not fill up, and guarantees that at least this many bytes are free.
20971520
log-rotation-size
Maximum file size in bytes before new log files are started. Change to increase/decrease size of files. If log files fill quickly, you might want to increase this number so that there are fewer log files.
10485760
log-rotate-daily
Start new log files at midnight. Set to false to write to log files until they are full, rather than restarting each day.
TRUE[1]
log-severity
Log to file severity levels:
DEBUG4
DEBUG3
DEBUG2
DEBUG1
INFO
WARNING
ERROR
FATAL
All levels after your chosen base severity level are listed. For example, if you set the severity level to WARNING, HEAVY.AI only logs WARNING, ERROR, and FATAL messages.
INFO
log-severity-clog
Log to console severity level: INFO WARNING ERROR FATAL. Output chosen severity messages to STDERR from running process.
WARNING
log-symlink
Symbolic link to the active log. Creates a symbolic link for every severity greater than or equal to the log-severity configuration option.
heavydb. {SEVERITY}.log
log-user-id
Log internal numeric user IDs instead of textual user names.
log-user-origin
Look up the origin of inbound connections by IP address and DNS name and print this information as part of stdlog. Some systems throttle DNS requests or have other network constraints that preclude timely return of user origin information. Set to FALSE to improve performance on those networks or when large numbers of users from different locations make rapid connect/disconnect requests to the server.
TRUE[1]
logs-system-tables-max-files-count [=arg]
Maximum number of log files that can be processed by each logs system table.
100
max-cacheable-hashtable-size-bytes [=arg]
Maximum size of the hashtable that the hashtable recycler can store. Limiting the size can enable more hashtables to be stored. Must be lesser than or equal to the value defined in hashtable-cache-total-bytes
.
2147483648 (2GB)
max-session-duration arg
Maximum duration of the active session, in minutes. Change to increase or decrease session duration before timeout.
43200 (30 days)
null-div-by-zero [=arg]
Allows processing to complete when when the dataset would cause a divide by zero error. Set to TRUE if you prefer to return null when dividing by zero, and set to FALSE to throw an exception.
FALSE[0]
num-executors
arg
Beta functionality in Release 5.7. Set the number of executors.
num-gpus
arg
Number of GPUs to use. In a shared environment, you can assign the number of GPUs to a particular application. The default, -1, uses all available GPUs. Use in conjunction with start-gpu
.
-1
num-reader-threads
arg
Number of reader threads to use. Drop the number of reader threads to prevent imports from using all available CPU power. Default is to use all threads.
0
overlaps-bucket-
threshold arg
The minimum size of a bucket corresponding to a given inner table range for the overlaps hash join.
-p | port int
HeavyDB server port. Change to avoid collisions with other services if 6274 is already in use.
6274
pending-query-interrupt-freq=
arg
Frequency with which to check the interrupt status of pending queries, in milliseconds. Values larger than 0 are valid. If you set pending-query-interrupt-freq=100
, each session's interrupt status is checked every 100 ms.
For example, assume you have three sessions (S1, S2, and S3) in your queue, and assume S1 contains a running query, and S2 and S3 hold pending queries. If you setpending-query-interrupt-freq=1000
both S2 and S3 are interrupted every 1000 ms (1 sec). See running-query-interrupt-freq
for information about interrupting running queries.
Decreasing the value increases the speed with which pending queries are removed, but also increases resource usage.
1000 (1 sec)
pki-db-client-auth [=
arg
]
Attempt authentication of users through a PKI certificate. Set to TRUE for the server to attempt PKI authentication.
FALSE[0]
read-only [=arg(=1)]
Enable read-only mode. Prevents changes to the dataset.
FALSE[0]
render-mem-bytes arg
Specifies the size of a per-GPU buffer that render query results are written to; allocated at the first rendering call. Persists while the server is running unless you run \clear_gpu_memory
. Increase if rendering a large number of points or symbols and you get the following out-of-memory exception: Not enough OpenGL memory to render the query results.
Default is 500 MB.
500000000
render-oom-retry-threshold = arg
A render execution time limit in milliseconds to retry a render request if an out-of-gpu-memory error is thrown. Requires enable-auto-clear-render-mem = true.
If enable-auto-clear-render-mem
= true, a retry of the render request can be performed after an out-of-gpu-memory exception. A retry only occurs if the first run took less than the threshold set here (in milliseconds). The retry is attempted after the render gpu memory is automatically cleared. If an OOM exception occurs, clearing the memory might get the request to succeed. Providing a reasonable threshold might give more stability to memory-constrained servers w/ rendering enabled. Only a single retry is attempted. A value of 0 disables retries.
rendering [=arg]
Enable or disable backend rendering. Disable rendering when not in use, freeing up memory reserved by render-mem-bytes
. To reenable rendering, you must restart HEAVY.AI Server.
TRUE[1]
res-gpu-mem =arg
Reserved memory for GPU. Reserves extra memory for your system (for example, if the GPU is also driving your display, such as on a laptop or single-card desktop). HEAVY.AI uses all the memory on the GPU except for render-mem-bytes
+ res-gpu-mem
. Also useful if other processes, such as a machine-learning pipeline, share the GPU with HEAVY.AI. In advanced rendering scenarios or distributed setups, increase to free up additional memory for the renderer, or for aggregating results for the renderer from multiple leaf nodes. HEAVY.AI recommends always setting res-gpu-mem
when using backend rendering.
134217728
running-query-interrupt-freq
arg
Controls the frequency of interruption status checking for running queries. Range: 0.0 (less frequently) to 1.0 (more frequently).
For example, if you have 10 threads that evaluate a query of a table that has 1000 rows, then each thread advances its thread index up to 10 times. In this case, if you set the flag close to 1.0, you check a session's interrupt status for every increment of the thread index.
If we set the flag value as close to 0.0, you only check the session's interrupt status when the index increment is close to 10. The default value of running interrupt checking is close to half of the maximum increment of the thread index.
Frequent interrupt status checking reduces latency for the interrupt but also can decrease query performance.
seek-kafka-commit = <N>
Set the offset of the last Kafka message to be committed from a Kafka data stream. Set the offset of the last Kafka message to be committed from a Kafka data stream. This way, Kafka does not resend those messages. After the Kafka server commits messages through the number N, it resends messages starting at message N+1. This is particularly useful when you want to create a replica of the HEAVY.AI server from an existing data directory.
N/A
ssl-cert path
Path to the server's public PKI certificate (.crt file). Define the path the the .crt file. Used to establish an encrypted binary connection.
ssl-keystore path
Path to the server keystore. Used for an encrypted binary connection. The path to Java trust store containing the server's public PKI key. Used by HeavyDB to connect to the encrypted Calcite server port.
ssl-keystore-password password
The password for the SSL keystore. Used to create a binary encrypted connection to the Calcite server.
ssl-private-key path
Path to the server's private PKI key. Define the path to the HEAVY.AI server PKI key. Used to establish an encrypted binary connection.
ssl-trust-ca path
Enable use of CA-signed certificates presented by Calcite. Defines the file that contains trusted CA certificates. This information enables the server to validate the TCP/IP Thrift connections it makes as a client to the Calcite server. The certificate presented by the Calcite server is the same as the certificate used to identify the database server to its clients.
ssl-trust-ca-server path
Path to the file containing trusted CA certificates; for PKI authentication. Used to validate certificates submitted by clients. If the certificate provided by the client (in the password
field of the connect
command) was not signed by one of the certificates in the trusted file, then the connection fails.
PKI authentication works only if the server is configured to encrypt connections via TLS. The common name extracted from the client certificate is used as the name of the user to connect. If this name does not already exist, the connection fails. If LDAP or SAML are also enabled, the servers fall back to these authentication methods if PKI authentication fails.
Currently works only with JDBC clients. To allow connection from other clients, set allow-local-auth-fallback
or add LDAP/SAML authentication.
ssl-trust-password password
The password for the SSL trust store. Password to the SSL trust store containing the server's public PKI key. Used to establish an encrypted binary connection.
ssl-trust-store path
The path to Java trustStore containing the server's public PKI key. Used by the Calcite server to connect to the encrypted OmniSci server port, to establish an encrypted binary connection.
start-gpu arg
First GPU to use. Used in shared environments in which the first assigned GPU is not GPU 0. Use in conjunction with num-gpus
.
FALSE[0]
trivial-loop-join-threshold [=arg]
The maximum number of rows in the inner table of a loop join considered to be trivially small.
1000
use-hashtable-cache
Set to TRUE to enable the hashtable recycler. Supports complex scenarios, such as hashtable recycling for queries that have subqueries.
TRUE[1]
vacuum-min-selectivity [=arg]
Specify the percentage (with a value of 0 implying 0% and a value of 1 implying 100%) of deleted rows in a fragment at which to perform automatic vacuuming.
Automatic vacuuming occurs when deletes or updates on variable-length columns result in a percentage of deleted rows in a fragment exceeding the specified threshold. The default threshold is 10% of deleted rows in a fragment.
When changing this value, consider the most common types of queries run on the system. In general, if you have infrequent updates and deletes, set vacuum-min-selectivity
to a low value. Set it higher if you have frequent updates and deletes, because vacuuming adds overhead to affected UPDATE and DELETE queries.
watchdog-none-encoded-string-translation-limit [=arg]
The number of strings that can be casted using the ENCODED_TEXT string operator.
1,000,000
window-function-frame-aggregation-tree-fanout [=arg]
Tree fan out of the aggregation tree is used to compute aggregation over the window frame.
8
Flag
Description
Default Value
cluster arg
Path to data leaves list JSON file. Indicates that the HEAVY.AI server instance is an aggregator node, and where to find the rest of its cluster. Change for testing and debugging.
$HEAVYAI_BASE
compression-limit-bytes [=arg(=536870912)] (=536870912)
Compress result sets that are transferred between leaves. Minimum length of payload above which data is compressed.
536870912
compressor arg (=lz4hc)
Compressor algorithm to be used by the server to compress data being transferred between server. See Data Compression for compression algorithm options.
lz4hc
ldap-dn arg
LDAP Distinguished Name.
ldap-role-query-regex arg
RegEx to use to extract role from role query result.
ldap-role-query-url arg
LDAP query role URL.
ldap-superuser-role arg
The role name to identify a superuser.
ldap-uri arg
LDAP server URI.
leaf-conn-timeout [=arg]
Leaf connect timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if a connection cannot be established.
20000
leaf-recv-timeout [=arg]
Leaf receive timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if data is not received in the time allotted.
300000
leaf-send-timeout [=arg]
Leaf send timeout, in milliseconds. Increase or decrease to fail Thrift connections between HeavyDB instances more or less quickly if data is not sent in the time allotted.
300000
saml-metadata-file arg
Path to identity provider metadata file.
Required for running SAML. An identity provider (like Okta) supplies a metadata file. From this file, HEAVY.AI uses:
Public key of the identity provider to verify that the SAML response comes from it and not from somewhere else.
URL of the SSO login page used to obtain a SAML token.
saml-sp-target-url arg
URL of the service provider for which SAML assertions should be generated. Required for running SAML. Used to verify that a SAML token was issued for HEAVY.AI and not for some other service.
saml-sync-roles arg (=0)
Enable mapping of SAML groups to HEAVY.AI roles. The SAML Identity provider (for example, Okta) automatically creates users at login and assigns them roles they already have as groups in SAML.
saml-sync-roles [=0]
string-servers arg
Path to string servers list JSON file. Indicates that HeavyDB is running in distributed mode and is required to designate a leaf server when running in distributed mode.