Executor Resource Manager

Overview

To enable concurrent execution of queries, we introduce the concept of an Executor Resource Manager (ERM). This keeps track of compute and memory resources to gate query execution and ensures that compute resources are not over-subscribed. As of version 7.0, ERM is enabled by default.

The ERM evaluates several kinds of resources required by a query. Currently this includes CPU cores, GPUs, buffer and result set memory. It will leverage all available resources unless policy limits have been established, such as for maximum memory use or query time. It determines both the ideal/maximum amount of resources desirable for optimal performance and the minimum required. For example, a CPU query scanning 8 fragments could run with up to 8 threads, but could execute with as little as a single CPU thread with correspondingly less memory if needed.

The ERM establishes a request queue. On every new request, as well as every time an existing request is completed, it checks available resources and picks the next resource request to grant. It currently always gives preference to earlier requests if resources permit launching them (first in, first out, or “FIFO”).

If the system-level multi-executor flag is enabled, the ERM will allow multiple queries to execute at once so long as resources are available. Currently, multiple execution is allowed for CPU queries (and multiple CPU queries and a single GPU query). This supports significant throughput gains by allowing inter-query-kernel concurrency, in addition to the major win of not having a long-running CPU query block the queue for other CPU queries or interactive GPU queries. The number of queries that can be run in parallel is limited by the number of executors

Use of CPU and GPU

By default, if HeavyDB is compiled to run on GPUs and if GPUs are available, query steps/kernels will execute on GPU UNLESS:

  1. Some operations in the query step cannot run on GPU. Operators like MODE, APPROX_MEDIAN/PERCENTILE, and certain string functions are examples.

  2. Update and delete queries currently run on CPU.

  3. The query step requires more memory than available on GPU, but less than available on CPU.

  4. A user explicitly requests their query run on CPU, either via setting a session flag or via a query hint.

At the instance level, this behavior can be configured with system flags on startup. For example a system with GPU can be configured to use only CPU using the cpu-only flag. Or the system use of CPU RAM can be controlled using cpu-buffer-mem-bytes. Execution can also be routed to different device types with query hints such as “SELECT /*+ cpu_mode */ …” These controls do not require the ERM but are platform-wide.

Example Use Cases

Example 1: (no tuning required)

In a scenario where the system hasn’t enough memory available for the cpu-cache or the cache itself is too fragmented to accommodate all the columns’ chunks into cpu-caches, the EMR instead of failing the query with an OOM error, will

  1. run the query reading a single chunk at a time and moving data to GPU caches for a GPU execution.

  2. in case that there isn’t enough GPU memory will run the query chunk by chunk in CPU mode. In this case the query will run slower, but this will free up the GPU executor for less memory demanding queries.

Example 2: (minimal tuning required)

You are deploying a new dashboard or chart which doesn’t require big data or high performance, and so you prefer to run it just on CPU. This way it doesn’t interfere with other performance-critical dashboards or charts.

  1. Set the dashboard chart execution to CPU using query hints. Instead of referencing data directly, set a new “custom data source.” For example, if your data is in a table called ‘mydata’, In the custom source, after your SELECT keyword, add the CPU query hint: You can repeat this for a data source supporting any number of charts desired, including all charts.

  2. Bump up the number of executors (default 4) to 6-8. With more executors free, the dashboard will perform better, without impacting the performance of the other dashboards.

Example 3: (some tuning required)

Improving performance of memory-intensive operations like high cardinality aggregates.

A user conducting exact “count distinct” operations on large datasets, with high cardinality that are likely to be run on CPU, on a server having many CPU cores might employ the following strategy:

  1. Increase the number of executors (default 4) to 8-16. --num-executors=16

  2. Limit CPU total memory use using --cpu-buffer-mem-bytes from default 80% to make some room for large result sets, that now are limited by the executor-cpu-result-mem-ratio.

If those query have sparse value or and high cardinality and are using a wide count distinct will be pushed to CPU execution. Change the executor-per-query-max-cpu-threads-ratio parameter to lower the number of cores that will run a single query; doing that the groupby buffers will be built in a faster way, lowering the memory footprint and speeding up the runtime of query.