All pages
Powered by GitBook
1 of 38

Data Manipulation (DML)

SQL Capabilities

ALTER SESSION SET

Change a parameter value for the current session.

ALTER SESSION SET <parameter_name>=<parameter_value>
Paremeter name
Values

EXECUTOR_DEVICE

CPU - Set the session to CPU execution mode:

ALTER SESSION SET EXECUTOR_DEVICE='CPU'; GPU - Set the session to GPU execution mode:

ALTER SESSION SET EXECUTOR_DEVICE='GPU'; NOTE: These parameter values have the same effect as the \cpu and \gpu commands in heavysql, but can be used with any tool capable of running sql commands.

CURRENT_DATABASE

Can be set to any string value.

If the value is a valid database name, and the current user has access to it, the session switches to the new database. If the user does not have access or the database does not exist, an error is returned and the session will fall back to the starting database.

Alter Session Examples

CURRENT_DATABASE

Switch to another database without need of re-login.

ALTER SESSION SET CURRENT_DATABASE='owned_database'; 

Your session will silently switch to the requested database.

The database exists, but the user does not have access to it:

ALTER SESSION SET CURRENT_DATABASE='information_schema';
TException - service has thrown: TDBException(error_msg=Unauthorized access: 
user test is not allowed to access database information_schema.)

The database does not exist:

ALTER SESSION SET CURRENT_DATABASE='not_existent_db'; 
TException - service has thrown: TDBException(error_msg=Database name 
not_existent_db does not exist.)

EXECUTOR_DEVICE

Force the session to run the subsequent SQL commands in CPU mode:

ALTER SESSION SET EXECUTOR_DEVICE='CPU';

Switch back the session to run in GPU mode

ALTER SESSION SET EXECUTOR_DEVICE='GPU';

ALTER SYSTEM CLEAR

Clear CPU, GPU, or RENDER memory. Available to super users only.

ALTER SYSTEM CLEAR (CPU|GPU|RENDER) MEMORY

Examples

ALTER SYSTEM CLEAR CPU MEMORY
ALTER SYSTEM CLEAR GPU MEMORY
ALTER SYSTEM CLEAR RENDER MEMORY

Generally, the server handles memory management, and you do not need to use this command. If you are having unexpected memory issues, try clearing the memory to see if performance improves.

DELETE

Deletes rows that satisfy the WHERE clause from the specified table. If the WHERE clause is absent, all rows in the table are deleted, resulting in a valid but empty table.

DELETE FROM table_name [ * ] [ [ AS ] alias ]
[ WHERE condition ]

Cross-Database Queries

In Release 6.4 and higher, you can run DELETE queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases.

To execute queries against another database, you must have ACCESS privilege on that database, as well as DELETE privilege.

Example

Delete rows from a table in the my_other_db database:

DELETE FROM my_other_db.customers WHERE id > 100;

EXPLAIN

Shows generated Intermediate Representation (IR) code, identifying whether it is executed on GPU or CPU. This is primarily used internally by HEAVY.AI to monitor behavior.

EXPLAIN <STMT>

For example, when you use the EXPLAIN command on a basic statement, the utility returns 90 lines of IR code that is not meant to be human readable. However, at the top of the listing, a heading indicates whether it is IR for the CPU or IR for the GPU, which can be useful to know in some situations.

EXPLAIN CALCITE

Returns a relational algebra tree describing the high-level plan to execute the statement.

EXPLAIN CALCITE <STMT>

The table below lists the relational algebra classes used to describe the execution plan for a SQL statement.

Method

Description

LogicalAggregate

Operator that eliminates duplicates and computes totals.

LogicalCalc

Expression that computes project expressions and also filters.

LogicalChi

Operator that converts a stream to a relation.

LogicalCorrelate

Operator that performs nested-loop joins.

LogicalDelta

Operator that converts a relation to a stream.

LogicalExchange

Expression that imposes a particular distribution on its input without otherwise changing its content.

LogicalFilter

Expression that iterates over its input and returns elements for which a condition evaluates to true.

LogicalIntersect

Expression that returns the intersection of the rows of its inputs.

LogicalJoin

Expression that combines two relational expressions according to some condition.

LogicalMatch

Expression that represents a MATCH_RECOGNIZE node.

LogicalMinus

Expression that returns the rows of its first input minus any matching rows from its other inputs. Corresponds to the SQL EXCEPT operator.

LogicalProject

Expression that computes a set of ‘select expressions’ from its input relational expression.

LogicalSort

Expression that imposes a particular sort order on its input without otherwise changing its content.

LogicalTableFunctionScan

Expression that calls a table-valued function.

LogicalTableModify

Expression that modifies a table. Similar to TableScan, but represents a request to modify a table instead of read from it.

LogicalTableScan

Reads all the rows from a RelOptTable.

LogicalUnion

Expression that returns the union of the rows of its inputs, optionally eliminating duplicates.

LogicalValues

Expression for which the value is a sequence of zero or more literal row values.

LogicalWindow

Expression representing a set of window aggregates. See Window Functions

For example, a SELECT statement is described as a table scan and projection.

heavysql> EXPLAIN CALCITE (SELECT * FROM movies);
Explanation
LogicalProject(movieId=[$0], title=[$1], genres=[$2])
   LogicalTableScan(TABLE=[[CATALOG, heavyai, MOVIES]])

If you add a sort order, the table projection is folded under a LogicalSort procedure.

heavysql> EXPLAIN calcite (SELECT * FROM movies ORDER BY title);
Explanation
LogicalSort(sort0=[$1], dir0=[ASC])
   LogicalProject(movieId=[$0], title=[$1], genres=[$2])
      LogicalTableScan(TABLE=[[CATALOG, omnisci, MOVIES]])

When the SQL statement is simple, the EXPLAIN CALCITE version is actually less “human readable.” EXPLAIN CALCITE is more useful when you work with more complex SQL statements, like the one that follows. This query performs a scan on the BOOK table before scanning the BOOK_ORDER table.

heavysql> EXPLAIN calcite SELECT bc.firstname, bc.lastname, b.title, bo.orderdate, s.name
FROM book b, book_customer bc, book_order bo, shipper s
WHERE bo.cust_id = bc.cust_id AND b.book_id = bo.book_id AND bo.shipper_id = s.shipper_id
AND s.name = 'UPS';
Explanation
LogicalProject(firstname=[$5], lastname=[$6], title=[$2], orderdate=[$11], name=[$14])
    LogicalFilter(condition=[AND(=($9, $4), =($0, $8), =($10, $13), =($14, 'UPS'))])
        LogicalJoin(condition=[true], joinType=[INNER])
            LogicalJoin(condition=[true], joinType=[INNER])
                LogicalJoin(condition=[true], joinType=[INNER])
                    LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK]])
                    LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_CUSTOMER]])
                LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_ORDER]])
            LogicalTableScan(TABLE=[[CATALOG, omnisci, SHIPPER]])

Revising the original SQL command results in a more natural selection order and a more performant query.

heavysql> EXPLAIN calcite SELECT bc.firstname, bc.lastname, b.title, bo.orderdate, s.name
FROM book_order bo, book_customer bc, book b, shipper s
WHERE bo.cust_id = bc.cust_id AND bo.book_id = b.book_id AND bo.shipper_id = s.shipper_id
AND s.name = 'UPS';
Explanation
LogicalProject(firstname=[$10], lastname=[$11], title=[$7], orderdate=[$3], name=[$14])
    LogicalFilter(condition=[AND(=($1, $9), =($5, $0), =($2, $13), =($14, 'UPS'))])
        LogicalJoin(condition=[true], joinType=[INNER])
            LogicalJoin(condition=[true], joinType=[INNER])
                LogicalJoin(condition=[true], joinType=[INNER])
                  LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_ORDER]])
                  LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK_CUSTOMER]])
                LogicalTableScan(TABLE=[[CATALOG, omnisci, BOOK]])
            LogicalTableScan(TABLE=[[CATALOG, omnisci, SHIPPER]])

EXPLAIN CALCITE DETAILED

Augments the EXPLAIN CALCITE command by adding details about referenced columns in the query plan.

For example, for the following EXPLAIN CALCITE command execution:

heavysql> EXPLAIN CALCITE SELECT x, SUM(y) FROM test GROUP BY x;
Explanation
LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
  LogicalProject(x=[$0], y=[$2])
    LogicalTableScan(table=[[testDB, test]])

EXPLAIN CALCITE DETAILED adds more column details as seen below:

heavysql> EXPLAIN CALCITE DETAILED SELECT x, SUM(y) FROM test GROUP BY x;
Explanation
LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])	{[$1->db:testDB,tableName:test,colName:y]}
  LogicalProject(x=[$0], y=[$2])	{[$2->db:testDB,tableName:test,colName:y], [$0->db:testDB,tableName:test,colName:x]}
    LogicalTableScan(table=[[testDB, test]])

INSERT

INSERT INTO <table> (column1, ...) VALUES (row_1_value_1, ...), ..., (row_n_value_1, ...);

Examples

CREATE TABLE ar (ai INT[], af FLOAT[], ad2 DOUBLE[2]); 
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}); 
INSERT INTO ar VALUES (ARRAY[NULL,2],NULL,NULL); 
INSERT INTO ar VALUES (NULL,{},{2.0,NULL});
-- or a multi-row insert equivalent
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}), (ARRAY[NULL,2],NULL,NULL), (NULL,{},{2.0,NULL});

You can also insert into a table as SELECT, as shown in the following examples:

INSERT INTO destination_table SELECT * FROM source_table;
INSERT INTO destination_table (id, name, age, gender) SELECT * FROM source_table;
INSERT INTO destination_table (name, gender, age, id) SELECT name, gender, age, id  FROM source_table;
INSERT INTO votes_summary (vote_id, vote_count) SELECT vote_id, sum(*) FROM votes GROUP_BY vote_id;

You can insert array literals into array columns. The inserts in the following example each have three array values, and demonstrate how you can:

  • Create a table with variable-length and fixed-length array columns.

  • Insert NULL arrays into these colums.

  • Specify and insert array literals using {...} or ARRAY[...] syntax.

  • Insert empty variable-length arrays using{} and ARRAY[] syntax.

  • Insert array values that contain NULL elements.

CREATE TABLE ar (ai INT[], af FLOAT[], ad2 DOUBLE[2]); 
INSERT INTO ar VALUES ({1,2,3},{4.0,5.0},{1.2,3.4}); 
INSERT INTO ar VALUES (ARRAY[NULL,2],NULL,NULL); 
INSERT INTO ar VALUES (NULL,{},{2.0,NULL});

Default Values

CREATE TABLE tbl (
   id INTEGER NOT NULL, 
   name TEXT NOT NULL DEFAULT 'John Doe', 
   age SMALLINT NOT NULL);

If you omit the name column from an INSERT or INSERT FROM SELECT statement, the missing value for column name is set to 'John Doe'.

INSERT INTO tbl (id, age) VALUES (1, 36); creates the record 1|'John Doe'|36 .

INSERT INTO tbl (id, age) SELECT id, age FROM old_tbl; also sets all the name values to John Doe .

KILL QUERY

Interrupt a queued query. Specify the query by using its session ID.

To see the queries in the queue, use the SHOW QUERIES command:

show queries;
query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU
947-ooNP        |RUNNING_IMPORTER    |0          |2021-08-03 ...|IMPORT_GEO_TABLE|Rio       |tcp:::ffff:127.0.0.1:47314|omnisci|CPU

To interrupt the last query in the list (ID 946-ooNP):

kill query '946-ooNP'

Showing the queries again indicates that 946-ooNP has been deleted:

show queries;
query_session_id|current_status      |executor_id|submitted     |query_str       |login_name|client_address            |db_name|exec_device_type
713-t1ax        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
491-xpfb        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |Patrick   |http:::1                  |omnisci|GPU
451-gp2c        |PENDING_QUEUE       |0          |2021-08-03 ...|SELECT ...      |John      |http:::1                  |omnisci|GPU
190-5pax        |PENDING_EXECUTOR    |1          |2021-08-03 ...|SELECT ...      |Cavin     |http:::1                  |omnisci|GPU
720-nQtV        |RUNNING_QUERY_KERNEL|2          |2021-08-03 ...|SELECT ...      |Cavin     |tcp:::ffff:127.0.0.1:50142|omnisci|GPU
  • KILL QUERY is only available if the runtime query interrupt parameter (enable-runtime-query-interrupt) is set.

  • Interrupting a query in ‘PENDING_QUEUE’ status is supported in both distributed and single-server mode.

  • To enable query interrupt for tables imported from data files in local storage, set enable_non_kernel_time_query_interrupt to TRUE. (It is enabled by default.)

LIKELY/UNLIKELY

Expression

Description

LIKELY(X)

Provides a hint to the query planner that argument X is a Boolean value that is usually true. The planner can prioritize filters on the value X earlier in the execution cycle and return results more efficiently.

UNLIKELY(X)

Provides a hint to the query planner that argument X is a Boolean value that is usually not true. The planner can prioritize filters on the value X later in the execution cycle and return results more efficiently.

Usage Notes

SQL normally assumes that terms in the WHERE clause that cannot be used by indices are usually true. If this assumption is incorrect, it could lead to a suboptimal query plan. Use the LIKELY(X) and UNLIKELY(X) SQL functions to provide hints to the query planner about clause terms that are probably not true, which helps the query planner to select the best possible plan.

Use LIKELY/UNLIKELY to optimize evaluation of OR/AND logical expressions. LIKELY/UNLIKELY causes the left side of an expression to be evaluated first. This allows the right side of the query to be skipped when possible. For example, in the clause UNLIKELY(A) AND B, if A evaluates to FALSE, B does not need to be evaluated.

Consider the following:

SELECT COUNT(*) FROM test WHERE UNLIKELY(x IN (7, 8, 9, 10)) AND y > 42;

If x is one of the values 7, 8, 9, or 10, the filter y > 42 is applied. If x is not one of those values, the filter y > 42 is not applied.

SELECT

The SELECT command returns a set of records from one or more tables.

ORDER BY

  • Sort order defaults to ascending (ASC).

  • Sorts null values after non-null values by default in an ascending sort, before non-null values in a descending sort. For any query, you can use NULLS FIRST to sort null values to the top of the results or NULLS LAST to sort null values to the bottom of the results.

  • Allows you to use a positional reference to choose the sort column. For example, the command SELECT colA,colB FROM table1 ORDER BY 2 sorts the results on colB because it is in position 2.

Query Hints

HEAVY.AI provides various query hints for controlling the behavior of the query execution engine.

Syntax

SELECT hints must appear first, immediately after the SELECT statement; otherwise, the query fails.

By default, a hint is applied to the query step in which it is defined. If you have multiple SELECT clauses and define a query hint in one of those clauses, the hint is applied only to the specific query step; the rest of the query steps are unaffected. For example, applying the /* cpu_mode */ hint affects only the SELECT clause in which it exists.

You can define a hint to apply to all query steps by prepending g_ to the query hint. For example, if you define /*+ g_cpu_mode */, CPU execution is applied to all query steps.

HEAVY.AI supports the following query hints.

The marker hint type represents a Boolean flag.

The key-value pair type is a hint name and its value.

Cross-Database Queries

In Release 6.4 and higher, you can run SELECT queries across tables in different databases on the same HEAVY.AI cluster without having to first connect to those databases. This enables more efficient storage and memory utilization by eliminating the need for table duplication across databases, and simplifies access to shared data and tables.

To execute queries against another database, you must have ACCESS privilege on that database, as well as SELECT privilege.

Example

Execute a join query involving a table in the current database and another table in the my_other_db database:

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...