Window Functions

Window functions allow you to work with a subset of rows related to the currently selected row. For a given dimension, you can find the most associated dimension by some other measure (for example, number of records or sum of revenue).

Window functions must always contain an OVER clause. The OVER clause splits up the rows of the query for processing by the window function.

The PARTITION BY list divides the rows into groups that share the same values of the PARTITION BY expression(s). For each row, the window function is computed using all rows in the same partition as the current row.

Rows that have the same value in the ORDER BY clause are considered peers. The ranking functions give the same answer for any two peer rows.

Supported Window Functions

HeavyDB supports the aggregate functions AVG, MIN, MAX, SUM, and COUNT in window functions.

Updates on window functions are supported, assuming the target table is single-fragment. Updates on multi-fragment target tables are not currently supported.

Example

This query shows the top airline carrier for each state, based on the number of departures.

select origin_state, carrier_name, n 
   from (select origin_state, carrier_name, row_number() over(
      partition by origin_state order by n desc) as rownum, n 
         from (select origin_state, carrier_name, count(*) as n 
            from flights_2008_7M where extract(year 
               from dep_timestamp) = 2008 
   group by origin_state, carrier_name )) where rownum = 1

Window Frames

A window function can include a frame clause that specifies a set of neighboring rows of the current row belonging to the same partition. This allows us to compute a window aggregate function over the window frame, instead of computing it against the entire partition. Note that a window frame for the current row is computed based on either 1) the number of rows before or after the current row (called rows mode) or 2) the specified ordering column value in the frame clause (called range mode).

For example:

  • From the starting row of the partition to the current row: Using the sum aggregate function, you can compute the running sum of the partition.

  • You can construct a frame based on the position of the rows (called rows mode): For example, a row before 3 rows and after 2 rows:

    • You can compute the aggregate function of the frame having up to six rows (including the current row).

  • You can organize a frame based on the value of the ordering column (called range mode): Assuming C as the current ordering column value, we can compute aggregate value of the window frame which contains rows having ordering column values between (C - 3) and (C + 2).

Window functions that ignore the frame are evaluated on the entire partition.

Note that we can define the window frame clause using rows mode with an ordering column.

You can use the following aggregate functions with the window frame clause.

Supported Functions

Syntax

<frame_mode> | <frame_bound> <frame_mode>can be one of the following:

  • rows

  • range

Example

1 | 2 | 3 | 4 | 5.5 | 7.5 | 8 | 9 | 10 → value of a each tuple’s order by expression.

When the current row has a value 5.5:

  • ROWS BETWEEN 3 PRECEDING and 3 FOLLOWING : 3 rows before and 3 rows after → {2, 3, 4, 5.5, 7.5, 8, 9 }

  • RANGE BETWEEN 3 PRECEDING and 3 FOLLOWING: 5.5 - 3 <= x <= 5.5 + 3 → { 3, 4, 5.5, 8 }

<frame_bound>:

  • frame_start or

  • frame_between: between frame_start and frame_end

frame_start and frame_end can be one of the following:

  • UNBOUNDED PRECEDING: The start row of the partition that the current row belongs to.

  • UNBOUNDED FOLLOWING: The end row of the partition that the current row belongs to.

  • CURRENT ROW

    • For rows mode: the current row.

    • For range mode: the peers of the current row. A peer is a row having the same value as the ordering column expression of the current row. Note that all null values are peers of each other.

  • expr PRECEDING

    • For rows mode: expr row before the current row.

    • For range mode: rows with the current row’s ordering expression value minus expr.

    • For DATE, TIME, and TIMESTAMP: Use the INTERVAL keyword with a specific time unit, depending on a data type:

      • TIMESTAMP type: NANOSECOND, MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR, DAY, MONTH, and YEAR

      • TIME type: SECOND, MINUTE, and HOUR

      • DATE type: DAY, MONTH, and YEAR

        For example: RANGE BETWEEN INTERVAL 1 DAY PRECEDING and INTERVAL 3 DAY FOLLOWING

    • Currently, only literal expressions as expr such as 1 PRECEDING and 100 PRECEDING are supported.

  • expr FOLLOWING

    • For rows mode: expr row after the current row.

    • For range mode: rows with the current row’s ordering expression value plus expr.

    • For DATE, TIME, and TIMESTAMP: Use the INTERVAL keyword with a specific time unit, depending on a data type:

      • TIMESTAMP type: NANOSECOND, MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR, DAY, MONTH, and YEAR

      • TIME type: SECOND, MINUTE, and HOUR

      • DATE type: DAY, MONTH, and YEAR

        For example: RANGE BETWEEN INTERVAL 1 DAY PRECEDING and INTERVAL 3 DAY FOLLOWING

    • Currently, only support literal expression as expr such as 1 FOLLOWING and 100 FOLLOWING are supported.

UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING have the same meaning in both rows and range mode.

When the query has no window frame bound, the window aggregate function is computed differently depending on the existence of the ORDER BY clause:

  • Has ORDER BY clause: The window function is computed with the default frame bound, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

  • No Order BY clause: The window function is computed over the entire partition.

Named Window Function Clause

You can refer to the same window clause in multiple window aggregate functions by defining it with a unique name in the query definition.

For example, you can define the named window clauses W1 and W2 as follows:

select min(x) over w1, max(x) over w2 from test window w1 as (order by y), 
  w2 as (partition by y order by z rows between 2 preceding and 2 following);

Named window function clause w1 refers to a window function clause without a window frame clause, and w2 refers to a named window frame clause.

Notes and Restrictions

  • To use window framing, you may need an ORDER BY clause in the window definition. Depending on the framing mode used, the constraint varies:

    • Row mode: no restriction of the existence of the ordering column. It also can include multiple ordering columns.

    • Range mode: only a single ordering column is required (not multi-column ordering).

  • Currently, all window functions including aggregation over window frame are computed via CPU-mode.

  • For window frame bound expressions, only non-negative integer literals are supported.

  • GROUPING mode and EXCLUDING are not currently supported.