tf_compute_dwell_times

Given a query input with entity keys (for example, user IP addresses) and timestamps (for example, page visit timestamps), and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session (dwell time).

Syntax

select * from table( 
  tf_compute_dwell_times( 
    data => CURSOR( 
      select 
        entity_id, 
        site_id, 
        ts, 
      from 
        <table> 
      where 
        ... 
        ), 
        min_dwell_seconds => <seconds>, 
        min_dwell_points => <points>, 
        max_inactive_seconds => <seconds> 
        ) 
      );

Input Arguments

Parameter
Description
Data Type

entity_id

Column containing keys/IDs used to identify the entities for which dwell/session times are to be computed. Examples include IP addresses of clients visiting a website, login IDs of database users, MMSIs of ships, and call signs of airplanes.

Column<TEXT ENCODING DICT | BIGINT>

site_id

Column containing keys/IDs of dwell “sites” or locations that entities visit. Examples include website pages, database session IDs, ports, airport names, or binned h3 hex IDs for geographic location.

Column<TEXT ENCODING DICT | BIGINT>

ts

Column denoting the time at which an event occurred.

Column<TIMESTAMP(0|3|6|0)>

min_dwell_seconds

Constant integer value specifying the minimum number of seconds required between the first and last timestamp-ordered record for an entity_id at a site_id to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3600 (one hour), but only 1800 seconds elapses between an entity’s first and last ordered timestamp records at a site, these records are not considered a valid session and a dwell time for that session is not calculated.

BIGINT (other integer types are automatically casted to BIGINT)

min_dwell_points

A constant integer value specifying the minimum number of successive observations (in ts timestamp order) required to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3, but only two consecutive records exist for a user at a site before they move to a new site, no dwell time is calculated for the user.

BIGINT (other integer types are automatically casted to BIGINT)

max_inactive_seconds

A constant integer value specifying the maximum time in seconds between two successive observations for an entity at a given site before the current session/dwell time is considered finished and a new session/dwell time is started. For example, if this variable is set to 86400 seconds (one day), and the time gap between two successive records for an entity id at a given site id is 86500 seconds, the session is considered ended at the first timestamp-ordered record, and a new session is started at the timestamp of the second record.

BIGINT (other integer types are automatically casted to BIGINT)

Output Columns

Name
Description
Data Type

entity_id

The ID of the entity for the output dwell time, identical to the corresponding entity_id column in the input.

Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the entity_id input column type)

site_id

The site ID for the output dwell time, identical to the corresponding site_id column in the input.

Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id input column type)

prev_site_id

The site ID for the session preceding the current session, which might be a different site_id, the same site_id (if successive records for an entity at the same site were split into multiple sessions because the max_inactive_seconds threshold was exceeded), or null if the last site_id visited was null.

Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id input column type)

next_site_id

The site id for the session after the current session, which might be a different site_id, the same site_id (if successive records for an entity at the same site were split into multiple sessions due to exceeding the max_inactive_seconds threshold, or null if the next site_id visited was null.

Column<TEXT ENCODING DICT> | Column<BIGINT> (type will be the same as the site_id input column type)

session_id

An auto-incrementing session ID specific/relative to the current entity_id, starting from 1 (first session) up to the total number of valid sessions for an entity_id, such that each valid session dwell time increments the session_id for an entity by 1.

Column<INT>

start_seq_id

The index of the nth timestamp (ts-ordered) record for a given entity denoting the start of the current output row's session.

Column<INT>

dwell_time_sec

The duration in seconds for the session.

Column<INT>

num_dwell_points

The number of records/observations constituting the current output row's session.

Column<INT>

Example

/* Data from https://www.kaggle.com/datasets/vodclickstream/netflix-audience-behaviour-uk-movies */

select
  *
from
  table(
    tf_compute_dwell_times(
      data => cursor(
        select
          user_id,
          movie_id,
          ts
        from
          netflix_audience_behavior
      ),
      min_dwell_points => 3,
      min_dwell_seconds => 600,
      max_inactive_seconds => 10800
    )
  )
order by
  num_dwell_points desc
limit
  10;

entity_id|site_id|prev_site_id|next_site_id|session_id|start_seq_id|ts|dwell_time_sec|num_dwell_points
59416738c3|cbdf9820bc|d058594d1c|863b39bbe8|2|19|2017-02-21 15:12:11.000000000|4391|54
16d994f6dd|1bae944666|4f1cf3c2dc|NULL|5|61|2017-11-11 20:27:02.000000000|9570|36
3675d9ba4a|948f2b5bf6|948f2b5bf6|69cb38018a|2|11|2018-11-26 18:42:52.000000000|3600|34
da01959c0b|fd711679f9|1f579d43c3|NULL|5|90|2019-03-21 05:37:22.000000000|7189|31
23c52f9b50|df00041e47|df00041e47|NULL|2|39|2019-01-21 15:53:33.000000000|1227|29
da01959c0b|8ab46a0cb1|f1fffa6ff4|1f579d43c3|3|29|2019-03-12 04:33:01.000000000|6026|29
23c52f9b50|df00041e47|NULL|df00041e47|1|10|2019-01-21 15:33:39.000000000|1194|28
da01959c0b|1f579d43c3|8ab46a0cb1|fd711679f9|4|63|2019-03-17 02:01:49.000000000|7240|27
3261cb81a5|1cb40406ae|NULL|NULL|1|2|2019-04-28 20:48:24.000000000|11240|27
dbed64ce9e|c5830185ca|NULL|NULL|1|3|2019-03-01 06:43:32.000000000|7261|25