tf_compute_dwell_times
Given a query input with entity keys (for example, user IP addresses) and timestamps (for example, page visit timestamps), and parameters specifying the minimum session time, the minimum number of session records, and the max inactive seconds, outputs all unique sessions found in the data with the duration of the session (dwell time).
Syntax
Input Arguments
entity_id
Column containing keys/IDs used to identify the entities for which dwell/session times are to be computed. Examples include IP addresses of clients visiting a website, login IDs of database users, MMSIs of ships, and call signs of airplanes.
Column<TEXT ENCODING DICT | BIGINT>
site_id
Column containing keys/IDs of dwell “sites” or locations that entities visit. Examples include website pages, database session IDs, ports, airport names, or binned h3 hex IDs for geographic location.
Column<TEXT ENCODING DICT | BIGINT>
ts
Column denoting the time at which an event occurred.
Column<TIMESTAMP(0|3|6|0)>
min_dwell_seconds
Constant integer value specifying the minimum number of seconds required between the first and last timestamp-ordered record for an entity_id at a site_id to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3600 (one hour), but only 1800 seconds elapses between an entity’s first and last ordered timestamp records at a site, these records are not considered a valid session and a dwell time for that session is not calculated.
BIGINT (other integer types are automatically casted to BIGINT)
min_dwell_points
A constant integer value specifying the minimum number of successive observations (in ts
timestamp order) required to constitute a valid session and compute and return an entity’s dwell time at a site. For example, if this variable is set to 3, but only two consecutive records exist for a user at a site before they move to a new site, no dwell time is calculated for the user.
BIGINT (other integer types are automatically casted to BIGINT)
max_inactive_seconds
A constant integer value specifying the maximum time in seconds between two successive observations for an entity at a given site before the current session/dwell time is considered finished and a new session/dwell time is started. For example, if this variable is set to 86400 seconds (one day), and the time gap between two successive records for an entity id at a given site id is 86500 seconds, the session is considered ended at the first timestamp-ordered record, and a new session is started at the timestamp of the second record.
BIGINT (other integer types are automatically casted to BIGINT)
Output Columns
entity_id
The ID of the entity for the output dwell time, identical to the corresponding entity_id
column in the input.
Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the entity_id
input column type)
site_id
The site ID for the output dwell time, identical to the corresponding site_id
column in the input.
Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id
input column type)
prev_site_id
The site ID for the session preceding the current session, which might be a different site_id
, the same site_id
(if successive records for an entity at the same site were split into multiple sessions because the max_inactive_seconds
threshold was exceeded), or null
if the last site_id
visited was null
.
Column<TEXT ENCODING DICT> | Column<BIGINT> (type is the same as the site_id
input column type)
next_site_id
The site id for the session after the current session, which might be a different site_id
, the same site_id
(if successive records for an entity at the same site were split into multiple sessions due to exceeding the max_inactive_seconds
threshold, or null
if the next site_id
visited was null
.
Column<TEXT ENCODING DICT> | Column<BIGINT> (type will be the same as the site_id
input column type)
session_id
An auto-incrementing session ID specific/relative to the current entity_id
, starting from 1 (first session) up to the total number of valid sessions for an entity_id
, such that each valid session dwell time increments the session_id
for an entity by 1.
Column<INT>
start_seq_id
The index of the nth timestamp (ts
-ordered) record for a given entity denoting the start of the current output row's session.
Column<INT>
dwell_time_sec
The duration in seconds for the session.
Column<INT>
num_dwell_points
The number of records/observations constituting the current output row's session.
Column<INT>
Example