tf_feature_self_similarity
Given a query input of entity keys/IDs (for example, airplane tail numbers), a set of feature columns (for example, airports visited), and a metric column (for example number of times each airport was visited), scores each pair of entities based on their similarity. The score is computed as the cosine similarity of the feature column(s) between each entity pair, which can optionally be TF/IDF weighted.
Input Arguments
Parameter | Description | Data Type |
---|---|---|
| Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function computes co-similarity. Examples include countries, census block groups, user IDs of website visitors, and aircraft callsigns. | Column<TEXT ENCODING DICT | INT | BIGINT> |
| One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by | Column<TEXT ENCODING DICT | INT | BIGINT> |
| Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is | Column<INT | BIGINT | FLOAT | DOUBLE> |
| Boolean constant denoting whether TF-IDF weighting should be used in the cosine similarity score computation. | BOOLEAN |
Output Columns
Name | Description | Data Types |
---|---|---|
| ID of the first | Column<TEXT ENCODING DICT | INT | BIGINT> (type is the same of |
| ID of the second | Column<TEXT ENCODING DICT | INT | BIGINT> (type is the same of |
| Computed cosine similarity score between each | Column<Float> |
Example