# tf_feature_similarity

Given a query input of entity keys, feature columns, and a metric column, and a second query input specifying a search vector of feature columns and metric, computes the similarity of each entity in the first input to the search vector based on their similarity. The score is computed as the cosine similarity of the feature column(s) for each entity with the feature column(s) for the search vector, which can optionally be TF/IDF weighted.

#### Input Arguments

Parameter | Description | Data Type |
---|---|---|

| Column containing keys/entity IDs that can be used to uniquely identify the entities for which the function will compute the similarity to the search vector specified by the | Column<TEXT ENCODING DICT | INT | BIGINT> |

| One or more columns constituting a compound feature. For example, two columns of visit hour and census block group would compare entities specified by | Column<TEXT ENCODING DICT | INT | BIGINT> |

| Column denoting the values used as input for the cosine similarity metric computation. In many cases, this is simply | Column<INT | BIGINT | FLOAT | DOUBLE> |

| One or more columns constituting a compound feature for the search vector. This should match in number of sub-features, types, and semantics | Column<TEXT ENCODING DICT | INT | BIGINT> |

| Column denoting the values used as input for the cosine similarity metric computation from the search vector. In many cases, this is simply | Column<TEXT ENCODING DICT | INT | BIGINT> |

| Boolean constant denoting whether TF-IDF weighting should be used in the cosine similarity score computation. | BOOLEAN |

#### Output Columns

Name | Description | Data Types |
---|---|---|

| ID of the | Column<TEXT ENCODING DICT | INT | BIGINT> (type will be the same of |

| Computed cosine similarity score between each | Column<FLOAT> |

#### Example

Last updated