5. Distance
Once we have comparison pairs, we compute distances between each pairs’ attributes.
From https://github.com/eulerto/pg_similarity:
L1 Distance (as known as City Block or Manhattan Distance);
Cosine Distance;
Dice Coefficient;
Euclidean Distance;
Hamming Distance;
Jaccard Coefficient;
Jaro Distance;
Jaro-Winkler Distance;
Levenshtein Distance;
Matching Coefficient;
Monge-Elkan Coefficient;
Needleman-Wunsch Coefficient;
Overlap Coefficient;
Q-Gram Distance;
Smith-Waterman Coefficient;
Smith-Waterman-Gotoh Coefficient;
Soundex Distance.
| Algorithm | Function | Operator | Use Index? | Parameters |
|---|---|---|---|---|
| L1 Distance | block(text, text) returns float8 | ~++ | yes |
pg_similarity.block_tokenizer (enum) pg_similarity.block_threshold (float8) pg_similarity.block_is_normalized (bool) |
| Cosine Distance | cosine(text, text) returns float8 | ~## | yes |
pg_similarity.cosine_tokenizer (enum) pg_similarity.cosine_threshold (float8) pg_similarity.cosine_is_normalized (bool) |
| Dice Coefficient | dice(text, text) returns float8 | ~-~ | yes |
pg_similarity.dice_tokenizer (enum) pg_similarity.dice_threshold (float8) pg_similarity.dice_is_normalized (bool) |
| Euclidean Distance | euclidean(text, text) returns float8 | ~!! | yes |
pg_similarity.euclidean_tokenizer (enum) pg_similarity.euclidean_threshold (float8) pg_similarity.euclidean_is_normalized (bool) |
| Hamming Distance | hamming(bit varying, bit varying) returns float8 hamming_text(text, text) returns float8 |
~@~ | no |
pg_similarity.hamming_threshold (float8) pg_similarity.hamming_is_normalized (bool) |
| Jaccard Coefficient | jaccard(text, text) returns float8 | ~?? | yes |
pg_similarity.jaccard_tokenizer (enum) pg_similarity.jaccard_threshold (float8) pg_similarity.jaccard_is_normalized (bool) |
| Jaro Distance | jaro(text, text) returns float8 | ~%% | no |
pg_similarity.jaro_threshold (float8) pg_similarity.jaro_is_normalized (bool) |
| Jaro-Winkler Distance | jarowinkler(text, text) returns float8 | ~@@ | no |
pg_similarity.jarowinkler_threshold (float8) pg_similarity.jarowinkler_is_normalized (bool) |
| Levenshtein Distance | lev(text, text) returns float8 | ~== | no |
pg_similarity.levenshtein_threshold (float8) pg_similarity.levenshtein_is_normalized (bool) |
| Matching Coefficient | matchingcoefficient(text, text) returns float8 | ~^^ | yes |
pg_similarity.matching_tokenizer (enum) pg_similarity.matching_threshold (float8) pg_similarity.matching_is_normalized (bool) |
| Monge-Elkan Coefficient | mongeelkan(text, text) returns float8 | ~|| | no |
pg_similarity.mongeelkan_tokenizer (enum) pg_similarity.mongeelkan_threshold (float8) pg_similarity.mongeelkan_is_normalized (bool) |
| Needleman-Wunsch Coefficient | needlemanwunsch(text, text) returns float8 | ~#~ | no |
pg_similarity.nw_threshold (float8) pg_similarity.nw_is_normalized (bool) |
| Overlap Coefficient | overlapcoefficient(text, text) returns float8 | ~** | yes |
pg_similarity.overlap_tokenizer (enum) pg_similarity.overlap_threshold (float8) pg_similarity.overlap_is_normalized (bool) |
| Q-Gram Distance | qgram(text, text) returns float8 | ~~~ | yes |
pg_similarity.qgram_threshold (float8) pg_similarity.qgram_is_normalized (bool) |
| Smith-Waterman Coefficient | smithwaterman(text, text) returns float8 | ~=~ | no |
pg_similarity.sw_threshold (float8) pg_similarity.sw_is_normalized (bool) |
| Smith-Waterman-Gotoh Coefficient | smithwatermangotoh(text, text) returns float8 | ~!~ | no |
pg_similarity.swg_threshold (float8) pg_similarity.swg_is_normalized (bool) |
| Soundex Distance | soundex(text, text) returns float8 | ~*~ | no |