5. Distance

Once we have comparison pairs, we compute distances between each pairs’ attributes.

From https://github.com/eulerto/pg_similarity:

  • L1 Distance (as known as City Block or Manhattan Distance);

  • Cosine Distance;

  • Dice Coefficient;

  • Euclidean Distance;

  • Hamming Distance;

  • Jaccard Coefficient;

  • Jaro Distance;

  • Jaro-Winkler Distance;

  • Levenshtein Distance;

  • Matching Coefficient;

  • Monge-Elkan Coefficient;

  • Needleman-Wunsch Coefficient;

  • Overlap Coefficient;

  • Q-Gram Distance;

  • Smith-Waterman Coefficient;

  • Smith-Waterman-Gotoh Coefficient;

  • Soundex Distance.

Algorithm Function Operator Use Index? Parameters
L1 Distance block(text, text) returns float8 ~++ yes pg_similarity.block_tokenizer (enum)
pg_similarity.block_threshold (float8)
pg_similarity.block_is_normalized (bool)
Cosine Distance cosine(text, text) returns float8 ~## yes pg_similarity.cosine_tokenizer (enum)
pg_similarity.cosine_threshold (float8)
pg_similarity.cosine_is_normalized (bool)
Dice Coefficient dice(text, text) returns float8 ~-~ yes pg_similarity.dice_tokenizer (enum)
pg_similarity.dice_threshold (float8)
pg_similarity.dice_is_normalized (bool)
Euclidean Distance euclidean(text, text) returns float8 ~!! yes pg_similarity.euclidean_tokenizer (enum)
pg_similarity.euclidean_threshold (float8)
pg_similarity.euclidean_is_normalized (bool)
Hamming Distance hamming(bit varying, bit varying) returns float8
hamming_text(text, text) returns float8
~@~ no pg_similarity.hamming_threshold (float8)
pg_similarity.hamming_is_normalized (bool)
Jaccard Coefficient jaccard(text, text) returns float8 ~?? yes pg_similarity.jaccard_tokenizer (enum)
pg_similarity.jaccard_threshold (float8)
pg_similarity.jaccard_is_normalized (bool)
Jaro Distance jaro(text, text) returns float8 ~%% no pg_similarity.jaro_threshold (float8)
pg_similarity.jaro_is_normalized (bool)
Jaro-Winkler Distance jarowinkler(text, text) returns float8 ~@@ no pg_similarity.jarowinkler_threshold (float8)
pg_similarity.jarowinkler_is_normalized (bool)
Levenshtein Distance lev(text, text) returns float8 ~== no pg_similarity.levenshtein_threshold (float8)
pg_similarity.levenshtein_is_normalized (bool)
Matching Coefficient matchingcoefficient(text, text) returns float8 ~^^ yes pg_similarity.matching_tokenizer (enum)
pg_similarity.matching_threshold (float8)
pg_similarity.matching_is_normalized (bool)
Monge-Elkan Coefficient mongeelkan(text, text) returns float8 ~|| no pg_similarity.mongeelkan_tokenizer (enum)
pg_similarity.mongeelkan_threshold (float8)
pg_similarity.mongeelkan_is_normalized (bool)
Needleman-Wunsch Coefficient needlemanwunsch(text, text) returns float8 ~#~ no pg_similarity.nw_threshold (float8)
pg_similarity.nw_is_normalized (bool)
Overlap Coefficient overlapcoefficient(text, text) returns float8 ~** yes pg_similarity.overlap_tokenizer (enum)
pg_similarity.overlap_threshold (float8)
pg_similarity.overlap_is_normalized (bool)
Q-Gram Distance qgram(text, text) returns float8 ~~~ yes pg_similarity.qgram_threshold (float8)
pg_similarity.qgram_is_normalized (bool)
Smith-Waterman Coefficient smithwaterman(text, text) returns float8 ~=~ no pg_similarity.sw_threshold (float8)
pg_similarity.sw_is_normalized (bool)
Smith-Waterman-Gotoh Coefficient smithwatermangotoh(text, text) returns float8 ~!~ no pg_similarity.swg_threshold (float8)
pg_similarity.swg_is_normalized (bool)
Soundex Distance soundex(text, text) returns float8 ~*~ no