最近学习了常见的一些相似度计算的方法, 在寻找资料的过程中找到了一篇较好的博客. 主要是图做的比较好. 所以拿过来做下简单的回顾与复习.
欧几里得距离
欧几里得距离 https://www.biaodianfu.com/eucliden-distance.html 其实就是空间内两点之间的直线距离.
Python 实现:
- from math import*
- def euclidean_distance(x,y):
- return sqrt(sum(pow(a-b,2) for a, b in zip(x, y)))
- print euclidean_distance([0,3,4,5],[7,6,3,-1])
曼哈顿距离
曼哈顿距离 https://www.biaodianfu.com/manhattan-distance.html 其实就是每一轴距离之和.
Python 实现:
- from math import*
- def manhattan_distance(x,y):
- return sum(abs(a-b) for a,b in zip(x,y))
- print manhattan_distance([10,20,10],[10,20,20])
闵氏距离
闵氏距离 https://www.biaodianfu.com/minkowski-distance.html 被看做是欧氏距离 https://www.biaodianfu.com/eucliden-distance.html 和曼哈顿距离 https://www.biaodianfu.com/manhattan-distance.html 的一种推广. 公式中包含了欧氏距离, 曼哈顿距离和切比雪夫距离 https://www.biaodianfu.com/chebyshev-distance.html .
Python 实现:
- from math import*
- from decimal import Decimal
- def nth_root(value, n_root):
- root_value = 1/float(n_root)
- return round (Decimal(value) ** Decimal(root_value),3)
- def minkowski_distance(x,y,p_value):
- return nth_root(sum(pow(abs(a-b),p_value) for a,b in zip(x, y)),p_value)
- print minkowski_distance([0,3,4,5],[7,6,3,-1],3)
余弦相似度
余弦相似度 https://www.biaodianfu.com/cosine-similarity.html 理解起来较为简单, 就是向量在空间方向上的差异.
Python 实现:
- from math import*
- def square_rooted(x):
- return round(sqrt(sum([a*a for a in x])),3)
- def cosine_similarity(x,y):
- numerator = sum(a*b for a,b in zip(x,y))
- denominator = square_rooted(x)*square_rooted(y)
- return round(numerator/float(denominator),3)
- print cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15])
杰卡德相似度
杰卡德相似度 https://www.biaodianfu.com/jaccard-tanimoto.html 理解起来非常的简单, 就是集合的交集除以并集.
Python 实现:
- def jaccard_similarity(x,y):
- intersection_cardinality = len(set.intersection(*[set(x), set(y)]))
- union_cardinality = len(set.union(*[set(x), set(y)]))
- return intersection_cardinality/float(union_cardinality)
- print jaccard_similarity([0,1,2,5,6],[0,2,3,5,7,9])
来源: https://juejin.im/entry/5b1ba0cb6fb9a01e5f3e22dc