Commit 97b7b786 by Rico Sennrich

fix cosine_distance for KMeansClusterer

see issue 709 http://code.google.com/p/nltk/issues/detail?id=709
in short, cosine_distance is implemented as a similarity metric,
while KMeansClusterer expects a distance metric.
parent f091002d
......@@ -116,10 +116,10 @@ def euclidean_distance(u, v):
def cosine_distance(u, v):
"""
Returns the cosine of the angle between vectors v and u. This is equal to
u.v / |u||v|.
Returns 1 minus the cosine of the angle between vectors v and u. This is equal to
1 - (u.v / |u||v|).
"""
return numpy.dot(u, v) / (math.sqrt(numpy.dot(u, u)) * math.sqrt(numpy.dot(v, v)))
return 1 - (numpy.dot(u, v) / (math.sqrt(numpy.dot(u, u)) * math.sqrt(numpy.dot(v, v))))
class _DendrogramNode(object):
""" Tree node of a dendrogram. """
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment