Commit 97b7b786 by Rico Sennrich

fix cosine_distance for KMeansClusterer

see issue 709 http://code.google.com/p/nltk/issues/detail?id=709
in short, cosine_distance is implemented as a similarity metric,
while KMeansClusterer expects a distance metric.
parent f091002d
...@@ -116,10 +116,10 @@ def euclidean_distance(u, v): ...@@ -116,10 +116,10 @@ def euclidean_distance(u, v):
def cosine_distance(u, v): def cosine_distance(u, v):
""" """
Returns the cosine of the angle between vectors v and u. This is equal to Returns 1 minus the cosine of the angle between vectors v and u. This is equal to
u.v / |u||v|. 1 - (u.v / |u||v|).
""" """
return numpy.dot(u, v) / (math.sqrt(numpy.dot(u, u)) * math.sqrt(numpy.dot(v, v))) return 1 - (numpy.dot(u, v) / (math.sqrt(numpy.dot(u, u)) * math.sqrt(numpy.dot(v, v))))
class _DendrogramNode(object): class _DendrogramNode(object):
""" Tree node of a dendrogram. """ """ Tree node of a dendrogram. """
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment