Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
N
nltk
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
edx
nltk
Commits
4b802fc7
Commit
4b802fc7
authored
Apr 11, 2011
by
Steven Bird
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
* updated changelog
* fixed lots of doctest-related issues svn/trunk@8784
parent
392f099c
Hide whitespace changes
Inline
Side-by-side
Showing
9 changed files
with
58 additions
and
52 deletions
+58
-52
ChangeLog
+10
-4
nltk/corpus/reader/bracket_parse.py
+4
-3
nltk/sem/chat80.py
+3
-3
nltk/test/ccg.doctest
+4
-4
nltk/test/chat80.doctest
+1
-2
nltk/test/probability.doctest
+1
-1
nltk/test/tree.doctest
+11
-11
nltk/test/treetransforms.doctest
+1
-1
nltk/test/wordnet.doctest
+23
-23
No files found.
ChangeLog
View file @
4b802fc7
Version 2.0.1
2011-04-??
Version 2.0.1
(rc1) 2011-04-11
NLTK:
* added interface to the Stanford POS Tagger
...
...
@@ -12,7 +12,7 @@ NLTK:
* fixed issue with NLTK's tokenize module colliding with the Python tokenize module
* fixed issue with stemming Unicode strings
* changed ViterbiParser.nbest_parse to parse
*
KNBC Japanese corpus reader
*
ChaSen and KNBC Japanese corpus readers
* preserve case in concordance display
* fixed bug in simplification of Brown tags
* a version of IBM Model 1 as described in Koehn 2010
...
...
@@ -28,9 +28,15 @@ NLTK:
* simplifications and corrections of Earley Chart Parser rules
* several changes to the feature chart parsers for correct unification
* bugfixes: FreqDist.plot, FreqDist.max, NgramModel.entropy, CategorizedCorpusReader, DecisionTreeClassifier
* removal of Python >2.4 language features for 2.4 compatibility
* removal of deprecated functions and associated warnings
* added semantic domains to wordnet corpus reader
* changed wordnet similarity functions to include instance hyponyms
* updated to use latest version of Boxer
Data:
* Japanese corpora...
* JEITA Public Morphologically Tagged Corpus (in ChaSen format)
* KNB Annotated corpus of Japanese blog posts
* Fixed some minor bugs in alvey.fcfg, and added number of parse trees in alvey_sentences.txt
* added more comtrans data
...
...
@@ -39,7 +45,7 @@ Documentation:
* NLTK Japanese book (chapter 12) by Masato Hagiwara
NLTK-Contrib:
*
Contribute a version of the
Viethen and Dale referring expression algorithms
* Viethen and Dale referring expression algorithms
Thanks to the following contributors to 2.0.1 (since 2.0b9, July 2010)
Yonatan Becker, Steven Bethard, David Coles, Dan Garrette,
...
...
nltk/corpus/reader/bracket_parse.py
View file @
4b802fc7
...
...
@@ -8,7 +8,7 @@
import
sys
from
nltk.tree
import
bracket_parse
,
Tree
from
nltk.tree
import
Tree
from
util
import
*
from
api
import
*
...
...
@@ -75,14 +75,15 @@ class BracketParseCorpusReader(SyntaxCorpusReader):
def
_parse
(
self
,
t
):
try
:
return
bracket_parse
(
self
.
_normalize
(
t
))
return
Tree
.
parse
(
self
.
_normalize
(
t
))
except
ValueError
,
e
:
sys
.
stderr
.
write
(
"Bad tree detected; trying to recover...
\n
"
)
# Try to recover, if we can:
if
e
.
args
==
(
'mismatched parens'
,):
for
n
in
range
(
1
,
5
):
try
:
v
=
bracket_
parse
(
self
.
_normalize
(
t
+
')'
*
n
))
v
=
Tree
.
parse
(
self
.
_normalize
(
t
+
')'
*
n
))
sys
.
stderr
.
write
(
" Recovered by adding
%
d close "
"paren(s)
\n
"
%
n
)
return
v
...
...
nltk/sem/chat80.py
View file @
4b802fc7
...
...
@@ -403,7 +403,7 @@ def cities2table(filename, rel_name, dbname, verbose=False, setup=False):
cur
.
close
()
except
ImportError
:
import
warnings
warnings
.
warn
(
"To run this function, first install pysqlite."
)
warnings
.
warn
(
"To run this function, first install pysqlite
, or else use Python 2.5 or later
."
)
def
sql_query
(
dbname
,
query
):
"""
...
...
@@ -423,7 +423,7 @@ def sql_query(dbname, query):
return
cur
.
execute
(
query
)
except
ImportError
:
import
warnings
warnings
.
warn
(
"To run this function, first install pysqlite."
)
warnings
.
warn
(
"To run this function, first install pysqlite
, or else use Python 2.5 or later
."
)
raise
def
_str2records
(
filename
,
rel
):
...
...
@@ -780,7 +780,7 @@ def sql_demo():
print
row
except
ImportError
:
import
warnings
warnings
.
warn
(
"To run the SQL demo, first install pysqlite."
)
warnings
.
warn
(
"To run the SQL demo, first install pysqlite
, or else use Python 2.5 or later
."
)
if
__name__
==
'__main__'
:
...
...
nltk/test/ccg.doctest
View file @
4b802fc7
...
...
@@ -196,12 +196,12 @@ Note that while the two derivations are different, they are semantically equival
(((S\NP)/NP)\.,((S\NP)/NP))
-----------------------------------------------------------------------<
((S\NP)/NP)
------------------------------------------------------------------------------->B
((S\NP)/N)
------------------------------------->
(N\.,N)
------------------------------------------------<
N
-------------------------------------------------------->
NP
------------------------------------------------------------------------------------------------------------------------------->
(S\NP)
-----------------------------------------------------------------------------------------------------------------------------------<
...
...
@@ -216,12 +216,12 @@ Note that while the two derivations are different, they are semantically equival
(((S\NP)/NP)\.,((S\NP)/NP))
-----------------------------------------------------------------------<
((S\NP)/NP)
------------------------------------------------------------------------------->B
((S\NP)/N)
------------------------------------->
(N\.,N)
------------------------------------------------<
N
-------------------------------------------------------->
NP
------------------------------------------------------------------------------------------------------------------------------->
(S\NP)
-----------------------------------------------------------------------------------------------------------------------------------<
...
...
nltk/test/chat80.doctest
View file @
4b802fc7
...
...
@@ -199,9 +199,8 @@ to SQL:
Given this grammar, we can express, and then execute, queries in English.
>>> from nltk.parse import load_earley
>>> from string import join
>>> cp =
load_earley
('grammars/book_grammars/sql0.fcfg')
>>> cp =
nltk.data.load
('grammars/book_grammars/sql0.fcfg')
>>> query = 'What cities are in China'
>>> trees = cp.nbest_parse(query.split())
>>> answer = trees[0].node['SEM']
...
...
nltk/test/probability.doctest
View file @
4b802fc7
...
...
@@ -65,7 +65,7 @@ from the whole corpus, not just the training corpus
>>> symbols = list(set([word for sent in corpus for (word,tag) in sent]))
>>> print len(symbols)
1464
>>> trainer = nltk.HiddenMarkovModelTrainer(tag_set, symbols)
>>> trainer = nltk.
tag.
HiddenMarkovModelTrainer(tag_set, symbols)
We divide the corpus into 90% training and 10% testing
...
...
nltk/test/tree.doctest
View file @
4b802fc7
...
...
@@ -158,26 +158,26 @@ then it simply delegates to `Tree.parse()`.
Trees can be compared for equality:
>>> tree ==
bracket_
parse(str(tree))
>>> tree ==
Tree.
parse(str(tree))
True
>>> tree2 ==
bracket_
parse(str(tree2))
>>> tree2 ==
Tree.
parse(str(tree2))
True
>>> tree == tree2
False
>>> tree ==
bracket_
parse(str(tree2))
>>> tree ==
Tree.
parse(str(tree2))
False
>>> tree2 ==
bracket_
parse(str(tree))
>>> tree2 ==
Tree.
parse(str(tree))
False
>>> tree !=
bracket_
parse(str(tree))
>>> tree !=
Tree.
parse(str(tree))
False
>>> tree2 !=
bracket_
parse(str(tree2))
>>> tree2 !=
Tree.
parse(str(tree2))
False
>>> tree != tree2
True
>>> tree !=
bracket_
parse(str(tree2))
>>> tree !=
Tree.
parse(str(tree2))
True
>>> tree2 !=
bracket_
parse(str(tree))
>>> tree2 !=
Tree.
parse(str(tree))
True
>>> tree < tree2 or tree > tree2
...
...
@@ -567,7 +567,7 @@ variable:
Define a helper funciton to create new parented trees:
>>> def make_ptree(s):
... ptree = ParentedTree.convert(
bracket_
parse(s))
... ptree = ParentedTree.convert(
Tree.
parse(s))
... all_ptrees.extend(t for t in ptree.subtrees()
... if isinstance(t, Tree))
... return ptree
...
...
@@ -838,7 +838,7 @@ variable:
Define a helper funciton to create new parented trees:
>>> def make_mptree(s):
... mptree = MultiParentedTree.convert(
bracket_
parse(s))
... mptree = MultiParentedTree.convert(
Tree.
parse(s))
... all_mptrees.extend(t for t in mptree.subtrees()
... if isinstance(t, Tree))
... return mptree
...
...
@@ -1126,6 +1126,6 @@ This used to cause an infinite loop (fixed in svn 6269):
This used to discard the ``(B b)`` subtree (fixed in svn 6270):
>>> print
bracket_
parse('((A a) (B b))')
>>> print
Tree.
parse('((A a) (B b))')
( (A a) (B b))
nltk/test/treetransforms.doctest
View file @
4b802fc7
...
...
@@ -11,7 +11,7 @@ Unit tests for the TreeTransformation class
>>> sentence = "(TOP (S (S (VP (VBN Turned) (ADVP (RB loose)) (PP (IN in) (NP (NP (NNP Shane) (NNP Longman) (POS 's)) (NN trading) (NN room))))) (, ,) (NP (DT the) (NN yuppie) (NNS dealers)) (VP (AUX do) (NP (NP (RB little)) (ADJP (RB right)))) (. .)))"
>>> tree =
bracket_
parse(sentence)
>>> tree =
Tree.
parse(sentence)
>>> print tree
(TOP
(S
...
...
nltk/test/wordnet.doctest
View file @
4b802fc7
...
...
@@ -171,13 +171,13 @@ The old behavior can be achieved by setting simulate_root to be False.
A score of 1 represents identity i.e. comparing a sense with itself
will return 1.
>>> dog.path_similarity(cat)
>>> dog.path_similarity(cat)
# doctest: +ELLIPSIS
0.2...
>>> hit.path_similarity(slap)
>>> hit.path_similarity(slap)
# doctest: +ELLIPSIS
0.142...
>>> wn.path_similarity(hit, slap)
>>> wn.path_similarity(hit, slap)
# doctest: +ELLIPSIS
0.142...
>>> print hit.path_similarity(slap, simulate_root=False)
...
...
@@ -194,13 +194,13 @@ of the taxonomy in which the senses occur. The relationship is given
as -log(p/2d) where p is the shortest path length and d the taxonomy
depth.
>>> dog.lch_similarity(cat)
>>> dog.lch_similarity(cat)
# doctest: +ELLIPSIS
2.028...
>>> hit.lch_similarity(slap)
>>> hit.lch_similarity(slap)
# doctest: +ELLIPSIS
1.312...
>>> wn.lch_similarity(hit, slap)
>>> wn.lch_similarity(hit, slap)
# doctest: +ELLIPSIS
1.312...
>>> print hit.lch_similarity(slap, simulate_root=False)
...
...
@@ -225,7 +225,7 @@ shortest path to the root node is the longest will be selected. Where
the LCS has multiple paths to the root, the longer path is used for
the purposes of the calculation.
>>> dog.wup_similarity(cat)
>>> dog.wup_similarity(cat)
# doctest: +ELLIPSIS
0.857...
>>> hit.wup_similarity(slap)
...
...
@@ -263,9 +263,9 @@ information content, the result is dependent on the corpus used to
generate the information content and the specifics of how the
information content was created.
>>> dog.res_similarity(cat, brown_ic)
>>> dog.res_similarity(cat, brown_ic)
# doctest: +ELLIPSIS
7.911...
>>> dog.res_similarity(cat, genesis_ic)
>>> dog.res_similarity(cat, genesis_ic)
# doctest: +ELLIPSIS
7.204...
``synset1.jcn_similarity(synset2, ic):``
...
...
@@ -275,9 +275,9 @@ Information Content (IC) of the Least Common Subsumer (most specific
ancestor node) and that of the two input Synsets. The relationship is
given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).
>>> dog.jcn_similarity(cat, brown_ic)
>>> dog.jcn_similarity(cat, brown_ic)
# doctest: +ELLIPSIS
0.449...
>>> dog.jcn_similarity(cat, genesis_ic)
>>> dog.jcn_similarity(cat, genesis_ic)
# doctest: +ELLIPSIS
0.285...
``synset1.lin_similarity(synset2, ic):``
...
...
@@ -287,7 +287,7 @@ Information Content (IC) of the Least Common Subsumer (most specific
ancestor node) and that of the two input Synsets. The relationship is
given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).
>>> dog.lin_similarity(cat, semcor_ic)
>>> dog.lin_similarity(cat, semcor_ic)
# doctest: +ELLIPSIS
0.886...
...
...
@@ -405,7 +405,7 @@ Bug 160: wup_similarity breaks when the two synsets have no common hypernym
>>> t = wn.synsets('picasso')[0]
>>> m = wn.synsets('male')[1]
>>> t.wup_similarity(m)
>>> t.wup_similarity(m)
# doctest: +ELLIPSIS
0.631...
>>> t = wn.synsets('titan')[1]
...
...
@@ -418,14 +418,14 @@ Bug 21: "instance of" not included in LCS (very similar to bug 160)
>>> a = wn.synsets("writings")[0]
>>> b = wn.synsets("scripture")[0]
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> a.jcn_similarity(b, brown_ic)
>>> a.jcn_similarity(b, brown_ic)
# doctest: +ELLIPSIS
0.175...
Bug 221: Verb root IC is zero
>>> from nltk.corpus.reader.wordnet import information_content
>>> s = wn.synsets('say', wn.VERB)[0]
>>> information_content(s, brown_ic)
>>> information_content(s, brown_ic)
# doctest: +ELLIPSIS
4.623...
Bug 161: Comparison between WN keys/lemmas should not be case sensitive
...
...
@@ -451,7 +451,7 @@ Bug 382: JCN Division by zero error
>>> shlep = wn.synset('shlep.v.02')
>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> tow.jcn_similarity(shlep, brown_ic)
>>> tow.jcn_similarity(shlep, brown_ic)
# doctest: +ELLIPSIS
1...e+300
Bug 428: Depth is zero for instance nouns
...
...
@@ -473,7 +473,7 @@ Bug 470: shortest_path_distance ignored instance hypernyms
>>> google = wordnet.synsets("google")[0]
>>> earth = wordnet.synsets("earth")[0]
>>> google.wup_similarity(earth)
>>> google.wup_similarity(earth)
# doctest: +ELLIPSIS
0.1...
Bug 484: similarity metrics returned -1 instead of None for no LCS
...
...
@@ -505,17 +505,17 @@ Bug 482: Some nouns not being lemmatised by WordNetLemmatizer().lemmatize
Bug 284: instance hypernyms not used in similarity calculations
>>> wn.synset('john.n.02').lch_similarity(wn.synset('dog.n.01'))
>>> wn.synset('john.n.02').lch_similarity(wn.synset('dog.n.01'))
# doctest: +ELLIPSIS
1.335...
>>> wn.synset('john.n.02').wup_similarity(wn.synset('dog.n.01'))
>>> wn.synset('john.n.02').wup_similarity(wn.synset('dog.n.01'))
# doctest: +ELLIPSIS
0.571...
>>> wn.synset('john.n.02').res_similarity(wn.synset('dog.n.01'), brown_ic)
>>> wn.synset('john.n.02').res_similarity(wn.synset('dog.n.01'), brown_ic)
# doctest: +ELLIPSIS
2.224...
>>> wn.synset('john.n.02').jcn_similarity(wn.synset('dog.n.01'), brown_ic)
>>> wn.synset('john.n.02').jcn_similarity(wn.synset('dog.n.01'), brown_ic)
# doctest: +ELLIPSIS
0.075...
>>> wn.synset('john.n.02').lin_similarity(wn.synset('dog.n.01'), brown_ic)
>>> wn.synset('john.n.02').lin_similarity(wn.synset('dog.n.01'), brown_ic)
# doctest: +ELLIPSIS
0.252...
>>> wn.synset('john.n.02').hypernym_paths()
>>> wn.synset('john.n.02').hypernym_paths()
# doctest: +ELLIPSIS
[[Synset('entity.n.01'), ..., Synset('john.n.02')]]
Issue 541: add domains to wordnet
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment