Commit 3449a126 by Ewan Klein

preliminary changes towards using fs.py

svn/trunk@4666
parent 8a3e49a7
...@@ -4,10 +4,10 @@ tested and revised. Assuming we have saved feat0cfg_ as a file named ...@@ -4,10 +4,10 @@ tested and revised. Assuming we have saved feat0cfg_ as a file named
``'feat0.cfg'``, the function ``GrammarFile.read_file()`` allows us to ``'feat0.cfg'``, the function ``GrammarFile.read_file()`` allows us to
read the grammar into NLTK, ready for use in parsing. read the grammar into NLTK, ready for use in parsing.
>>> from nltk.parse import * >>> from nltk import parse, tokenize
>>> from nltk.parse.featurechart import * >>> from nltk import fs as featstruct
>>> import nltk.tokenize >>> from nltk.fs import FS
>>> cp = load_earley('feat0.cfg', trace=2) >>> cp = parse.load_earley('feat0.cfg', trace=2)
>>> sent = 'Kim likes children' >>> sent = 'Kim likes children'
>>> tokens = list(tokenize.whitespace(sent)) >>> tokens = list(tokenize.whitespace(sent))
>>> tokens >>> tokens
...@@ -49,25 +49,31 @@ read the grammar into NLTK, ready for use in parsing. ...@@ -49,25 +49,31 @@ read the grammar into NLTK, ready for use in parsing.
Feature structures in NLTK are ... Atomic feature values can be strings or Feature structures in NLTK are ... Atomic feature values can be strings or
integers. integers.
>>> fs1 = dict(TENSE='past', NUM='sg') >>> fs1 = FS(dict(TENSE='past', NUM='sg'))
>>> print fs1 >>> print fs1
{'NUM': 'sg', 'TENSE': 'past'} {'NUM': 'sg', 'TENSE': 'past'}
We can think of a feature structure as being like a Python dictionary, We can think of a feature structure as being like a Python dictionary,
and access its values by indexing in the usual way. and access its values by indexing in the usual way.
>>> fs1 = dict(PER=3, NUM='pl', GND='fem') >>> fs1 = FS(dict(PER=3, NUM='pl', GND='fem'))
>>> print fs1['GND'] >>> print fs1['GND']
fem fem
We can also define feature structures which have complex values, as We can also define feature structures which have complex values, as
discussed earlier. discussed earlier.
>>> fs2 = dict(POS='N', AGR=fs1) >>> fs2 = FS(dict(POS='N', AGR=fs1))
>>> print fs2 >>> print fs2
{'AGR': {'NUM': 'pl', 'GND': 'fem', 'PER': 3}, 'POS': 'N'} AGR:
GND: fem
NUM: pl
PER: 3
POS: N
>>> print fs2['AGR'] >>> print fs2['AGR']
{'NUM': 'pl', 'GND': 'fem', 'PER': 3} GND: fem
NUM: pl
PER: 3
>>> print fs2['AGR']['PER'] >>> print fs2['AGR']['PER']
3 3
...@@ -80,23 +86,23 @@ Representing dictionaries in YAML form is useful for making feature ...@@ -80,23 +86,23 @@ Representing dictionaries in YAML form is useful for making feature
structures readable: structures readable:
>>> from nltk.parse.featurelite import * >>> from nltk.parse.featurelite import *
>>> f1 = yaml.load("NUMBER: SINGULAR") >>> f1 = featstruct.parse("NUMBER: SINGULAR")
>>> f2 = yaml.load("PERSON: 3") >>> f2 = featstruct.parse("PERSON: 3")
>>> print show(unify(f1, f2)) >>> print unify(f1, f2)
NUMBER: SINGULAR NUMBER: SINGULAR
PERSON: 3 PERSON: 3
>>> f1 = yaml.load(''' >>> f1 = featstruct.parse('''
... A: ... A:
... B: b ... B: b
... D: d ... D: d
... ''') ... ''')
>>> f2 = yaml.load(''' >>> f2 = featstruct.parse('''
... A: ... A:
... C: c ... C: c
... D: d ... D: d
... ''') ... ''')
>>> print show(unify(f1, f2)) >>> print unify(f1, f2)
A: A:
B: b B: b
C: c C: c
...@@ -110,12 +116,12 @@ Feature structures are not inherently tied to linguistic objects; they are ...@@ -110,12 +116,12 @@ Feature structures are not inherently tied to linguistic objects; they are
general purpose structures for representing knowledge. For example, we general purpose structures for representing knowledge. For example, we
could encode information about a person in a feature structure: could encode information about a person in a feature structure:
>>> person01 = yaml.load(''' >>> person01 = featstruct.parse('''
... NAME: 'Lee' ... NAME: 'Lee'
... TELNO: '01 27 86 42 96' ... TELNO: '01 27 86 42 96'
... AGE: 33 ... AGE: 33
... ''') ... ''')
>>> print show(person01) >>> print person01
AGE: 33 AGE: 33
NAME: Lee NAME: Lee
TELNO: 01 27 86 42 96 TELNO: 01 27 86 42 96
...@@ -127,7 +133,7 @@ is prefixed with an integer in parentheses, such as ``(1)``, and any ...@@ -127,7 +133,7 @@ is prefixed with an integer in parentheses, such as ``(1)``, and any
subsequent reference to that structure uses the notation subsequent reference to that structure uses the notation
``->(1)``, as shown below. ``->(1)``, as shown below.
>>> fs=yaml.load(""" >>> fs=featstruct.parse("""
... NAME: 'Lee' ... NAME: 'Lee'
... ADDRESS: &1 ... ADDRESS: &1
... NUMBER: 74 ... NUMBER: 74
...@@ -136,7 +142,7 @@ subsequent reference to that structure uses the notation ...@@ -136,7 +142,7 @@ subsequent reference to that structure uses the notation
... NAME: 'Kim' ... NAME: 'Kim'
... ADDRESS: *1 ... ADDRESS: *1
... """) ... """)
>>> print show(fs) >>> print fs
ADDRESS: &id001 ADDRESS: &id001
NUMBER: 74 NUMBER: 74
STREET: rue Pascal STREET: rue Pascal
...@@ -147,14 +153,14 @@ subsequent reference to that structure uses the notation ...@@ -147,14 +153,14 @@ subsequent reference to that structure uses the notation
There can be any number of tags within a single feature structure. There can be any number of tags within a single feature structure.
>>> fs3 = yaml.load(""" >>> fs3 = featstruct.parse("""
... A: 'a' ... A: 'a'
... B: &1 ... B: &1
... C: 'c' ... C: 'c'
... D: *1 ... D: *1
... E: *1 ... E: *1
... """) ... """)
>>> print show(fs3) >>> print fs3
A: a A: a
B: &id001 B: &id001
C: c C: c
...@@ -162,12 +168,12 @@ There can be any number of tags within a single feature structure. ...@@ -162,12 +168,12 @@ There can be any number of tags within a single feature structure.
E: *id001 E: *id001
>>> fs1 = yaml.load(""" >>> fs1 = featstruct.parse("""
... NUMBER: 74 ... NUMBER: 74
... STREET: 'rue Pascal' ... STREET: 'rue Pascal'
... """) ... """)
>>> fs2 = yaml.load("CITY: Paris") >>> fs2 = featstruct.parse("CITY: Paris")
>>> print show(unify(fs1, fs2)) >>> print unify(fs1, fs2)
CITY: Paris CITY: Paris
NUMBER: 74 NUMBER: 74
STREET: rue Pascal STREET: rue Pascal
...@@ -179,7 +185,7 @@ Unification is symmetric: ...@@ -179,7 +185,7 @@ Unification is symmetric:
Unification is commutative: Unification is commutative:
>>> fs3 = yaml.load("TELNO: 01 27 86 42 96") >>> fs3 = featstruct.parse("TELNO: 01 27 86 42 96")
>>> unify(unify(fs1, fs2), fs3) == unify(fs1, unify(fs2, fs3)) >>> unify(unify(fs1, fs2), fs3) == unify(fs1, unify(fs2, fs3))
True True
...@@ -200,7 +206,7 @@ this is implemented by setting the result of unification to be ...@@ -200,7 +206,7 @@ this is implemented by setting the result of unification to be
Now, if we look at how unification interacts with structure-sharing, Now, if we look at how unification interacts with structure-sharing,
things become really interesting. things become really interesting.
>>> fs0 = yaml.load(""" >>> fs0 = featstruct.parse("""
... NAME: Lee ... NAME: Lee
... ADDRESS: ... ADDRESS:
... NUMBER: 74 ... NUMBER: 74
...@@ -211,7 +217,7 @@ things become really interesting. ...@@ -211,7 +217,7 @@ things become really interesting.
... NUMBER: 74 ... NUMBER: 74
... STREET: 'rue Pascal' ... STREET: 'rue Pascal'
... """) ... """)
>>> print show(fs0) >>> print fs0
ADDRESS: ADDRESS:
NUMBER: 74 NUMBER: 74
STREET: rue Pascal STREET: rue Pascal
...@@ -222,12 +228,12 @@ things become really interesting. ...@@ -222,12 +228,12 @@ things become really interesting.
STREET: rue Pascal STREET: rue Pascal
NAME: Kim NAME: Kim
>>> fs1 = yaml.load(""" >>> fs1 = featstruct.parse("""
... SPOUSE: ... SPOUSE:
... ADDRESS: ... ADDRESS:
... CITY: Paris ... CITY: Paris
... """) ... """)
>>> print show(unify(fs0, fs1)) >>> print unify(fs0, fs1))
ADDRESS: ADDRESS:
NUMBER: 74 NUMBER: 74
STREET: rue Pascal STREET: rue Pascal
...@@ -239,7 +245,7 @@ things become really interesting. ...@@ -239,7 +245,7 @@ things become really interesting.
STREET: rue Pascal STREET: rue Pascal
NAME: Kim NAME: Kim
>>> fs0 = yaml.load(""" >>> fs0 = featstruct.parse("""
... NAME: Lee ... NAME: Lee
... ADDRESS: &1 ... ADDRESS: &1
... NUMBER: 74 ... NUMBER: 74
...@@ -248,7 +254,7 @@ things become really interesting. ...@@ -248,7 +254,7 @@ things become really interesting.
... NAME: Kim ... NAME: Kim
... ADDRESS: *1 ... ADDRESS: *1
... """) ... """)
>>> print show(fs0) >>> print fs0
ADDRESS: &id001 ADDRESS: &id001
NUMBER: 74 NUMBER: 74
STREET: rue Pascal STREET: rue Pascal
...@@ -257,7 +263,7 @@ things become really interesting. ...@@ -257,7 +263,7 @@ things become really interesting.
ADDRESS: *id001 ADDRESS: *id001
NAME: Kim NAME: Kim
>>> print show(unify(fs0, fs1)) >>> print unify(fs0, fs1)
ADDRESS: &id001 ADDRESS: &id001
CITY: Paris CITY: Paris
NUMBER: 74 NUMBER: 74
...@@ -267,18 +273,18 @@ things become really interesting. ...@@ -267,18 +273,18 @@ things become really interesting.
ADDRESS: *id001 ADDRESS: *id001
NAME: Kim NAME: Kim
>>> fs1 = yaml.load(""" >>> fs1 = featstruct.parse("""
... ADDRESS1: ... ADDRESS1:
... NUMBER: 74 ... NUMBER: 74
... STREET: 'rue Pascal' ... STREET: 'rue Pascal'
... """) ... """)
>>> fs2 = yaml.load(""" >>> fs2 = featstruct.parse("""
... ADDRESS1: ?x ... ADDRESS1: ?x
... ADDRESS2: ?x ... ADDRESS2: ?x
... """) ... """)
>>> print show(unify(fs1, fs2)) >>> print unify(fs1, fs2)
ADDRESS1: &id001 ADDRESS1: &id001
NUMBER: 74 NUMBER: 74
STREET: rue Pascal STREET: rue Pascal
...@@ -287,7 +293,7 @@ things become really interesting. ...@@ -287,7 +293,7 @@ things become really interesting.
>>> sent = 'who do you claim that you like' >>> sent = 'who do you claim that you like'
>>> tokens = list(tokenize.whitespace(sent)) >>> tokens = list(tokenize.whitespace(sent))
>>> cp = load_earley('feat1.cfg', trace=1) >>> cp = parse.load_earley('feat1.cfg', trace=1)
>>> trees = cp.parse(tokens) >>> trees = cp.parse(tokens)
|.w.d.y.c.t.y.l.| |.w.d.y.c.t.y.l.|
Scanner |[-] . . . . . .| NP[+WH] -> who * {} Scanner |[-] . . . . . .| NP[+WH] -> who * {}
...@@ -332,7 +338,7 @@ things become really interesting. ...@@ -332,7 +338,7 @@ things become really interesting.
Let's load a German grammar: Let's load a German grammar:
>>> cp = load_earley('german0.cfg', trace=0) >>> cp = parse.load_earley('german0.cfg', trace=0)
>>> sent = 'die katze sieht den hund' >>> sent = 'die katze sieht den hund'
>>> tokens = list(tokenize.whitespace(sent)) >>> tokens = list(tokenize.whitespace(sent))
>>> trees = cp.parse(tokens) >>> trees = cp.parse(tokens)
...@@ -349,7 +355,7 @@ Let's load a German grammar: ...@@ -349,7 +355,7 @@ Let's load a German grammar:
First attempt at doing semantics with features: First attempt at doing semantics with features:
>>> cp = load_earley('sem3.cfg', trace=2) >>> cp = parse.load_earley('sem3.cfg', trace=2)
>>> sent = 'Kim barks' >>> sent = 'Kim barks'
>>> tokens = list(tokenize.whitespace(sent)) >>> tokens = list(tokenize.whitespace(sent))
>>> trees = cp.parse(tokens) >>> trees = cp.parse(tokens)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment