Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
E
edx-ora2
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
edx
edx-ora2
Commits
9ae1a8a3
Commit
9ae1a8a3
authored
Jun 16, 2014
by
Will Daly
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
More regex patterns, add a stemmer
parent
4b77c5c0
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
14 additions
and
2 deletions
+14
-2
openassessment/assessment/worker/classy.py
+14
-2
No files found.
openassessment/assessment/worker/classy.py
View file @
9ae1a8a3
"""
"""
import
pickle
import
numpy
as
np
from
sklearn.feature_extraction.text
import
TfidfVectorizer
,
CountVectorizer
from
sklearn.svm
import
SVC
from
sklearn.pipeline
import
FeatureUnion
...
...
@@ -23,6 +24,8 @@ WORD_PATTERNS = [
(
r'.*ic$'
,
'JJ'
),
(
r'.*est$'
,
'JJ'
),
(
r'^a$'
,
'PREP'
),
(
r'.*s$'
,
'NNS'
),
(
r'.*'
,
'NN'
)
]
...
...
@@ -34,6 +37,11 @@ def tokenizer(text):
]
def
stemmer
(
text
):
stemmer
=
nltk
.
PorterStemmer
()
return
[
stemmer
.
stem
(
token
)
for
token
in
nltk
.
word_tokenize
(
text
)]
class
ClassyAlgorithm
(
AIAlgorithm
):
"""
A super-classy text classification algorithm :)
...
...
@@ -54,11 +62,15 @@ class ClassyAlgorithm(AIAlgorithm):
"""
pipeline
=
FeatureUnion
([
(
'tfid'
,
TfidfVectorizer
(
min_df
=
1
,
ngram_range
=
(
1
,
2
),
stop_words
=
'english'
)),
(
'tfid'
,
TfidfVectorizer
(
tokenizer
=
stemmer
,
min_df
=
1
,
ngram_range
=
(
1
,
2
),
stop_words
=
'english'
)),
(
'pos'
,
CountVectorizer
(
tokenizer
=
tokenizer
,
ngram_range
=
(
1
,
2
)))
])
transformed
=
pipeline
.
fit_transform
([
example
.
text
for
example
in
examples
])
scores
=
[
example
.
score
for
example
in
examples
]
classifier
=
SVC
()
classifier
.
fit
(
transformed
,
scores
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment