Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
I
insights
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
edx
insights
Commits
20e1fcd9
Commit
20e1fcd9
authored
Jul 22, 2013
by
Steve Komarov
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'master' into stkomarov-fixes
parents
3eaa178f
76ab893e
Hide whitespace changes
Inline
Side-by-side
Showing
8 changed files
with
194 additions
and
55 deletions
+194
-55
README.md
+138
-43
docs/app_store.png
+0
-0
docs/grade_histogram.png
+0
-0
src/edinsights/core/decorators.py
+4
-0
src/edinsights/core/registry.py
+15
-5
src/edinsights/modules/testmodule/__init__.py
+12
-0
src/edinsights/modules/tests.py
+5
-0
src/edinsights/settings.py
+20
-7
No files found.
README.md
View file @
20e1fcd9
...
@@ -24,18 +24,99 @@ to on-line batched analytics (e.g. for an instructor dashboard), to
...
@@ -24,18 +24,99 @@ to on-line batched analytics (e.g. for an instructor dashboard), to
on-line realtime analytics (e.g. for the system to react to an event
on-line realtime analytics (e.g. for the system to react to an event
the analytics detects).
the analytics detects).
The model behind Insights is the app store model:

As with an app store (Android shown above), we provide a runtime. This
runtime provides a fixed set of technologies (Python, numpy, scipy,
pylab, pandas, mongo, a cache, etc.). If you restrict yourself to this
runtime, anyone running Insights can host your analytic. If you'd like
to move outside this set of tools, you can do that too, but then you
may have to host your own analytics server.
Comparison to other systems:
*
Tincan is an SOA and a format for streaming analytics. Insights is
an API and runtime for handling those events. The two are very
complementary.
*
Twitter Storm is a framework for sending events around. Insights is
an API and runtime which would benefit from moving to something like
storm.
*
Hadoop is a distributed computation engine. For most learning
analytics, hadoop is overkill, but it could be embedded in an
analytics module if desired.
Examples
--------
Views show up in the dashboards. To define an analytic which just
shows "Hello World" in the analytics dashboard:
@view()
def hello_world():
return "<html>Hello world!</html>"
Queries return data for use in other parts of the system. If you would
like to define a new analytic which shows a histogram of grades, the
first step would be to define a query while will return grades. How
this is done depends on your LMS, but it is often convenient to define
a dummy one which does not rely on having a functioning LMS
present. This is convenient for off-line development without live
student data:
@query()
def get_grades(course):
''' Dummy data module. Returns grades
'''
grades = 3*numpy.random.randn(1000,4)+ \
12*numpy.random.binomial(1,0.3,(1000,4))+40
return grades
Once this is in place, you can define a view which will call this query:
@view()
def plot_grades(fs, query, course):
grades = query.get_grades(course)
filename = course+"_"+str(time.time())+".png"
title("Histogram of course grades")
hist(grades)
f = fs.open(filename, "w")
savefig(f)
f.close()
fs.expire(filename, 5*60)
return "<img src="+fs.get_url(filename)+">"
At this point, the following will show up in the instructor dashboard:

Note that the query and the view don't have to live on the same
machine. If someone wants to reuse your grade histogram in a different
LMS, all they need to do is define a new get_grades query.
To build a module which takes all in coming events and dumps them into
a database:
@event_handler()
def dump_to_db(mongodb, events):
collection = mongodb['event_log']
collection.insert([e.event for e in events])
Except for imports, that's all that's required.
Architecture
Architecture
------------
------------
A block diagram of the overall system is:
A block diagram of where the analytics might fit into an overall
learning system is:


The learning management streams events to the analytics framework. In
The learning management system (and potentially other sources) stream
addition, the modules in the framework will typically have access to
events to the analytics framework. In addition, the modules in the
read replicas of production databases. In practice, a lot of analytics
framework will typically have access to read replicas of production
can be performed directly from the LMS databases with a lot less
databases. In practice, a lot of analytics can be performed directly
effort than processing events.
from the LMS databases with a lot less effort than processing events.
A single module
A rough diagram of a single analytics module is:
A rough diagram of a single analytics module is:
...
@@ -65,11 +146,48 @@ The views and queries are automatically inspect for parameters, and
...
@@ -65,11 +146,48 @@ The views and queries are automatically inspect for parameters, and
the system will do the right thing. If you would like to have a
the system will do the right thing. If you would like to have a
per-module database, simply take a db parameter. Etc.
per-module database, simply take a db parameter. Etc.
To understand
the system, the best place to start is by reading the
To understand
how to build modules in more detail, the best place to
module which defines testcases -- the file
start is by reading the
module which defines testcases -- the file
modules/testmodule/__init__.py. Next place is to look at the code for
modules/testmodule/__init__.py. Next place is to look at the code for
the decorators. Final place is for the main views and dashboard.
the decorators. Final place is for the main views and dashboard.
Using with other LMSes
The architecture is designed to be usable with common analytics shared
between multiple LMSes. The structure for this is:

Here, each instance has a data layer module. This module translates
the data generate by the particular LMS into a common
representation. Higher-level analytics are built on top of that common
representation. We're trying to come up with process for creating this
data layer, but it's not essential we get it 100% right. In most
cases, it is relatively easy to include backwards-compatibility
queries.
Structuring servers
The system is transparent to how analytics are split across
servers. There are several models for how this might be used.
First, we might have a production-grade code on e.g. a critical server
which keeps student profile/grading/etc. information, while still
maintaining prototype analytics servers, which may be on-line more
intermittently:

A second way to use this might be by function. For example, we might
embed analytics in the LMS, in the forums, in the wiki, in the student
registration system, and in other parts of the system. Those would
provide access to data from those subsystems. We may also wish to have
specialized runtimes providing access to additional tools like Hadoop
or R. A single computer can query across all of these servers from the
Insights API:

Installing
Installing
----------
----------
...
@@ -145,11 +263,14 @@ per-course/per-student. An instructor of that course might want to
...
@@ -145,11 +263,14 @@ per-course/per-student. An instructor of that course might want to
have that fixed to the course (so it transforms into a per-student
have that fixed to the course (so it transforms into a per-student
analytic). djobject's transform_embed defines a DSL for restricting
analytic). djobject's transform_embed defines a DSL for restricting
permissions to analytics, as well as for fixing specific commandline
permissions to analytics, as well as for fixing specific commandline
parameters.
parameters. This DSL should be cleaned up, but it's good enough for
now.
Multiple analytics servers can be merged into one djobject.
There is an issue of network reliability and timeouts when access
There is an issue of network reliability and timeouts when access
remotely.
This is planned to be handled by being able to set timeouts
remotely.
You can set timeouts on djembed objects to manage those
on djembed object
s.
issue
s.
Shortcuts/invariants
Shortcuts/invariants
--------------------
--------------------
...
@@ -218,8 +339,10 @@ Gotchas
...
@@ -218,8 +339,10 @@ Gotchas
*
For events to flow in, a decorator in core.views must be
*
For events to flow in, a decorator in core.views must be
called. This must be iported from the main appliction.
called. This must be iported from the main appliction.
*
Number 1 bug: Python path issues if you have this installed and are
*
Sometimes, the network transparency isn't quite right. This is a
developing from source.
bug.
*
Are there still any Python path issues if you have this installed
and are developing from source?
Product Backlog
Product Backlog
---------------
---------------
...
@@ -289,44 +412,16 @@ students, instructors, researchers, marketers, etc.
...
@@ -289,44 +412,16 @@ students, instructors, researchers, marketers, etc.
Architecture Expansions
Architecture Expansions
=======================
=======================
This section lists some long-term architectural design goals of the
system.
The architecture is explicitly designed to eventually scale to running
different analytics on different servers. edinsights.core.djobject
(TODO: change to insights.core) provides a query object and a view
object, which can be used to access queries and views in an identical
way, regardless of whether or not there is a network in between. In
the future, we would like to support an architecture where we have
multiple analytics servers:

This way, we can have production-grade code on e.g. a critical server
which keeps student profile/grading/etc. information, while still
maintaining prototype analytics servers, which may be on-line more
intermittently. In order to support this, the djobject abstraction
would have to be extended to support multiple servers. In addition,
the current way the analytics embed in the courseware would have to
change substantially.
In addition, the architecture is designed to scale to sharing
analytics between LMSes. A potential structure for this is:

Here, each instance would have a data layer module. This module would
translate the data generate by the particular LMS into a common
representation. Analytics would be built on top of that common
representation.
We would like to also support FERPA-compliance. This could be built in
We would like to also support FERPA-compliance. This could be built in
one of two ways. Per-school stacks, including analytics:
one of two ways. Per-school stacks, including analytics:
Split analytics:
Split analytics:
The API supports either. Building out back-end support for either
The API supports either. Building out back-end support for either
would be substantial.
would be substantial
work
.
Other edX Code
Other edX Code
==============
==============
...
...
docs/app_store.png
0 → 100644
View file @
20e1fcd9
470 KB
docs/grade_histogram.png
0 → 100644
View file @
20e1fcd9
18.5 KB
src/edinsights/core/decorators.py
View file @
20e1fcd9
...
@@ -67,6 +67,10 @@ def view(category = None, name = None, description = None, args = None):
...
@@ -67,6 +67,10 @@ def view(category = None, name = None, description = None, args = None):
args: Optional argspec for the function. This is generally better
args: Optional argspec for the function. This is generally better
omitted.
omitted.
TODO: human_name: Name without Python name restrictions -- e.g.
"Daily uploads" instead of "daily_uploads" -- for use in
human-usable dashboards.
'''
'''
def
view_factory
(
f
):
def
view_factory
(
f
):
registry
.
register_handler
(
'view'
,
category
,
name
,
description
,
f
,
args
)
registry
.
register_handler
(
'view'
,
category
,
name
,
description
,
f
,
args
)
...
...
src/edinsights/core/registry.py
View file @
20e1fcd9
...
@@ -33,11 +33,21 @@ def register_handler(cls, category, name, description, f, args):
...
@@ -33,11 +33,21 @@ def register_handler(cls, category, name, description, f, args):
category
+=
"+"
category
+=
"+"
if
cls
not
in
request_handlers
:
if
cls
not
in
request_handlers
:
request_handlers
[
cls
]
=
{}
request_handlers
[
cls
]
=
{}
if
name
in
request_handlers
[
cls
]:
# We used to have this be an error.
# We may want to register under multiple names. E.g.
# We changed to a warning for the way we handle dummy values.
# edx.get_grades and (once adopted globally) generic
log
.
warn
(
"{0} already in {1}"
.
format
(
name
,
category
))
# raise KeyError(name+" already in "+category)
# get_grades
request_handlers
[
cls
][
name
]
=
{
'function'
:
f
,
'name'
:
name
,
'doc'
:
description
,
'category'
:
category
}
if
isinstance
(
name
,
list
):
names
=
name
else
:
names
=
[
name
]
for
n
in
names
:
if
n
in
request_handlers
[
cls
]:
# We used to have this be an error.
# We changed to a warning for the way we handle dummy values.
log
.
warn
(
"{0} already in {1}"
.
format
(
n
,
category
))
# raise KeyError(name+" already in "+category)
request_handlers
[
cls
][
n
]
=
{
'function'
:
f
,
'name'
:
n
,
'doc'
:
description
,
'category'
:
category
}
class
StreamingEvent
:
class
StreamingEvent
:
''' Event object. Behaves like the normal JSON event dictionary,
''' Event object. Behaves like the normal JSON event dictionary,
...
...
src/edinsights/modules/testmodule/__init__.py
View file @
20e1fcd9
...
@@ -164,3 +164,15 @@ def djt_fake_user_count(query):
...
@@ -164,3 +164,15 @@ def djt_fake_user_count(query):
the network, as well as optional parameters like fs, db, etc.
the network, as well as optional parameters like fs, db, etc.
'''
'''
return
"<html>Users: {uc}</html>"
.
format
(
uc
=
query
.
djt_fake_user_count
())
return
"<html>Users: {uc}</html>"
.
format
(
uc
=
query
.
djt_fake_user_count
())
@query
(
name
=
[
'djt_three_name'
,
'edx_djt_three_name'
,
'edx.djt_three_name'
])
def
djt_three_name
():
return
"I have three names"
@query
(
name
=
'djt_check_three_name'
)
def
check_three_name
(
query
):
if
query
.
djt_three_name
()
!=
"I have three names"
:
raise
Exception
(
"oops"
)
if
query
.
edx_djt_three_name
()
!=
"I have three names"
:
raise
Exception
(
"oops"
)
return
"Works"
src/edinsights/modules/tests.py
View file @
20e1fcd9
...
@@ -151,3 +151,8 @@ class SimpleTest(TestCase):
...
@@ -151,3 +151,8 @@ class SimpleTest(TestCase):
c
=
Client
()
c
=
Client
()
response
=
c
.
get
(
'/view/djt_fake_user_count'
)
.
content
response
=
c
.
get
(
'/view/djt_fake_user_count'
)
.
content
self
.
assertEqual
(
response
,
"<html>Users: 2</html>"
)
self
.
assertEqual
(
response
,
"<html>Users: 2</html>"
)
def
test_multiname
(
self
):
c
=
Client
()
response
=
c
.
get
(
'/query/djt_check_three_name'
)
.
content
self
.
assertEqual
(
response
,
"Works"
)
src/edinsights/settings.py
View file @
20e1fcd9
...
@@ -196,9 +196,23 @@ LOGGING = {
...
@@ -196,9 +196,23 @@ LOGGING = {
#initialize celery
#initialize celery
import
djcelery
import
djcelery
djcelery
.
setup_loader
()
djcelery
.
setup_loader
()
#import the settings for celery from the edinsights module
#import the settings for celery from the edinsights module and for cache
from
edinsights.celerysettings_dev
import
*
try
:
from
celerysettings_dev
import
*
# import django cache settings
from
djangocachesettings_dev
import
*
from
edinsights.djangocachesettings_dev
import
*
except
:
\ No newline at end of file
# The code had the imports below. These fail when running test
# cases stand-alone. I think the above fixes this, but I'm
# leaving this in for now in case there are configurations I
# haven't thought of. If the exception is raised, remove this
# comment, remove the exception, and add a comment explaining
# when the second set of imports is necessary.
#
# If it's, say, October, and no one has run into the exception,
# we should kill the extra code.
#
# pmitros -- 21/July/2013.
raise
Exception
(
"Import failed. See instructions in settings.py"
)
from
edinsights.djangocachesettings_dev
import
*
from
edinsights.celerysettings_dev
import
*
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment