Commit 484258bb by Piotr Mitros

Expiration logic and tests now work

parent 2adab303
......@@ -2,7 +2,8 @@ analytics-experiments
=====================
This is a development version of an analytics framework for the edX
infrastructure. The goal of this framework is to define an
infrastructure. It will replace the ad-hoc analytics used in the
software previously. The goal of this framework is to define an
architecture for simple, pluggable analytics modules. The architecture
must have the following properties:
......@@ -19,18 +20,39 @@ each others' work.
Architecture
------------
The analytics framework has access to the event stream from the
tracking architecture, a read-replica of the main database, as well
as, in the future, course definition database, and read-replicas of
auxilliary databases, such as discussion forums.
edX currently has many sources of data:
1. User tracking events. The software has a middleware layer which
captures all data (within reason) that the user sends to the
server. In addition, the server is instrumented, where necessary,
to capture context for this (e.g. for a problem submission, we also
need to capture the random number seed used to generate that
problem). The JavaScript is instrumented to capture most of what
the user does client-side (e.g. textbook page turns or video
scrubs). These events are captured in a Python logger, and streamed
into this framework.
2. Databases. The application-layer of the analytics framework has or
will have access to these through read replicas. While virtually
all of the information is in the events, in practice, most
analytics can be performed from just the databases. This is
generally both much easier, and less sensitive to breaking when the
software changes.
3. External services used for surveys and mailings. This is not
currently integrated.
4. Course data. Most of this is in the read replica databases, but for
some courses, this lives in github repositories of XML files. This
is not currently integrated.
5. Course-specific services (e.g. CS50 forums, Berkeley graders, etc.)
6. E-mails to course staff.
7. Anecdotal interactions.
A block diagram of the overall system is:
![System structure](docs/system_structure.png)
Each module in the analytics framework has its own Mongo database. In
addition, in the near future, it should have read-only access to the
DBs associated with other modules.
Each module in the analytics framework has its own Mongo database, as
well as a filesystem abstraction. In addition, in the near future, it
should have read-only access to the DBs associated with other modules.
The module consists of a set of functions which can be decorated as:
* Event handlers. These receive tracking events.
......
......@@ -33,8 +33,8 @@ def expire_objects():
module = None
for o in objects:
if module != o.module:
fs = get_filesystem(module)
module = o.module
fs = get_filesystem(module)
if fs.exists(o.filename):
fs.remove(o.filename)
o.delete()
......@@ -43,14 +43,14 @@ def patch_fs(fs, namespace, url_method):
''' Patch a filesystem object to add get_url method and
expire method.
'''
def expire(self, filename, seconds, days=0, expire = True):
def expire(self, filename, seconds, days=0, expires = True):
''' Set the lifespan of a file on the filesystem.
filename: Name of file
expire: False means the file will never be removed
seconds and days give time to expiration.
'''
FSExpirations.create_expiration(cls, namespace, filename, expires, seconds, days=days)
FSExpirations.create_expiration(namespace, filename, seconds, days=days, expires = expires)
fs.expire = types.MethodType(expire, fs)
fs.get_url = types.MethodType(url_method, fs)
......
......@@ -32,7 +32,7 @@ def clear_database(db):
return "Database clear"
@event_handler()
def event(db, events):
def event_count_event(db, events):
for evt in events:
if 'user' in evt:
collection = db['user_event_count']
......@@ -51,7 +51,42 @@ def event(db, events):
return 0
@event_handler()
def event(fs, events):
def python_fs_forgets(fs, events):
''' Test case for checking whether the file system properly forgets.
To write a file:
{ 'fs_forgets_contents' : True,
'filename' : "foo.txt",
'contents' : "hello world!"}
To set or change expiry on a file:
{ 'fs_forgets_expiry' : -5,
'filename' : "foo.txt"}
The two may be combined into one operation.
'''
def checkfile(filename, contents):
if not fs.exists(filename):
return False
if fs.open(filename).read == contents:
return True
raise Exception("File contents do not match")
for evt in events:
if 'fs_forgets_contents' in evt:
f=fs.open(evt['filename'], 'w')
f.write(evt['fs_forgets_contents'])
f.close()
if 'fs_forgets_expiry' in evt:
try:
fs.expire(evt['filename'], evt['fs_forgets_expiry'])
except:
print "Failed"
import traceback
traceback.print_exc()
return 0
@event_handler()
def python_fs_event(fs, events):
for evt in events:
if 'event' in evt and evt['event'] == 'pyfstest':
if 'create' in evt:
......
......@@ -60,7 +60,6 @@ class SimpleTest(TestCase):
response = c.get('/event?msg=%7B%22user%22:%22alice%22%7D')
response = c.get('/query/user/user_event_count?user=alice')
self.assertEqual(response.content, "2")
print response
def test_osfs_works(self):
''' Make sure there is no file. Create a file. Read it. Erase it. Confirm it is gone.
......@@ -72,3 +71,25 @@ class SimpleTest(TestCase):
response = c.get('/query/filename/readfile?filename=foo.txt')
self.assertEqual(self.send_event(c, {'event':'pyfstest', 'delete' : 'foo.txt'}).content, "Success")
response = c.get('/query/filename/readfile?filename=foo.txt')
def test_osfs_forgets(self):
c = Client()
def verify(d):
for key in d:
r = json.loads(c.get('/query/filename/readfile?filename='+key).content)
if d[key]:
self.assertEqual(r, "hello world!")
else:
self.assertEqual(r, "File not found")
self.send_event(c, { 'fs_forgets_contents' : "hello world!", 'filename' : "foo1.txt", 'fs_forgets_expiry' : -5})
self.send_event(c, { 'fs_forgets_contents' : "hello world!", 'filename' : "foo2.txt", 'fs_forgets_expiry' : -5})
self.send_event(c, { 'fs_forgets_contents' : "hello world!", 'filename' : "foo3.txt", 'fs_forgets_expiry' : 15})
self.send_event(c, { 'fs_forgets_contents' : "hello world!", 'filename' : "foo4.txt", 'fs_forgets_expiry' : 15})
verify({"foo1.txt":True, "foo2.txt":True, "foo3.txt":True, "foo4.txt":True})
from modulefs.modulefs import expire_objects
expire_objects()
verify({"foo1.txt":False, "foo2.txt":False, "foo3.txt":True, "foo4.txt":True})
self.send_event(c, { 'filename' : "foo3.txt", 'fs_forgets_expiry' : -15})
self.send_event(c, { 'filename' : "foo4.txt", 'fs_forgets_expiry' : -15})
expire_objects()
verify({"foo1.txt":False, "foo2.txt":False, "foo3.txt":False, "foo4.txt":False})
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment