Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
E
edx-platform
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
edx
edx-platform
Commits
8d827332
Commit
8d827332
authored
Mar 24, 2014
by
Alison Hodges
Committed by
Mark Hoeber
Mar 28, 2014
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
AN-167 New chapter on role & skills of data czar and research team
parent
5471bab8
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
170 additions
and
0 deletions
+170
-0
docs/en_us/data/source/index.rst
+1
-0
docs/en_us/data/source/internal_data_formats/change_log.rst
+2
-0
docs/en_us/data/source/internal_data_formats/data_czar.rst
+167
-0
No files found.
docs/en_us/data/source/index.rst
View file @
8d827332
...
...
@@ -51,6 +51,7 @@ These documents describe how we store course structure, student state/progress,
:maxdepth: 2
internal_data_formats/change_log.rst
internal_data_formats/data_czar.rst
internal_data_formats/sql_schema.rst
internal_data_formats/discussion_data.rst
internal_data_formats/wiki_data.rst
...
...
docs/en_us/data/source/internal_data_formats/change_log.rst
View file @
8d827332
...
...
@@ -10,6 +10,8 @@ Change Log
* - Date
- Change
* - 28 Mar 2014
- Added the :ref:'Data_Czar' chapter.
* - 24 Mar 2014
- Added the ``user_api_usercoursetag`` table to the :ref:`Student_Info` chapter and the ``assigned_user_to_partition`` and ``child_render`` event types to the :ref:`Tracking Logs` chapter.
* - 19 Mar 2014
...
...
docs/en_us/data/source/internal_data_formats/data_czar.rst
0 → 100644
View file @
8d827332
.. _Data_Czar:
####################################################
Data Czar/Data Team Selection and Responsibilities
####################################################
A data czar is the single representative at a partner institution who has the
credentials to download and decrypt edX data packages. The data czar is
responsible for transferring data securely to researchers and other interested
parties after it is received. Due to the sensitivity of this data, the
responsibility for these activities is restricted to one individual. At each
partner institution, the data czar is the primary point of contact for
information about edX data.
* :ref:`Skills_Experience_Data_Czar`
* :ref:`Getting_Credentials_Data_Czar`
* :ref:`Resources_Information`
At some institutions, only the data czar works on research projects that use
the course data in edX data packages. At other institutions, the data czar
works with a team of additional contributors, or is responsible only for
making a secure transfer of the data to the research team. Typically, the data
team includes members in the following roles (or a data czar with these skill
sets):
* Database administrators work with the SQL and NoSQL data files and write
queries on the data.
* Statisticians and data analysts mine the data.
* Educational researchers pose questions and interpret the results of queries on the data.
See :ref:`Skills_Experience_Contributors`.
All of the individuals who are permitted to access the data should be trained
in, and comply with, their institution's secure data handling protocols.
.. _Skills_Experience_Data_Czar:
**************************************
Skills and Experience of Data Czars
**************************************
The individuals who are selected by a partner institution to be edX data czars
typically have experience working with sensitive student data, are familiar
with encryption/decryption and file transfer protocols, and can validate,
copy, move, and store large files. The data czar is responsible for ensuring
compliance with your institution's and country's regulations with respect to
the sharing of this data.
=====================
General Skills
=====================
- Ability to set up and manage data access.
- Knowledgeable of general data privacy and security best practices.
- Experience with management of sensitive student data.
=====================
Technical Skills
=====================
- Familiarity with PGP and GPG encryption and decryption.
- Ability to download large files from Amazon Web Service (AWS) Simple Storage
Service (S3).
- Experience working with archive files in TAR, GZ, and ZIP formats.
- Familiarity with SQL and noSQL (Mongo) databases.
- Familiarity with CSV and JSON file formats.
- Experience copying, moving, and storing large files in bulk.
- Ability to validate the data and files received and distributed.
.. _Getting_Credentials_Data_Czar:
**************************************
Getting Credentials for Data Czars
**************************************
The designated data czar at each institution works with an edX Program Manager
to set up a public/private key pair for GNU Privacy Guard (GNUPG).
* The edX Analytics team creates an account on the Amazon Web Service (AWS)
Simple Storage Service (S3), and provides the Program Manager with the
public key for account access.
* When a data package is available, the data czar downloads it from S3 and
decrypts it using the private key.
For detailed information on this procedure, see the *How Do I get my Research
Data Package?* article on the Open edX Analytics wiki_.
.. _wiki: https://edx-wiki.atlassian.net/wiki/pages/viewpage.action?pageId=36044863
.. _Resources_Information:
**************************************
Resources and Information
**************************************
The edX Analytics team adds every data czar to a Google Group and mailing
list_ called course-data.
.. _list: http://groups.google.com/a/edx.org/forum/#!forum/course-data
EdX also hosts an **Open edX Analytics** wiki_ that is available to the
public. The wiki provides links to the engineering roadmap, information about
operational issues, and release notes describing past releases.
.. _wiki: http://edx-wiki.atlassian.net/wiki/display/OA/Open+edX+Analytics+Home
.. _Skills_Experience_Contributors:
*************************************************
Skills and Experience of Other Contributors
*************************************************
In addition to the data czar, each partner institution assembles a team of
contributors to their research projects. This team can include database
administrators, software engineers, data specialists, and educational
researchers. The team can be large or small, but collectively its members need
to be able to work with SQL and NoSQL databases, write queries, and convert
the data from raw formats into standard research packages, such as CSV files,
spreadsheets, or other desired formats.
=====================
General Skills
=====================
- Attention to detail.
- Experience setting up and testing a data conversion pipeline.
- Ability to identify interesting features in a complex and rich data set.
- Familiarity with anonymization and obfuscation techniques.
- Familiarity with data privacy and security best practices.
- Experience managing sensitive student data.
=====================
Technical Skills
=====================
- Familiarity with CSV, MongoDB, JSON, Unicode, XML, HTML.
- Ability to set up, query, and administer both SQL and noSQL databases.
- Experience with console/bash scripts.
- Basic or advanced scripting (for example, using Python or Ruby) to convert,
join, and aggregate data from different data sources, handle JSON
serialization, and Unicode specificities.
- Experience with data mining and data aggregation across a rich, varied data set.
- Ability to write parsing scripts that properly handle JSON serialization and
Unicode.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment