Merge pull request #3038 from edx/ahodges/documentation/dataczar
AN-167 New chapter on role & skills of data czar and research team
This commit is contained in:
@@ -51,6 +51,7 @@ These documents describe how we store course structure, student state/progress,
|
||||
:maxdepth: 2
|
||||
|
||||
internal_data_formats/change_log.rst
|
||||
internal_data_formats/data_czar.rst
|
||||
internal_data_formats/sql_schema.rst
|
||||
internal_data_formats/discussion_data.rst
|
||||
internal_data_formats/wiki_data.rst
|
||||
|
||||
@@ -10,6 +10,8 @@ Change Log
|
||||
|
||||
* - Date
|
||||
- Change
|
||||
* - 28 Mar 2014
|
||||
- Added the :ref:'Data_Czar' chapter.
|
||||
* - 24 Mar 2014
|
||||
- Added the ``user_api_usercoursetag`` table to the :ref:`Student_Info` chapter and the ``assigned_user_to_partition`` and ``child_render`` event types to the :ref:`Tracking Logs` chapter.
|
||||
* - 19 Mar 2014
|
||||
|
||||
167
docs/en_us/data/source/internal_data_formats/data_czar.rst
Normal file
167
docs/en_us/data/source/internal_data_formats/data_czar.rst
Normal file
@@ -0,0 +1,167 @@
|
||||
.. _Data_Czar:
|
||||
|
||||
####################################################
|
||||
Data Czar/Data Team Selection and Responsibilities
|
||||
####################################################
|
||||
|
||||
A data czar is the single representative at a partner institution who has the
|
||||
credentials to download and decrypt edX data packages. The data czar is
|
||||
responsible for transferring data securely to researchers and other interested
|
||||
parties after it is received. Due to the sensitivity of this data, the
|
||||
responsibility for these activities is restricted to one individual. At each
|
||||
partner institution, the data czar is the primary point of contact for
|
||||
information about edX data.
|
||||
|
||||
* :ref:`Skills_Experience_Data_Czar`
|
||||
|
||||
* :ref:`Getting_Credentials_Data_Czar`
|
||||
|
||||
* :ref:`Resources_Information`
|
||||
|
||||
At some institutions, only the data czar works on research projects that use
|
||||
the course data in edX data packages. At other institutions, the data czar
|
||||
works with a team of additional contributors, or is responsible only for
|
||||
making a secure transfer of the data to the research team. Typically, the data
|
||||
team includes members in the following roles (or a data czar with these skill
|
||||
sets):
|
||||
|
||||
* Database administrators work with the SQL and NoSQL data files and write
|
||||
queries on the data.
|
||||
|
||||
* Statisticians and data analysts mine the data.
|
||||
|
||||
* Educational researchers pose questions and interpret the results of queries on the data.
|
||||
|
||||
See :ref:`Skills_Experience_Contributors`.
|
||||
|
||||
All of the individuals who are permitted to access the data should be trained
|
||||
in, and comply with, their institution's secure data handling protocols.
|
||||
|
||||
.. _Skills_Experience_Data_Czar:
|
||||
|
||||
**************************************
|
||||
Skills and Experience of Data Czars
|
||||
**************************************
|
||||
|
||||
The individuals who are selected by a partner institution to be edX data czars
|
||||
typically have experience working with sensitive student data, are familiar
|
||||
with encryption/decryption and file transfer protocols, and can validate,
|
||||
copy, move, and store large files. The data czar is responsible for ensuring
|
||||
compliance with your institution's and country's regulations with respect to
|
||||
the sharing of this data.
|
||||
|
||||
=====================
|
||||
General Skills
|
||||
=====================
|
||||
|
||||
- Ability to set up and manage data access.
|
||||
|
||||
- Knowledgeable of general data privacy and security best practices.
|
||||
|
||||
- Experience with management of sensitive student data.
|
||||
|
||||
=====================
|
||||
Technical Skills
|
||||
=====================
|
||||
|
||||
- Familiarity with PGP and GPG encryption and decryption.
|
||||
|
||||
- Ability to download large files from Amazon Web Service (AWS) Simple Storage
|
||||
Service (S3).
|
||||
|
||||
- Experience working with archive files in TAR, GZ, and ZIP formats.
|
||||
|
||||
- Familiarity with SQL and noSQL (Mongo) databases.
|
||||
|
||||
- Familiarity with CSV and JSON file formats.
|
||||
|
||||
- Experience copying, moving, and storing large files in bulk.
|
||||
|
||||
- Ability to validate the data and files received and distributed.
|
||||
|
||||
.. _Getting_Credentials_Data_Czar:
|
||||
|
||||
**************************************
|
||||
Getting Credentials for Data Czars
|
||||
**************************************
|
||||
|
||||
The designated data czar at each institution works with an edX Program Manager
|
||||
to set up a public/private key pair for GNU Privacy Guard (GNUPG).
|
||||
|
||||
* The edX Analytics team creates an account on the Amazon Web Service (AWS)
|
||||
Simple Storage Service (S3), and provides the Program Manager with the
|
||||
public key for account access.
|
||||
|
||||
* When a data package is available, the data czar downloads it from S3 and
|
||||
decrypts it using the private key.
|
||||
|
||||
For detailed information on this procedure, see the *How Do I get my Research
|
||||
Data Package?* article on the Open edX Analytics wiki_.
|
||||
|
||||
.. _wiki: https://edx-wiki.atlassian.net/wiki/pages/viewpage.action?pageId=36044863
|
||||
|
||||
.. _Resources_Information:
|
||||
|
||||
**************************************
|
||||
Resources and Information
|
||||
**************************************
|
||||
|
||||
The edX Analytics team adds every data czar to a Google Group and mailing
|
||||
list_ called course-data.
|
||||
|
||||
.. _list: http://groups.google.com/a/edx.org/forum/#!forum/course-data
|
||||
|
||||
EdX also hosts an **Open edX Analytics** wiki_ that is available to the
|
||||
public. The wiki provides links to the engineering roadmap, information about
|
||||
operational issues, and release notes describing past releases.
|
||||
|
||||
.. _wiki: http://edx-wiki.atlassian.net/wiki/display/OA/Open+edX+Analytics+Home
|
||||
|
||||
.. _Skills_Experience_Contributors:
|
||||
|
||||
*************************************************
|
||||
Skills and Experience of Other Contributors
|
||||
*************************************************
|
||||
|
||||
In addition to the data czar, each partner institution assembles a team of
|
||||
contributors to their research projects. This team can include database
|
||||
administrators, software engineers, data specialists, and educational
|
||||
researchers. The team can be large or small, but collectively its members need
|
||||
to be able to work with SQL and NoSQL databases, write queries, and convert
|
||||
the data from raw formats into standard research packages, such as CSV files,
|
||||
spreadsheets, or other desired formats.
|
||||
|
||||
=====================
|
||||
General Skills
|
||||
=====================
|
||||
|
||||
- Attention to detail.
|
||||
|
||||
- Experience setting up and testing a data conversion pipeline.
|
||||
|
||||
- Ability to identify interesting features in a complex and rich data set.
|
||||
|
||||
- Familiarity with anonymization and obfuscation techniques.
|
||||
|
||||
- Familiarity with data privacy and security best practices.
|
||||
|
||||
- Experience managing sensitive student data.
|
||||
|
||||
=====================
|
||||
Technical Skills
|
||||
=====================
|
||||
|
||||
- Familiarity with CSV, MongoDB, JSON, Unicode, XML, HTML.
|
||||
|
||||
- Ability to set up, query, and administer both SQL and noSQL databases.
|
||||
|
||||
- Experience with console/bash scripts.
|
||||
|
||||
- Basic or advanced scripting (for example, using Python or Ruby) to convert,
|
||||
join, and aggregate data from different data sources, handle JSON
|
||||
serialization, and Unicode specificities.
|
||||
|
||||
- Experience with data mining and data aggregation across a rich, varied data set.
|
||||
|
||||
- Ability to write parsing scripts that properly handle JSON serialization and
|
||||
Unicode.
|
||||
Reference in New Issue
Block a user