diff --git a/docs/en_us/data/source/Images/AWS_Credentials.png b/docs/en_us/data/source/Images/AWS_Credentials.png new file mode 100644 index 0000000000..31fa3df1fd Binary files /dev/null and b/docs/en_us/data/source/Images/AWS_Credentials.png differ diff --git a/docs/en_us/data/source/Images/Data_Czar_Initialization.png b/docs/en_us/data/source/Images/Data_Czar_Initialization.png new file mode 100644 index 0000000000..d5ace3d687 Binary files /dev/null and b/docs/en_us/data/source/Images/Data_Czar_Initialization.png differ diff --git a/docs/en_us/data/source/internal_data_formats/credentials.rst b/docs/en_us/data/source/internal_data_formats/credentials.rst index 43d16c2271..1609594303 100644 --- a/docs/en_us/data/source/internal_data_formats/credentials.rst +++ b/docs/en_us/data/source/internal_data_formats/credentials.rst @@ -1,87 +1,89 @@ .. _Getting_Credentials_Data_Czar: #################################################### -Data Transfers +Keys and Credentials for Data Transfers #################################################### EdX transfers course data to the data czars at our partner institutions in regularly generated data packages. Data packages are delivered to a single contact at each university, referred to as the "data czar". -When a data czar is selected at each institution, he or she works with an edX -Program Manager to set up credentials for securely transferring course data -from edX to the partner institution. When this initial step is complete, the -data czar can download data packages when they are available. +The data czar who is selected at each institution sets up encryption "keys" +for securely transferring files from edX to the partner institution. Meanwhile, +the Analytics team at edX sets up credentials so that the data czar can log in +to the site where data packages are stored. + + .. image:: ../Images/Data_Czar_Initialization.png + :alt: Flowchart of data czar creating public and private keys and sending the + public key to edX, and of edX creating data storage credentials and + encrypting those credentials with the public key before sending them to + the data czar + +After these steps for setting up credentials are complete, the data czar can +download data packages. **************************************************************** -Credentials Created By Data Czars for Encryption and Decryption +Keys Created by Data Czars for Encryption and Decryption **************************************************************** -To assure the security of data packages, edX encrypts course data before -transferring it to a partner institution. As a result, when you receive a data -package you must decrypt the data before it can be added to a database, -queried, or used in any other way. +To assure the security of data packages, the edX Analytics team encrypts all +files before transferring them to a partner institution. As a result, when you +receive a data package (or any other file from the edX Analytics team) you must +decrypt the data before it can be used in any way. -To set up the credentials needed for this encryption and decryption process, -edX has selected GNU Privacy Guard (GnuPG). Each data czar sets up the -credentials that will be used for their institution's data packages. This -entails the creation of a public/private key pair. Essentially, you install an -application on your local computer and then supply your email address and a -secret passphrase (a password). The application uses this information to create -a private key for *decrypting* your data packages and also the unique public -key that you send to edX to use in *encrypting* your data packages. +To create the keys needed for this encryption and decryption process, you use +GNU Privacy Guard (GnuPG or GPG). Essentially, you install a cryptographic +application on your local computer and supply your email address and a secret +passphrase (a password). The application uses this information to create both a +private key for you to use for *decrypting* files from edX and also the unique +public key that you send to edX to use in *encrypting* your data packages and +files. Each data czar creates his or her own private and public key pair to use +with edX files. -.. note:: The domain in the email address you supply when you create your credentials must be for your official email address for your edX partner institution. +.. note:: The email address that you supply when you create your keys must be your official email address at your edX partner institution. -Creating these credentials is a one-time process that you coordinate with your -edX Program Manager. Instructions for creating the credentials on Windows and -Macintosh follow. +Creating these keys is a one-time process that you coordinate with your edX +Program Manager. Instructions for creating the keys on Windows or Macintosh +follow. -For more information about the key pairs, see `chapter 3 in the Gpg4win -Compendium`_. +For more information about GPG encryption and creating key pairs, see the +`Gpg4win Compendium`_. -.. _chapter 3 in the Gpg4win Compendium: http://www.gpg4win.org/doc/en/gpg4win-compendium_8.html +.. _Gpg4win Compendium: http://www.gpg4win.org/doc/en/gpg4win-compendium.html -Create Credentials: Windows +Create Keys: Windows ----------------------------------------- #. Go to the Gpg4win_ website and download the most recent version of Gpg4win. - For detailed installation instructions, see `chapter 6 in the Gpg4win Compendium`_. - -2. Install Gpg4win and then open the Kleopatra Gpg4win application. A wizard +#. Install Gpg4win and then open the Kleopatra Gpg4win application. A wizard presents a series of dialog boxes to collect information from you and generate your public key (called a certificate in Kleopatra). - * When you are prompted to specify the type of key pair you want, click - **Create personal OpenPGP key pair**. + a. When you are prompted to specify the type of key pair you want, click + **Create personal OpenPGP key pair**. - * When you are prompted for your email address, be sure to enter your - official university or institution email address. EdX cannot use public - keys that are based on personal or other non-official email addresses to - encrypt data. + b. When you are prompted for your email address, be sure to enter your + official university or institution email address. EdX cannot use public + keys that are based on personal or other non-official email addresses to + encrypt data. - * When you are prompted for a passphrase, enter a strong passphrase. For - information about passphrases, see `chapter 4 in the Gpg4win - Compendium`_. Be sure to remember your secret passphrase: you will use it - to decrypt your data packages. + c. When you are prompted for a passphrase, enter a strong passphrase. Be + sure to remember your passphrase: you will use it to decrypt your data + packages. - For detailed instructions, see `chapter 7 of the Gpg4win Compendium`_. - -3. When Kleopatra presents the "Key Pair Successfully Created" dialog box, +3. When Kleopatra presents the Key Pair Successfully Created dialog box, click **Send Certificate by EMail** to send the public key (and only the public key) to your edX Program Manager. -#. You can also click **Make a Backup Copy of Your Key Pair** to store both of - the keys on a removable data storage device. Do not reveal your secret - passphrase, or share your private key, with anyone else. +#. Optionally, click **Make a Backup Copy of Your Key Pair** to store both of + the keys on a removable data storage device. + +.. important:: Do not reveal your passphrase, or share your private key, with anyone else. .. _Gpg4win: http://gpg4win.org/ -.. _chapter 4 in the Gpg4win Compendium: http://www.gpg4win.org/doc/en/gpg4win-compendium_9.html -.. _chapter 6 in the Gpg4win Compendium: http://www.gpg4win.org/doc/en/gpg4win-compendium_11.html -.. _chapter 7 in the Gpg4win Compendium: http://www.gpg4win.org/doc/en/gpg4win-compendium_12.html -Create Credentials: Macintosh +Create Keys: Macintosh -------------------------------------------- #. Go to the `GPG Tools`_ website. Scroll down to the **GPG Suite** section of @@ -90,7 +92,7 @@ Create Credentials: Macintosh #. When the download is complete, click the .dmg file to begin the installation. -#. When installation is complete, GPG Keychain Access opens both a web page of +#. When installation is complete, GPG Keychain Access opens a web page with `First Steps`_ and a dialog box. #. Enter your name and email address. Be sure to enter your official university @@ -99,59 +101,109 @@ Create Credentials: Macintosh #. Click **Generate key**. A dialog box opens to prompt you for a passphrase. -#. Enter a strong passphrase. For information about passphrases, see `chapter 4 - in the Gpg4win Compendium`_. Be sure to remember your secret passphrase: you - will use it to decrypt your data packages. +#. Enter a strong passphrase. Be sure to remember your passphrase: you will use + it to decrypt your data packages. #. To send only your public key to your edX Program Manager, click the key and then click **Export**. A dialog box opens. -#. Specify a name and a place to save the file. Accept the default value in the - **Format** of ASCII and leave **Allow secret key export** cleared. + a. Specify a file name and location to save the file. + + b. Make sure that **Format** is ASCII. + + c. Make sure that **Allow secret key export** is cleared. -#. Compose a new e-mail message to your edX program manager. Attach the .asc - file that you saved in the previous step to the message, and then send the +#. Compose an e-mail message to your edX program manager. Attach the .asc + file that you saved in the previous step to the message then send the message. .. _GPG Tools: https://gpgtools.org/ .. _First Steps: http://support.gpgtools.org/kb/how-to/first-steps-where-do-i-start-where-do-i-begin#setupkey +**************************************************************** +Credentials Created by edX for Accessing Data Storage +**************************************************************** +The data packages that edX prepares for each partner organization are uploaded +to the Amazon Web Service (AWS) Simple Storage Service (S3). The edX Analytics +team creates an individual account to access this storage service for each data +czar. The credentials for accessing this account are called an Access Key +and a Secret Key. +After the edX Analytics team creates these access credentials for you, they are +encrypted (using the public encryption key that you sent your Program Manager) +into a **credentials.csv.gpg** file. This file is then sent to you, securely, +as an email attachment. +The **credentials.csv.gpg** file is likely to be the first file that you +decrypt with your private GPG key. You use the same process to decrypt the data +package files that you retrieve from Amazon S3. +.. image +.. _Decrypt an Encrypted File: +Decrypt an Encrypted File +-------------------------- +To work with an encrypted .gpg file, you use the same GNU Privacy Guard program +that you used to create your public/private key pair. You use your private key +to decrypt the Amazon S3 credentials file and the files in your data packages. +#. Save the encrypted file in an accessible location. -* The edX Analytics team creates an account on the Amazon Web Service (AWS) - Simple Storage Service (S3), and provides the Program Manager with the - public key for account access. +#. On a Windows computer, open Windows Explorer. On a Macintosh, open Finder. -* When a data package is available, the data czar downloads it from S3 and - decrypts it using the private key. +#. Navigate to the file and right-click on it. + +#. On a Windows computer, select **Decrypt and verify** and then click + **Decrypt/Verify**. On a Macintosh, select **Services** and then click + **OpenPGP: Decrypt File**. -.. xref to this chapter from the How Do I Get My Research Data Package? article on the Open edX Analytics wiki. +#. Enter your passphrase. The GNU Privacy Guard program decrypts the file. + +For example, when you decrypt the credentials.csv.gpg file the result is a +credentials.csv file. When you open the credentials.csv file it contains your +email address, your Access Key, and your Secret Key. + .. image:: ../Images/AWS_Credentials.png + :alt: A csv file, open in Notepad, with the access key value and the secret key value underlined +Access Amazon S3 and Download Data Packages +-------------------------------------------- +To connect to Amazon S3, you must have your decrypted credentials. You may want +to have a third-party tool that gives you a user interface for managing files +and transferring them from Amazon S3 to your network. Some data czars use +applications like CloudBerry Explorer for Amazon S3, Bucket Explorer, or S3 +Browser. Alternatively, you can use the `AWS Command Line Interface`_. +#. Select and install a third-party tool or interface to manage your S3 + account. -EdX stores the data packages in a secure bucket on the Amazon Web Services (AWS) Simple Storage Service (Amazon S3). Only the data czar is given access credentials (a user name and password) to the AWS S3 account. +#. Open your decrypted credentials.csv file. This file contains your AWS Access + Key and your AWS Secret Key. -To gain access to the AWS S3 account, the data czar must complete these steps: +#. Open the third-party tool. In most tools, you set up information about the + S3 account and then supply your Access Key and your Secret Key to connect to + that account. For more information, refer to the documentation for the tool + that you selected. - Create the encryption keys. - Receive an email message from edX. +#. Access Amazon S3 and navigate to the edX **course-data** bucket. For each + period that a data package is prepared for your organization, two files are + available. + Event tracking data is in a file named {date}-{organization}-tracking.tar. + Database data files are in a file named {organization}-{date}.zip. +#. Download the files. These files can become very large, sometimes several + gigabytes in size. +#. Extract the files from the compressed .tar and the .zip files. All of the + files that you extract are .gpg files. -Then, to retrieve the first (and each subsequent) data package, the data czar must complete these steps: +#. Use your private key to decrypt the .gpg files. See `Decrypt an Encrypted + File`_. - Access the AWS S3 account. - Download the data package. - Decrypt the data package. +.. _AWS Command Line Interface: http://aws.amazon.com/cli/ -Details for each of these tasks follow for the data czar at your institution. For more information about the responsibilities of a data czar, see the edX Data Documentation. \ No newline at end of file