Move user-retirement scripts docs near code (#34695)

* chore: Move user-retirement scripts docs near code

---------

Co-authored-by: Feanil Patel <feanil@axim.org>
This commit is contained in:
Muhammad Farhan Khan
2024-05-07 23:45:04 +05:00
committed by GitHub
parent cf78c27730
commit d3d3225fa6
7 changed files with 558 additions and 3 deletions

View File

@@ -58,6 +58,7 @@ extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.coverage',
'sphinx.ext.doctest',
'sphinx.ext.graphviz',
'sphinx.ext.ifconfig',
'sphinx.ext.intersphinx',
'sphinx.ext.mathjax',

View File

@@ -1,7 +1,7 @@
User Retirement Scripts
=======================
`This <https://github.com/openedx/edx-platform/tree/master/scripts/user_retirement>`_ directory contains python scripts which are migrated from the `tubular <https://github.com/openedx/tubular/tree/master/scripts>`_ respository.
`This <https://github.com/openedx/edx-platform/tree/master/scripts/user_retirement>`_ directory contains python scripts which are migrated from the `tubular <https://github.com/openedx/tubular/tree/master/scripts>`_ respository.
These scripts are intended to drive the user retirement workflow which involves handling the deactivation or removal of user accounts as part of the platform's management process.
These scripts could be called from any automation/CD framework.
@@ -49,9 +49,9 @@ In-depth Documentation and Configuration Steps
For in-depth documentation and essential configurations follow these docs
`Documentation <https://edx.readthedocs.io/projects/edx-installing-configuring-and-running/en/latest/configuration/user_retire/index.html>`_
`Documentation <https://docs.openedx.org/projects/edx-platform/en/latest/references/docs/scripts/user_retirement/docs/index.html>`_
`Configuration Docs <https://edx.readthedocs.io/projects/edx-installing-configuring-and-running/en/latest/configuration/user_retire/driver_setup.html>`_
`Configuration Docs <https://docs.openedx.org/projects/edx-platform/en/latest/references/docs/scripts/user_retirement/docs/driver_setup.html>`_
Execute Script

View File

@@ -0,0 +1,134 @@
.. _driver-setup:
#############################################
Setting Up the User Retirement Driver Scripts
#############################################
`scripts/user_retirement <https://github.com/openedx/edx-platform/tree/master/scripts/user_retirement>`_
is a directory of Python scripts designed to plug into various automation
tooling. It also contains readme file having details of how to run the scripts.
Included in this directory are two scripts intended to drive the user
retirement workflow.
``get_learners_to_retire.py``
Generates a list of users that are ready for immediate retirement. Users
are "ready" after a certain number of days spent in the ``PENDING`` state,
specified by the ``--cool_off_days`` argument. Produces an output intended
for consumption by Jenkins in order to spawn separate downstream builds for
each user.
``retire_one_learner.py``
Retires the user specified by the ``--username`` argument.
These two scripts share a required ``--config_file`` argument, which specifies
the driver configuration file for your environment (for example, production).
This configuration file is a YAML file that contains LMS auth secrets, API URLs,
and retirement pipeline stages specific to that environment. Here is an example
of a driver configuration file.
.. code-block:: yaml
client_id: <client ID for the retirement service user>
client_secret: <client secret for the retirement service user>
base_urls:
lms: https://courses.example.com/
ecommerce: https://ecommerce.example.com/
credentials: https://credentials.example.com/
retirement_pipeline:
- ['RETIRING_EMAIL_LISTS', 'EMAIL_LISTS_COMPLETE', 'LMS', 'retirement_retire_mailings']
- ['RETIRING_ENROLLMENTS', 'ENROLLMENTS_COMPLETE', 'LMS', 'retirement_unenroll']
- ['RETIRING_LMS_MISC', 'LMS_MISC_COMPLETE', 'LMS', 'retirement_lms_retire_misc']
- ['RETIRING_LMS', 'LMS_COMPLETE', 'LMS', 'retirement_lms_retire']
The ``client_id`` and ``client_secret`` keys contain the oauth credentials.
These credentials are simply copied from the output of the
``create_dot_application`` management command described in
:ref:`retirement-service-user`.
The ``base_urls`` section in the configuration file defines the mappings of
IDA to base URLs used by the scripts to construct API URLs. Only the LMS is
mandatory here, but if any of your pipeline states contain API calls to other
services, those services must also be present in the ``base_urls`` section.
The ``retirement_pipeline`` section defines the steps, state names, and order
of execution for each environment. Each item is a list in the form of:
#. Start state name
#. End state name
#. IDA to call against (LMS, ECOMMERCE, or CREDENTIALS currently)
#. Method name to call in
`edx_api.py <https://github.com/openedx/edx-platform/blob/master/scripts/user_retirement/utils/edx_api.py>`_
For example: ``['RETIRING_CREDENTIALS', 'CREDENTIALS_COMPLETE', 'CREDENTIALS',
'retire_learner']`` will set the user's state to ``RETIRING_CREDENTIALS``, call
a pre-instantiated ``retire_learner`` method in the ``CredentialsApi``, then set
the user's state to ``CREDENTIALS_COMPLETE``.
********
Examples
********
The following are some examples of how to use the driver scripts.
==================
Set Up Environment
==================
Follow this `readme <https://github.com/openedx/edx-platform/tree/master/scripts/user_retirement#readme>`_ to set up your execution environment.
=========================
List of Targeted Learners
=========================
Generate a list of learners that are ready for retirement (those learners who
have selected and confirmed account deletion and have been in the ``PENDING``
state for the time specified ``cool_off_days``).
.. code-block:: bash
mkdir learners_to_retire
get_learners_to_retire.py \
--config_file=path/to/config.yml \
--output_dir=learners_to_retire \
--cool_off_days=5
=====================
Run Retirement Script
=====================
After running these commands, the ``learners_to_retire`` directory contains
several INI files, each containing a single line in the form of ``USERNAME
=<username-of-learner>``. Iterate over these files while executing the
``retire_one_learner.py`` script on each learner with a command like the following.
.. code-block:: bash
retire_one_learner.py \
--config_file=path/to/config.yml \
--username=<username-of-learner-to-retire>
**************************************************
Using the Driver Scripts in an Automated Framework
**************************************************
At edX, we call the user retirement scripts from
`Jenkins <https://jenkins.io/>`_ jobs on one of our internal Jenkins
services. The user retirement driver scripts are intended to be agnostic
about which automation framework you use, but they were only fully tested
from Jenkins.
For more information about how we execute these scripts at edX, see the
following wiki articles:
* `User Retirement Jenkins Implementation <https://openedx.atlassian.net/wiki/spaces/PLAT/pages/704872737/User+Retirement+Jenkins+Implementation>`_
* `How to: retirement Jenkins jobs development and testing <https://openedx.atlassian.net/wiki/spaces/PLAT/pages/698221444/How+to+retirement+Jenkins+jobs+development+and+testing>`_
And check out the Groovy DSL files we use to seed these jobs:
* `platform/jobs/RetirementJobs.groovy in edx/jenkins-job-dsl <https://github.com/edx/jenkins-job-dsl/blob/master/platform/jobs/RetirementJobs.groovy>`_
* `platform/jobs/RetirementJobEdxTriggers.groovy in edx/jenkins-job-dsl <https://github.com/edx/jenkins-job-dsl/blob/master/platform/jobs/RetirementJobEdxTriggers.groovy>`_
.. include:: ../../../../links/links.rst

View File

@@ -0,0 +1,117 @@
.. _Implmentation:
#######################
Implementation Overview
#######################
In the Open edX platform, the user experience is enabled by several
services, such as LMS, Studio, ecommerce, credentials, discovery, and more.
Personally Identifiable Identification (PII) about a user can exist in many of
these services. As a consequence, to remove a user's PII, you must be able
to request each service containing PII to remove, delete, or unlink the
data for that user in that service.
In the user retirement feature, a centralized process (the *driver* scripts)
orchestrates all of these requests. For information about how to configure the
driver scripts, see :ref:`driver-setup`.
****************************
The User Retirement Workflow
****************************
The user retirement workflow is a configurable pipeline of building-block
APIs. These APIs are used to:
* "Forget" a retired user's PII
* Prevent a retired user from logging back in
* Prevent re-use of the username or email address of a retired user
Depending on which third parties a given Open edX instance integrates with,
the user retirement process may need to call out to external services or to
generate reports for later processing. Any such reports must subsequently be
destroyed.
Configurability and adaptability were design goals from the beginning, so this
user retirement tooling should be able to accommodate a wide range of Open edX
sites and custom use cases.
The workflow is designed to be linear and rerunnable, allowing recovery and
continuation in cases where a particular stage fails. Each user who has
requested retirement will be individually processed through this workflow, so
multiple users could be in the same state simultaneously. The LMS is the
authoritative source of information about the state of each user in the
retirement process, and the arbiter of state progressions, using the
``UserRetirementStatus`` model and associated APIs. The LMS also holds a
table of the states themselves (the ``RetirementState`` model), rather than
hard-coding the states. This was done because we cannot predict all the
possible states required by all members of the Open edX community.
This example state diagram outlines the pathways users follow throughout the
workflow:
.. digraph:: retirement_states_example
:align: center
ranksep = "0.3";
node[fontname=Courier,fontsize=12,shape=box,group=main]
{ rank = same INIT[style=invis] PENDING }
INIT -> PENDING;
"..."[shape=none]
PENDING -> RETIRING_ENROLLMENTS -> ENROLLMENTS_COMPLETE -> RETIRING_FORUMS -> FORUMS_COMPLETE -> "..." -> COMPLETE;
node[group=""];
RETIRING_ENROLLMENTS -> ERRORED;
RETIRING_FORUMS -> ERRORED;
PENDING -> ABORTED;
subgraph cluster_terminal_states {
label = "Terminal States";
labelloc = b // put label at bottom
{rank = same ERRORED COMPLETE ABORTED}
}
Unless an error occurs internal to the user retirement tooling, a user's
retirement state should always land in one of the terminal states. At that
point, either their entry should be cleaned up from the
``UserRetirementStatus`` table or, if the state is ``ERRORED``, the
administrator needs to examine the error and resolve it. For more information,
see :ref:`recovering-from-errored`.
*******************
The User Experience
*******************
From the learner's perspective, the vast majority of this process is obscured.
The Account page contains a new section titled **Delete My Account**. In this
section, a learner may click the **Delete My Account** button and enter
their password to confirm their request. Subsequently, all of the learner's
browser sessions are logged off, and they become locked out of their account.
An informational email is immediately sent to the learner to confirm the
deletion of their account. After this email is sent, the learner has a limited
amount of time (defined by the ``--cool_off_days`` argument described in
:ref:`driver-setup`) to contact the site administrators and rescind their
request.
At this point, the learner's account has been deactivated, but *not* retired.
An entry in the ``UserRetirementStatus`` table is added, and their state set to
``PENDING``.
By default, the **Delete My Account** section is visible and the button is
enabled, allowing account deletions to queue up. The
``ENABLE_ACCOUNT_DELETION`` feature in django settings toggles the visibility
of this section. See :ref:`django-settings`.
================
Third Party Auth
================
Learners who registered using social authentication must first unlink their
LMS account from their third-party account. For those learners, the **Delete
My Account** button will be disabled until they do so; meanwhile, they will be
instructed to follow the procedure in this help center article: `How do I link
or unlink my edX account to a social media
account? <https://support.edx.org/hc/en-us/articles/207206067>`_.
.. include:: ../../../../links/links.rst

View File

@@ -0,0 +1,38 @@
.. _Enabling User Retirement:
####################################
Enabling the User Retirement Feature
####################################
There have been many changes to privacy laws (for example, GDPR or the
European Union General Data Protection Regulation) intended to change the way
that businesses think about and handle Personally Identifiable Information
(PII).
As a step toward enabling Open edX to support some of the key updates in privacy
laws, edX has implemented APIs and tooling that enable Open edX instances to
retire registered users. When you implement this user retirement feature, your
Open edX instance can automatically erase PII for a given user from systems that
are internal to Open edX (for example, the LMS, forums, credentials, and other
independently deployable applications (IDAs)), as well as external systems, such
as third-party marketing services.
This section is intended not only for instructing Open edX admins to perform
the basic setup, but also to offer some insight into the implementation of the
user retirement feature in order to help the Open edX community build
additional APIs and states that meet their special needs. Custom code,
plugins, packages, or XBlocks in your Open edX instance might store PII, but
this feature will not magically find and clean up that PII. You may need to
create your own custom code to include PII that is not covered by the user
retirement feature.
.. toctree::
:maxdepth: 1
implementation_overview
service_setup
driver_setup
special_cases
.. include:: ../../../../links/links.rst

View File

@@ -0,0 +1,179 @@
.. _Service Setup:
#####################################
Setting Up User Retirement in the LMS
#####################################
This section describes how to set up and configure the user retirement feature
in the Open edX LMS.
.. _django-settings:
***************
Django Settings
***************
The following Django settings control the behavior of the user retirement
feature. Note that some of these settings values are lambda functions rather
than standard string literals. This is intentional; it is a pattern for
defining *derived* settings specific to Open edX. Read more about it in
`openedx/core/lib/derived.py
<https://github.com/openedx/edx-platform/blob/fdc50c3/openedx/core/lib/derived.py>`_.
.. list-table::
:header-rows: 1
* - Setting Name
- Default
- Description
* - RETIRED_USERNAME_PREFIX
- ``'retired__user_'``
- The prefix part of hashed usernames. Used in ``RETIRED_USERNAME_FMT``.
* - RETIRED_EMAIL_PREFIX
- ``'retired__user_'``
- The prefix part of hashed emails. Used in ``RETIRED_EMAIL_FMT``.
* - RETIRED_EMAIL_DOMAIN
- ``'retired.invalid'``
- The domain part of hashed emails. Used in ``RETIRED_EMAIL_FMT``.
* - RETIRED_USERNAME_FMT
- ``lambda settings:
settings.RETIRED_USERNAME_PREFIX + '{}'``
- The username field for a retired user gets transformed into this format,
where ``{}`` is replaced with the hash of their username.
* - RETIRED_EMAIL_FMT
- ``lambda settings:
settings.RETIRED_EMAIL_PREFIX + '{}@' +
settings.RETIRED_EMAIL_DOMAIN``
- The email field for a retired user gets transformed into this format, where
``{}`` is replaced with the hash of their email.
* - RETIRED_USER_SALTS
- None
- A list of salts used for hashing usernames and emails. Only the last item in this list is used as a salt for all new retirements, but historical salts are preserved in order to guarantee that all hashed usernames and emails can still be checked. The default value **MUST** be overridden!
* - RETIREMENT_SERVICE_WORKER_USERNAME
- ``'RETIREMENT_SERVICE_USER'``
- The username of the retirement service worker.
* - RETIREMENT_STATES
- See `lms/envs/common.py <https://github.com/openedx/edx-platform/blob/fe82954/lms/envs/common.py#L3421-L3449>`_
in the ``RETIREMENT_STATES`` setting
- A list that defines the name and order of states for the retirement
workflow. See `Retirement States`_ for details.
* - FEATURES['ENABLE_ACCOUNT_DELETION']
- True
- Whether to display the "Delete My Account" section the account settings page.
=================
Retirement States
=================
The state of each user's retirement is stored in the LMS database, and the
state list itself is also separately stored in the database. We expect the
list of states will be variable over time and across different Open edX
installations, so it is the responsibility of the administrator to populate
the state list.
The default states are defined in `lms/envs/common.py
<https://github.com/openedx/edx-platform/blob/fe82954/lms/envs/common.py#L3421-L3449>`_
in the ``RETIREMENT_STATES`` setting. There must be, at minimum, a ``PENDING``
state at the beginning, and ``COMPLETED``, ``ERRORED``, and ``ABORTED`` states
at the end of the list. Also, for every ``RETIRING_foo`` state, there must be
a corresponding ``foo_COMPLETE`` state.
Override these states if you need to add any states. Typically, these
settings are set in ``lms.yml``.
After you have defined any custom states, populate the states table with the
following management command:
.. code-block:: bash
$ ./manage.py lms --settings=<your-settings> populate_retirement_states
All states removed and new states added. Differences:
Added: set([u'RETIRING_ENROLLMENTS', u'RETIRING_LMS', u'LMS_MISC_COMPLETE', u'RETIRING_LMS_MISC', u'ENROLLMENTS_COMPLETE', u'LMS_COMPLETE'])
Removed: set([])
Remaining: set([u'ERRORED', u'PENDING', u'ABORTED', u'COMPLETE'])
States updated successfully. Current states:
PENDING (step 1)
RETIRING_ENROLLMENTS (step 11)
ENROLLMENTS_COMPLETE (step 21)
RETIRING_LMS_MISC (step 31)
LMS_MISC_COMPLETE (step 41)
RETIRING_LMS (step 51)
LMS_COMPLETE (step 61)
ERRORED (step 71)
ABORTED (step 81)
COMPLETE (step 91)
In this example, some states specified in settings were already present, so
they were listed under ``Remaining`` and were not re-added. The command output
also prints the ``Current states``; this represents all the states in the
states table. The ``populate_retirement_states`` command is idempotent, and
always attempts to make the states table reflect the ``RETIREMENT_STATES``
list in settings.
.. _retirement-service-user:
***********************
Retirement Service User
***********************
The user retirement driver scripts authenticate with the LMS and IDAs as the
retirement service user with oauth client credentials. Therefore, to use the
driver scripts, you must create a retirement service user, and generate a DOT
application and client credentials, as in the following command.
.. code-block:: bash
app_name=retirement
user_name=retirement_service_worker
./manage.py lms --settings=<your-settings> manage_user $user_name $user_name@example.com --staff --superuser
./manage.py lms --settings=<your-settings> create_dot_application $app_name $user_name
.. note::
The client credentials (client ID and client secret) will be printed to the
terminal, so take this opportunity to copy them for future reference. You
will use these credentials to configure the driver scripts. For more
information, see :ref:`driver-setup`.
The retirement service user needs permission to perform retirement tasks, and
that is done by specifying the ``RETIREMENT_SERVICE_WORKER_USERNAME`` variable
in Django settings:
.. code-block:: python
RETIREMENT_SERVICE_WORKER_USERNAME = 'retirement_service_worker'
************
Django Admin
************
The Django admin interface contains the following models under ``USER_API``
that relate to user retirement.
.. list-table::
:widths: 15 30 55
:header-rows: 1
* - Name
- URI
- Description
* - Retirement States
- ``/admin/user_api/retirementstate/``
- Represents the table of states defined in ``RETIREMENT_STATES`` and
populated with ``populate_retirement_states``.
* - User Retirement Requests
- ``/admin/user_api/userretirementrequest/``
- Represents the table that tracks the user IDs of every learner who
has ever requested account deletion. This table is primarily used for
internal bookkeeping, and normally isn't useful for administrators.
* - User Retirement Statuses
- ``/admin/user_api/userretirementstatus/``
- Model for managing the retirement state for each individual learner.
In special cases where you may need to manually intervene with the pipeline,
you can use the User Retirement Statuses management page to change the
state for an individual user. For more information about how to handle these
cases, see :ref:`handling-special-cases`.
.. include:: ../../../../links/links.rst

View File

@@ -0,0 +1,86 @@
.. _handling-special-cases:
######################
Handling Special Cases
######################
.. _recovering-from-errored:
Recovering from ERRORED
***********************
If a retirement API indicates failure (4xx or 5xx status code), the driver
immediately sets the user's state to ``ERRORED``. To debug this error state,
check the ``responses`` field in the user's row in
``user_api_userretirementstatus`` (User Retirement Status) for any relevant
logging. Once the issue is resolved, you need to manually set the user's
``current_state`` to the state immediately prior to the state which should be
re-tried. You can do this using the Django admin. In this example, a user
retirement errored during forums retirement, so we manually reset their state
from ``ERRORED`` to ``ENROLLMENTS_COMPLETE``.
.. digraph:: retirement_states_example
:align: center
//rankdir=LR; // Rank Direction Left to Right
ranksep = "0.3";
edge[color=grey]
node[fontname=Courier,fontsize=12,shape=box,group=main]
{ rank = same INIT[style=invis] PENDING }
{
edge[style=bold,color=black]
INIT -> PENDING;
"..."[shape=none]
PENDING -> RETIRING_ENROLLMENTS -> ENROLLMENTS_COMPLETE -> RETIRING_FORUMS;
}
RETIRING_FORUMS -> FORUMS_COMPLETE -> "..." -> COMPLETE
node[group=""];
RETIRING_ENROLLMENTS -> ERRORED;
RETIRING_FORUMS -> ERRORED[style=bold,color=black];
PENDING -> ABORTED;
subgraph cluster_terminal_states {
label = "Terminal States";
labelloc = b // put label at bottom
{rank = same ERRORED COMPLETE ABORTED}
}
ERRORED -> ENROLLMENTS_COMPLETE[style="bold,dashed",color=black,label=" via django\nadmin"]
Now, the user retirement driver scripts will automatically resume this user's
retirement the next time they are executed.
Rerunning some or all states
*****************************
If you decide you want to rerun all retirements from the beginning, set
``current_state`` to ``PENDING`` for all retirements with ``current_state`` ==
``COMPLETE``. This would be useful in the case where a new stage in the user
retirement workflow is added after running all retirements (but before the
retirement queue is cleaned up), and you want to run all the retirements
through the new stage. Or, perhaps you were developing a stage/API that
didn't work correctly but still indicated success, so the pipeline progressed
all users into ``COMPLETED``. Retirement APIs are designed to be idempotent,
so this should be a no-op for stages already run for a given user.
Cancelling a retirement
***********************
Users who have recently requested account deletion but are still in the
``PENDING`` retirement state may request to rescind their account deletion by
emailing or otherwise contacting the administrators directly. edx-platform
offers a Django management command that administrators can invoke manually to
cancel a retirement, given the user's email address. It restores a given
user's login capabilities and removes them from all retirement queues. The
syntax is as follows:
.. code-block:: bash
$ ./manage.py lms --settings=<your-settings> cancel_user_retirement_request <email-of-user-to-cancel-retirement>
Keep in mind, this will only work for users which have not had their retirement
states advance beyond ``PENDING``. Additionally, the user will need to reset
their password in order to restore access to their account.