David Ormsbee 4c3fd76904 fix: add index to courseware_studentmodule for report performance.
= IMPORTANT WARNING =

This can be a VERY EXPENSIVE MIGRATION which may take hours or
days to run depending on the size of the courseware_studentmodule
table on your site. Depending on your database, it may also lock
this table, causing courseware to be non-functional during that
time.

If you want to run this migration manually in a more controlled
way (separate from your release pipeline), the SQL needed is:

  CREATE INDEX `courseware_stats` ON `courseware_studentmodule`
     (`module_id`, `grade`, `student_id`);

You can then fake the migration:
  https://docs.djangoproject.com/en/2.2/ref/django-admin/#cmdoption-migrate-fake

= Motivation and Background =

TLDR: This adds an index that will speed up reports like the
Problem Grade Report. This fixes a performance regression that
was unintentionally introduced in 25da206c.

I'm capturing the entire saga below, in case Open edX operators
need to dig into it.

The tale begins in November of 2012 (yes, seriously). We had an
inline analytics feature that would display a histogram to course
staff by each problem in the LMS, detailing how students did on
that problem (e.g. 80% got 2 points, 10% got 1 point, 10% got 0
points). The courseware_studentmodule table already had an index
on the module_id (a.k.a. module_state_key), but because there
were 100K+ students that had student state for some problems,
the generation of those histograms was still extremely expensive.
During U.S. Thanksgiving weekend in late November of 2012, that
load started causing operational failures on edx.org.

As an emergency measure, I manually added a composite index for
(module_id, grade, student_id) on courseware_studentmodule in
order to stabilize the courseware on edx.org. I did _not_ follow
up properly and add it in a migration file. Later on, the inline
analytics feature was removed entirely, so the index was considered
redundant (but again, it was not properly cleaned up).

Various reports were created over the years, some of which
relied on having an index for module_id. These ran fine because
there had long been an index for that field specifically.

In 2018, the courseware_studentmodule table for edx.org ran into
the 2 TB size limit that our old RDS instance had. We had a fair
amount of monitoring for various limits that we thought we might
run into, but the per-table limit took us by surprise. The Devops/
SRE person fielding that issue needed to free up space in a hurry
in order to make the courseware functional again. Examining the
database itself, he noticed that we had a module_id index that was
technically redundant because the composite index of (module_id,
grade, student_id) would cover queries that would otherwise use it.
Again, as an emergency measure, he dropped the index on module_id
in order to free up a little space and buy enough time to do a
proper move of the database to Aurora.

Devops-of-2018 being more disciplined than me-of-2012, the index
on module_id was removed in 25da206c. The intention was to make it
so that the state of the code would match what was live on edx.org.
But because the composite index was added in an ad hoc way, what
that really meant was that now queries involving module_id were
_only_ indexed by the (module_id, grade, student_id) composite
index that existed only on edx.org and no other Open edX instances.

We didn't realize this issue until months later. @blarghmatey
created an index to re-add the index for module_id:

  https://github.com/edx/edx-platform/pull/20885

The reason why we didn't accept this immediately is because
migrations for this table are very operationally risky and take
days to run. Faking this migration would have put edx.org even
more out of sync with the Open edX repo. Complicating this
somewhat was the fact that some folks still seem to be running a
variant of the inline analytics on their fork.

So in the end, we're going forward with this migration that brings
the code fully into sync with indexes on edx.org and covers the
obscure inline analytics histogram use case, while still covering
the module_id index needed for the fast generation of certain
reports that focus on a single problem.

Sorry folks.
2021-03-09 17:28:12 -05:00
2021-03-10 00:47:49 +05:00
2021-03-10 00:47:49 +05:00
2018-07-18 00:37:25 +05:30
2018-04-13 14:10:40 -04:00
2017-03-20 15:46:39 -04:00
2016-09-27 17:30:35 -04:00
2021-02-10 16:40:06 +05:00
2020-10-29 08:22:23 +00:00
2019-12-30 10:35:30 -05:00
2021-01-06 11:39:25 +05:00
2021-01-06 11:39:25 +05:00
2021-02-10 16:40:06 +05:00
2021-02-01 16:25:19 +05:00

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This is the core repository of the Open edX software. It includes the LMS
(student-facing, delivering courseware), and Studio (course authoring)
components.

Installation
------------

Installing and running an Open edX instance is not simple.  We strongly
recommend that you use a service provider to run the software for you.  They
have free trials that make it easy to get started:
https://openedx.org/get-started/

If you will be modifying edx-platform code, the `Open edX Developer Stack`_ is
a Docker-based development environment.

If you want to run your own Open edX server and have the technical skills to do
so, `Open edX Installation Options`_ explains your options.

.. _Open edX Developer Stack: https://github.com/edx/devstack
.. _Open edX Installation Options:  https://openedx.atlassian.net/wiki/spaces/OpenOPS/pages/60227779/Open+edX+Installation+Options

License
-------

The code in this repository is licensed under version 3 of the AGPL
unless otherwise noted. Please see the `LICENSE`_ file for details.

.. _LICENSE: https://github.com/edx/edx-platform/blob/master/LICENSE


More about Open edX
-------------------

See the `Open edX site`_ to learn more about the Open edX world. You can find
information about hosting, extending, and contributing to Open edX software. In
addition, the Open edX site provides product announcements, the Open edX blog,
and other rich community resources.

.. _Open edX site: https://openedx.org

Documentation
-------------

Documentation can be found at https://docs.edx.org.


Getting Help
------------

If you're having trouble, we have discussion forums at
https://discuss.openedx.org where you can connect with others in the community.

Our real-time conversations are on Slack. You can request a `Slack
invitation`_, then join our `community Slack team`_.

For more information about these options, see the `Getting Help`_ page.

.. _Slack invitation: https://openedx-slack-invite.herokuapp.com/
.. _community Slack team: http://openedx.slack.com/
.. _Getting Help: https://openedx.org/getting-help


Issue Tracker
-------------

We use JIRA for our issue tracker, not GitHub issues. You can search
`previously reported issues`_.  If you need to report a problem,
please make a free account on our JIRA and `create a new issue`_.

.. _previously reported issues: https://openedx.atlassian.net/projects/CRI/issues
.. _create a new issue: https://openedx.atlassian.net/secure/CreateIssue.jspa?issuetype=1&pid=11900


How to Contribute
-----------------

Contributions are welcome! The first step is to submit a signed
`individual contributor agreement`_.  See our `CONTRIBUTING`_ file for more
information  it also contains guidelines for how to maintain high code
quality, which will make your contribution more likely to be accepted.


Reporting Security Issues
-------------------------

Please do not report security issues in public. Please email
security@edx.org.

.. _individual contributor agreement: https://openedx.org/cla
.. _CONTRIBUTING: https://github.com/edx/edx-platform/blob/master/CONTRIBUTING.rst
Description
No description provided
Readme AGPL-3.0 2.2 GiB
Languages
Python 73.7%
JavaScript 15.4%
HTML 7.1%
SCSS 3.2%
CSS 0.5%