= IMPORTANT WARNING =
This can be a VERY EXPENSIVE MIGRATION which may take hours or
days to run depending on the size of the courseware_studentmodule
table on your site. Depending on your database, it may also lock
this table, causing courseware to be non-functional during that
time.
If you want to run this migration manually in a more controlled
way (separate from your release pipeline), the SQL needed is:
CREATE INDEX `courseware_stats` ON `courseware_studentmodule`
(`module_id`, `grade`, `student_id`);
You can then fake the migration:
https://docs.djangoproject.com/en/2.2/ref/django-admin/#cmdoption-migrate-fake
= Motivation and Background =
TLDR: This adds an index that will speed up reports like the
Problem Grade Report. This fixes a performance regression that
was unintentionally introduced in 25da206c.
I'm capturing the entire saga below, in case Open edX operators
need to dig into it.
The tale begins in November of 2012 (yes, seriously). We had an
inline analytics feature that would display a histogram to course
staff by each problem in the LMS, detailing how students did on
that problem (e.g. 80% got 2 points, 10% got 1 point, 10% got 0
points). The courseware_studentmodule table already had an index
on the module_id (a.k.a. module_state_key), but because there
were 100K+ students that had student state for some problems,
the generation of those histograms was still extremely expensive.
During U.S. Thanksgiving weekend in late November of 2012, that
load started causing operational failures on edx.org.
As an emergency measure, I manually added a composite index for
(module_id, grade, student_id) on courseware_studentmodule in
order to stabilize the courseware on edx.org. I did _not_ follow
up properly and add it in a migration file. Later on, the inline
analytics feature was removed entirely, so the index was considered
redundant (but again, it was not properly cleaned up).
Various reports were created over the years, some of which
relied on having an index for module_id. These ran fine because
there had long been an index for that field specifically.
In 2018, the courseware_studentmodule table for edx.org ran into
the 2 TB size limit that our old RDS instance had. We had a fair
amount of monitoring for various limits that we thought we might
run into, but the per-table limit took us by surprise. The Devops/
SRE person fielding that issue needed to free up space in a hurry
in order to make the courseware functional again. Examining the
database itself, he noticed that we had a module_id index that was
technically redundant because the composite index of (module_id,
grade, student_id) would cover queries that would otherwise use it.
Again, as an emergency measure, he dropped the index on module_id
in order to free up a little space and buy enough time to do a
proper move of the database to Aurora.
Devops-of-2018 being more disciplined than me-of-2012, the index
on module_id was removed in 25da206c. The intention was to make it
so that the state of the code would match what was live on edx.org.
But because the composite index was added in an ad hoc way, what
that really meant was that now queries involving module_id were
_only_ indexed by the (module_id, grade, student_id) composite
index that existed only on edx.org and no other Open edX instances.
We didn't realize this issue until months later. @blarghmatey
created an index to re-add the index for module_id:
https://github.com/edx/edx-platform/pull/20885
The reason why we didn't accept this immediately is because
migrations for this table are very operationally risky and take
days to run. Faking this migration would have put edx.org even
more out of sync with the Open edX repo. Complicating this
somewhat was the fact that some folks still seem to be running a
variant of the inline analytics on their fork.
So in the end, we're going forward with this migration that brings
the code fully into sync with indexes on edx.org and covers the
obscure inline analytics histogram use case, while still covering
the module_id index needed for the fast generation of certain
reports that focus on a single problem.
Sorry folks.
When the extended courseware module history feature is disabled
(ENABLE_CSMH_EXTENDED=false), the coursewarehistoryextended application
cannot be added to INSTALLED_APPS. Otherwise, the
StudentModuleHistoryExtended model is loaded in the project: it contains
signal receivers that automatically save objects to the student history
table. This table does not exist because the CSMH flag is disabled and
there is no student_module_history database.
So the feature flag is disabled and coursewarehistoryextended is not
part of INSTALLED_APPS: this was the default behaviour in Ironwood. To
make sure that this behaviour keeps working, we also need to make sure
that the migrations do not depend on the coursewarehistoryextended app
when the feature flag is disabled.
The VideoBlock `handle_ajax` is allowing NaN values for speed key
and causing videos to not load. Also added a data migration to fix
the data for learners.
PROD-1148
* Fix type mismatches in coursewaqre
* Fix type mismatch in credit migrations
* Fix type mismatch in status migrations
* Fix type mismatch in user_api migrations
* Review Fixes
This commit introduces the changes needed for XBlocks in Blockstore to save
their user state into CSM. Before this commit, all student state for Blockstore
blocks was ephemeral (in-process dict store).
Notes:
* The main risk factor of this PR is that it adds non-course keys to the
course_id field in CSM. If any code (like analytics?) reads course keys
directly out of CSM and doesn't have graceful handling for key types it
doesn't recognize, it could cause an issue. With the included changes to
opaque-keys, calling CourseKey.from_string(...) on these values will raise
InvalidKeyError since they're not CourseKeys. (But calling
LearningContextKey.from_string(...) will work for both course and library
keys.)
* This commit introduces a slight regression for the Studio view of XBlocks in
Blockstore content libraries: their state is now lost from request to request.
I have a follow up PR to give them a proper studio-appropriate state store,
but I want to review it separately so it doesn't hold up this PR and we can
test this PR on its own.
Django 2.0 will make this field required for `ForeignKey` and `OneToOneFields`.
In previous versions the option defaulted to `models.CASCADE` when not
specified. This change should make the deprecation warnings in the current
Django version go away.
The migrations where also modified, but the changes should not cause a change in
the database schema since `models.CASCADE` was already the old default.
The verified seat upgrade deadline for self-paced course runs is now
dependent on when the learner was first able to access the content--the
latest of enrollment date and course run start date.
Original Commit Messages:
use edx's own get_parent method, rather than our own.
add field to unique constraint to avoid MultipleObjectsReturned in case of multiple browser clicks on submit
fix 0011 migration, inherit from TimeStampedField and add composite index (migration only)
fix bug where adding an already registered user to a ccx would cause a crash due to an undefined variable
add assertNumQueries tests to test modules where override field providers are used
remove unnecessary teardown
implement recommended style for checking empty list
import utility methods rather than use duplicate code
added comment explaining date conversion to string for json
add logging for invalid users or emails when enrolling students
add comment about xmodule user state
avoid using get_or_create, which seems to be causing a race condition on schedule change save
relocate badly placed edvent handlers to fix multiple submit problem
individual students, and a reimplementation of the individual due date
feature.
This work introduces an architecture, used with the 'authored_data'
portion of LmsFieldData, which allows arbitrary field overrides to be
made for fields that are part of the course content or settings (Mongo
data). The basic architecture is extensible by means of writing and
configuring arbitrary field override providers.
One concrete implementation of a field override provider is provided
which allows for overrides to be for individual students. This provider
is then used as a basis for reimplementing the individual due date
extensions feature as a proof of concept for the design.
One can imagine writing override providers that provide overrides based
on a student's membership in a cohort or other similar idea. This work
is being done, in fact, to pave the way for the Personal Online Courses
feature being developed by MIT, which will use an override provider very
much long those lines.