PII Annotations are very out of date, this commit adds most that were
missing in edx-platform, and some additional annotations to the
safelist. It is not comprehensive, several other upstream Open edX
packages also need to be updated. It also does not include removing
annotations that have been moved upstream, or been removed entirely.
Those are separate follow-on tasks.
When a student submits a problem answer, the state is stored in a
StudentModule record containing answer, score, correctness, etc. The
record, though, is updated in multiple steps within the single request
(first the grade is updated, then the state is updated separately).
Each partial save would trigger a separate StudentModuleHistory record
to be stored resulting in duplicate and inaccurate historical records.
This solution uses the RequestCache to track within a request thread
which StudentModules are updated and a single corresponding
StudentModuleHistory id. If multiple update actions occur within the
request cycle, then modify the history record that was already
generated to ensure that each submission only results in one
StudentModuleHistory record.
This issue and its solution were discussed in:
https://discuss.openedx.org/t/extra-history-record-stored-on-each-problem-submission/8081
Clearing the RequestCache was intended to address memory leaks in the
celery workers. Celery worker processes will process many tasks before
they are terminated. RequestCache cleanup typically happens in the
RequestCacheMiddleware class, and middleware never executes for celery.
To get around that issue, the CLEAR_REQUEST_CACHE_ON_TASK_COMPLETION
setting was created to clear the RequestCache after every task was
successfully completed.
This works fine when celery is running as a separate process, as it's
set up to do in production. But during development, the
CELERY_ALWAYS_EAGER setting variable is set to True, meaning that
celery tasks are run in the same thread as the Django Request. This is
meant to make debugging easier, as task failures run as part of the
request cycle and will raise exceptions that are visible to the
browser.
However, celery tasks are triggered from many different actions. That
means that the RequestCache was being cleared many times during the
course of processing a request. This led to behavior that was
potentially slower, but also incorrect–the RequestCache was getting
flushed in a way that wouldn't happen in any deployed environment
because celery would be running in separate processes there. This came
up when trying to fix an issue around extra history records being
created during problem submissions:
https://discuss.openedx.org/t/extra-history-record-stored-on-each-problem-submission/8081
Furthermore, it's not necessary to prevent RequestCache memory leaks
when running in CELERY_AWLAYS_EAGER mode in development because the
middleware cleanup happens automatically–as everything is running as
part of the request/response cycle.
There are times in which we may want to run celery eagerly and still
clear the cache, such as testing. I have set
CLEAR_REQUEST_CACHE_ON_TASK_COMPLETION = False in all dev and test
environments that already have CELERY_ALWAYS_EAGER = True. The unit
test that specifically tests whether the request cache is getting
cleared upon completion of a celery task then overrides
CLEAR_REQUEST_CACHE_ON_TASK_COMPLETION = True even though
CELERY_ALWAYS_EAGER = True for the sake of that specific testing
purpose.
= IMPORTANT WARNING =
This can be a VERY EXPENSIVE MIGRATION which may take hours or
days to run depending on the size of the courseware_studentmodule
table on your site. Depending on your database, it may also lock
this table, causing courseware to be non-functional during that
time.
If you want to run this migration manually in a more controlled
way (separate from your release pipeline), the SQL needed is:
CREATE INDEX `courseware_stats` ON `courseware_studentmodule`
(`module_id`, `grade`, `student_id`);
You can then fake the migration:
https://docs.djangoproject.com/en/2.2/ref/django-admin/#cmdoption-migrate-fake
= Motivation and Background =
TLDR: This adds an index that will speed up reports like the
Problem Grade Report. This fixes a performance regression that
was unintentionally introduced in 25da206c.
I'm capturing the entire saga below, in case Open edX operators
need to dig into it.
The tale begins in November of 2012 (yes, seriously). We had an
inline analytics feature that would display a histogram to course
staff by each problem in the LMS, detailing how students did on
that problem (e.g. 80% got 2 points, 10% got 1 point, 10% got 0
points). The courseware_studentmodule table already had an index
on the module_id (a.k.a. module_state_key), but because there
were 100K+ students that had student state for some problems,
the generation of those histograms was still extremely expensive.
During U.S. Thanksgiving weekend in late November of 2012, that
load started causing operational failures on edx.org.
As an emergency measure, I manually added a composite index for
(module_id, grade, student_id) on courseware_studentmodule in
order to stabilize the courseware on edx.org. I did _not_ follow
up properly and add it in a migration file. Later on, the inline
analytics feature was removed entirely, so the index was considered
redundant (but again, it was not properly cleaned up).
Various reports were created over the years, some of which
relied on having an index for module_id. These ran fine because
there had long been an index for that field specifically.
In 2018, the courseware_studentmodule table for edx.org ran into
the 2 TB size limit that our old RDS instance had. We had a fair
amount of monitoring for various limits that we thought we might
run into, but the per-table limit took us by surprise. The Devops/
SRE person fielding that issue needed to free up space in a hurry
in order to make the courseware functional again. Examining the
database itself, he noticed that we had a module_id index that was
technically redundant because the composite index of (module_id,
grade, student_id) would cover queries that would otherwise use it.
Again, as an emergency measure, he dropped the index on module_id
in order to free up a little space and buy enough time to do a
proper move of the database to Aurora.
Devops-of-2018 being more disciplined than me-of-2012, the index
on module_id was removed in 25da206c. The intention was to make it
so that the state of the code would match what was live on edx.org.
But because the composite index was added in an ad hoc way, what
that really meant was that now queries involving module_id were
_only_ indexed by the (module_id, grade, student_id) composite
index that existed only on edx.org and no other Open edX instances.
We didn't realize this issue until months later. @blarghmatey
created an index to re-add the index for module_id:
https://github.com/edx/edx-platform/pull/20885
The reason why we didn't accept this immediately is because
migrations for this table are very operationally risky and take
days to run. Faking this migration would have put edx.org even
more out of sync with the Open edX repo. Complicating this
somewhat was the fact that some folks still seem to be running a
variant of the inline analytics on their fork.
So in the end, we're going forward with this migration that brings
the code fully into sync with indexes on edx.org and covers the
obscure inline analytics histogram use case, while still covering
the module_id index needed for the fast generation of certain
reports that focus on a single problem.
Sorry folks.
* Fix type mismatches in coursewaqre
* Fix type mismatch in credit migrations
* Fix type mismatch in status migrations
* Fix type mismatch in user_api migrations
* Review Fixes
This commit introduces the changes needed for XBlocks in Blockstore to save
their user state into CSM. Before this commit, all student state for Blockstore
blocks was ephemeral (in-process dict store).
Notes:
* The main risk factor of this PR is that it adds non-course keys to the
course_id field in CSM. If any code (like analytics?) reads course keys
directly out of CSM and doesn't have graceful handling for key types it
doesn't recognize, it could cause an issue. With the included changes to
opaque-keys, calling CourseKey.from_string(...) on these values will raise
InvalidKeyError since they're not CourseKeys. (But calling
LearningContextKey.from_string(...) will work for both course and library
keys.)
* This commit introduces a slight regression for the Studio view of XBlocks in
Blockstore content libraries: their state is now lost from request to request.
I have a follow up PR to give them a proper studio-appropriate state store,
but I want to review it separately so it doesn't hold up this PR and we can
test this PR on its own.
This PR is based on #19284 and is part of the
series of work related to the proposal #18134.
This PR avoids the assignment of
anonymous/unenrolled users to any cohort when
course is public. Anonymous or unenrolled users
will only see content that does not have a
content group assigned.
The "View Course" link to the course outline
is shown on the course about page for a course
marked public/public outline.
It also makes course handouts available for
public courses (not for public_outline).
This PR also hides the different warnings and
messages asking the user to sign-in and enroll
in the course, when the course is marked public.
It modifies the default public_view text to
include the component display_name when
unenrolled access is not available.
Django 2.0 will make this field required for `ForeignKey` and `OneToOneFields`.
In previous versions the option defaulted to `models.CASCADE` when not
specified. This change should make the deprecation warnings in the current
Django version go away.
The migrations where also modified, but the changes should not cause a change in
the database schema since `models.CASCADE` was already the old default.
If a learner has not accessed/attempted the score override functionality
didn't work. This PR is intended to fix this behaviour and the
override should work regardless of learner has accessed or attempted
a problem or not.
The verified seat upgrade deadline for self-paced course runs is now
dependent on when the learner was first able to access the content--the
latest of enrollment date and course run start date.
This abstract class contains most of the fields (aside from the id and
foreign key to StudentModule that the subclasses need to manage). It
also provides a get_history method that abstracts searching across
multiple backends.
Move router code to openedx/core
We need to use it from cms and lms.
Ensure aws_migrate can be used for migrating both the lms and cms.
Handle queries directed to student_module_history vs default and the
extra queries generated by Django 1.8 (SAVEPOINTS, etc).
Additionally, flag testing classes as multi_db so that Django will
flush the non-default database between unit tests.
Further decouple the foreignkey relation between csm and csmhe
When calling StudentModule().delete() Django will try to delete CSMHE
objects, but naively does so in the database, not by consulting the
database router.
Instead, we disable django cascading deletes and listen for post_delete
signals and clean up CSMHE by hand.
Add feature flags for CSMHE
One to turn it on/off so we can control the deploy.
The other will control whether or not we read from two database tables
or one when searching.
Update tests to explicitly use this get_history method rather than
looking directly into StudentModuleHistory or
StudentModuleHistoryExtended.
Inform lettuce to avoid the coursewarehistoryextended app
Otherwise it fails when it can't find features/ in that app.
Add Pg support, this is not tested automatically.
This is a clone (copy) of CSMH's declaration and methods with an added
id of UnsignedBigInAutoField
We should be able to delete the save_history code, but needs testing.
Add error logging when capa failures happen
Put StudentModuleHistory into its own database
Bump out the primary key on CSMHE
This gives us a gap to backfill as needed.
Since the new table's pk is an unsigned bigint, even for people who don't
consolidate CSMH into CSMHE, the lost rows are unlikely to matter.
Remove StudentModuleHistory cleaner