This commit introduces the changes needed for XBlocks in Blockstore to save
their user state into CSM. Before this commit, all student state for Blockstore
blocks was ephemeral (in-process dict store).
Notes:
* The main risk factor of this PR is that it adds non-course keys to the
course_id field in CSM. If any code (like analytics?) reads course keys
directly out of CSM and doesn't have graceful handling for key types it
doesn't recognize, it could cause an issue. With the included changes to
opaque-keys, calling CourseKey.from_string(...) on these values will raise
InvalidKeyError since they're not CourseKeys. (But calling
LearningContextKey.from_string(...) will work for both course and library
keys.)
* This commit introduces a slight regression for the Studio view of XBlocks in
Blockstore content libraries: their state is now lost from request to request.
I have a follow up PR to give them a proper studio-appropriate state store,
but I want to review it separately so it doesn't hold up this PR and we can
test this PR on its own.
When crawlers like edX-downloader make requests on courseware, they are
often concurrently loading many units in the same sequence. This causes
contention for the rows in courseware_studentmodule that store the
student's state for various XBlocks/XModules, most notably for the
sequence, chapter, and course -- all of which record and update user
position information when loaded.
It would be nice if we could actually remove these writes altogether
and come up with a cleaner way of keeping track of the user's position.
In general, GETs should be side-effect free. However, any such change
would break backwards compatibility, and would require close
coordination with research teams to make sure they weren't negatively
affected.
This commit identifies crawlers by user agent (CrawlersConfig model),
and blocks student state writes if a crawler is detected. FieldDataCache
writes simply become no-ops. It doesn't actually alter the rendering
of the courseware in any way -- the main impact is that the blocks
won't record your most recent position, which is meaningless for
crawlers anyway.
This can also be used as a building block for other policy we want to
define around crawlers. We just have to be mindful that this only works
with "nice" crawlers who are honest in their user agents, and that
significantly more sophisticated (and costly) measures would be
necessary to prevent crawlers that try to be even trivially sneaky.
[PERF-403]
This required the following changes to the DjangoXBlockUserStateClient
semantics:
1) Changes get/get_many to return XBlockUserState tuples, rather
than state dictionaries or (block_key, state) tuples.
2) Raises DoesNotExist if get_history is called on an XBlock that has
had no data saved to it.
3) Returns XBlockUserState tuples as the results of get_history.
The progress page did a number of things that make performance terrible for
courses with large numbers of problems, particularly if those problems are
customresponse CapaModule problems that need to be executed via codejail.
The grading code takes pains to not instantiate student state and execute the
problem code. If a student has answered the question, the max score is stored
in StudentModule. However, if the student hasn't attempted the question yet, we
have to run the problem code just to call .max_score() on it. This is necessary
in grade() if the student has answered other problems in the assignment (so we
can know what to divide by). This is always necessary to know in
progress_summary() because we list out every problem there. Code execution can
be especially slow if the problems need to invoke codejail.
To address this, we create a MaxScoresCache that will cache the max raw score
possible for every problem. We select the cache keys so that it will
automatically become invalidated when a new version of the course is published.
The fundamental assumption here is that a problem cannot have two different
max score values for two unscored students. A problem *can* score two students
differently such that they have different max scores. So Carlos can have 2/3 on
a problem, while Lyla gets 3/4. But if neither Carlos nor Lyla has ever
interacted with the problem (i.e. they're just seeing it on their progress
page), they must both see 0/4 -- it cannot be the case that Carlos sees 0/3 and
Lyla sees 0/4.
We used to load all student state into two separate FieldDataCache instances,
after which we do a bunch of individual queries for scored items. Part of this
split-up was done because of locking problems, but I think we might have gotten
overzealous with our manual transaction hammer.
In this commit, we consolidate all state access in grade() and progress()
to use one shared FieldDataCache. We also use a filter so that we only pull
back StudentModule state for things that might possibly affect the grade --
items that either have scores or have children.
Because some older XModules do work in their __init__() methods (like Video),
instantiating them takes time, particularly on large courses. This commit also
changes the code that fetches the grading_context to filter out children that
can't possibly affect the grade.
Finally, we introduce a ScoresClient that also tries to fetch score
information all at once, instead of in separate queries. Technically, we are
fetching this information redundantly, but that's because the state and score
interfaces are being teased apart as we move forward. Still, this only
amounts to one extra SQL query, and has very little impact on performance
overall.
Much thanks to @adampalay -- his hackathon work in #7168 formed the basis of
this.
https://openedx.atlassian.net/browse/CSM-17
The only consumer of that functionality (the XQueue callback) already
does retries, so newly introduce integrity errors (due to multiple
commiters trying to update a StudentModule) won't break the XQueue
processing pipeline.