The Course Info Blocks API endpoint has been known to be rather slow
to return the response. Previous investigation showed that the major
time sink was the get_course_blocks function, which is called three
times in a single request. This commit aims to improve the response
times by reducing the number of times that this function is called.
Solution Summary
The first time the function get_course_blocks is called, the result
(transformed course blocks) is stored in the current WSGI request
object. Later in the same request, before the second get_course_blocks
call is triggered, the already transformed course blocks are taken
from the request object, and if they are available, get_course_blocks
is not called (if not, it is called as a fallback). Later in the
request, the function is called again as before (see Optimization
Strategy and Difficulties).
Optimization Strategy and Difficulties
The original idea was to fetch and transform the course blocks once
and reuse them in all three cases, which would reduce get_course_blocks
call count to 1. However, this did not turn out to be a viable solution
because of the arguments passed to get_course_blocks. Notably, the
allow_start_dates_in_future boolean flag affects the behavior of
StartDateTransformer, which is a filtering transformer modifying the
block structure returned.
The first two times allow_start_dates_in_future is False, the third
time it is True. Setting it to True in all three cases would mean that
some blocks would be incorrectly included in the response.
This left us with one option - optimize the first two calls. The
difference between the first two calls is the non-filtering
transformers, however the second call applies a subset of transformers
from the first call, so it was safe to apply the superset of
transformers in both cases. This allowed to reduce the number of
function calls to 2. However, the cached structure may be further
mutated by filters downstream, which means we need to cache a copy of
the course structure (not the structure itself). The copy method itself
is quite heavy (it calls deepcopy three times), making the benefits of
this solution much less tangible. In fact, another potential
optimization that was considered was to reuse the collected block
structure (pre-transformation), but since calling copy on a collected
structure proved to be more time-consuming than calling get_collected,
this change was discarded, considering that the goal is to improve
performance.
Revised Solution
To achieve a more tangible performance improvement, it was decided to
modify the previous strategy as follows:
* Pass a for_blocks_view parameter to the get_blocks function to make
sure the new caching logic only affects the blocks view.
* Collect and cache course blocks with future dates included.
* Include start key in requested fields.
* Reuse the cached blocks in the third call, which is in
get_course_assignments
* Before returning the response, filter out any blocks with a future
start date, and also remove the start key if it was not in requested
fields
There was problem in filter_discussion_xblocks_from_response(). This
function was breaking the list response for the BlocksInCourseView by
returning a dict instead of list.
Currently, the sidebar relies only on the XBlock's `category` class attribute
(called `type` in the transformers). This behavior is inconsistent with the
legacy subsection navigation, which relies on the `XModuleMixin.get_icon_class`
method. This commit adds the `icon_class` to the fields collected by the
transformers and uses it to determine whether the "problem" or "video" icon
should be displayed for a unit in the sidebar.
Implements the connection from the teams feature to the content groups feature. This implementation uses the dynamic partition generator extension point to associate content groups with the users that belong to a Team.
This implementation was heavily inspired by the enrollment tracks dynamic partitions.
Adds new api to return block metadata which includes index_dictionary.
Reason for new api instead of adding it to course blocks API: data like
index_dictionary are too large for the cache used by course/blocks
transformers API.
The bug is explained in https://openedx.atlassian.net/browse/CRI-233. Only is missing add the `VisibilityTransformer` in `get_blocks()` when the user is not enrolled to the course.
On the test, `html_block` is visible only for staff and `vertical_block` is a normal block. The new behaviour hides the `html_block` and show the `vertical_block` to anonymous users
This is a first stage for removing the LegacyWaffle* classes.
LegacyWaffleFlag usage replaced with WaffleFlag;
LegacyWaffleSwitche usage replaced with WaffleSwitch;
New CourseWaffleFlag added to the temporary module __future__ as FutureCourseWaffleFlag;
Updated all the imports to use CourseWaffleFlag from the __future__ module;
BREAKING CHANGE: A number of toggle related constants (e.g. ENABLE_ACCESSIBILITY_POLICY_PAGE)
changed types. They were strings, and are now toggle instances (e.g. WaffleSwitch). Although the entire
refactor should be self-contained in edx-platform, if any plugins or dependencies were directly
using these constants, they will break. If this is the case, try to find a better publicized way of
exposing those toggles.
Description
This is a follow up to #29058 and #29413. This is the next step in moving part of the modulestore data (the course indexes / "active versions" table) from MongoDB to MySQL.
There are four steps planned in moving course index data to MySQL:
Step 1: create the tables in MySQL, start writing to MySQL + MongoDB ✅ done
Step 2: migrate all remaining courses to MySQL ✅ done
Step 3: switch reads from MongoDB to MySQL (this PR)
Step 4 (much later, once we know this is working well): stop writing to MongoDB altogether.
Supporting information
OpenCraft Jira ticket: MNG-2557
Status
✅ Tested with a large Open edX instance is in progress.
Testing instructions
Try making changes in Studio and verify that they work fine.
Deadline
None
It's long past time that the default test modulestore was Split,
instead of Old Mongo. This commit switches the default store and
fixes some tests that now fail:
- Tests that didn't expect MFE to be enabled (because we don't
enable MFE for Old Mongo) - opt out of MFE for those
- Tests that hardcoded old key string formats
- Lots of other random little differences
In many places, I didn't spend much time trying to figure out how to
properly fix the test, and instead just set the modulestore to Old
Mongo.
For those tests that I didn't spend time investigating, I've set
the modulestore to TEST_DATA_MONGO_AMNESTY_MODULESTORE - search for
that string to find further work.
Split modulestore persists data in three MongoDB "collections": course_index (list of courses and the current version of each), structure (outline of the courses, and some XBlock fields), and definition (other XBlock fields). While "structure" and "definition" data can get very large, which is one of the reasons MongoDB was chosen for modulestore, the course index data is very small.
This commit starts writing course indexes (active_versions) to both MySQL and Mongo, but continues to read from MongoDB only.
By moving course index data to MySQL / a django model, we get these advantages:
* Full history of changes to the course index data is now preserved
* Includes a django admin view to inspect the list of courses and libraries
* It's much easier to "reset" a corrupted course to a known working state, by using the simple-history revert tools from the django admin.
* The remaining MongoDB collections (structure and definition) are essentially just used as key-value stores of large JSON data structures. This paves the way for future changes that allow migrating courses one at a time from MongoDB to S3, and thus eliminating any use of MongoDB by split modulestore, simplifying the stack.