The cache in the previous version of this code was unwittingly being
shared among all threads, and an occasional race condition would result
in the .children field of some XBlocks containing duplicate entries.
I tried to find other ways to keep the existing cache design and let it
be shared among all the threads (which would be more efficient), but I
couldn't find any clean way of doing it (and even then, this code was
not written with the intention of being used in a multi-threaded way).
So to keep the fix simple, I made the block data cache thread-local
instead of process-local. That eliminated the bug.
Technical details:
The big challenge with this code in the first place was due to the
parse_xml API of XBlocks, which instantiates the XBlock instance and
_then_ sets field data on it and returns it, as there is no mechanism
available to distinguish that from the case of instantiating an XBlock
(using previously loaded/cached field data) and then deliberately
changing its field values. In particular, parse_xml sets the 'children'
field just by calling self.children.append(...) but never explicitly
initializes self.children to [] first, so it's necessary for the field
data store to have a mechanism for setting self.children = [] when
parsing begins, but it's hard to know "when parsing begins" in general.
This is made more challenging since the BlockstoreFieldData design ties
field data changes to the XBlock instance itself (using a weakkey), but
it's impossible to get a reference to the XBlock instance before calling
parse_xml, and since the BlockstoreFieldData design uses a pass-through
approach where fields that aren't being actively changed are read from
the cache; since it doesn't know when children is being initialized vs.
being modified, it would sometimes pass-through and start one thread's
changes with the final result from another thread.
Anyhow, the bottom line is that avoiding unintentional multithreading
here solves the problem.
If we want the field data cache to be shared among threads, it might as
well be rewritten to use memcached and shared among all processes too.
That would be a very good performance boost but would take up a lot more
memory in memcached. Also the rewrite may be challenging due to the
aforementioned nuances of the XBlock parse_xml / construct_xblock /
add_node_as_child APIs. Perhaps modifying the runtime to use a
completely separate fielddata implementation for parsing vs. loading a
parsed+cached XBlock would do it.
Without this PR, there is no [reasonable] way to get the following data
for any XBlocks in the new runtime; now there is :)
* index_dictionary: data about the block content, for search indexing
* student_view_data: data-only equivalent of student_view, for use in
custom UIs/mobile
* children: list of child IDs
* editable_children: list of child IDs in the same bundle (use case:
when showing an OLX editor you want to allow editing the OLX of
children in the same bundle but not linked children)
Also improve usefulness of some blockstore runtime logs for debugging
Context:
Sometimes when trying to load an XBlock's XML file from Amazon S3, AWS will return a 4xx or 5xx response along with error XML like:
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>foo/bar</Key>...</Error>
A bug in the get_bundle_file_data_with_cache method would cause this XML to be returned to the runtime anyways, as if it were the expected OLX. This would then (obviously) lead to strange parsing bugs, e.g. when trying to interpret <Code> as an <xblock-include>.
This fixes the bug and improves the logging, both to make this sort of issue easier to debug in the future and to return whatever detailed error code S3 provides (or Blockstore, if S3 is not being used).
Also improve usefulness of some blockstore runtime logs for debugging
Context:
Sometimes when trying to load an XBlock's XML file from Amazon S3, AWS will return a 4xx or 5xx response along with error XML like:
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>foo/bar</Key>...</Error>
A bug in the get_bundle_file_data_with_cache method would cause this XML to be returned to the runtime anyways, as if it were the expected OLX. This would then (obviously) lead to strange parsing bugs, e.g. when trying to interpret <Code> as an <xblock-include>.
This fixes the bug and improves the logging, both to make this sort of issue easier to debug in the future and to return whatever detailed error code S3 provides (or Blockstore, if S3 is not being used).
I needed this change because I found a bug:
1. Create a block with children in a content library
2. Delete that content library
3. Create a new identical block with children in a new content library.
4. If the OLX is identical to the original block, this new block will not load in the LMS.
The reason for the bug is that the .children field contains usage keys (which encode the library, for example), but the values were being stored in BlockstoreFieldData which caches really aggressively and caches based on the hash of the OLX. Since the OLX is identical, it assumes the .children values should be identical as well.
The fix was to move children to a children-specific field data store, and only store the part of the child data that is encoded by the OLX (the <xblock-include> data) in BlockstoreFieldData. This is a better match for the way the caching works and cleaned up a hacky part of the runtime (at least it's slightly less hacky now).
Specifying a namespace in django.conf.urls.include() without providing an app_name is deprecated.
Adding the app_name attribute in the included module.
Implementation details:
* Anonymous users are assigned a unique ID (like
`anon42c08f9996194e2a9339`) which gets stored in the django session.
`block.scope_ids.user_id` and `block.runtime.anonymous_student_id`
will both return this value.
* User state for anonymous users is stored in the django cache and
automatically expires as the cache gets pruned. Because user state is
stored, anonymous users can use interactive blocks like capa problems.
* There is no mechanism for upgrading to a registered account and
keeping user state since the user state store for anonymous users
(EphemeralKeyValueStore) is completely different than the one for
registered users (DjangoKeyValueStore/"CSM"), and has no "list all
keys" functionality.
* "User State Summary" field values are shared among [recently active]
anonymous users but are not shared with registered users.
* Anonymous users can only access the `public_view` of XBlocks, not the
regular `student_view`.
This PR introduces some backend python + REST APIs for storing static
asset files along with an XBlock in a content library. It also updates
the new runtime to be able to load such static asset files.
Example use cases:
* Store an image file with an HTML block and then use the image inline
in the HTML block.
* Store a PDF file with an HTML block and provide a link in the HTML for
the learner to download the PDF.
* Store .srt files or even video .mp4 files that belong to a video
XBlock.
Within the bundle, these static asset files are stored in a "static/"
subfolder of the folder that contains the OLX file. Extending an
existing LMS/Studio convention, a static asset file such as "image.png"
is referenced within the OLX as "/static/image.png" and the URL will be
rewritten by the runtime.
This commit introduces the changes needed for XBlocks in Blockstore to save
their user state into CSM. Before this commit, all student state for Blockstore
blocks was ephemeral (in-process dict store).
Notes:
* The main risk factor of this PR is that it adds non-course keys to the
course_id field in CSM. If any code (like analytics?) reads course keys
directly out of CSM and doesn't have graceful handling for key types it
doesn't recognize, it could cause an issue. With the included changes to
opaque-keys, calling CourseKey.from_string(...) on these values will raise
InvalidKeyError since they're not CourseKeys. (But calling
LearningContextKey.from_string(...) will work for both course and library
keys.)
* This commit introduces a slight regression for the Studio view of XBlocks in
Blockstore content libraries: their state is now lost from request to request.
I have a follow up PR to give them a proper studio-appropriate state store,
but I want to review it separately so it doesn't hold up this PR and we can
test this PR on its own.
https://github.com/edx/edx-platform/pull/20645
This introduces:
* A new XBlock runtime that can read and write XBlocks that are persisted using
Blockstore instead of Modulestore. The new runtime is currently isolated so
that it can be tested without risk to the current courseware/runtime.
* Content Libraries v2, which store XBlocks in Blockstore not modulestore
* An API Client for Blockstore
* "Learning Context" plugin API. A learning context is a more abstract concept
than a course; it's a collection of XBlocks that serves some learning purpose.