Files
edx-platform/docs/decisions/0020-upstream-downstream.rst
Kyle McCormick 834cb9482d refactor: rename ModuleStore runtimes now that XModules are gone (#35523)
* Consolidates and renames the runtime used as a base for all the others:
  * Before: `xmodule.x_module:DescriptorSystem` and
            `xmodule.mako_block:MakoDescriptorSystem`.
  * After:  `xmodule.x_module:ModuleStoreRuntime`.

* Co-locates and renames the runtimes for importing course OLX:
  * Before: `xmodule.x_module:XMLParsingSystem` and
            `xmodule.modulestore.xml:ImportSystem`.
  * After:  `xmodule.modulestore.xml:XMLParsingModuleStoreRuntime` and
            `xmodule.modulestore.xml:XMLImportingModuleStoreRuntime`.
  * Note: I would have liked to consolidate these, but it would have
          involved nontrivial test refactoring.

* Renames the stub Old Mongo runtime:
  * Before: `xmodule.modulestore.mongo.base:CachingDescriptorSystem`.
  * After: `xmodule.modulestore.mongo.base:OldModuleStoreRuntime`.

* Renames the Split Mongo runtime, the which is what runs courses in LMS and CMS:
  * Before: `xmodule.modulestore.split_mongo.caching_descriptor_system:CachingDescriptorSystem`.
  * After: `xmodule.modulestore.split_mongo.runtime:SplitModuleStoreRuntime`.

* Renames some of the dummy runtimes used only in unit tests.
2025-10-29 15:46:07 -04:00

407 lines
20 KiB
ReStructuredText

4. Upstream and downstream content
##################################
Status
******
Accepted.
Implementation in progress as of 2024-09-03.
Context
*******
We are replacing the existing Legacy ("V1") Content Libraries system, based on
ModuleStore, with a Relaunched ("V2") Content Libraries system, based on
Learning Core. V1 and V2 libraries will coexist for at least one release to
allow for migration; eventually, V1 libraries will be removed entirely.
Content from V1 libraries can only be included into courses using the
LibraryContentBlock (called "Randomized Content Module" in Studio), which works
like this:
* Course authors add a LibraryContentBlock to a Unit and configure it with a
library key and a count of N library blocks to select (or `-1` for "all
blocks").
* For each block in the chosen library, its *content definition* is copied into
the course as a child of the LibraryContentBlock, whereas its *settings* are
copied into a special "default" settings dictionary in the course's structure
document--this distinction will matter later. The usage key of each copied
block is derived from a hash of the original library block's usage key plus
the LibraryContentBlock's own usage key--this will also matter
later.
* The course author is free to override the content and settings of the
course-local copies of each library block.
* When any update is made to the library, the course author is prompted to
update the LibraryContentBlock. This involves re-copying the library blocks'
content definitions and default settings, which clobbers any overrides they
have made to content, but preserves any overrides they have made to settings.
Furthermore, any blocks that were added to the library are newly copied into
the course, and any blocks that were removed from the library are deleted
from the course. For all blocks, usage keys are recalculated using the same
hash derivation described above; for existing blocks, it is important that
this recalculation yields the same usage key so that student state is not
lost.
* Over in the LMS, when a learner loads LibraryContentBlock, they are shown a
list of N randomly-picked blocks from the library. Subsequent visits show
them the same list, *unless* children were added, children were removed, or N
changed. In those cases, the LibraryContentBlock tries to make the smallest
possible adjustment to their personal list of blocks while respecting N and
the updated list of children.
This system has several issues:
#. **Missing defaults after import:** When a course with a LibraryContentBlock
is imported into an Open edX instance *without* the referenced library, the
blocks' *content* will remain intact as will course-local *settings
overrides*. However, any *default settings* defined in the library will be
missing. This can result in content that is completely broken, especially
since critical fields like video URLs and LTI URLs are considered
"settings". For a detailed scenario, see `LibraryContentBlock Curveball 1`_.
#. **Strange behavior when duplicating content:** Typically, when a
block is duplicated or copy-pasted, the new block's usage key and its
children's usage keys are randomly generated. However, recall that when a
LibraryContentBlock is updated, its children's usage keys are rederived
using a hash function. That would cause the children's usage keys to change,
thus destroying any student state. So, we must work around this with a hack:
upon duplicating or pasting a LibraryContentBlock, we immediately update the
LibraryContentBlock, thus discarding the problematic randomly-generated keys
in favor of hash-derived keys. This works, but:
* it involves weird code hacks,
* it unexpectedly discards any content overrides the course author made to
the copied LibraryContentBlock's children,
* it unexpectedly uses the latest version of library content, regardless of
which version the copied LibraryContentBlock was using, and
* it fails if the library does not exist on the Open edX instance, which
can happen if the course was imported from another instance.
#. **Conflation of reference and randomization:** The LibraryContentBlock does
two things: it connects courses to library content, and it shows users a
random subset of content. There is no reason that those two features need to
be coupled together. A course author may want to randomize course-defined
content, or they may want to randomize content from multiple different
libraries. Or, they may want to use content from libraries without
randomizing it at all. While it is feasible to support all these things in a
single XBlock, trying to do so led to a `very complicated XBlock concept`_
which difficult to explain to product managers and other engineers.
#. **Unpredictable preservation of overrides:** Recall that *content
definitions* and *settings* are handled differently. This distinction is
defined in the code: every authorable XBlock field is either defined with
`Scope.content` or `Scope.settings`. In theory, XBlock developers would use
the content scope for fields that are core to the meaning of piece of
content, and they would only use the settings scope for fields that would be
reasonable to configure in a local copy of the piece of content. In
practice, though, XBlock developers almost always use `Scope.settings`. The
result of this is that customizations to blocks *almost always* survive
through library updates, except when they don't. Course authors have no way
to know (or even guess) when their customizations they will and won't
survive updates.
#. **General pain and suffering:** The relationship between courses and V1
libraries is confusing to content authors, site admins, and developers
alike. The behaviors above toe the line between "quirks" and "known bugs",
and they are not all documented. Past attempts to improve the system have
`triggered series of bugs`_, some of which led to permanent loss of learner
state. In other cases, past Content Libraries improvement efforts have
slowed or completely stalled out in code review due to the overwhelming
amount of context and edge cases that must be understood to safely make any
changes.
.. _LibraryContentBlock Curveball 1: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3966795804/Fun+with+LibraryContentBlock+export+import+and+duplication#Curveball-1%3A-Import%2FExport
.. _LibraryContentBlock Curveball 2: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3966795804/Fun+with+LibraryContentBlock+export+import+and+duplication#Curveball-2:-Duplication
.. _very complicated XBlock concept: https://github.com/openedx/edx-platform/blob/master/xmodule/docs/decisions/0003-library-content-block-schema.rst
.. _triggered series of bugs: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3858661405/Bugs+from+Content+Libraries+V1
We are keen to use the Library Relaunch project to address all of these
problems. So, V2 libraries will interop with courses using a completely
different data model.
Decision
********
We will create a framework where a *downstream* piece of content (e.g. a course
block) can be *linked* to an *upstream* piece of content (e.g., a library
block) with the following properties:
* **Portable:** Links can refer to certain content on the current Open edX
instance, and in the future they may be able to refer to content on other
Open edX instances or sites. Links will never include information that is
internal to a particular Open edX instance, such as foreign keys.
* **Flat:** The *link* is a not a wrapper (like the LibraryContentBlock),
but simply a piece of metadata directly on the downstream content which
points to the upstream content. We will no longer rely on precarious
hash-derived usage keys to establish connection to upstream blocks;
like any other block, an upstream-linked blocks can be granted whatever block
ID that the authoring environment assigns it, whether random or
human-readable.
* **Forwards-compatible:** If downstream content is created in a course on
an Open edX site that supports upstream and downstreams (e.g., a Teak
instance), and then it is exported and imported into a site that doesn't
(e.g., a Quince instance), the downstream content will simply act like
regular course content.
* **Independent:** Upstream content and downstream content exist separately
from one another:
* Modifying upstream content does not affect any downstream content (unless a
sync happens, more on that later).
* Deleting upstream content does not impact its downstream content. By
corollary, pieces of downstream content can completely and correctly render
on Open edX instances that are missing their linked upstream content.
* (Preserving a positive feature of the V1 LibraryContentBlock) The link
persists through export-import and copy-paste, regardless of whether the
upstream content actually exists. A "broken" link to upstream content is
seamlessly "repaired" if the upstream content becomes available again.
* **Customizable:** On an OLX level, authors can still override the value
of any field for a piece of downstream content. However, we will empower
Studio to be more prescriptive about what authors *can* override versus what
they *should* override:
* We define a set of *customizable* fields, with platform-level defaults
like display_name and a max_attempts, plus the ability for external
XBlocks to opt their own fields into customizability.
* Studio may use this list to provide an interface for customizing
downstream blocks, separate from the usual "Edit" interface that would
permit them to make unsafe overrides.
* Furthermore, downstream content will record which fields the user has
customized...
* even if the customization is to simply clear the value of the fields...
* and even if the customization is made redundant in a future version of
the upstream content. For example, if max_attempts is customized from 3
to 5 in the downstream content, but the next version of the upstream
content also changes max_attempts to 5, the downstream would still
consider max_attempts to be customized. If the following version of the
upstream content again changed max_attempts to 6, the downstream would
retain max_attempts to be 5.
* Finally, the downstream content will locally save the upstream value of
customizable fields, allowing the author to *revert* back to them
regardless of whether the upstream content is actually available.
* **Synchronizable, without surprises:** Downstream content can be *synced*
with updates that have been made to its linked upstream. This means that the
latest available upstream content field values will entirely replace all of
the downstream field values, *except* those which were customized, as
described in the previous item.
* **Concrete, but flexible:** The internal implementation of upstream-downstream
syncing will assume that:
* upstream content belongs to a V2 content library,
* downstream content belongs to a course on the same instance, and
* the link is the stringified usage key of the upstream library content.
This will allow us to keep the implementation straightforward. However, we
will *not* expose these assumptions in the Python APIs, the HTTP APIs, or in
the persisted fields, allowing us in the future to generalize to other
upstreams (such as externally-hosted libraries) and other downstreams (such
as a standalone enrollable sequence without a course).
If any of these assumptions are violated, we will raise an exception or log a
warning, as appropriate. Particularly, if these assumptions are violated at
the OLX level via a course import, then we will probably show a warning at
import time and refuse to sync from the unsupported upstream; however, we
will *not* fail the entire import or mangle the value of upstream link, since
we want to remain forwards-compatible with potential future forms of syncing.
As a concrete example: if a course block has *another course block's usage
key* as an upstream, then we will faithfully keep that value through the
import and export process, but we will not prompt the user to sync updates
for that block.
* **Decoupled:** Upstream-downstream linking is not tied up with any other
courseware feature; in particular, it is unrelated to content randomization.
Randomized library content will be supported, but it will be a *synthesis* of
two features: (1) a RandomizationBlock that randomly selects a subset of its
children, where (2) some or all of those children are linked to upstream
blocks.
Consequences
************
To support the Libraries Relaunch in Sumac:
* For every XBlock in CMS, we will use XBlock fields to persist the upstream
link, its versions, its customizable fields, and its set of downstream
overrides.
* We will avoid exposing these fields to LMS code.
* We will define an initial set of customizable fields for Problem, Text, and
Video blocks.
* We will define method(s) for syncing update on the XBlock runtime so that
they are available in the SplitModuleStoreRuntime.
* Either in the initial implementation or in a later implementation, it may
make sense to declare abstract versions of the syncing method(s) higher up
in XBlock Runtime inheritance hierarchy.
* We will expose a CMS HTTP API for syncing updates to blocks from their
upstreams.
* We will avoid exposing this API from the LMS.
For reference, here are some excerpts of a potential implementation. This may
change through development and code review.
(UPDATE: When implementing, we ended up factoring this code differently.
Particularly, we opted to use regular functions rather than add new
XBlock Runtime methods, allowing us to avoid mucking with the complicated
inheritance hierarchy of CachingDescriptorSystem and SplitModuleStoreRuntime.)
.. code-block:: python
###########################################################################
# cms/lib/xblock/upstream_sync.py
###########################################################################
class UpstreamSyncMixin(XBlockMixin):
"""
Allows an XBlock in the CMS to be associated & synced with an upstream.
Mixed into CMS's XBLOCK_MIXINS, but not LMS's.
"""
# Metadata related to upstream synchronization
upstream = String(
help=("""
The usage key of a block (generally within a content library)
which serves as a source of upstream updates for this block,
or None if there is no such upstream. Please note: It is valid
for this field to hold a usage key for an upstream block
that does not exist (or does not *yet* exist) on this instance,
particularly if this downstream block was imported from a
different instance.
"""),
default=None, scope=Scope.settings, hidden=True, enforce_type=True
)
upstream_version = Integer(
help=("""
Record of the upstream block's version number at the time this
block was created from it. If upstream_version is smaller
than the upstream block's latest version, then the user will be
able to sync updates into this downstream block.
"""),
default=None, scope=Scope.settings, hidden=True, enforce_type=True,
)
downstream_customized = Set(
help=("""
Names of the fields which have values set on the upstream
block yet have been explicitly overridden on this downstream
block. Unless explicitly cleared by the user, these
customizations will persist even when updates are synced from
the upstream.
"""),
default=[], scope=Scope.settings, hidden=True, enforce_type=True,
)
# Store upstream defaults for customizable fields.
upstream_display_name = String(...)
upstream_max_attempts = List(...)
... # We will probably want to pre-define several more of these.
def get_upstream_field_names(cls) -> dict[str, str]:
"""
Mapping from each customizable field to field which stores its upstream default.
XBlocks outside of edx-platform can override this in order to set
up their own customizable fields.
"""
return {
"display_name": "upstream_display_name",
"max_attempts": "upstream_max_attempts",
}
def save(self, *args, **kwargs):
"""
Update `downstream_customized` when a customizable field is modified.
Uses `get_upstream_field_names` keys as the list of fields that are
customizable.
"""
...
@dataclass(frozen=True)
class UpstreamInfo:
"""
Metadata about a block's relationship with an upstream.
"""
usage_key: UsageKey
current_version: int
latest_version: int | None
sync_url: str
error: str | None
@property
def sync_available(self) -> bool:
"""
Should the user be prompted to sync this block with upstream?
"""
return (
self.latest_version
and self.current_version < self.latest_version
and not self.error
)
###########################################################################
# xmodule/modulestore/split_mongo/runtime.py
###########################################################################
class SplitModuleStoreRuntime(...):
def validate_upstream_key(self, usage_key: UsageKey | str) -> UsageKey:
"""
Raise an error if the provided key is not a valid upstream reference.
Instead of explicitly checking whether a key is a LibraryLocatorV2,
callers should validate using this function, and use an `except` clause
to handle the case where the key is not a valid upstream.
Raises: InvalidKeyError, UnsupportedUpstreamKeyType
"""
...
def sync_from_upstream(self, *, downstream_key: UsageKey, apply_updates: bool) -> None:
"""
Python API for loading updates from upstream block.
Can choose whether or not to actually apply those updates...
apply_updates=False: Think "get fetch".
Use case: course import.
apply_updates=True: Think "git pull".
Use case: sync_updates handler.
Raises: InvalidKeyError, UnsupportedUpstreamKeyType, XBlockNotFoundError
"""
...
def get_upstream_info(self, downstream_key: UsageKey) -> UpstreamInfo | None:
"""
Python API for upstream metadata, or None.
Raises: InvalidKeyError, XBlockNotFoundError
"""
...
Finally, here is what the OLX for a library-sourced Problem XBlock in a course
might look like:
.. code-block:: xml
<problem
display_name="A title that has been customized in the course"
max_attempts="2"
upstream="lb:myorg:mylib:problem:p1"
upstream_version="12"
downstream_customized="[&quot;display_name&quot;,&quot;max_attempts&quot;]"
upstream_display_name="The title that was defined in the library block"
upstream_max_attempts="3"
>
<!-- problem content would go here -->
</problem>