Split modulestore persists data in three MongoDB "collections": course_index (list of courses and the current version of each), structure (outline of the courses, and some XBlock fields), and definition (other XBlock fields). While "structure" and "definition" data can get very large, which is one of the reasons MongoDB was chosen for modulestore, the course index data is very small.
This commit starts writing course indexes (active_versions) to both MySQL and Mongo, but continues to read from MongoDB only.
By moving course index data to MySQL / a django model, we get these advantages:
* Full history of changes to the course index data is now preserved
* Includes a django admin view to inspect the list of courses and libraries
* It's much easier to "reset" a corrupted course to a known working state, by using the simple-history revert tools from the django admin.
* The remaining MongoDB collections (structure and definition) are essentially just used as key-value stores of large JSON data structures. This paves the way for future changes that allow migrating courses one at a time from MongoDB to S3, and thus eliminating any use of MongoDB by split modulestore, simplifying the stack.
Split modulestore persists data in three MongoDB "collections": course_index (list of courses and the current version of each), structure (outline of the courses, and some XBlock fields), and definition (other XBlock fields). While "structure" and "definition" data can get very large, which is one of the reasons MongoDB was chosen for modulestore, the course index data is very small.
By moving course index data to MySQL / a django model, we get these advantages:
* Full history of changes to the course index data is now preserved
* Includes a django admin view to inspect the list of courses and libraries
* It's much easier to "reset" a corrupted course to a known working state, by using the simple-history revert tools from the django admin.
* The remaining MongoDB collections (structure and definition) are essentially just used as key-value stores of large JSON data structures. This paves the way for future changes that allow migrating courses one at a time from MongoDB to S3, and thus eliminating any use of MongoDB by split modulestore, simplifying the stack.
- Update migration instructions
- Changes regarding redirect URLs and cookie domain are to permit the
site to run on multiple domains.
- Set LOGIN_URL in common so that it can be unset in environment overrides
This bypasses the "redirect to LMS" login/signup code, but does not yet
remove it; removal is covered by DEPR-166 so that this remains a
configuration-only change for now.
There should have no user-visible effect.
ref: ARCHBOM-1890
Current State (before this commit):
Studio, as of today doesn't have a way to restrict a user to
create a course in a particular organization. What Studio
provides right now is a CourseCreator permission which gives
an Admin the power to grant a user the permission to create
a course.
For example: If the Admin has given a user Spiderman the
permission to create courses, Spiderman can now create courses
in any organization i.e Marvel as well as DC.
There is no way to restrict Spiderman from creating courses
under DC.
Purpose of this commit:
The changes done here gives Admin the ability to restrict a
user on an Organization level from creating courses via the
Course Creators section of the Studio Django administration
panel.
For example: Now, the Admin can give the user Spiderman the
privilege of creating courses only under Marvel organization.
The moment Spiderman tries to create a course under some
other organization(i.e DC), Studio will show an error message.
This change is available to all Studio instances that
enable the FEATURES['ENABLE_CREATOR_GROUP'] flag.
Regardless of the flag, it will not affect any instances that choose
not to use it.
BB-3622
Does 3 things:
(1) Use django for modulestore tests
(2) Use normal LMS settings for modulestore tests instead of openedx/tests/settings.py
(3) Simplify some TestCase subclasses by converting them to use ModuleStoreTestCase
Details and rationale:
(1) Currently parts of the modulestore test suite are designed to run "without django", although there is still a lot of django functionality imported at times, and many of the tests do in fact use django. But for the upcoming PR #27565 (moving split's course indexes from MongoDB to MySQL), we will need to always have Django enabled. So this commit paves the way for that change.
(2) The previous tests that did use Django used a special settings file, openedx/tests/settings.py which made some debugging confusing because those tests had quite different django settings than other tests. This change deletes that file and runs the tests using the LMS test settings.
(3) The test suite also contains many different ways of initializing and testing a modulestore, with significant differences in their configuration, and also a lot of repetition. I find this makes understanding, debugging and writing tests more difficult. So this commit also reduces the number of different "test case using modulestore" base classes:
* Simplifies MixedWithOptionsTestCase and MixedSplitTestCase by making them simple subclasses of ModuleStoreTestCase.
* Removes PureModulestoreTestCase.
Some actions in split modulestore record the user ID who requested the action. Currently, split modulestore doesn't care what data type you use for those user IDs. Most of the codebase uses integers, but some tests used username or email address strings.
My upcoming PR #27565 will move split modulestore's "course index" data from MongoDB into MySQL. In doing so, it requires that user IDs are always numeric. So this PR paves the way for that one by using numeric IDs consistently in all test cases. I believe the actual non-test code was already consistently using integer IDs.
Instead of having json errors in transcript acquisition and conversion cause errors, have transcription conversion and acquisition simply return an error message in the transcription which can prompt a change from the user.
Although not uploading a transcript is handled, transcripts can often cause errors in edit, export, and other activities due to json errors. These errors block the entire use of these features, so to allow for reupload, etc, we add an error message instead of transcript and log the event.
In response to [TNL-8539](https://openedx.atlassian.net/secure/RapidBoard.jspa?rapidView=580&projectKey=TNL&modal=detail&selectedIssue=TNL-8539)
Testing: Unit tests coverage is included in the PR. Upload, import, and export of courses with transcriptions is also easily hand-testable. Just create a video in studio, add an irrelevant transcript. Then try to import, export, and edit the problem. Expected behavior is success.
This PR fixes Course Outline DAGs with duplicate sequences. Previously
when a course outline had duplicate sequences, the outline would not generate
and raise a ValueError. There were no checks for duplicate sequences
before the generation of the course outline because it is not possible
to create duplicate sequences in Studio, but is possible when a Course
Author imports a course. Now before the course outline is generated, it
will be checked for duplicate sequences. If a duplicate sequence is
found an error will be logged for Partner Support to see in the Django
Admin and the duplicate will be deleted. This change will impact the
Course Author.
Code in ./common/lib/xmodule/xmodule should
be imported as `from xmodule`, since `xmodule`
is a locally-installed package.
This is weird, but as long as it is the case,
we should be consistent.
(In BOM-2584, I propose moving the files to
./xmodule, which would quell this confusion.)
This commit addresses MST-793. In MST-757, exam registration was moved to an asychronous task to improve the performance of saving content in Studio. This feature was gated behind a waffle flag, ENABLE_ASYNC_REGISTER_EXAMS, to avoid interrupting course teams during the testing and roll out of this feature.
This feature has been tested and rolled out, and so the aforementioned waffle flag is ready to be removed.
The effect of this change is that asynchronous exam registration will no longer be gated by the aforementioned waffle flag, and this feature will be rolled out across all of the edX platform.
MST-362. Previously instructors were unable to modify the zendesk ticket field, although that field should change based on the proctoring provider. This backend change allows course instructors to modify the field so that it is appropriate for whateverproctoring provider they choose.
Modify the existing login api in a way that
it will allow the user to login via username as well.
currently it is only allowing email to log the user in.
VAN-445
In dfb80acc1, we allowed group_access settings to bubble up from
Units to Sequences in the case where all Units had the same group
settings for a particular user partition. This caused an issue
where the group being bubbled up could be empty if the Unit was
misconfigured in OLX (this is already caught if done directly on
the Sequence). This commit fixes things so that empty partition
group settings never bubble up.
We don't want empty partition groups to bubble up because it would
mean that no groups are allowed to access the Sequence, which is
probably an error. (There is a separate visible_to_staff_only flag
when you want to hide content from all non-staff users.)
NOTE: This will require a forced backfill of course outlines to update
the course content data in learning_sequences:
python manage.py cms backfill_course_outlines --force
Without this backfill, the learning_sequences API will continue to serve
stale content data that has no user partition group data. It won't cause
errors, but it won't do the exclusions properly.
Commit summary:
* Created EnrollmentTrackPartitionGroupsOutlineProcessor to process the
enrollment_track User Partition Group, allowing Sequences and Sections
to be removed based on their group_access settings.
* Added user_partition_groups attribute to CourseLearningSequenceData
and CourseSectionData in learning_sequences/data.py, along with
backing model data.
* get_outline_from_modulestore now extracts group_access settings from
Sections and Sequences. It also bubbles up group_access settings from
Units, meaning that if a Sequence with no group_access setting has
Units that are all set to show only to the Verified enrollment track,
then the Sequence will only show to the Verified enrollment track.
This commit adds model-level support for all user partition groups by
capturing all the content group associations (group_access), but it only
implements the code checks for the enrollment track partition. It's not
clear that we want to generalize, since there's only one other partition
type (A/B testing) that is applicable at the outline level.
It's important to note that there is no way to set the group_access for
a Section or Sequence in Studio today. It's only possible by direct
editing of the OLX for import. That being said, the block structures
framework supports applying course groups at this level, and this commit
moves learning_sequences closer to feature parity.
The bubbling up from Units to the parent Sequence was done to mitigate
confusion when a Sequence is entirely composed of Units that are not
visible to the user because of content group restrictions. It's not
clear whether this is something we want to do in the long term, since it
would simplify the code to always specify group_access at the Sequence
level. This first pass is done partially to collect better data about
places in our courses where this kind of usage is already happening.
Most of the EnrollmentTrackPartitionGroupsOutlineProcessor code and its
tests were written by @schenedx.
Synchronously registering proctored exams while saving content to studio is causing a significant slow down. The function that registers the exams has been moved to an async task. In addition, a signal handler on_course_publish has also been moved to the async task, as it relies on exam registration being complete before being executed.
Modify the existing login api in a way that
it will allow the user to login via username as well.
currently it is only allowing email to log the user in.
VAN-445
* Enable import failure and email with Errors/Warnings
This PR enables course import failure in case of olx validation errors. Here is the flow.
* Course Import tries to import foo.tar.gz into their course shell
* Course olx contains validation errors
* During course import, olx is validated and import is failed with the error message "Course olx validation failed. Please check your email."
* System generates an email with ERRORs & WARNINGs in the body of the email.
This PR also adds a waffle flag contentstore.bypass_olx_failure. The purpose of this test flag is to allow course teams to unblock by enabling them to bypass the
the olx-validation failure.
The workaround is shared on the ticket TNL8214.
* Disable olx validation out of the box.
* Use config settings for olxcleaner
Use config settings instead of hardcoded values for olx validation. This would help in adding a great deal of control when you want to change these settings in the future. With this approch we would not need a redeploy.
* Use configs and deprecate waffleflag and also add / update tests
Video SJSON transcripts are supposed to be UTF-8 encoded, but SJSON
is an ad hoc thing we made up to make it easier to build the
transcripts viewer in the VideoBlock, and it's not well specified.
Prior to this commit, if you had an SJSON file with Latin-1 encoded
text outside the standard ASCII range (e.g. û), then we'd error out
while trying to export it.
This was blocking an effort to export some Old Mongo courses (TNL-8007).
tests are failings and complaining related objects doest not exists in
User table. Create object in test setup to fix it.
In another fix article id was giving integrity error.
Fixing task test.
tests are failings and complaining related objects doest not exists in User table. Create object in test setup to fix it.
In another fix article id was giving integrity error.
Fixing task test.
* Introduces the idea of content errors into the learning_sequences
public API, accessible using get_content_errors().
* Makes course outline generation much more resilient to unusual
structures (e.g. Section -> Unit with no Sequence in between),
with the understanding that anything that doesn't conform to the
standard structure will simply be skipped.
* Improves the Django Admin for learning_sequences to display
content errors and improve sequence data browsing within a course.
* Switches the main table viewed in the Django admin from
LearningContext to CourseContext, which is appropriate since only
course runs generate outlines.
This was done as part of TNL-8057, with the end goal of making
course outline generation resilient enough to switch over apps
to using the learning_sequences outline API. The types of course
structure errors that this PR addresses cause display issues even
in the current Outline Page experience, but would break the outline
generation for learning_sequences altogether.
The approach for error messages here is very generic, to keep
modulestore concepts from seeping into learning_sequences (which is
not aware of the modulestore/contentstore). We may need to address
this later, with a more normalized content error data model.
While the Django admin page is backwards compatible with the old
versions of the models, we should run the backfill_course_outlines
management command after deploying this change, to get the full
benefits.
In https://github.com/edx/edx-platform/pull/25955 `HiddenDescriptor`
(which was a subclass of `RawDescriptor` with a custom `student_view()`)
was converted to an XBlock. It is used as the `default_class` by the
`CachingDescriptorSystem` classes. However `RawDescriptor` is still
being used by `XMLModuleStore`. This has been replaced by
`HiddenDescriptor` as well.
The Studio UI prevents you from creating a Section or Subsection
with no title (display_name). But OLX import allows you to bypass
these checks and create Sections ("chapter" tag) and Subsections
("sequential" tag) without display_name information specified in
the XML. When this happens, Studio and the LMS fall back to using
the url_name (the last part of the UsageKey) as a title, using the
display_name_with_default method.
This usually works, because url_names are derived from the import
file name, and if you're hand-editing a course in XML, your file
names are probably more intelligible than Mongo object IDs. In any
case, this commit updates get_outline_from_modulestore to match the
behavior of Studio and the LMS with respect to this situation.
This is part of the course outlines backfill rollout. TNL-8056
The learning_sequences app has its own model for Course Outlines.
Prior to this commit, these course outlines were only populated by
a management command in the learning_sequences app that queried
modulestore. This commit does a few things:
1. Move the update_course_outline command to live in contentstore
(i.e. Studio). This makes learning_sequences unaware of
modulestore, and makes it easier for us to extract it from
edx-platform (or to plug in different kinds of course outlines).
2. Add tests.
3. Add performance and debug logging to course outline creation.
4. Make course outline creation happen every time a course publish
happens.
This will allow us to start collecting data about how long building
course outlines takes, and get error reporting around any content
edge cases that break the course outline code.
* Install `organizations` app into LMS and Studio non-optionally.
* Add toggle `ORGANIZATIONS_AUTOCREATE` to Studio.
* Remove the `FEATURES["ORGANIZATIONS_APP"]` toggle.
* Use the new `organizations.api.ensure_organization` function to
either validate or get-or-create organizations, depending
on the value of `ORGANIZATIONS_AUTOCREATE`,
when creating course runs and V2 content libraries.
We'll soon use it for V1 content libraries as well.
* Remove the `util.organizations_helpers` wrapper layer
that had to exist because `organizations` was an optional app.
* Add `.get_library_keys()` method to the Split modulestore.
* Add Studio management command for backfilling organizations tables
(`backfill_orgs_and_org_courses`).
For full details, see
https://github.com/edx/edx-organizations/blob/master/docs/decisions/0001-phase-in-db-backed-organizations-to-all.rst
TNL-7646
* Generate common/djangoapps import shims for LMS
* Generate common/djangoapps import shims for Studio
* Stop appending project root to sys.path
* Stop appending common/djangoapps to sys.path
* Import from common.djangoapps.course_action_state instead of course_action_state
* Import from common.djangoapps.course_modes instead of course_modes
* Import from common.djangoapps.database_fixups instead of database_fixups
* Import from common.djangoapps.edxmako instead of edxmako
* Import from common.djangoapps.entitlements instead of entitlements
* Import from common.djangoapps.pipline_mako instead of pipeline_mako
* Import from common.djangoapps.static_replace instead of static_replace
* Import from common.djangoapps.student instead of student
* Import from common.djangoapps.terrain instead of terrain
* Import from common.djangoapps.third_party_auth instead of third_party_auth
* Import from common.djangoapps.track instead of track
* Import from common.djangoapps.util instead of util
* Import from common.djangoapps.xblock_django instead of xblock_django
* Add empty common/djangoapps/__init__.py to fix pytest collection
* Fix pylint formatting violations
* Exclude import_shims/ directory tree from linting