On a platform that is configured to upload video transcripts to S3
(`DEFAULT_FILE_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"`),
uploads from the studio fail with a TypeError: "Unicode-objects must be
encoded before hashing"
A full stacktrace of the issue can be found here:
https://sentry.overhang.io/share/issue/2249b6f67d794c7e986cc288758f4ebe/
This error is triggered by md5 hashing in the botocore library, which
itself is used by the S3Boto3Storage storage class. This error does not
occur with filesystem-based uploads because it does not perform checksum
verification. The reason why this error would not occur on edx.org is
unknown. Similar issues were already fixed from edxval.
To address this issue, we encode the transcript file content prior to
sending it to s3.
* Generate common/djangoapps import shims for LMS
* Generate common/djangoapps import shims for Studio
* Stop appending project root to sys.path
* Stop appending common/djangoapps to sys.path
* Import from common.djangoapps.course_action_state instead of course_action_state
* Import from common.djangoapps.course_modes instead of course_modes
* Import from common.djangoapps.database_fixups instead of database_fixups
* Import from common.djangoapps.edxmako instead of edxmako
* Import from common.djangoapps.entitlements instead of entitlements
* Import from common.djangoapps.pipline_mako instead of pipeline_mako
* Import from common.djangoapps.static_replace instead of static_replace
* Import from common.djangoapps.student instead of student
* Import from common.djangoapps.terrain instead of terrain
* Import from common.djangoapps.third_party_auth instead of third_party_auth
* Import from common.djangoapps.track instead of track
* Import from common.djangoapps.util instead of util
* Import from common.djangoapps.xblock_django instead of xblock_django
* Add empty common/djangoapps/__init__.py to fix pytest collection
* Fix pylint formatting violations
* Exclude import_shims/ directory tree from linting
Add a field to VEM config model that will decide the percentage of
courses allowed to go to VEM pipeline. The courses that don't meet the
criteria will go to VEDA.
PROD-1722
Treat transcript content as unicode strings and convert them at any edge
where we encounter them. One decision made here was to not update
edx-val it treats transcripts as byte but is much closer to the actual
files so it makes more sense over there. But within the platform they
are generally passed around as serialized json and so it's much better
for them to be unicode.
EDUCATOR-1335 - This Adds a dedicated app which is responsible for communication with edx-video-pipeline service. It also adds the backend for saving 3rd party transcription service credentials on edx-video-pipline and cache it in edx-val.