feat: [FC-0092] Optimize Course Info Blocks API (#37122)

The Course Info Blocks API endpoint has been known to be rather slow
to return the response. Previous investigation showed that the major
time sink was the get_course_blocks function, which is called three
times in a single request. This commit aims to improve the response
times by reducing the number of times that this function is called.

Solution Summary

The first time the function get_course_blocks is called, the result
(transformed course blocks) is stored in the current WSGI request
object. Later in the same request, before the second get_course_blocks
call is triggered, the already transformed course blocks are taken
from the request object, and if they are available, get_course_blocks
is not called (if not, it is called as a fallback). Later in the
request, the function is called again as before (see Optimization
Strategy and Difficulties).

Optimization Strategy and Difficulties

The original idea was to fetch and transform the course blocks once
and reuse them in all three cases, which would reduce get_course_blocks
call count to 1. However, this did not turn out to be a viable solution
because of the arguments passed to get_course_blocks. Notably, the
allow_start_dates_in_future boolean flag affects the behavior of
StartDateTransformer, which is a filtering transformer modifying the
block structure returned.

The first two times allow_start_dates_in_future is False, the third
time it is True. Setting it to True in all three cases would mean that
some blocks would be incorrectly included in the response.

This left us with one option - optimize the first two calls. The
difference between the first two calls is the non-filtering
transformers, however the second call applies a subset of transformers
from the first call, so it was safe to apply the superset of
transformers in both cases. This allowed to reduce the number of
function calls to 2. However, the cached structure may be further
mutated by filters downstream, which means we need to cache a copy of
the course structure (not the structure itself). The copy method itself
is quite heavy (it calls deepcopy three times), making the benefits of
this solution much less tangible. In fact, another potential
optimization that was considered was to reuse the collected block
structure (pre-transformation), but since calling copy on a collected
structure proved to be more time-consuming than calling get_collected,
this change was discarded, considering that the goal is to improve
performance.

Revised Solution

To achieve a more tangible performance improvement, it was decided to
modify the previous strategy as follows:

* Pass a for_blocks_view parameter to the get_blocks function to make
  sure the new caching logic only affects the blocks view.
* Collect and cache course blocks with future dates included.
* Include start key in requested fields.
* Reuse the cached blocks in the third call, which is in
  get_course_assignments
* Before returning the response, filter out any blocks with a future
  start date, and also remove the start key if it was not in requested
  fields
This commit is contained in:
Serhiii Nanai
2025-10-30 23:23:49 +02:00
committed by GitHub
parent ec1d9593d9
commit 7cd4170ca7
6 changed files with 168 additions and 5 deletions

View File

@@ -1,7 +1,7 @@
"""
API function for retrieving course blocks data
"""
from edx_django_utils.cache import RequestCache
import lms.djangoapps.course_blocks.api as course_blocks_api
from lms.djangoapps.course_blocks.transformers.access_denied_filter import AccessDeniedMessageFilterTransformer
@@ -14,6 +14,7 @@ from .serializers import BlockDictSerializer, BlockSerializer
from .toggles import HIDE_ACCESS_DENIALS_FLAG
from .transformers.blocks_api import BlocksAPITransformer
from .transformers.milestones import MilestonesAndSpecialExamsTransformer
from .utils import COURSE_API_REQUEST_CACHE_NAMESPACE, REUSABLE_BLOCKS_CACHE_KEY
def get_blocks(
@@ -29,6 +30,7 @@ def get_blocks(
block_types_filter=None,
hide_access_denials=False,
allow_start_dates_in_future=False,
cache_with_future_dates=False,
):
"""
Return a serialized representation of the course blocks.
@@ -61,6 +63,7 @@ def get_blocks(
allow_start_dates_in_future (bool): When True, will allow blocks to be
returned that can bypass the StartDateTransformer's filter to show
blocks with start dates in the future.
cache_with_future_dates (bool): When True, will use the block caching logic using RequestCache
"""
if HIDE_ACCESS_DENIALS_FLAG.is_enabled():
@@ -118,6 +121,10 @@ def get_blocks(
),
]
if cache_with_future_dates:
# Include future dates such that get_course_assignments can reuse the block structure from RequestCache
allow_start_dates_in_future = True
# transform
blocks = course_blocks_api.get_course_blocks(
user,
@@ -128,6 +135,19 @@ def get_blocks(
include_has_scheduled_content=include_has_scheduled_content
)
if cache_with_future_dates:
# Store a copy of the transformed, but still unfiltered, course blocks in RequestCache to be reused
# wherever possible for optimization. Copying is required to make sure the cached structure is not mutated
# by the filtering below.
request_cache = RequestCache(COURSE_API_REQUEST_CACHE_NAMESPACE)
request_cache.set(REUSABLE_BLOCKS_CACHE_KEY, blocks.copy())
# Since we included blocks with future start dates in our block structure,
# we need to include the 'start' field to filter out such blocks before returning the response.
# If 'start' field is not requested, it will be removed from the response.
requested_fields = set(requested_fields)
requested_fields.add('start')
# filter blocks by types
if block_types_filter:
block_keys_to_remove = []
@@ -142,7 +162,7 @@ def get_blocks(
serializer_context = {
'request': request,
'block_structure': blocks,
'requested_fields': requested_fields or [],
'requested_fields': requested_fields,
}
if return_type == 'dict':

View File

@@ -1,6 +1,7 @@
"""
Utils for Blocks
"""
from edx_django_utils.cache import RequestCache
from rest_framework.utils.serializer_helpers import ReturnList
from openedx.core.djangoapps.discussions.models import (
@@ -9,6 +10,10 @@ from openedx.core.djangoapps.discussions.models import (
)
COURSE_API_REQUEST_CACHE_NAMESPACE = "course_api"
REUSABLE_BLOCKS_CACHE_KEY = "reusable_transformed_blocks"
def filter_discussion_xblocks_from_response(response, course_key):
"""
Removes discussion xblocks if discussion provider is openedx.
@@ -63,3 +68,18 @@ def filter_discussion_xblocks_from_response(response, course_key):
response.data['blocks'] = filtered_blocks
return response
def get_cached_transformed_blocks():
"""
Helper function to get an unfiltered course structure from RequestCache,
including blocks with start dates in the future.
Caution: For performance reasons, the function returns the structure object itself, not its copy.
This means the retrieved structure is supposed to be read-only and should not be mutated by consumers.
"""
request_cache = RequestCache(COURSE_API_REQUEST_CACHE_NAMESPACE)
cached_response = request_cache.get_cached_response(REUSABLE_BLOCKS_CACHE_KEY)
reusable_transformed_blocks = cached_response.value if cached_response.is_found else None
return reusable_transformed_blocks

View File

@@ -2,6 +2,7 @@
CourseBlocks API views
"""
from datetime import datetime, timezone
from django.core.exceptions import ValidationError
from django.db import transaction
@@ -237,6 +238,7 @@ class BlocksView(DeveloperErrorViewMixin, ListAPIView):
params.cleaned_data['return_type'],
params.cleaned_data.get('block_types_filter', None),
hide_access_denials=hide_access_denials,
cache_with_future_dates=True
)
)
# If the username is an empty string, and not None, then we are requesting
@@ -339,9 +341,50 @@ class BlocksInCourseView(BlocksView):
if not root:
raise ValidationError(f"Unable to find course block in '{course_key_string}'")
# Earlier we included blocks with future start dates in the collected/cached block structure.
# Now we need to emulate allow_start_dates_in_future=False by removing any such blocks.
include_start = "start" in request.query_params['requested_fields']
self.remove_future_blocks(course_blocks, include_start)
recurse_mark_complete(root, course_blocks)
return response
@staticmethod
def remove_future_blocks(course_blocks, include_start: bool):
"""
Mutates course_blocks in place:
- removes blocks whose 'start' is in the future
- also removes references to them from parents' 'children' lists
- removes 'start' key from all blocks if it wasn't requested
"""
if not course_blocks:
return course_blocks
now = datetime.now(timezone.utc)
# 1. Collect IDs of blocks to remove
to_remove = set()
for block_id, block in course_blocks.items():
get_field = block.get if include_start else block.pop
start = get_field("start")
if start and start > now:
to_remove.add(block_id)
if not to_remove:
return course_blocks
# 2. Remove the blocks themselves
for block_id in to_remove:
course_blocks.pop(block_id, None)
# 3. Clean up children lists
for block in course_blocks.values():
children = block.get("children")
if children:
block["children"] = [cid for cid in children if cid not in to_remove]
return course_blocks
@method_decorator(transaction.non_atomic_requests, name='dispatch')
@view_auth_classes(is_authenticated=False)

View File

@@ -26,6 +26,7 @@ from common.djangoapps.edxmako.shortcuts import render_to_string
from common.djangoapps.static_replace import replace_static_urls
from common.djangoapps.util.date_utils import strftime_localized
from lms.djangoapps import branding
from lms.djangoapps.course_api.blocks.utils import get_cached_transformed_blocks
from lms.djangoapps.course_blocks.api import get_course_blocks
from lms.djangoapps.courseware.access import has_access
from lms.djangoapps.courseware.access_response import (
@@ -636,7 +637,10 @@ def get_course_assignments(course_key, user, include_access=False, include_witho
store = modulestore()
course_usage_key = store.make_course_usage_key(course_key)
block_data = get_course_blocks(user, course_usage_key, allow_start_dates_in_future=True, include_completion=True)
block_data = get_cached_transformed_blocks() or get_course_blocks(
user, course_usage_key, allow_start_dates_in_future=True, include_completion=True
)
now = datetime.now(pytz.UTC)
assignments = []

View File

@@ -2,12 +2,12 @@
Code used to get and cache the requested course-data
"""
from lms.djangoapps.course_blocks.api import get_course_blocks
from openedx.core.djangoapps.content.block_structure.api import get_block_structure_manager
from xmodule.modulestore.django import modulestore # lint-amnesty, pylint: disable=wrong-import-order
from .transformer import GradesTransformer
from ..course_api.blocks.utils import get_cached_transformed_blocks
class CourseData:
@@ -56,7 +56,11 @@ class CourseData:
@property
def structure(self): # lint-amnesty, pylint: disable=missing-function-docstring
if self._structure is None:
self._structure = get_course_blocks(
# The get_course_blocks function proved to be a major time sink during a request at "blocks/".
# This caching logic helps improve the response time by getting a copy of the already transformed, but still
# unfiltered, course blocks from RequestCache and thus reducing the number of times that
# the get_course_blocks function is called.
self._structure = get_cached_transformed_blocks() or get_course_blocks(
self.user,
self.location,
collected_block_structure=self._collected_block_structure,

View File

@@ -432,6 +432,78 @@ class TestBlocksInfoInCourseView(TestBlocksInCourseView, MilestonesTestCaseMixin
for block_info in response.data['blocks'].values():
self.assertNotEqual('assignment_progress', block_info)
def test_response_keys(self):
response = self.verify_response(url=self.url)
data = response.data
expected_top_level_keys = {
'blocks',
'certificate',
'course_about',
'course_access_details',
'course_handouts',
'course_modes',
'course_progress',
'course_sharing_utm_parameters',
'course_updates',
'deprecate_youtube',
'discussion_url',
'end',
'enrollment_details',
'id',
'is_self_paced',
'media',
'name',
'number',
'org',
'org_logo',
'root',
'start',
'start_display',
'start_type'
}
expected_course_access_keys = {
"has_unmet_prerequisites",
"is_too_early",
"is_staff",
"audit_access_expires",
"courseware_access"
}
expected_courseware_access_keys = {
"has_access",
"error_code",
"developer_message",
"user_message",
"additional_context_user_message",
"user_fragment"
}
expected_enrollment_details_keys = {"created", "mode", "is_active", "upgrade_deadline"}
expected_media_keys = {"image"}
expected_image_keys = {"raw", "small", "large"}
expected_course_sharing_keys = {"facebook", "twitter"}
expected_course_modes_keys = {"slug", "sku", "android_sku", "ios_sku", "min_price"}
expected_course_progress_keys = {"total_assignments_count", "assignments_completed"}
self.assertSetEqual(set(data), expected_top_level_keys)
self.assertSetEqual(set(data["course_access_details"]), expected_course_access_keys)
self.assertSetEqual(set(data["course_access_details"]["courseware_access"]), expected_courseware_access_keys)
self.assertSetEqual(set(data["enrollment_details"]), expected_enrollment_details_keys)
self.assertSetEqual(set(data["media"]), expected_media_keys)
self.assertSetEqual(set(data["media"]["image"]), expected_image_keys)
self.assertSetEqual(set(data["course_sharing_utm_parameters"]), expected_course_sharing_keys)
self.assertSetEqual(set(data["course_modes"][0]), expected_course_modes_keys)
self.assertSetEqual(set(data["course_progress"]), expected_course_progress_keys)
def test_block_count_depends_on_depth_in_request_params(self):
response_depth_zero = self.verify_response(url=self.url, params={'depth': 0})
response_depth_one = self.verify_response(url=self.url, params={'depth': 1})
blocks_depth_zero = [block for block in self.store.get_items(self.course_key) if block.category == "course"]
blocks_depth_one = [
block for block in self.store.get_items(self.course_key) if block.category in ("course", "chapter")
]
self.assertEqual(len(response_depth_zero.data["blocks"]), len(blocks_depth_zero))
self.assertEqual(len(response_depth_one.data["blocks"]), len(blocks_depth_one))
class TestCourseEnrollmentDetailsView(MobileAPITestCase, MilestonesTestCaseMixin): # lint-amnesty, pylint: disable=test-inherits-tests
"""