feat: [FC-0092] Optimize Course Info Blocks API (#37122)

The Course Info Blocks API endpoint has been known to be rather slow to return the response. Previous investigation showed that the major time sink was the get_course_blocks function, which is called three times in a single request. This commit aims to improve the response times by reducing the number of times that this function is called. Solution Summary The first time the function get_course_blocks is called, the result (transformed course blocks) is stored in the current WSGI request object. Later in the same request, before the second get_course_blocks call is triggered, the already transformed course blocks are taken from the request object, and if they are available, get_course_blocks is not called (if not, it is called as a fallback). Later in the request, the function is called again as before (see Optimization Strategy and Difficulties). Optimization Strategy and Difficulties The original idea was to fetch and transform the course blocks once and reuse them in all three cases, which would reduce get_course_blocks call count to 1. However, this did not turn out to be a viable solution because of the arguments passed to get_course_blocks. Notably, the allow_start_dates_in_future boolean flag affects the behavior of StartDateTransformer, which is a filtering transformer modifying the block structure returned. The first two times allow_start_dates_in_future is False, the third time it is True. Setting it to True in all three cases would mean that some blocks would be incorrectly included in the response. This left us with one option - optimize the first two calls. The difference between the first two calls is the non-filtering transformers, however the second call applies a subset of transformers from the first call, so it was safe to apply the superset of transformers in both cases. This allowed to reduce the number of function calls to 2. However, the cached structure may be further mutated by filters downstream, which means we need to cache a copy of the course structure (not the structure itself). The copy method itself is quite heavy (it calls deepcopy three times), making the benefits of this solution much less tangible. In fact, another potential optimization that was considered was to reuse the collected block structure (pre-transformation), but since calling copy on a collected structure proved to be more time-consuming than calling get_collected, this change was discarded, considering that the goal is to improve performance. Revised Solution To achieve a more tangible performance improvement, it was decided to modify the previous strategy as follows: * Pass a for_blocks_view parameter to the get_blocks function to make sure the new caching logic only affects the blocks view. * Collect and cache course blocks with future dates included. * Include start key in requested fields. * Reuse the cached blocks in the third call, which is in get_course_assignments * Before returning the response, filter out any blocks with a future start date, and also remove the start key if it was not in requested fields
2025-10-30 23:23:49 +02:00
parent ec1d9593d9
commit 7cd4170ca7
6 changed files with 168 additions and 5 deletions
--- a/lms/djangoapps/course_api/blocks/api.py
+++ b/lms/djangoapps/course_api/blocks/api.py
@@ -1,7 +1,7 @@
 """
 API function for retrieving course blocks data
 """
-
+from edx_django_utils.cache import RequestCache

 import lms.djangoapps.course_blocks.api as course_blocks_api
 from lms.djangoapps.course_blocks.transformers.access_denied_filter import AccessDeniedMessageFilterTransformer
@@ -14,6 +14,7 @@ from .serializers import BlockDictSerializer, BlockSerializer
 from .toggles import HIDE_ACCESS_DENIALS_FLAG
 from .transformers.blocks_api import BlocksAPITransformer
 from .transformers.milestones import MilestonesAndSpecialExamsTransformer
+from .utils import COURSE_API_REQUEST_CACHE_NAMESPACE, REUSABLE_BLOCKS_CACHE_KEY


 def get_blocks(
@@ -29,6 +30,7 @@ def get_blocks(
        block_types_filter=None,
        hide_access_denials=False,
        allow_start_dates_in_future=False,
+        cache_with_future_dates=False,
 ):
    """
    Return a serialized representation of the course blocks.
@@ -61,6 +63,7 @@ def get_blocks(
        allow_start_dates_in_future (bool): When True, will allow blocks to be
            returned that can bypass the StartDateTransformer's filter to show
            blocks with start dates in the future.
+        cache_with_future_dates (bool): When True, will use the block caching logic using RequestCache
    """

    if HIDE_ACCESS_DENIALS_FLAG.is_enabled():
@@ -118,6 +121,10 @@ def get_blocks(
        ),
    ]

+    if cache_with_future_dates:
+        # Include future dates such that get_course_assignments can reuse the block structure from RequestCache
+        allow_start_dates_in_future = True
+
    # transform
    blocks = course_blocks_api.get_course_blocks(
        user,
@@ -128,6 +135,19 @@ def get_blocks(
        include_has_scheduled_content=include_has_scheduled_content
    )

+    if cache_with_future_dates:
+        # Store a copy of the transformed, but still unfiltered, course blocks in RequestCache to be reused
+        # wherever possible for optimization. Copying is required to make sure the cached structure is not mutated
+        # by the filtering below.
+        request_cache = RequestCache(COURSE_API_REQUEST_CACHE_NAMESPACE)
+        request_cache.set(REUSABLE_BLOCKS_CACHE_KEY, blocks.copy())
+
+        # Since we included blocks with future start dates in our block structure,
+        # we need to include the 'start' field to filter out such blocks before returning the response.
+        # If 'start' field is not requested, it will be removed from the response.
+        requested_fields = set(requested_fields)
+        requested_fields.add('start')
+
    # filter blocks by types
    if block_types_filter:
        block_keys_to_remove = []
@@ -142,7 +162,7 @@ def get_blocks(
    serializer_context = {
        'request': request,
        'block_structure': blocks,
-        'requested_fields': requested_fields or [],
+        'requested_fields': requested_fields,
    }

    if return_type == 'dict':
--- a/lms/djangoapps/course_api/blocks/utils.py
+++ b/lms/djangoapps/course_api/blocks/utils.py
@@ -1,6 +1,7 @@
 """
   Utils for Blocks
 """
+from edx_django_utils.cache import RequestCache
 from rest_framework.utils.serializer_helpers import ReturnList

 from openedx.core.djangoapps.discussions.models import (
@@ -9,6 +10,10 @@ from openedx.core.djangoapps.discussions.models import (
 )


+COURSE_API_REQUEST_CACHE_NAMESPACE = "course_api"
+REUSABLE_BLOCKS_CACHE_KEY = "reusable_transformed_blocks"
+
+
 def filter_discussion_xblocks_from_response(response, course_key):
    """
    Removes discussion xblocks if discussion provider is openedx.
@@ -63,3 +68,18 @@ def filter_discussion_xblocks_from_response(response, course_key):
        response.data['blocks'] = filtered_blocks

    return response
+
+
+def get_cached_transformed_blocks():
+    """
+    Helper function to get an unfiltered course structure from RequestCache,
+    including blocks with start dates in the future.
+
+    Caution: For performance reasons, the function returns the structure object itself, not its copy.
+    This means the retrieved structure is supposed to be read-only and should not be mutated by consumers.
+    """
+    request_cache = RequestCache(COURSE_API_REQUEST_CACHE_NAMESPACE)
+    cached_response = request_cache.get_cached_response(REUSABLE_BLOCKS_CACHE_KEY)
+    reusable_transformed_blocks = cached_response.value if cached_response.is_found else None
+
+    return reusable_transformed_blocks
--- a/lms/djangoapps/course_api/blocks/views.py
+++ b/lms/djangoapps/course_api/blocks/views.py
@@ -2,6 +2,7 @@
 CourseBlocks API views
 """

+from datetime import datetime, timezone

 from django.core.exceptions import ValidationError
 from django.db import transaction
@@ -237,6 +238,7 @@ class BlocksView(DeveloperErrorViewMixin, ListAPIView):
                    params.cleaned_data['return_type'],
                    params.cleaned_data.get('block_types_filter', None),
                    hide_access_denials=hide_access_denials,
+                    cache_with_future_dates=True
                )
            )
            # If the username is an empty string, and not None, then we are requesting
@@ -339,9 +341,50 @@ class BlocksInCourseView(BlocksView):
        if not root:
            raise ValidationError(f"Unable to find course block in '{course_key_string}'")

+        # Earlier we included blocks with future start dates in the collected/cached block structure.
+        # Now we need to emulate allow_start_dates_in_future=False by removing any such blocks.
+        include_start = "start" in request.query_params['requested_fields']
+        self.remove_future_blocks(course_blocks, include_start)
+
        recurse_mark_complete(root, course_blocks)
        return response

+    @staticmethod
+    def remove_future_blocks(course_blocks, include_start: bool):
+        """
+        Mutates course_blocks in place:
+        - removes blocks whose 'start' is in the future
+        - also removes references to them from parents' 'children' lists
+        - removes 'start' key from all blocks if it wasn't requested
+        """
+        if not course_blocks:
+            return course_blocks
+
+        now = datetime.now(timezone.utc)
+
+        # 1. Collect IDs of blocks to remove
+        to_remove = set()
+        for block_id, block in course_blocks.items():
+            get_field = block.get if include_start else block.pop
+            start = get_field("start")
+            if start and start > now:
+                to_remove.add(block_id)
+
+        if not to_remove:
+            return course_blocks
+
+        # 2. Remove the blocks themselves
+        for block_id in to_remove:
+            course_blocks.pop(block_id, None)
+
+        # 3. Clean up children lists
+        for block in course_blocks.values():
+            children = block.get("children")
+            if children:
+                block["children"] = [cid for cid in children if cid not in to_remove]
+
+        return course_blocks
+

@method_decorator(transaction.non_atomic_requests, name='dispatch')
@view_auth_classes(is_authenticated=False)
--- a/lms/djangoapps/courseware/courses.py
+++ b/lms/djangoapps/courseware/courses.py
@@ -26,6 +26,7 @@ from common.djangoapps.edxmako.shortcuts import render_to_string
 from common.djangoapps.static_replace import replace_static_urls
 from common.djangoapps.util.date_utils import strftime_localized
 from lms.djangoapps import branding
+from lms.djangoapps.course_api.blocks.utils import get_cached_transformed_blocks
 from lms.djangoapps.course_blocks.api import get_course_blocks
 from lms.djangoapps.courseware.access import has_access
 from lms.djangoapps.courseware.access_response import (
@@ -636,7 +637,10 @@ def get_course_assignments(course_key, user, include_access=False, include_witho

    store = modulestore()
    course_usage_key = store.make_course_usage_key(course_key)
-    block_data = get_course_blocks(user, course_usage_key, allow_start_dates_in_future=True, include_completion=True)
+
+    block_data = get_cached_transformed_blocks() or get_course_blocks(
+        user, course_usage_key, allow_start_dates_in_future=True, include_completion=True
+    )

    now = datetime.now(pytz.UTC)
    assignments = []
--- a/lms/djangoapps/grades/course_data.py
+++ b/lms/djangoapps/grades/course_data.py
@@ -2,12 +2,12 @@
 Code used to get and cache the requested course-data
 """

-
 from lms.djangoapps.course_blocks.api import get_course_blocks
 from openedx.core.djangoapps.content.block_structure.api import get_block_structure_manager
 from xmodule.modulestore.django import modulestore  # lint-amnesty, pylint: disable=wrong-import-order

 from .transformer import GradesTransformer
+from ..course_api.blocks.utils import get_cached_transformed_blocks


 class CourseData:
@@ -56,7 +56,11 @@ class CourseData:
    @property
    def structure(self):  # lint-amnesty, pylint: disable=missing-function-docstring
        if self._structure is None:
-            self._structure = get_course_blocks(
+            # The get_course_blocks function proved to be a major time sink during a request at "blocks/".
+            # This caching logic helps improve the response time by getting a copy of the already transformed, but still
+            # unfiltered, course blocks from RequestCache and thus reducing the number of times that
+            # the get_course_blocks function is called.
+            self._structure = get_cached_transformed_blocks() or get_course_blocks(
                self.user,
                self.location,
                collected_block_structure=self._collected_block_structure,
--- a/lms/djangoapps/mobile_api/tests/test_course_info_views.py
+++ b/lms/djangoapps/mobile_api/tests/test_course_info_views.py
@@ -432,6 +432,78 @@ class TestBlocksInfoInCourseView(TestBlocksInCourseView, MilestonesTestCaseMixin
        for block_info in response.data['blocks'].values():
            self.assertNotEqual('assignment_progress', block_info)

+    def test_response_keys(self):
+        response = self.verify_response(url=self.url)
+        data = response.data
+
+        expected_top_level_keys = {
+            'blocks',
+            'certificate',
+            'course_about',
+            'course_access_details',
+            'course_handouts',
+            'course_modes',
+            'course_progress',
+            'course_sharing_utm_parameters',
+            'course_updates',
+            'deprecate_youtube',
+            'discussion_url',
+            'end',
+            'enrollment_details',
+            'id',
+            'is_self_paced',
+            'media',
+            'name',
+            'number',
+            'org',
+            'org_logo',
+            'root',
+            'start',
+            'start_display',
+            'start_type'
+        }
+        expected_course_access_keys = {
+            "has_unmet_prerequisites",
+            "is_too_early",
+            "is_staff",
+            "audit_access_expires",
+            "courseware_access"
+        }
+        expected_courseware_access_keys = {
+            "has_access",
+            "error_code",
+            "developer_message",
+            "user_message",
+            "additional_context_user_message",
+            "user_fragment"
+        }
+        expected_enrollment_details_keys = {"created", "mode", "is_active", "upgrade_deadline"}
+        expected_media_keys = {"image"}
+        expected_image_keys = {"raw", "small", "large"}
+        expected_course_sharing_keys = {"facebook", "twitter"}
+        expected_course_modes_keys = {"slug", "sku", "android_sku", "ios_sku", "min_price"}
+        expected_course_progress_keys = {"total_assignments_count", "assignments_completed"}
+
+        self.assertSetEqual(set(data), expected_top_level_keys)
+        self.assertSetEqual(set(data["course_access_details"]), expected_course_access_keys)
+        self.assertSetEqual(set(data["course_access_details"]["courseware_access"]), expected_courseware_access_keys)
+        self.assertSetEqual(set(data["enrollment_details"]), expected_enrollment_details_keys)
+        self.assertSetEqual(set(data["media"]), expected_media_keys)
+        self.assertSetEqual(set(data["media"]["image"]), expected_image_keys)
+        self.assertSetEqual(set(data["course_sharing_utm_parameters"]), expected_course_sharing_keys)
+        self.assertSetEqual(set(data["course_modes"][0]), expected_course_modes_keys)
+        self.assertSetEqual(set(data["course_progress"]), expected_course_progress_keys)
+
+    def test_block_count_depends_on_depth_in_request_params(self):
+        response_depth_zero = self.verify_response(url=self.url, params={'depth': 0})
+        response_depth_one = self.verify_response(url=self.url, params={'depth': 1})
+        blocks_depth_zero = [block for block in self.store.get_items(self.course_key) if block.category == "course"]
+        blocks_depth_one = [
+            block for block in self.store.get_items(self.course_key) if block.category in ("course", "chapter")
+        ]
+        self.assertEqual(len(response_depth_zero.data["blocks"]), len(blocks_depth_zero))
+        self.assertEqual(len(response_depth_one.data["blocks"]), len(blocks_depth_one))
+

 class TestCourseEnrollmentDetailsView(MobileAPITestCase, MilestonesTestCaseMixin):  # lint-amnesty, pylint: disable=test-inherits-tests
    """