Files
edx-platform/lms/djangoapps/grades/course_data.py
Serhiii Nanai 7cd4170ca7 feat: [FC-0092] Optimize Course Info Blocks API (#37122)
The Course Info Blocks API endpoint has been known to be rather slow
to return the response. Previous investigation showed that the major
time sink was the get_course_blocks function, which is called three
times in a single request. This commit aims to improve the response
times by reducing the number of times that this function is called.

Solution Summary

The first time the function get_course_blocks is called, the result
(transformed course blocks) is stored in the current WSGI request
object. Later in the same request, before the second get_course_blocks
call is triggered, the already transformed course blocks are taken
from the request object, and if they are available, get_course_blocks
is not called (if not, it is called as a fallback). Later in the
request, the function is called again as before (see Optimization
Strategy and Difficulties).

Optimization Strategy and Difficulties

The original idea was to fetch and transform the course blocks once
and reuse them in all three cases, which would reduce get_course_blocks
call count to 1. However, this did not turn out to be a viable solution
because of the arguments passed to get_course_blocks. Notably, the
allow_start_dates_in_future boolean flag affects the behavior of
StartDateTransformer, which is a filtering transformer modifying the
block structure returned.

The first two times allow_start_dates_in_future is False, the third
time it is True. Setting it to True in all three cases would mean that
some blocks would be incorrectly included in the response.

This left us with one option - optimize the first two calls. The
difference between the first two calls is the non-filtering
transformers, however the second call applies a subset of transformers
from the first call, so it was safe to apply the superset of
transformers in both cases. This allowed to reduce the number of
function calls to 2. However, the cached structure may be further
mutated by filters downstream, which means we need to cache a copy of
the course structure (not the structure itself). The copy method itself
is quite heavy (it calls deepcopy three times), making the benefits of
this solution much less tangible. In fact, another potential
optimization that was considered was to reuse the collected block
structure (pre-transformation), but since calling copy on a collected
structure proved to be more time-consuming than calling get_collected,
this change was discarded, considering that the goal is to improve
performance.

Revised Solution

To achieve a more tangible performance improvement, it was decided to
modify the previous strategy as follows:

* Pass a for_blocks_view parameter to the get_blocks function to make
  sure the new caching logic only affects the blocks view.
* Collect and cache course blocks with future dates included.
* Include start key in requested fields.
* Reuse the cached blocks in the third call, which is in
  get_course_assignments
* Before returning the response, filter out any blocks with a future
  start date, and also remove the start key if it was not in requested
  fields
2025-10-30 17:23:49 -04:00

137 lines
5.7 KiB
Python

"""
Code used to get and cache the requested course-data
"""
from lms.djangoapps.course_blocks.api import get_course_blocks
from openedx.core.djangoapps.content.block_structure.api import get_block_structure_manager
from xmodule.modulestore.django import modulestore # lint-amnesty, pylint: disable=wrong-import-order
from .transformer import GradesTransformer
from ..course_api.blocks.utils import get_cached_transformed_blocks
class CourseData:
"""
Utility access layer to intelligently get and cache the
requested course data as long as at least one property is
provided upon initialization.
This is an in-memory object that maintains its own internal
cache during its lifecycle.
"""
def __init__(self, user, course=None, collected_block_structure=None, structure=None, course_key=None):
if not any((course, collected_block_structure, structure, course_key)):
raise ValueError(
"You must specify one of course, collected_block_structure, structure, or course_key to this method."
)
self.user = user
self._collected_block_structure = collected_block_structure
self._structure = structure
self._course = course
self._course_key = course_key
self._location = None
@property
def course_key(self): # lint-amnesty, pylint: disable=missing-function-docstring
if not self._course_key:
if self._course:
self._course_key = self._course.id
else:
structure = self.effective_structure
self._course_key = structure.root_block_usage_key.course_key
return self._course_key
@property
def location(self): # lint-amnesty, pylint: disable=missing-function-docstring
if not self._location:
structure = self.effective_structure
if structure:
self._location = structure.root_block_usage_key
elif self._course:
self._location = self._course.location
else:
self._location = modulestore().make_course_usage_key(self.course_key)
return self._location
@property
def structure(self): # lint-amnesty, pylint: disable=missing-function-docstring
if self._structure is None:
# The get_course_blocks function proved to be a major time sink during a request at "blocks/".
# This caching logic helps improve the response time by getting a copy of the already transformed, but still
# unfiltered, course blocks from RequestCache and thus reducing the number of times that
# the get_course_blocks function is called.
self._structure = get_cached_transformed_blocks() or get_course_blocks(
self.user,
self.location,
collected_block_structure=self._collected_block_structure,
)
return self._structure
@property
def collected_structure(self):
if self._collected_block_structure is None:
self._collected_block_structure = get_block_structure_manager(self.course_key).get_collected()
return self._collected_block_structure
@property
def course(self):
if not self._course:
self._course = modulestore().get_course(self.course_key)
return self._course
@property
def grading_policy_hash(self): # lint-amnesty, pylint: disable=missing-function-docstring
structure = self.effective_structure
if structure:
return structure.get_transformer_block_field(
structure.root_block_usage_key,
GradesTransformer,
'grading_policy_hash',
)
else:
return GradesTransformer.grading_policy_hash(self.course)
@property
def version(self):
structure = self.effective_structure
course_block = structure[self.location] if structure else self.course
return getattr(course_block, 'course_version', None)
@property
def edited_on(self): # lint-amnesty, pylint: disable=missing-function-docstring
# get course block from structure only; subtree_edited_on field on modulestore's course block isn't optimized.
structure = self.effective_structure
if structure:
course_block = structure[self.location]
return getattr(course_block, 'subtree_edited_on', None)
def __str__(self):
"""
Return human-readable string representation.
"""
return f'Course: course_key: {self.course_key}'
def full_string(self): # lint-amnesty, pylint: disable=missing-function-docstring
if self.effective_structure:
return 'Course: course_key: {}, version: {}, edited_on: {}, grading_policy: {}'.format(
self.course_key, self.version, self.edited_on, self.grading_policy_hash,
)
else:
return f'Course: course_key: {self.course_key}, empty course structure'
@property
def effective_structure(self):
"""
Get whichever course block structure is already loaded, if any.
This may give either the user-specific course structure or the generic
structure, depending on which is cached at the moment. Because of that,
this should only be used for queries related to the root block of the
course, which will always exist in either structure.
For anything else, such as queries involving course sections or blocks,
use either .structure or .collected_structure to explicitly state
whether you want the user-specific version of the course or not.
"""
return self._structure or self._collected_block_structure