Commit Graph

4 Commits

Author SHA1 Message Date
Feanil Patel
9cf2f9f298 Run 2to3 -f future . -w
This will remove imports from __future__ that are no longer needed.

https://docs.python.org/3.5/library/2to3.html#2to3fixer-future
2019-12-30 10:35:30 -05:00
Amit
5d7e58c3cb INCR-152: Run python-modernize on openedx/core/djangoapps/crawlers (#20487) 2019-05-09 10:52:31 -04:00
Andy Armstrong
93235d118d Reorder imports using isort (except lms and cms) 2017-05-30 16:04:54 -04:00
David Ormsbee
5ef1e08050 Disable student state writes for crawlers.
When crawlers like edX-downloader make requests on courseware, they are
often concurrently loading many units in the same sequence. This causes
contention for the rows in courseware_studentmodule that store the
student's state for various XBlocks/XModules, most notably for the
sequence, chapter, and course -- all of which record and update user
position information when loaded.

It would be nice if we could actually remove these writes altogether
and come up with a cleaner way of keeping track of the user's position.
In general, GETs should be side-effect free. However, any such change
would break backwards compatibility, and would require close
coordination with research teams to make sure they weren't negatively
affected.

This commit identifies crawlers by user agent (CrawlersConfig model),
and blocks student state writes if a crawler is detected. FieldDataCache
writes simply become no-ops. It doesn't actually alter the rendering
of the courseware in any way -- the main impact is that the blocks
won't record your most recent position, which is meaningless for
crawlers anyway.

This can also be used as a building block for other policy we want to
define around crawlers. We just have to be mindful that this only works
with "nice" crawlers who are honest in their user agents, and that
significantly more sophisticated (and costly) measures would be
necessary to prevent crawlers that try to be even trivially sneaky.

[PERF-403]
2017-01-26 09:36:53 -05:00