Removes expected part of EXPECTED_ERRORS with a variety of changes.
- In many placed in the code, "expected" was used to mean
"ignored and expected", and all such instances are renamed to "ignored".
- The setting ``EXPECTED_ERRORS`` is renamed to ``IGNORED_ERRORS``,
which better matches how it was being used in the first place.
- The setting ``EXPECTED_ERRORS[REASON_EXPECTED]`` is renamed to ``IGNORED_ERRORS[REASON_IGNORED]``.
- The setting toggle ``EXPECTED_ERRORS[IS_IGNORED]`` is removed,
because it will now always be True.
- The how-to will is renamed to how_tos/logging-and-monitoring-ignored-errors.rst.
See 0002-logging-and-monitoring-expected-errors-removed.rst for more details.
Implements DEPR: https://github.com/openedx/edx-platform/issues/32405
**BREAKING CHANGE:** The rename of the setting ``EXPECTED_ERRORS`` to
``IGNORED_ERRORS``, and ``REASON_EXPECTED`` to ``REASON_IGNORED``,
was implemented without backward compatibility. Simply copy the old settings
with the new name as an expand phase before deleting the old names in the
contract phase.
Processing cookies at response time included cookies
that were temporary, like the JWT cookie that is
created by the server by combining the JWT header-payload
and JWT signature cookies. Since we are trying to monitor
the cookie header, we do not want to process this cookie.
However, since we want to include the user id in the logging
message, we delay the logging until response time.
Also, fixed docstring which mislabeled a custom attribute.
ARCHBOM-2055
In case of unusual cookie headers containing "Cookie ",
add custom attributes for monitoring:
- cookies.header.corrupt_count
- cookies.header.corrupt_key_count
See annotation documentation for more details.
Separately, updated to skip cookie log sampling for
0 size cookie header.
ARCHBOM-2055
Contains a number of cookie monitoring changes.
Enhancements:
- Add sampling capability for cookie logging on headers
smaller than the threshold. For details, see
COOKIE_SAMPLING_REQUEST_COUNT.
- Add cookie header size to log message.
- Sort logged cookies starting with largest cookie.
- Move logging from Middleware request processing
to response processing to ensure the user id is
available for logging for authenticated calls.
- Added cookies.header.size.computed to check
if there are any large hidden duplicate cookies.
Can be compared against the cookies.header.size
custom attribute.
- Add delimiters into logs to make it simpler to parse
when the logging tools accidentally exports multiple
log lines together.
Removed:
- Legacy cookie capture code. This code was dangerous to
to enable and provided more limited insight than the
newer logging, so this was removed to simplify the code.
Other refactors:
- Switched Middleware to use new Django format, rather
than the Mixin.
- Moved tests to its own test class. Note: this
middleware is likely to move to a separate
library.
ARCHBOM-2055
This should really be all we need for most cases, and we don't want to
emit sensitive data more than necessary, even encrypted. If we need to
inspect one cookie in particular, we can add special logging for that.
Also, change to greater-than-or-equal for threshold to match setting docs.
ref: ARCHBOM-2042
The learning MFE paths include /course/{course_id} which doesn't match /courses/{course_id} which is what the regex expects. This causes issues with the Wiki when when accessed from the learning MFE doesn't detect that course it's related to in the middleware.
* fix: adding more parameters for cookie monitoring
Added: cookies_total_num, cookies_unaccounted_size.
-Both are to help us gauge how many cookies we are not collecting data for.
Increased: # of cookies data collected
The `error_expected` custom attribute used to contain
both the class name and the error message. This had
the following issues:
* Combining data in the same custom attribute limits
the ability to query.
* The additional error class and message data is only
needed for ignored errors, since this data isn't
available elsewhere.
The following changes were made:
* `error_expected` will always have the value True
if present.
* `error_ignored` no longer exists.
* `error_ignored_class` will contain the error module
and class for ignored errors.
* `error_ignored_message` will contain the error message
for ignored errors.
ARCHBOM-1708
Adds logging and monitoring capabilities for expected
errors. See the ADR and how-to documentation for
details of how to configure and use the EXPECTED_ERRORS
setting and new monitoring and logging.
ARCHBOM-1708
Co-authored-by: Tim McCormack <tmccormack@edx.org>
This reverts commit 0517603b6d.
This was masking a LabXchange error by blowing up with:
"Stack trace builtins:AttributeError: 'NoneType' object has no attribute 'status_code'"
The mobile app is getting unexpected 403s from
/oauth2/exchange_access_token/, but we have been unable
to pinpoint from where they are coming. This commit
introduces a temporary exception handler to provide stack info
for 403s on this endpoint to try to track down the source.
Requires the ENABLE_403_MONITORING setting to be set to
True to enable the logging.
ARCHBOM-1667
Instead of adding new attributes for each cookie name we create
consistent attribute names. This should prevent any issues where we
have too many different unique attribute names because the cookie names
are unique to the user.
We added two new settings to make the number of cookies and groups
capture configurable:
- TOP_N_COOKIES_CAPTURED
- TOP_N_COOKIE_GROUPS_CAPTURED
Setting a new metric per cookie name resulted in a lot of metrics
getting added to New Relic. In some cases, this was causing other
more important metrics to not get registered.
We want to be able to easily figure out what our biggest cookies are and we
want to also group cookies by prefix because certain services create multiple
cookies and then put unique identifiers in the cookie name.
For example braze cookie names use the following pattern:
ab.storage.<userId>
ab.storage.<deviceId>
ab.storage.<sessionId>
In this case we want to group all the `ab` cookies together so we can see
their total size.
New attributes:
cookies.<group_prefix>.group.size: The size of a group of cookies. For example
the sum of the size of all braze cookies would be the value of the
`cookies.ab.group.size` attribute.
cookies.max.name: The name of the largest cookie sent by the user.
cookies.max.size: The size of the largest cookie sent by the user.
cookies.max.group.name: The name of the largest group of cookies. A single cookie
counts as a group of one for this calculation.
cookies.max.group.size: The sum total size of all the cookies in the largest group.
By explicitly importing the legacy namespace classes, we make it clear
that we are using soon-to-be-deprecated classes. We will then be able to
start removing the legacy classes, one module at a time.
This fixes a misuse of New Relic terminology. Here we are in fact using
custom attributes; custom metrics are a different thing that we may start
using in the future.
Instead of going up the stacktrace to find the module names of waffle
flags and switches, we manually pass the module __name__ whenever the
flag is created. This is similar to `logging.getLogger(__name__)`
standard behaviour.
As the waffle classes are used outside of edx-platform, we make the new
module_name argument an optional keyword argument. This will change once
we pull waffle_utils outside of edx-platform.
Note that the module name is normally only required to view the list of
existing waffle flags and switches. The module name should not be
necessary to verify if a flag is enabled. Thus, maybe it would make
sense to create a `add` class methor similar to:
class WaffleFlag:
@classmethod
def add(cls, namespace, flag, module):
instance = cls(namespace, flag)
cls._class_instances.add((instance, module))
This adds middleware that will create custom parameter metrics in
New Relic to track the size of all the cookies being received for
our domain. The custom fields are "cookies_total_size" and a
separate named parameter for every cookie size, e.g.
"cookies.csrftoken.size".
This is intended to help us track cookie growth and better diagnose
issues where users lose their sessions. It is toggled by the
'request_utils.capture_cookie_sizes' Waffle Flag.