Instead of optionally not logging usernames and emails, do so by
default. This mostly removes some complexity from the app and is makes
it so that it's more secure by default.
I considered the question of allowing people to log usernames and
e-mails if they wanted to but opted not to for a couple of reasons:
* It would involve adding a new feature flag that would be the opposite
of the SQUELCH_PII_IN_LOGS which would be a bit confusing. When do you
use which one? or do you need both? etc.
* There is still a way to correlate the messages to eachother and in
most cases also to a specific user(email being the exception).
The size of commons.js has gradually grown until it is now 4 MB in
dev mode. This change brings it back down to 880 KB. This does
cause the size of some other JS assets to increase, some by as much
as 500 KB. This still seemed like a worthwhile tradeoff.
Now that we always return an existing value from the DB rather than trusting that ID generation is deterministic and constant over time, we're free to change the generation algorithm.
Our long term goal is to switch to random IDs, but we need to first investigate the uses of save=False. In the meantime, this is a good opportunity to move away from MD5, which has a number of cryptographic weaknesses. None of the known vulnerabilities are considered exploitable in this location, given the limited ability to control the input to the hash, but we should generally be moving away from it everywhere for consistency.
This change should not be breaking even for save=False callers, since those calls are extremely rare (1 in 100,000) and should only occur after a save=True call, at which point they'll use the stored value. Even if this were not true, for a save=False/True pair of calls to result in a mismatch in output, the first of the calls would have to occur around the time of the deploy of this code.
Co-authored-by: Tim McCormack <tmccormack@edx.org>
Co-authored-by: Tim McCormack <tmccormack@edx.org>
This deprecates `save=False` for several functions and removes all known
usages of the parameter but does not actually remove the parameter.
Instead, it will emit a deprecation warning if the parameter is used.
We can remove the parameter as soon as we feel sure nothing is using it.
Now that we have refactored `anonymous_id_for_user` to always prefer
retrieving an existing ID from the database -- and observed that only a
small fraction of calls pass save=False -- we can stop respecting
save=False. This opens the door for future improvements, such as generating
random IDs or switching to the external user ID system.
Metrics: I observe that 1 in 16 requests for new, non-request-cached
anon user IDs are made with save=False. But 71% of all calls are served
from the request cache, and 99.7% of the misses are served from the DB.
save=False only appear to come from intermittent spikes as reports are
generated and are low in absolute number.
Also document usage/risk/rotation of secret in anonymous user ID
generation as indicated by `docs/decisions/0008-secret-key-usage.rst`
ADR on `SECRET_KEY` usage.
ref: ARCHBOM-1683
Currently it's hard to see the content of an error without knowing how
to cause an existing view to make that error in production. Adding
these default paths should make that a lot easier.
See context here: https://django-ratelimit.readthedocs.io/en/latest/cookbook/429.html#context
For now we continue to fall back to django's default 403 handler for 403
but provide a new 429 template that we use for ratelimit exceptions.
This commit also updates a logistration test that relied on the old 403
behavior of django-ratelimit instead of the newly added 429 behavior.