Previously, our rate-limiting code trusted the entire `X-Forwarded-For` header, allowing a malicious client to spoof that header and evade rate-limiting. This commit introduces a new module and setting allowing us to make a more conservative choice of IPs. - Create new `openedx.core.djangoapps.util.ip` module for producing the IP "external chain" for requests based on the XFF header and the REMOTE_ADDR. - Include a function that gives the safest choice of IPs. - Add new setting `CLOSEST_CLIENT_IP_FROM_HEADERS` for configuring how the external chain is derived (i.e. setting the trust boundary). Currently has a default, but we may want to make it mandatory in the future. - Change `django-ratelimit` code to use the proximate IP in the external chain -- the one just outside the trust boundary. Also: - Change `XForwardedForMiddleware` to use more conservative choice for its `REMOTE_ADDR` override - Other adjustments to `XForwardedForMiddleware` as needed in order to initialize new module and support code that needs the real `REMOTE_ADDR` value - Metrics for observability into the change (and XFF composition) - Feature switch to restore legacy mode if needed This also gives us a path forward to removing use of the django-ipware package, which is no longer maintained and has a handful of bugs that make it difficult to use safely. Internal ticket: ARCHBOM-2056
480 lines
20 KiB
Python
480 lines
20 KiB
Python
"""
|
|
Utilities for determining the IP address of a request.
|
|
|
|
|
|
Summary
|
|
=======
|
|
|
|
For developers:
|
|
|
|
- Call ``get_safest_client_ip`` whenever you want to know the caller's IP address
|
|
- Make sure ``init_client_ips`` is called as early as possible in the middleware stack
|
|
- See the "Guidance for developers" section for more advanced usage
|
|
|
|
For site operators:
|
|
|
|
- See the "Configuration" section for important information and guidance
|
|
|
|
For everyone:
|
|
|
|
- Background information is available in the "Concepts" section
|
|
|
|
|
|
Concepts
|
|
========
|
|
|
|
- The *IP chain* is the list of IPs in the ``X-Forwarded-For`` (XFF) header followed
|
|
by the ``REMOTE_ADDR`` value. If all involved parties are telling the truth, this
|
|
is the list of IP addresses that have relayed the HTTP request. However, due to
|
|
the possibility of spoofing, this raw data cannot be used directly for all
|
|
purposes:
|
|
|
|
- The rightmost IP in the chain is the IP that has directly connected with the
|
|
server and sent or relayed the request. In most deployments, this is likely
|
|
to be a reverse proxy such as nginx. In any case it is the "closest" IP (in
|
|
the sense of the request chain, not in terms of geographic proximity.)
|
|
- The next closest IP, if present, is the one that the closest IP *claims*
|
|
sent the request to it. Each IP in the chain can only vouch for the
|
|
correctness of the IP immediately to its left in the list.
|
|
- In a normal, unspoofed request, the leftmost IP is the "real" client IP, the
|
|
IP of the computer that made the original request.
|
|
- However, clients can send a fake XFF header, so the leftmost IP in the chain
|
|
cannot be trusted in the general case. In fact, the only IP that can be
|
|
trusted absolutely is the rightmost one.
|
|
- The challenge is to determine what the leftmost *trusted* IP is, as this is
|
|
the most accurate we can get without compromising on security.
|
|
|
|
- The *external chain* is some prefix of the IP chain that stops before the
|
|
(recognized) edge of the deployment's infrastructure. That is, the external
|
|
chain is the portion of the IP chain that is to the left of some trust
|
|
boundary, as determined by configuration or some fallback method. This is the
|
|
list of IPs that can all plausibly be considered the "real" IP of the client.
|
|
If the server is configured correctly this may contain, in order: Any IPs
|
|
spoofed by the client, the client's own IP, IPs of any forwarding HTTP proxies
|
|
specified by the client, and then IPs of any reverse HTTP proxies the
|
|
request passed through *before* reaching the deployment's own infrastructure
|
|
(CDN, load balancer, etc.)
|
|
|
|
- Caveat: In the case where the request is being sent through an anonymizing
|
|
proxy such as a VPN, the VPN's exit node IP is considered the "real" client
|
|
IP.
|
|
- Despite the name, this chain may contain private-range IP addresses, in
|
|
particular if a request originates from another server in the same
|
|
datacenter.
|
|
|
|
|
|
Guidance for developers
|
|
=======================
|
|
|
|
Almost anywhere you care about IP address, just call ``get_safest_client_ip``.
|
|
This will get you the *rightmost* IP of the external chain (defined above).
|
|
Because it cannot be easily spoofed by the caller, it is suitable for adversarial
|
|
use-cases such as:
|
|
|
|
- Rate-limiting
|
|
- Only allowing certain IPs to access a resource (or alternatively, blocking them)
|
|
|
|
In some less common situations where you need the entire external chain, you
|
|
can call ``get_all_client_ips`. This returns a list of IP addresses, although for
|
|
the great majority of normal requests this will be a list of length 1. This list is
|
|
appropriate for when you're recording IPs for manual review or need to make a
|
|
decision based on all of the IPs (no matter which one is the "real" one. This might
|
|
include:
|
|
|
|
- Audit logs
|
|
- Telling a user about other active sessions on their account
|
|
- Georestriction
|
|
|
|
In some very rare cases you might want just a single IP that isn't rightmost. In
|
|
some cases you might ask for the entire external chain and then take the leftmost
|
|
IP. This should only be used in non-adversarial situations, and is usually the wrong
|
|
choice, but may be appropriate for:
|
|
|
|
- Localization (if other HTTP headers aren't sufficient)
|
|
- Analytics
|
|
|
|
|
|
Configuration
|
|
=============
|
|
|
|
Configuration is via ``CLOSEST_CLIENT_IP_FROM_HEADERS``, which allows specifying
|
|
an HTTP header that will be trusted to report the rightmost IP in the external chain.
|
|
See setting annotation for details, but guidance on common configurations is provided
|
|
here:
|
|
|
|
- If you use a CDN as your outermost proxy:
|
|
|
|
- Find what header your CDN sends to its origin that indicates the remote address it
|
|
sees on inbound connections. For example, with Cloudflare this is ``CF-Connecting-IP``.
|
|
- Ensure that your CDN always overrides this header if it exists in the inbound request,
|
|
and never accepts a value provided by the client. Some CDNs are better than others
|
|
about this.
|
|
- Recommended setting, using Cloudflare as the example::
|
|
|
|
CLOSEST_CLIENT_IP_FROM_HEADERS:
|
|
- name: CF-Connecting-IP
|
|
index: 0
|
|
|
|
It would be equivalent to use ``-1`` as the index since there is always one and only
|
|
one IP in this header, and Python list indexing rules are used here.
|
|
- As a general rule, you should also ensure that traffic cannot bypass the CDN and reach
|
|
your origin directly, since otherwise attackers will be able to spoof their IP address
|
|
(and bypass protections your CDN provides). You may need to arrange for your CDN to set
|
|
a header containing a shared secret.
|
|
|
|
- If your outermost proxy is an AWS ELB or other proxy on the same local network as your
|
|
server, or you have any other configuration in which your proxies and application speak
|
|
to each other using private-range IP addresses:
|
|
|
|
- You can rely on the rightmost public IP in the IP chain to be the safest client IP.
|
|
To do this, set your configuration for zero trusted headers::
|
|
|
|
CLOSEST_CLIENT_IP_FROM_HEADERS: []
|
|
|
|
- This assumes that 1) your outermost proxy always appends to ``X-Forwarded-For``, and
|
|
2) any further proxies between that one and your application either append to it
|
|
(ideal) or pass it along unchanged (not ideal, but workable). This is true by default
|
|
for most proxy software.
|
|
|
|
- If you have any reverse proxy that will be seen by the next proxy or your application as
|
|
having a public IP:
|
|
|
|
- You'll need to rely on having a consistent *number* of proxies in front of your
|
|
application, and you'll need to know which ones append to the ``X-Forwarded-For``
|
|
header instead of just passing it unchanged.
|
|
- Once you know the number of your proxies in the chain that append, you can use this
|
|
count to say that the Nth-from-last IP in the ``X-Forwarded-For`` is the closest client
|
|
IP. For example, if you had two, you would use ``-2`` (note the negative sign) to
|
|
indicate the second-from-last IP::
|
|
|
|
CLOSEST_CLIENT_IP_FROM_HEADERS:
|
|
- name: X-Forwarded-For
|
|
index: -2
|
|
|
|
- This is fragile in the face of network configuration changes, so having your outermost
|
|
proxy set a special header is preferred.
|
|
- Configuring the proxy count too low will result in rate-limiting your own proxies;
|
|
configuring it too high will allow attackers to bypass rate-limiting.
|
|
- Side note: Even if you don't use it for ``CLOSEST_CLIENT_IP_FROM_HEADERS``, this
|
|
proxy-counting approach will be required for configuring django-rest-framework's
|
|
``NUM_PROXIES`` setting.
|
|
|
|
- If your application is directly exposed to the public internet, without even a local proxy:
|
|
|
|
- This is an unusual configuration, but simple to configure; with no proxies, just indicate
|
|
that there are no trusted headers and therefore the closest public IP should be used::
|
|
|
|
CLOSEST_CLIENT_IP_FROM_HEADERS: []
|
|
"""
|
|
|
|
import ipaddress
|
|
import warnings
|
|
|
|
from django.conf import settings
|
|
from edx_toggles.toggles import WaffleSwitch
|
|
|
|
# .. toggle_name: ip.legacy
|
|
# .. toggle_implementation: WaffleSwitch
|
|
# .. toggle_default: False
|
|
# .. toggle_description: Emergency switch to revert to use the older, less secure method for
|
|
# IP determination. When enabled, instructs switch's callers to revert to using the *leftmost*
|
|
# IP from the X-Forwarded-For header. When disabled (the default), callers should use the new
|
|
# code path for IP determination, which has callers retrieve the entire external chain or pick
|
|
# the leftmost or rightmost IP from it. The construction of the external chain is configurable
|
|
# via ``CLOSEST_CLIENT_IP_FROM_HEADERS``.
|
|
# This toggle, as well as any other legacy IP references, should be deleted (in the off
|
|
# position) when the new IP code is well-tested and all IP-reliant code has been switched over.
|
|
# .. toggle_warning: This switch does not control the behavior of this module. Callers must
|
|
# opt into querying this switch, and can call ``get_legacy_ip`` if the switch is enabled.
|
|
# .. toggle_use_cases: temporary
|
|
# .. toggle_creation_date: 2022-03-24
|
|
# .. toggle_target_removal_date: 2022-07-01
|
|
# .. toggle_tickets: https://openedx.atlassian.net/browse/ARCHBOM-2056 (internal only)
|
|
USE_LEGACY_IP = WaffleSwitch('ip.legacy', module_name=__name__)
|
|
|
|
|
|
def get_legacy_ip(request):
|
|
"""
|
|
Return a client IP selected using an old, insecure method.
|
|
|
|
Always picks the leftmost IP in the X-Forwarded-For header, if present,
|
|
otherwise returns the original REMOTE_ADDR.
|
|
"""
|
|
if xff := request.META.get('HTTP_X_FORWARDED_FOR'):
|
|
return xff.split(',')[0].strip()
|
|
else:
|
|
# Might run before or after XForwardedForMiddleware.
|
|
return request.META.get('ORIGINAL_REMOTE_ADDR', request.META['REMOTE_ADDR'])
|
|
|
|
|
|
def _get_meta_ip_strs(request, header_name):
|
|
"""
|
|
Get a list of IPs from a header in the given request.
|
|
|
|
Return the list of IPs the request is carrying on this header, which is
|
|
expected to be comma-delimited if it contains more than one. Response
|
|
may be an empty list for missing or empty header. List items may not be
|
|
valid IPs.
|
|
"""
|
|
if not header_name:
|
|
return []
|
|
|
|
field_name = 'HTTP_' + header_name.replace('-', '_').upper()
|
|
header_value = request.META.get(field_name, '').strip()
|
|
|
|
if header_value:
|
|
return [s.strip() for s in header_value.split(',')]
|
|
else:
|
|
return []
|
|
|
|
|
|
def get_raw_ip_chain(request):
|
|
"""
|
|
Retrieve the full IP chain from this request, as list of raw strings.
|
|
|
|
This is uninterpreted and unparsed, except for splitting on commas and
|
|
removing extraneous whitespace.
|
|
"""
|
|
return _get_meta_ip_strs(request, 'X-Forwarded-For') + [request.META['REMOTE_ADDR']]
|
|
|
|
|
|
def _get_usable_ip_chain(request):
|
|
"""
|
|
Retrieve the full IP chain from this request, as parsed addresses.
|
|
|
|
The IP chain is the X-Forwarded-For header, followed by the REMOTE_ADDR.
|
|
This list is then narrowed to the largest suffix that can be parsed as
|
|
IP addresses.
|
|
"""
|
|
parsed = []
|
|
for ip_str in reversed(get_raw_ip_chain(request)):
|
|
try:
|
|
parsed.append(ipaddress.ip_address(ip_str))
|
|
except ValueError:
|
|
break
|
|
return list(reversed(parsed))
|
|
|
|
|
|
def _remove_tail(elements, f_discard):
|
|
"""
|
|
Remove items from the tail of the given list until f_discard returns false.
|
|
|
|
- elements is a list
|
|
- f_discard is a function that accepts an item from the list and returns
|
|
true if it should be discarded from the tail
|
|
|
|
Returns a new list that is a possibly-empty prefix of the input list.
|
|
|
|
(This is basically itertools.dropwhile on a reversed list.)
|
|
"""
|
|
prefix = elements[:]
|
|
while prefix and f_discard(prefix[-1]):
|
|
prefix.pop()
|
|
return prefix
|
|
|
|
|
|
def _get_client_ips_via_xff(request):
|
|
"""
|
|
Get the external chain of the request by discarding private IPs.
|
|
|
|
This is a strategy used by ``get_all_client_ips`` and should not be used
|
|
directly.
|
|
|
|
Returns a list of *parsed* IP addresses, one of:
|
|
|
|
- A list ending in a publicly routable IP
|
|
- A list with a single, private-range IP
|
|
- An empty list, if REMOTE_ADDR was unparseable as an IP address. This
|
|
would be very unusual but could possibly happen if a local reverse proxy
|
|
used a domain socket rather than a TCP connection.
|
|
"""
|
|
ip_chain = _get_usable_ip_chain(request)
|
|
external_chain = _remove_tail(ip_chain, lambda ip: not ip.is_global)
|
|
|
|
# If the external_chain is in fact all private, everything will have been
|
|
# removed. In that case, just return the leftmost IP it would have
|
|
# considered, even though it must be a private IP.
|
|
return external_chain or ip_chain[:1]
|
|
|
|
|
|
# .. setting_name: CLOSEST_CLIENT_IP_FROM_HEADERS
|
|
# .. setting_default: []
|
|
# .. setting_description: A list of header/index pairs to use for determining the IP in the
|
|
# IP chain that is just outside of this deployment's infrastructure boundary -- that is,
|
|
# the rightmost address in the IP chain that is *not* owned by the deployment. (See module
|
|
# docstring for background and definitions, as well as guidance on configuration.)
|
|
# Each list entry is a dict containing a header name and an index into that header. This will
|
|
# control how the client's IP addresses are determined for attribution, tracking, rate-limiting,
|
|
# or other general-purpose needs.
|
|
# The named header must contain a list of IP addresses separated by commas, with whitespace
|
|
# tolerated around each address. The index is used for a Python list lookup, e.g. 0 is the first
|
|
# element and -2 is the second from the end.
|
|
# Header/index pairs will be tried in turn until the first one that yields a usable IP, which
|
|
# will then be used to determine the end of the external chain.
|
|
# If the setting is an empty list, or if none of the entries yields a usable IP (header is
|
|
# missing, index out of range, IP not in IP chain), then a fallback strategy will be used
|
|
# instead: Private-range IPs will be discarded from the right of the IP chain until a public
|
|
# IP is found, or the chain shrinks to one IP. This entry will then be considered the rightmost
|
|
# end of the external chain.
|
|
# Migrations from one network configuration to another may be accomplished by first adding the
|
|
# new header to the list, making the networking change, and then removing the old one.
|
|
# .. setting_warnings: Changes to the networking configuration that are not coordinated with
|
|
# this setting may allow callers to spoof their IP address.
|
|
|
|
|
|
def _get_trusted_header_ip(request, header_name, index):
|
|
"""
|
|
Read a parsed IP address from a header at the specified position.
|
|
|
|
Helper function for ``_get_client_ips_via_trusted_header``.
|
|
|
|
Returns None if header is missing, index is out of range, or the located
|
|
entry can't be parsed as an IP address.
|
|
"""
|
|
ip_strs = _get_meta_ip_strs(request, header_name)
|
|
|
|
if not ip_strs:
|
|
warnings.warn(f"Configured IP address header was missing: {header_name!r}", UserWarning)
|
|
return None
|
|
|
|
try:
|
|
trusted_ip_str = ip_strs[index]
|
|
except IndexError:
|
|
warnings.warn(
|
|
"Configured index into IP address header is out of range: "
|
|
f"{header_name!r}:{index!r} "
|
|
f"(actual length {len(ip_strs)})",
|
|
UserWarning
|
|
)
|
|
return None
|
|
|
|
try:
|
|
return ipaddress.ip_address(trusted_ip_str)
|
|
except ValueError:
|
|
warnings.warn(
|
|
"Configured trusted IP address header contained invalid IP: "
|
|
f"{header_name!r}:{index!r}",
|
|
UserWarning
|
|
)
|
|
|
|
|
|
def _get_client_ips_via_trusted_header(request):
|
|
"""
|
|
Get the external chain by reading the trust boundary from a header.
|
|
|
|
This is a strategy used by ``get_all_client_ips`` and should not be used
|
|
directly. It does not implement any fallback in case of misconfiguration.
|
|
|
|
Uses ``CLOSEST_CLIENT_IP_FROM_HEADERS`` to identify the IP just outside of
|
|
the deployment's infrastructure boundary, and uses the rightmost position
|
|
of this to determine where the external chain stops. See setting docs for
|
|
more details.
|
|
|
|
Returns one of the following:
|
|
|
|
- A non-empty list of *parsed* IP addresses, where the rightmost IP is the
|
|
same as the one identified in the trusted header.
|
|
- None if no headers configured or all headers are unusable.
|
|
|
|
A configured header can be unusable if it's missing from the request, the
|
|
index is out of range, the indicated entry in the header can't be parsed
|
|
as an IP address, or the IP in the header can't be found in the IP chain.
|
|
"""
|
|
header_entries = getattr(settings, 'CLOSEST_CLIENT_IP_FROM_HEADERS', [])
|
|
|
|
full_chain = _get_usable_ip_chain(request)
|
|
external_chain = []
|
|
|
|
for entry in header_entries:
|
|
header_name = entry['name']
|
|
index = entry['index']
|
|
if closest_client_ip := _get_trusted_header_ip(request, header_name, index):
|
|
# The equality check in this predicate is why we use parsed IP
|
|
# addresses -- ::1 should compare as equal to 0:0:0:0:0:0:0:1.
|
|
external_chain = _remove_tail(full_chain, lambda ip: ip != closest_client_ip) # pylint: disable=cell-var-from-loop
|
|
if external_chain:
|
|
break
|
|
else:
|
|
warnings.warn(
|
|
f"Ignoring trusted header IP {header_name!r}:{index!r} "
|
|
"because it was not found in the actual IP chain.",
|
|
UserWarning
|
|
)
|
|
|
|
return external_chain
|
|
|
|
|
|
def _compute_client_ips(request):
|
|
"""
|
|
Get the request's external chain, a non-empty list of IP address strings.
|
|
|
|
Warning: should only be called once and cached by ``init_client_ips``.
|
|
|
|
Prefer to use ``get_all_client_ips`` to retrieve the value stored on the
|
|
request, unless you are sure that later middleware has not modified
|
|
the REMOTE_ADDR in-place.
|
|
|
|
This function will attempt several strategies to determine the external chain:
|
|
|
|
- If ``CLOSEST_CLIENT_IP_FROM_HEADERS`` is configured and usable, it will be
|
|
used to determine the rightmost end of the external chain (by reading a
|
|
trusted HTTP header).
|
|
- If that does not yield a result, fall back to assuming that the rightmost
|
|
public IP address in the IP chain is the end of the external chain. (For an
|
|
in-datacenter HTTP request, may instead yield a list with a private IP.)
|
|
"""
|
|
# In practice the fallback to REMOTE_ADDR should never happen, since that
|
|
# would require that value to be present and malformed but with no XFF
|
|
# present.
|
|
ips = _get_client_ips_via_trusted_header(request) \
|
|
or _get_client_ips_via_xff(request) \
|
|
or [request.META['REMOTE_ADDR']]
|
|
|
|
return [str(ip) for ip in ips]
|
|
|
|
|
|
def init_client_ips(request):
|
|
"""
|
|
Compute the request's external chain and store it in the request.
|
|
|
|
This should be called early in the middleware stack in order to avoid
|
|
being called after another middleware that overwrites ``REMOTE_ADDR``,
|
|
which is a pattern some apps use.
|
|
|
|
If called multiple times or if ``CLIENT_IPS`` is already present in
|
|
``request.META``, will just warn.
|
|
"""
|
|
if 'CLIENT_IPS' in request.META:
|
|
warnings.warn("init_client_ips refusing to overwrite existing CLIENT_IPS")
|
|
else:
|
|
request.META['CLIENT_IPS'] = _compute_client_ips(request)
|
|
|
|
|
|
def get_all_client_ips(request):
|
|
"""
|
|
Get the request's external chain, a non-empty list of IP address strings.
|
|
|
|
Most consumers of IP addresses should just use ``get_safest_client_ip``.
|
|
|
|
Calls ``init_client_ips`` if needed.
|
|
"""
|
|
if 'CLIENT_IPS' not in request.META:
|
|
init_client_ips(request)
|
|
|
|
return request.META['CLIENT_IPS']
|
|
|
|
|
|
def get_safest_client_ip(request):
|
|
"""
|
|
Get the safest choice of client IP.
|
|
|
|
Returns a single string containing the IP address that most likely
|
|
represents the originator of the HTTP call, without compromising on
|
|
safety.
|
|
|
|
This is always the rightmost value in the external IP chain that
|
|
is returned by ``get_all_client_ips``. See module docstring for
|
|
more details.
|
|
"""
|
|
return get_all_client_ips(request)[-1]
|