""" Utilities for determining the IP address of a request. Summary ======= For developers: - Call ``get_safest_client_ip`` whenever you want to know the caller's IP address - Make sure ``init_client_ips`` is called as early as possible in the middleware stack - See the "Guidance for developers" section for more advanced usage For site operators: - See the "Configuration" section for important information and guidance For everyone: - Background information is available in the "Concepts" section Concepts ======== - The *IP chain* is the list of IPs in the ``X-Forwarded-For`` (XFF) header followed by the ``REMOTE_ADDR`` value. If all involved parties are telling the truth, this is the list of IP addresses that have relayed the HTTP request. However, due to the possibility of spoofing, this raw data cannot be used directly for all purposes: - The rightmost IP in the chain is the IP that has directly connected with the server and sent or relayed the request. In most deployments, this is likely to be a reverse proxy such as nginx. In any case it is the "closest" IP (in the sense of the request chain, not in terms of geographic proximity.) - The next closest IP, if present, is the one that the closest IP *claims* sent the request to it. Each IP in the chain can only vouch for the correctness of the IP immediately to its left in the list. - In a normal, unspoofed request, the leftmost IP is the "real" client IP, the IP of the computer that made the original request. - However, clients can send a fake XFF header, so the leftmost IP in the chain cannot be trusted in the general case. In fact, the only IP that can be trusted absolutely is the rightmost one. - The challenge is to determine what the leftmost *trusted* IP is, as this is the most accurate we can get without compromising on security. - The *external chain* is some prefix of the IP chain that stops before the (recognized) edge of the deployment's infrastructure. That is, the external chain is the portion of the IP chain that is to the left of some trust boundary, as determined by configuration or some fallback method. This is the list of IPs that can all plausibly be considered the "real" IP of the client. If the server is configured correctly this may contain, in order: Any IPs spoofed by the client, the client's own IP, IPs of any forwarding HTTP proxies specified by the client, and then IPs of any reverse HTTP proxies the request passed through *before* reaching the deployment's own infrastructure (CDN, load balancer, etc.) - Caveat: In the case where the request is being sent through an anonymizing proxy such as a VPN, the VPN's exit node IP is considered the "real" client IP. - Despite the name, this chain may contain private-range IP addresses, in particular if a request originates from another server in the same datacenter. Guidance for developers ======================= Almost anywhere you care about IP address, just call ``get_safest_client_ip``. This will get you the *rightmost* IP of the external chain (defined above). Because it cannot be easily spoofed by the caller, it is suitable for adversarial use-cases such as: - Rate-limiting - Only allowing certain IPs to access a resource (or alternatively, blocking them) In some less common situations where you need the entire external chain, you can call ``get_all_client_ips`. This returns a list of IP addresses, although for the great majority of normal requests this will be a list of length 1. This list is appropriate for when you're recording IPs for manual review or need to make a decision based on all of the IPs (no matter which one is the "real" one. This might include: - Audit logs - Telling a user about other active sessions on their account - Georestriction In some very rare cases you might want just a single IP that isn't rightmost. In some cases you might ask for the entire external chain and then take the leftmost IP. This should only be used in non-adversarial situations, and is usually the wrong choice, but may be appropriate for: - Localization (if other HTTP headers aren't sufficient) - Analytics Configuration ============= Configuration is via ``CLOSEST_CLIENT_IP_FROM_HEADERS``, which allows specifying an HTTP header that will be trusted to report the rightmost IP in the external chain. See setting annotation for details, but guidance on common configurations is provided here: - If you use a CDN as your outermost proxy: - Find what header your CDN sends to its origin that indicates the remote address it sees on inbound connections. For example, with Cloudflare this is ``CF-Connecting-IP``. - Ensure that your CDN always overrides this header if it exists in the inbound request, and never accepts a value provided by the client. Some CDNs are better than others about this. - Recommended setting, using Cloudflare as the example:: CLOSEST_CLIENT_IP_FROM_HEADERS: - name: CF-Connecting-IP index: 0 It would be equivalent to use ``-1`` as the index since there is always one and only one IP in this header, and Python list indexing rules are used here. - As a general rule, you should also ensure that traffic cannot bypass the CDN and reach your origin directly, since otherwise attackers will be able to spoof their IP address (and bypass protections your CDN provides). You may need to arrange for your CDN to set a header containing a shared secret. - If your outermost proxy is an AWS ELB or other proxy on the same local network as your server, or you have any other configuration in which your proxies and application speak to each other using private-range IP addresses: - You can rely on the rightmost public IP in the IP chain to be the safest client IP. To do this, set your configuration for zero trusted headers:: CLOSEST_CLIENT_IP_FROM_HEADERS: [] - This assumes that 1) your outermost proxy always appends to ``X-Forwarded-For``, and 2) any further proxies between that one and your application either append to it (ideal) or pass it along unchanged (not ideal, but workable). This is true by default for most proxy software. - If you have any reverse proxy that will be seen by the next proxy or your application as having a public IP: - You'll need to rely on having a consistent *number* of proxies in front of your application, and you'll need to know which ones append to the ``X-Forwarded-For`` header instead of just passing it unchanged. - Once you know the number of your proxies in the chain that append, you can use this count to say that the Nth-from-last IP in the ``X-Forwarded-For`` is the closest client IP. For example, if you had two, you would use ``-2`` (note the negative sign) to indicate the second-from-last IP:: CLOSEST_CLIENT_IP_FROM_HEADERS: - name: X-Forwarded-For index: -2 - This is fragile in the face of network configuration changes, so having your outermost proxy set a special header is preferred. - Configuring the proxy count too low will result in rate-limiting your own proxies; configuring it too high will allow attackers to bypass rate-limiting. - Side note: Even if you don't use it for ``CLOSEST_CLIENT_IP_FROM_HEADERS``, this proxy-counting approach will be required for configuring django-rest-framework's ``NUM_PROXIES`` setting. - If your application is directly exposed to the public internet, without even a local proxy: - This is an unusual configuration, but simple to configure; with no proxies, just indicate that there are no trusted headers and therefore the closest public IP should be used:: CLOSEST_CLIENT_IP_FROM_HEADERS: [] """ import ipaddress import warnings from django.conf import settings from edx_toggles.toggles import WaffleSwitch # .. toggle_name: ip.legacy # .. toggle_implementation: WaffleSwitch # .. toggle_default: False # .. toggle_description: Emergency switch to revert to use the older, less secure method for # IP determination. When enabled, instructs switch's callers to revert to using the *leftmost* # IP from the X-Forwarded-For header. When disabled (the default), callers should use the new # code path for IP determination, which has callers retrieve the entire external chain or pick # the leftmost or rightmost IP from it. The construction of the external chain is configurable # via ``CLOSEST_CLIENT_IP_FROM_HEADERS``. # This toggle, as well as any other legacy IP references, should be deleted (in the off # position) when the new IP code is well-tested and all IP-reliant code has been switched over. # .. toggle_warning: This switch does not control the behavior of this module. Callers must # opt into querying this switch, and can call ``get_legacy_ip`` if the switch is enabled. # .. toggle_use_cases: temporary # .. toggle_creation_date: 2022-03-24 # .. toggle_target_removal_date: 2022-07-01 # .. toggle_tickets: https://openedx.atlassian.net/browse/ARCHBOM-2056 (internal only) USE_LEGACY_IP = WaffleSwitch('ip.legacy', module_name=__name__) def get_legacy_ip(request): """ Return a client IP selected using an old, insecure method. Always picks the leftmost IP in the X-Forwarded-For header, if present, otherwise returns the original REMOTE_ADDR. """ if xff := request.META.get('HTTP_X_FORWARDED_FOR'): return xff.split(',')[0].strip() else: # Might run before or after XForwardedForMiddleware. return request.META.get('ORIGINAL_REMOTE_ADDR', request.META['REMOTE_ADDR']) def _get_meta_ip_strs(request, header_name): """ Get a list of IPs from a header in the given request. Return the list of IPs the request is carrying on this header, which is expected to be comma-delimited if it contains more than one. Response may be an empty list for missing or empty header. List items may not be valid IPs. """ if not header_name: return [] field_name = 'HTTP_' + header_name.replace('-', '_').upper() header_value = request.META.get(field_name, '').strip() if header_value: return [s.strip() for s in header_value.split(',')] else: return [] def get_raw_ip_chain(request): """ Retrieve the full IP chain from this request, as list of raw strings. This is uninterpreted and unparsed, except for splitting on commas and removing extraneous whitespace. """ return _get_meta_ip_strs(request, 'X-Forwarded-For') + [request.META['REMOTE_ADDR']] def _get_usable_ip_chain(request): """ Retrieve the full IP chain from this request, as parsed addresses. The IP chain is the X-Forwarded-For header, followed by the REMOTE_ADDR. This list is then narrowed to the largest suffix that can be parsed as IP addresses. """ parsed = [] for ip_str in reversed(get_raw_ip_chain(request)): try: parsed.append(ipaddress.ip_address(ip_str)) except ValueError: break return list(reversed(parsed)) def _remove_tail(elements, f_discard): """ Remove items from the tail of the given list until f_discard returns false. - elements is a list - f_discard is a function that accepts an item from the list and returns true if it should be discarded from the tail Returns a new list that is a possibly-empty prefix of the input list. (This is basically itertools.dropwhile on a reversed list.) """ prefix = elements[:] while prefix and f_discard(prefix[-1]): prefix.pop() return prefix def _get_client_ips_via_xff(request): """ Get the external chain of the request by discarding private IPs. This is a strategy used by ``get_all_client_ips`` and should not be used directly. Returns a list of *parsed* IP addresses, one of: - A list ending in a publicly routable IP - A list with a single, private-range IP - An empty list, if REMOTE_ADDR was unparseable as an IP address. This would be very unusual but could possibly happen if a local reverse proxy used a domain socket rather than a TCP connection. """ ip_chain = _get_usable_ip_chain(request) external_chain = _remove_tail(ip_chain, lambda ip: not ip.is_global) # If the external_chain is in fact all private, everything will have been # removed. In that case, just return the leftmost IP it would have # considered, even though it must be a private IP. return external_chain or ip_chain[:1] # .. setting_name: CLOSEST_CLIENT_IP_FROM_HEADERS # .. setting_default: [] # .. setting_description: A list of header/index pairs to use for determining the IP in the # IP chain that is just outside of this deployment's infrastructure boundary -- that is, # the rightmost address in the IP chain that is *not* owned by the deployment. (See module # docstring for background and definitions, as well as guidance on configuration.) # Each list entry is a dict containing a header name and an index into that header. This will # control how the client's IP addresses are determined for attribution, tracking, rate-limiting, # or other general-purpose needs. # The named header must contain a list of IP addresses separated by commas, with whitespace # tolerated around each address. The index is used for a Python list lookup, e.g. 0 is the first # element and -2 is the second from the end. # Header/index pairs will be tried in turn until the first one that yields a usable IP, which # will then be used to determine the end of the external chain. # If the setting is an empty list, or if none of the entries yields a usable IP (header is # missing, index out of range, IP not in IP chain), then a fallback strategy will be used # instead: Private-range IPs will be discarded from the right of the IP chain until a public # IP is found, or the chain shrinks to one IP. This entry will then be considered the rightmost # end of the external chain. # Migrations from one network configuration to another may be accomplished by first adding the # new header to the list, making the networking change, and then removing the old one. # .. setting_warnings: Changes to the networking configuration that are not coordinated with # this setting may allow callers to spoof their IP address. def _get_trusted_header_ip(request, header_name, index): """ Read a parsed IP address from a header at the specified position. Helper function for ``_get_client_ips_via_trusted_header``. Returns None if header is missing, index is out of range, or the located entry can't be parsed as an IP address. """ ip_strs = _get_meta_ip_strs(request, header_name) if not ip_strs: warnings.warn(f"Configured IP address header was missing: {header_name!r}", UserWarning) return None try: trusted_ip_str = ip_strs[index] except IndexError: warnings.warn( "Configured index into IP address header is out of range: " f"{header_name!r}:{index!r} " f"(actual length {len(ip_strs)})", UserWarning ) return None try: return ipaddress.ip_address(trusted_ip_str) except ValueError: warnings.warn( "Configured trusted IP address header contained invalid IP: " f"{header_name!r}:{index!r}", UserWarning ) def _get_client_ips_via_trusted_header(request): """ Get the external chain by reading the trust boundary from a header. This is a strategy used by ``get_all_client_ips`` and should not be used directly. It does not implement any fallback in case of misconfiguration. Uses ``CLOSEST_CLIENT_IP_FROM_HEADERS`` to identify the IP just outside of the deployment's infrastructure boundary, and uses the rightmost position of this to determine where the external chain stops. See setting docs for more details. Returns one of the following: - A non-empty list of *parsed* IP addresses, where the rightmost IP is the same as the one identified in the trusted header. - None if no headers configured or all headers are unusable. A configured header can be unusable if it's missing from the request, the index is out of range, the indicated entry in the header can't be parsed as an IP address, or the IP in the header can't be found in the IP chain. """ header_entries = getattr(settings, 'CLOSEST_CLIENT_IP_FROM_HEADERS', []) full_chain = _get_usable_ip_chain(request) external_chain = [] for entry in header_entries: header_name = entry['name'] index = entry['index'] if closest_client_ip := _get_trusted_header_ip(request, header_name, index): # The equality check in this predicate is why we use parsed IP # addresses -- ::1 should compare as equal to 0:0:0:0:0:0:0:1. external_chain = _remove_tail(full_chain, lambda ip: ip != closest_client_ip) # pylint: disable=cell-var-from-loop if external_chain: break else: warnings.warn( f"Ignoring trusted header IP {header_name!r}:{index!r} " "because it was not found in the actual IP chain.", UserWarning ) return external_chain def _compute_client_ips(request): """ Get the request's external chain, a non-empty list of IP address strings. Warning: should only be called once and cached by ``init_client_ips``. Prefer to use ``get_all_client_ips`` to retrieve the value stored on the request, unless you are sure that later middleware has not modified the REMOTE_ADDR in-place. This function will attempt several strategies to determine the external chain: - If ``CLOSEST_CLIENT_IP_FROM_HEADERS`` is configured and usable, it will be used to determine the rightmost end of the external chain (by reading a trusted HTTP header). - If that does not yield a result, fall back to assuming that the rightmost public IP address in the IP chain is the end of the external chain. (For an in-datacenter HTTP request, may instead yield a list with a private IP.) """ # In practice the fallback to REMOTE_ADDR should never happen, since that # would require that value to be present and malformed but with no XFF # present. ips = _get_client_ips_via_trusted_header(request) \ or _get_client_ips_via_xff(request) \ or [request.META['REMOTE_ADDR']] return [str(ip) for ip in ips] def init_client_ips(request): """ Compute the request's external chain and store it in the request. This should be called early in the middleware stack in order to avoid being called after another middleware that overwrites ``REMOTE_ADDR``, which is a pattern some apps use. If called multiple times or if ``CLIENT_IPS`` is already present in ``request.META``, will just warn. """ if 'CLIENT_IPS' in request.META: warnings.warn("init_client_ips refusing to overwrite existing CLIENT_IPS") else: request.META['CLIENT_IPS'] = _compute_client_ips(request) def get_all_client_ips(request): """ Get the request's external chain, a non-empty list of IP address strings. Most consumers of IP addresses should just use ``get_safest_client_ip``. Calls ``init_client_ips`` if needed. """ if 'CLIENT_IPS' not in request.META: init_client_ips(request) return request.META['CLIENT_IPS'] def get_safest_client_ip(request): """ Get the safest choice of client IP. Returns a single string containing the IP address that most likely represents the originator of the HTTP call, without compromising on safety. This is always the rightmost value in the external IP chain that is returned by ``get_all_client_ips``. See module docstring for more details. """ return get_all_client_ips(request)[-1]