IP address Localisation via Latency

date April 18, 2024

date 7 min read

The state of current IP location solutions is bad. How can they do better?

Tim Armstrong, Founder, Consultant Engineer, Expert in Performance & Security

The state of current IP location solutions is bad.

Practically all of the commercially available “GeoIP” Databases are continually out of date, and, despite being generally expensive for the “accurate” solutions, even when they are “updated daily” they’re still frequently shockingly wrong.

Introducing: Stochastic-Progressive Latency Localisation.

Using globally distributed nodes with well-known locations (such as RIPE Atlas), we present a method for localising an IP address to the City.

This new method results in massive improvements in accuracy and will continue to improve over time, as we’re not playing a constant game of catch-up, or manually reviewing location update notifications from Network owners.

Why not Triangulation?

When approaching the problem of localising something from measurements, the obvious thing that comes to mind is triangulation.

With triangulation, you can take three points (probes) that you know the physical location of and if you can measure the length of time it takes a signal to get to the target and back from each point and have some means of converting that time into a distance, then you can accurately compute its location with some very simple maths. If you want more accuracy, simply make more measurements from other known locations and average the difference.

This is, in fact, exactly how GPS works.

We know the precise positions of geostationary satellites in relation to well-known reference points, we also know the precise conversion between time and distance for radio signals travelling in a straight line, as well as the relationship between signal level and distance.

Using these elements, along with sending time-stamps from each satellite periodically, a relatively inexpensive device is able to compute its location very precisely.

So why then won’t this work for the internet.

The speed of light through the fibre is well-known, so surely we can measure the distance by taking the latency measured by a common ping and dividing it by 2 (since ping is round-trip and not mono-directional). If we know the locations of two probes and the time it takes for each to ping the target, then we know precisely two locations that target could be in, then by adding a third location, then we know which of the two locations is correct by its proximity (or lack thereof) to the third probe.

In theory, this works, however, the practicalities of the internet get in the way. The fundamental issue here is that the internet is not (and will never be) line-of-sight. This is due in part to the realities of laying fibre, but also because some parts of the world rely on satellite internet and/or radio links. And that’s all before we even start taking into account that it’s not one big homogeneous entity managed for the common good.

The reality, perhaps unsurprisingly, is that it is actually a duct-taped together mess of ISPs that are both in direct competition with each other, while simultaneously having to work together in order for any of it to even function at all. And all of this is being held together by a bunch of overworked and underpaid engineers, that don’t even really get to take a break when they’re on holiday (unless they’re junior engineers; as those guys are still being indoctrinated we try to hide this from them for a while).

Stochastic-Progressive algorithms

So let’s talk about Stochastic-Progressive algorithms. These algorithms are commonly found in the Photo-Realistic Graphics Rendering engines used by the likes of Framestore, or Industrial Light and Magic.

They work by taking a random selection of points (probes) and evaluating how “good” each point is. The best probes are then used as the center of a circle (search radius) that is used as a boundary that the next selection of points must be within. This process then continues until the search radius can’t be meaningfully reduced further.

This path is then coloured (shaded) and saved to the “film”.

In the case of Stochastic-Progressive-Photon-Mapping this looks somewhat like this:

Ray Pass-1: The light “A” sends out light-rays randomly. The camera sends out eye-rays randomly.

Selection-1: A search radius is drawn around each of the eye-rays. If there are light rays within the search radii it is painted onto the film with the radius that it was closest to the centre of.

Ray Pass-2: Then a larger radius is drawn around each of the light-rays and eye-rays. The light and camera will then send a random selection rays that would be within their respective sets of radii.

Selection-2: As with Selection-1 a search radius is drawn around each of the eye-rays (although the radius now reduced a little). Light-rays that fall within these search areas are then painted onto the film as before.

The Ray Pass and Selection loop then continues until some artificial halting condition is met (common halting conditions include: number of passes, time, dimension of search radius). Once this “direct illumination” pass is complete, it’s then common to iteratively increase the “path-depth”, allowing the light rays to bounce randomly off a number of surfaces before being required to match with an eye-ray.

Incidentally, I was responsible for creating some of the tests in a Paper by researchers at Tsinghua University in Beijing, China, back in 2010-2011 that expanded upon this technique by introducing metropolis sampling to the light-ray generation reducing mean-relative-error while increasing photon density in areas lit by indirect sources.

Interesting, but how does this apply to IP localisation?

Rather than using light-rays, our technique uses Round Trip Time. As measured from a random selection of probes spread around the globe.

We then draw a search area around the best probes, from which we select new probes. We then repeat this until there are insufficient probes to reasonably improve the accuracy further.

Latency Pass-1: Uses RTT with a global selection of probes.

Selection-1: Finds the midpoint of the best probes and constructs a search radius around it.

Latency Pass-2: Takes a random selection of new probes within the search radius and evaluates the RTT from these new probes towards the destination.

This process continues until there are no longer sufficient probes within the search radius to reliably reduce the radius further. The more probes we have the more accurate the data gets.

Cool, so how can it be improved further?

Indications of physical locations gathered from sources like the Internet Routing Registry (IRR), Routing information Service (RIS), and DNS can be used as hints to pre-optimise the probe selection. Biasing it to the predicted country to reduce the number of probes and interations needed.

For example: In the IRR, it is not uncommon for ISPs to register the Country where they intend to use the IP Addresses. This isn’t always kept up-to-date, but it’s generally a good start as while the country might not be accurate, the continent that this block is used within is likely to remain largely unchanged.

To improve the efficiency of the Stochastic process we can attempt to bootstrap the search by using these hints.

This is done by first validating the accuracy of the hints by testing the assumption with 3 random probes within the region and 3 outside the region.

The hints pass this validation if the mean RTT of the probes within the region is lower than the mean RTT of the probes outside the region.

This biasing can significantly reduce iterations; reducing bandwidth consumption and total measurement time.

What happens when there are no probes near the actual location?

This technique is obviously heavily reliant on the existence of, and proximity to, suitable probes.

This doesn’t mean we have no data for such locations, but it does mean that the accuracy is limited to the radius at which this halting condition is met.

In this scenario, it may be more accurate to use the hints directly if they are of a finer resolution. For this purpose, the hints, radius, and centre of the circle are provided in the body of the response from the API.

So if ICMP is blocked then it’s not going to work right?

If ICMP is blocked for the specific target then we assume that the location is within the same approximate region as the rest of that block of 256 IP Addresses (hopefully, there is at least one IP responding to ping in that block).

Final thoughts

It’s not perfect, and it’ll never get you street-level accuracy, but it does enable real-time updates of localisation data. Removing the primary limitation of commercial databases that may claim street-level accuracy but only update once a year (if that).

Articles overview