© 2019 heuristify - Building Products for the AI First World

To calculate the bill for a ride, the ride sharing service must account for the time taken and the distance travelled. The distance travelled concern appears to be a trivial one: simply take GPS points at regular time intervals during the ride and then use them to calculate the distance travelled. That is indeed what we have done, but it's not quite as simple as that.

Figure 1: Detecting GPS points over the course of a ride

The issue lies with the uncertainty in the GPS mechanism itself. Despite all the advances in technology, weather conditions, nearby tall buildings and even cheap phones with not-the-best-tech can introduce errors in the recorded position of a GPS-enabled device. These fluctuations from the true position, though often minor, can build up to introduce superfluous “distance travelled”, which will result in the customer being unfairly charged higher than they should be. The consequences in terms of customer satisfaction can hardly be overstated.

Figure 2: Fluctuations are marked in black.

The problem of GPS fluctuations, therefore, is one that is imperative for any aspiring ride sharing service to resolve. Consider Figure 2 shown above. In what appears to be an innocuous set of GPS points, multiple fluctuations become apparent upon closer examination, fluctuations which add several tens of meters of superfluous distance. The problem is thus clear, and we must now form a solution.

Once again, a trivial solution presents itself. The issue with GPS fluctuation as shown in Figure 2 may be restated as follows: the error introduced by GPS fluctuations produces points that lie off-road. If there was a way to simply snap all points to a road based on the context of the entire route, this issue could very easily be circumvented.

Enter Google’s Snap to Road API. This API will, given a sequential set of GPS points representing a full route, return a corresponding set of points snapped to the roads the route most probably included. While it has some limitations like a maximum receiving limit of 100 points (which is fulfilled for long rides by sampling at intervals), it appears for the moment to solve all our problems. In the figure below, the original raw points are shown in red as previously, with the route returned by Snap to Road shown in blue.

Enter Google’s Snap to Road API. This API will, given a sequential set of GPS points representing a full route, return a corresponding set of points snapped to the roads the route most probably included. While it has some limitations like a maximum receiving limit of 100 points (which is fulfilled for long rides by sampling at intervals), it appears for the moment to solve all our problems. In the figure below, the original raw points are shown in red as previously, with the route returned by Snap to Road shown in blue.

Figure 3: Snap to Road deals with the fluctuations in raw points

All our problems are then solved, albeit at a non-trivial monetary cost (Snap to Road is obviously not free), and we congratulate ourselves on a job well done.

But all is not as rosy as it may seem. While Snap to Road performs exceptionally well for minor fluctuations, GPS inaccuracies are not quite so obliging as to always be minor. What happens when a fluctuation’s magnitude is not of a few tens of meters, but a few hundred?

But all is not as rosy as it may seem. While Snap to Road performs exceptionally well for minor fluctuations, GPS inaccuracies are not quite so obliging as to always be minor. What happens when a fluctuation’s magnitude is not of a few tens of meters, but a few hundred?

Figure 4: When the magnitude and concentration of fluctuations rises, Snap to Road is unable to plot the corrected route for us

Consider Figure 4 above. Once again, the raw points are shown in red, and the route returned by Snap to Road in blue. Consider now the central part of the figure marked out in a black ovaloid. The red points at the left extreme of the ovaloid are multiple fluctuations from the right extreme of the ovaloid. These are multiple fluctuation that introduce a few hundred meters of superfluous distance each.

Although Snap to Road is clearly able to reduce the damage, it remains quite significant, with an error of over 1500m from what the measurement should be. In fact, in some cases, despite Snap to Road usage, the error introduced is upwards of 2.5km. It seems the problem is not so trivial, and something else must be done.

Although Snap to Road is clearly able to reduce the damage, it remains quite significant, with an error of over 1500m from what the measurement should be. In fact, in some cases, despite Snap to Road usage, the error introduced is upwards of 2.5km. It seems the problem is not so trivial, and something else must be done.

In order to formulate a solution, we constructed a dataset of 16,508 rides, recorded over a period of a few months. The dataset consisted of the original raw points and the corresponding Snap to Road route. Two major obstacles are apparent: we have no true reference data to compare our eventual solution to, and it is infeasible to manually mark out reference data for such a large dataset.

We solve the first one by using loosely the Snap to Road data as reference: although it does underperform for major fluctuations, it shows excellent results for most rides as demonstrated above. The second problem we solve by that best of methods: we ignore it. Callous though this may seem, we argue that it is irrelevant in the first place: Snap to Road provides a sufficient reference for rides with minor fluctuations and those with major fluctuations are reasonably few in number as to allow empirical observation of any potential solution to suffice as a test.

We come at last to the thing you have been waiting for: the solution we propose for this very important problem. Before the algorithm itself is presented, we must make two very important observations: (a) most points will be correctly recorded with fluctuations being a very small minority in absolute terms, and (b) a fluctuated point between two correctly recorded points will subtend an acute angle, since all sequential triples of correctly detected points (with the exception of U-turns) will tend towards a straight line.

We solve the first one by using loosely the Snap to Road data as reference: although it does underperform for major fluctuations, it shows excellent results for most rides as demonstrated above. The second problem we solve by that best of methods: we ignore it. Callous though this may seem, we argue that it is irrelevant in the first place: Snap to Road provides a sufficient reference for rides with minor fluctuations and those with major fluctuations are reasonably few in number as to allow empirical observation of any potential solution to suffice as a test.

We come at last to the thing you have been waiting for: the solution we propose for this very important problem. Before the algorithm itself is presented, we must make two very important observations: (a) most points will be correctly recorded with fluctuations being a very small minority in absolute terms, and (b) a fluctuated point between two correctly recorded points will subtend an acute angle, since all sequential triples of correctly detected points (with the exception of U-turns) will tend towards a straight line.

Figure 5: A fluctuation (in red) will tend to subtend an acute angle. The correct path is marked by dotted lines.

Using these observations, we now suggest a solution:

1- Traverse the points in the path.

2- For each non-edge point:

a. Calculate the angle it subtends between the point immediately preceding it and the point immediately succeeding it.

b. Remove the point if the angle it subtends is smaller than 90 degrees.

1- Traverse the points in the path.

2- For each non-edge point:

a. Calculate the angle it subtends between the point immediately preceding it and the point immediately succeeding it.

b. Remove the point if the angle it subtends is smaller than 90 degrees.

Armed with our deceptively simple solution, we proceed to the testing phase. Once again, the results appear promising, but once again we discover an oversight. The algorithm proposed makes a fatal assumption: it assumes that each fluctuation is located between two correctly recorded points. This is not guaranteed to be the case, however, since consecutive points may be erroneously recorded.

Figure 6: Results (in green) appear to be promising.

Consider Figure 7 below. Once again, the results of the algorithm are shown in green. At the bottom of the figure, we see multiple consecutive fluctuations, and we see clearly that none of them are removed.

Figure 7: Consecutive fluctuations cause the algorithm problems.

To solve this problem, we tweak our algorithm ever so slightly. The modified algorithm is shown below.

1- Traverse the points in the path.

2- For each non-edge point:

a. Calculate the angle it subtends between the last correctly recorded point preceding it and the point immediately succeeding it.

b. Remove the point if the angle it subtends is smaller than 90 degrees.

This modification ensures that if multiple fluctuations exist consecutively, we simply ignore any fluctuations immediately preceding a point under observation and use instead the last known point that was not marked by the algorithm as a fluctuation to calculate the angle our current point subtends. The remaining algorithm remains the same. For individual fluctuations, therefore, the results also remain the same while for consecutive ones, they are significantly improved.

1- Traverse the points in the path.

2- For each non-edge point:

a. Calculate the angle it subtends between the last correctly recorded point preceding it and the point immediately succeeding it.

b. Remove the point if the angle it subtends is smaller than 90 degrees.

This modification ensures that if multiple fluctuations exist consecutively, we simply ignore any fluctuations immediately preceding a point under observation and use instead the last known point that was not marked by the algorithm as a fluctuation to calculate the angle our current point subtends. The remaining algorithm remains the same. For individual fluctuations, therefore, the results also remain the same while for consecutive ones, they are significantly improved.

Figure 8: Results of the modified algorithm are shown in turquoise. Consecutive fluctuations are now correctly handled.

Our algorithm has now reached maturity and no oversight is to be seen. Figure 9 also shows its performance.

Figure 9: Results of the modified algorithm are shown in turquoise. Compare with the performance of Snap to Road in Figure 4.

We present now some concrete proof of the good performance of our final algorithm beyond some cherry-picked examples. Recall that we had decided to use Snap to Road loosely as our reference data. In the following histogram, the x-axis shows absolute delta from Snap to Road distance in meters and the y-axis shows frequency of rides. Each bin on the x-axis has a width of 50m.

Figure 10: The deltas for our proposed solution are shown in blue. Deltas for raw data points are shown in orange for reference.

82.3% of rides have a delta of less 500m between our proposed solution and Snap to Road. Fewer than 5% have a delta of greater than 1200m. Rides with deltas of over 500m between our proposed solution and Snap to Road, upon empirical observation, show almost invariably that it is Snap to Road at fault to due high fluctuation in the raw data (refer to the comparison between Figures 4 and 9). Remember, Snap to Road data is itself erroneous and cannot be used as an absolute metric to judge the correctness of our proposed algorithm.

All in all, the problem is resolved to a satisfactory ending.

All in all, the problem is resolved to a satisfactory ending.

© 2019 heuristify - Building Products for the AI First World