# Line Simplification with Ramer–Douglas–Peucker

The other day I needed to do a fairly routine graphical operation, to “simplify” a polyline with many points into a simpler polyline which has roughly the same shape plus or minus some tolerance factor.

My actual use case was in sending linear movements to a CNC machine. Drawings are defined using floating point numbers and can be “accurate” to about 7-15 decimal places (depending on if you use floats or doubles) but when you take the machine’s mechanical tolerances and material effects into account the final cut is only really accurate to about 1 decimal place (0.1 mm). If I were to simplify the path with a tolerance of, say, 0.05 mm I could massively reduce the number of points sent to the machine (which reduces the amount of data sent, buffer sizes, communications overhead, etc.) with minimal effect on the accuracy.

Other places where this operation can be useful are:

• Cleaning up paths from from “noisy” data sources (imagine getting pixel locations from an edge detection algorithm)
• When you just need a general shape and more points would have a large negative effect on performance for the consumer (e.g. a nesting algorithm).

My go-to tool for this sort of operation is the Ramer–Douglas–Peucker algorithm, and I thought this would make a nice addition to the arcs library I’ve been working on over the last couple months.

The algorithm itself uses a remarkably simple recursive algorithm,

1. Mark the first and last points as kept
2. Find the point, p that is the farthest from the first-last line segment. If there are no points between first and last we are done (the base case)
3. If p is closer than tolerance units to the line segment then everything between first and last can be discarded
4. Otherwise, mark p as kept and repeat steps 1-4 using the points between first and p and between p and last (the call to recursion)

This animation from the Wikipedia article can help wrap your head around how it works.

Like most divide-and-conquer algorithms, in the ideal case this completes in O(n log n) time. However, if you hit an edge case where the “furthest” point is right next to the endpoints this can blow out to O(n^2).

I don’t normally worry about computational complexity too often (computers are fast), but because it’s quite common for my application to work with drawings containing hundreds of thousands of points it’s something to keep an eye on.

To start, let’s add a line_simplification module to arcs::algorithms.

 // arcs/src/algorithms/mod.rs

mod length;
+mod line_simplification;
mod scale;

...

pub use length::Length;
+pub use line_simplification::simplify;
pub use scale::Scale;


I’ve also stubbed out a simplify function.

// arcs/src/algorithms/line_simplification.rs

use euclid::{Length, Point2D};

/// Decimate a curve composed of line segments to a *"simpler"* curve with fewer
/// points.
///
/// The algorithm defines *"simpler"* based on the maximum distance
/// (tolerance) between the original curve and the simplified curve.
///
/// You may want to research the [Ramer–Douglas–Peucker algorithm][wiki] for
/// the exact details and assumptions that can be made.
///
/// [wiki]: https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm
pub fn simplify<Space>(
points: &[Point2D<f64, Space>],
tolerance: Length<f64, Space>,
) -> Vec<Point2D<f64, Space>> {
unimplemented!()
}


To implement this I’m going to procedurally build up a new Vec of points, passing a &mut Vec<_> to the function doing the actual recursion.

// arcs/src/algorithms/line_simplification.rs

pub fn simplify<Space>(
points: &[Point2D<f64, Space>],
tolerance: Length<f64, Space>,
) -> Vec<Point2D<f64, Space>> {
if points.len() <= 2 {
return points.to_vec();
}

let mut buffer = Vec::new();

// push the first point
buffer.push(points);
// then simplify every point in between the start and end
simplify_points(&points[..], tolerance, &mut buffer);
// and finally the last one
buffer.push(*points.last().unwrap());

buffer
}


Next we need to implement this simplify_points() function.

We can use if let and the really handy slice pattern feature (stabilised in Rust 1.42) to extract the first, last, and rest. This gives us everything we need to create a Line from first and last.

// arcs/src/algorithms/line_simplification.rs

fn simplify_points<Space>(
points: &[Point2D<f64, Space>],
tolerance: Length<f64, Space>,
buffer: &mut Vec<Point2D<f64, Space>>,
) {
if let [first, rest @ .., last] = points {
let line_segment = Line::new(*first, *last);

...
}
}


Next we can try to find the point whose perpendicular distance is furthest from line_segment.

Ideally I’d like to use the Iterator::max_by_key() method to find the index of the furthest point where our “key” function uses Line::perpendicular_distance_to(), but that returns a reference to the item and not its index… So to make the code cleaner I ended up rolling my own max_by_key() function.

// arcs/src/algorithms/line_simplification.rs

fn simplify_points<Space>(
points: &[Point2D<f64, Space>],
tolerance: Length<f64, Space>,
buffer: &mut Vec<Point2D<f64, Space>>,
) {
if let [first, rest @ .., last] = points {
let line_segment = Line::new(*first, *last);

if let Some((ix, distance)) =
max_by_key(rest, |p| line_segment.perpendicular_distance_to(*p))
{
...
}
}
}

fn max_by_key<T, F, K>(items: &[T], mut key_func: F) -> Option<(usize, K)>
where
F: FnMut(&T) -> K,
K: PartialOrd,
{
let mut best_so_far = None;

for (i, item) in items.iter().enumerate() {
let key = key_func(item);

let is_better = match best_so_far {
Some((_, ref best_key)) => key > *best_key,
None => true,
};

if is_better {
best_so_far = Some((i, key));
}
}

best_so_far
}


If you’re keeping track we’ve completed step 2 from the algorithm section.

Now if the distance is greater than our tolerance we need to recurse and add the furthest point to our buffer.

The only real difficulty here is that the ix returned by max_by_key() is an index into rest, not points… I originally forgot this bit and had an off-by-one error that resulted in infinite recursion and blowing the stack 😊

// arcs/src/algorithms/line_simplification.rs

fn simplify_points<Space>(
points: &[Point2D<f64, Space>],
tolerance: Length<f64, Space>,
buffer: &mut Vec<Point2D<f64, Space>>,
) {
if let [first, rest @ .., last] = points {
let line_segment = Line::new(*first, *last);

if let Some((ix, distance)) =
max_by_key(rest, |p| line_segment.perpendicular_distance_to(*p))
{
if distance > tolerance {
// note: index is the index into rest, but we want it relative
// to point
let ix = ix + 1;

simplify_points(&points[..=ix], tolerance, buffer);
buffer.push(points[ix]);
simplify_points(&points[ix..], tolerance, buffer);
}
}
}
}


… And that’s pretty much it. We’ve implemented the full Ramer-Douglas-Peucker algorithm in about 50 lines or Rust.

At this point we know our code compiles, but is it actually correct?

We can start off with lines of 0, 1, or 2 points, because they’re already as simple as they’re going to get.

// arcs/src/algorithms/line_simplification.rs

#[cfg(test)]
mod tests {
use super::*;
use crate::Point;

#[test]
fn empty_line() {
let points: Vec<Point> = Vec::new();

let got = simplify(&points, Length::new(1.0));

assert!(got.is_empty());
}

#[test]
fn line_with_one_point() {
let points = vec![Point::new(0.0, 0.0)];

let got = simplify(&points, Length::new(1.0));

assert_eq!(got, points);
}

#[test]
fn line_with_two_points() {
let points = vec![Point::new(0.0, 0.0), Point::new(10.0, 2.0)];

let got = simplify(&points, Length::new(1.0));

assert_eq!(got, points);
}
}


What about a perfectly straight line containing 100 points? The simplified version should only contain the start and end points.

// arcs/src/algorithms/line_simplification.rs

#[cfg(test)]
mod tests {
...

#[test]
fn simplify_a_straight_line_to_two_points() {
let points: Vec<Point> =
(0..100).map(|i| Point::new(i as f64, 0.0)).collect();
let should_be = &[points, points];

let got = simplify(&points, Length::new(0.1));

assert_eq!(got, should_be);
}
}


Next, let’s add a bit of movement to the various points in this line. I’m going to use sin to add a bit of “randomness” to each point’s vertical component. As long as the vertical movement is within our threshold all points between the start and end should be simplified out.

// arcs/src/algorithms/line_simplification.rs

#[cfg(test)]
mod tests {
...

#[test]
fn simplify_a_horizontal_line_with_small_amounts_of_vertical_jitter() {
let max_jitter = 0.1;

let points: Vec<Point> = (0..100)
.map(|i| {
let jitter = max_jitter * (i as f64 / 100.0 * PI).sin();
Point::new(i as f64, jitter)
})
.collect();

let should_be = &[points, points];

let got = simplify(&points, Length::new(max_jitter * 2.0));

assert_eq!(got, should_be);
}
}


As a fun fact, if you were to graph this you’d see a sine wave between 0 and 99 with a period of 50 and amplitude of 0.1.

Finally I thought I’d try a more realistic curve to make sure the tests so far haven’t added some bias due to their contrived nature.

For this, I needed to pull out the most sophisticated tool in my mathematical toolbox. … Pen and paper.

I’ve drawn a series of points on a set of cartesian coordinates, and circled the points (blue) that would be kept. By tracing around my ruler (red) I can emulate the tolerance area, with anything inside the ruler boundary being discarded.

By measuring the location of each point we can write one last test.

// arcs/src/algorithms/line_simplification.rs

#[cfg(test)]
mod tests {
...

#[test]
fn simplify_more_realistic_line() {
// Found by drawing it out on paper and using a ruler to determine
// point coordinates
let line = vec![
Point::new(-43.0, 8.0),
Point::new(-24.0, 19.0),
Point::new(-13.0, 23.0),
Point::new(-8.0, 36.0),
Point::new(7.0, 40.0),
Point::new(24.0, 12.0),
Point::new(44.0, -6.0),
Point::new(57.0, 2.0),
Point::new(70.0, 7.0),
];
let should_be = vec![line, line, line, line];
let ruler_width = Length::new(20.0);

let got = simplify(&line, ruler_width / 2.0);

assert_eq!(got, should_be);
}
}