The Smallest Circle (#157)

Matthew_D_Moss · February 17, 2008, 8:07pm

I had planned to improve my solution by taking an approach
similar to Justin E.'s. . . . By iterating and “moving the
circle center towards the outlier as the radius is reduced,
while all points still are within the radius.”

I may still code that if I have time… but then again, it’s
already been done.

Ah, what the heck… Here 'tis…

http://tastyspleen.net/~billk/ruby/quiz/157-smallest-circle/157_smallest_circle.rb

Algorithm used: First enclose the points in a simple

axis-aligned bounding box, and choose the center of the

box as the initial center of the circle. Then, binary

search to determine how far we can move the center point

toward the furthest outlying point, while keeping the

rest of the points still inside the circle.

Regards,

Bill

Matthew_D_Moss · February 17, 2008, 8:14pm

From: “Lionel B.” [email protected]

Alex S. wrote:

http://tastyspleen.net/~billk/ruby/quiz/157-smallest-circle/157_small…

I’m trying to utilize gnuplot for visualization. With any luck I can
post the visual part without spoiling the quiz.

Glad to see that I’m not the sole old-school minded

Hehe… if Wikipedia is to be believed, gnuplot is 22 years old, and
OpenGL is 16 years old. Practically siblings.

Regards,

Bill

Matthew_D_Moss · February 17, 2008, 8:17pm

Without looking for literature (on purpose), I thought that an exact
solution could perhaps be computed from the convex hull of the set of
points. I’ve written a self-made algorithm to compute the convex hull,
but run out of time to continue with the problem.

Anyone?

– fxn

Matthew_D_Moss · February 17, 2008, 8:18pm

Lionel B. wrote:

get the best solution by gradient walking (I don’t think there are
local maximums) but it’s not the general case. Unfortunately it can be
proven that such a circle can encircle all points, but not that it’s
the smallest (and my experiments proved that it isn’t).

Update : it seems I was wrong and probably had a bug in this
implementation as I just read this algorithm described as the best known
solution until 1983 On the paper referenced by Douglas :

http://www.cs.mcgill.ca/~cs507/projects/1998/jacob/

On a purely random set, my gradient walking is 27 times faster on 10000
points than Meggido’s algorithm (but certainly slower in some cases,
Douglas’ solution gets the result in a fairly constant time for a given
set size).

Hum, after testing, modifying the initial step for the gradient walking
from

max width of the set / point_number
to
max width of the set / 4

makes it competitive in all cases (12s on 10000 points instead of 270
for Meggido’s) and not balanced data sets (which random ones are) only.

Lionel

Matthew_D_Moss · February 17, 2008, 9:03pm

From: “Frank F.” [email protected]

points are far away. In this case the center is near that mass of points
and the line-search may fail to find an optimal solution. For example:

puts encircle( ([[0.0,0.0]] * 100 + [[0.0,1.0], [1.0,0.0]]).map{|p|
Point.new(*p)})

(100 points at (0,0), one at (1,0) and one at (0,1)) gives a solution of
{center:(0.00980392156862745, 0.00980392156862745), radius:0.990244611507173}
but optimal would be (1/2, 1/2) and radius 1/sqrt(2).

My variation using an axis-aligned bounding box to pick the
initial center point appears to handle the above case.

However I would not be surprised if it still might find some
sub-optimal solutions… :-/

Regards,

Bill

Matthew_D_Moss · February 17, 2008, 9:40pm

And again I reply to my own post.

This is getting a bad habit. Sorry. This slightly modified version
uses the center of the two points with maximum distance as starting
point. For random sets, the results fit quite well most of the time –
on certain sets it’s farther off than the mass center version though.

Regards,
Thomas.

def encircle(points) # takes array of Point objects
points = points.uniq
return if points.nil? or points.empty?

return Circle.new(points[0], 0) if points.size == 1

m, n = points.combination(2).sort_by {|a, b| -distance(a, b)}[0]
a    = Point.new(m.x / 2 + n.x / 2, m.y / 2 + n.y / 2)
return Circle.new(a, distance(a, m)) unless points.size > 2

points = points.sort_by {|p| -distance(a, p)}
f  = points[0]
df = distance(f, a)
b  = med(f, a)
e  = 1e-10
1000.times do
    db = distance(f, b)
    if points.all? {|p| distance(b, p) <= db + e}
        da = distance(f, a)
        if (da - db).abs <= e
            return Circle.new(b, db)
        else
            a, b = b, med(f, b)
        end
    else
        b = med(a, b)
    end
end
raise RuntimeError

end

def med(a, b)
Point.new((b.x - a.x) / 2.0 + a.x, (b.y - a.y) / 2.0 + a.y)
end

def distance(p1, p2)
Math.sqrt((p1.x - p2.x) ** 2 + (p1.y - p2.y) ** 2)
end

Matthew_D_Moss · February 17, 2008, 10:28pm

On Feb 15, 3:19 pm, Matthew D Moss [email protected] wrote:

As mentioned, this one may be more difficult than it seems. Feel free
to estimate the smallest circle, if you get stuck on getting the
exact solution.

This is really way more difficult than it seemed to me…

I was approaching some geometrical solution just to be shocked that
there are cases there not both ends of the longest chord lay on the
smallest circle!

Check out these (found by pure luck and help of random):

-0.385628 -0.435444
-0.319182 +0.478400
+0.257012 +0.333913
-0.243603 -0.496925

The longest chord is from points[0] to points[2] (of length 1.002445),
but correct solution appears to be the circle around a triangle [[1],
[2], [3]]. It includes second to longest chord [1]->[3] of length
0.978249.

Matthew_D_Moss · February 17, 2008, 8:21pm

Bill K. wrote:

Hehe… if Wikipedia is to be believed, gnuplot is 22 years old, and
OpenGL is 16 years old. Practically siblings.

True, but using OpenGL for rendering data sets was only recently made
easy thanks to Ruby

Lionel

Matthew_D_Moss · February 19, 2008, 12:28am

Here is my solution. Although it uses a brute force approach if the
first attempts (driven by obvious constraints) of determining the
circle fail, it scales on larger sets, because the given equal
distibution of random points form an approximate square on these
sets. In that case the convex hull which is used for the brute force
has very few points.

I also did some visualization with RMagick.

g phil

Matthew_D_Moss · February 19, 2008, 1:33am

Philipp H. wrote:

Here is my solution. Although it uses a brute force approach if the
first attempts (driven by obvious constraints) of determining the
circle fail, it scales on larger sets, because the given equal
distibution of random points form an approximate square on these
sets.

Wow, that was awfully fast for the random distribution, it takes more
time to display the result than computing it even on 10000 points !

On badly shaped point clouds it suffers greatly, I couldn’t get it to
compute the circle for 1000 points if I feed it with :

class Point < Struct.new(:x, :y)
def self.random
ro = rand / 2
theta = Math::PI * 2 * rand
Point.new(ro * Math::cos(theta) + 0.5, ro * Math::sin(theta) + 0.5)
end
[…]
end

The ruby process happily took more than 800MB and I had to kill it while
I could still get some feedback from my poor laptop

Note :
I found it suspicious that I could get better performance than the
accepted best mathematical solution in the general case and I reflected
a while on that.
I think the reason why the gradient walking is faster is because it is
inherently inaccurate. Megiddo’s algorithm defines the exact position of
the center using a geometrical process. Unfortunately computers must
manipulate floating point data when describing points, segments,
vectors, circles, … So any computer result is inherently inaccurate.
The gradient walking method profits from this inaccuracy: it stops when
the step becomes too small to move the current best position because of
rounding errors. This happens in o(n) steps where n is the number of
bits used to represent the floating point mantissa which is a huge win
(I don’t remember the actual number of bits, but it must be < 64)
compared to the o(n) of Megiddo’s where n is the number of points.

I benched the hull computing method and I believe that we can get the
best of both worlds : on any distribution of points I threw at it
(random in a square, in a disk or on a circle) computing the convex hull
took 0.4s for 10000 points on my laptop. To be as fast as possible, we
could run your algorithm up to the point where it switches on the brute
force method. Then, instead of looking for the best circle based on
three points of the hull, we’d use the gradient walking method. Using
only the hull points saves time in the most common cases and the
filtering is quick enough to be negligible in the worst case when no
point can be filtered out.

My tests only filtering the points using your ConvexHull class before
applying gradient walking returns in 0.5s for the initial distribution
(on a square) or on a disk and fall back to 12-13s on the worst case:
distribution on a circle.

Lionel.

Matthew_D_Moss · February 19, 2008, 11:21pm

Philipp H. wrote:

What you call a ‘badly shaped point clouds’ is in fact an approximate
circle.

In fact initially I tested with a disk (which is an intermediate level
before the circle for the number of points on the convex hull), the
worst shape is indeed a circle (when you replace ro by a constant in my
random method for people still following) : the number of polygone
vertices is then the total number of points.

I confident it doesn’t do any harm to your poor laptop this time.

The little thing survived ! Clarkson’s is good for the worst case (every
point on the same circle) which fits common sense (any circle should
encircle all points, so the first iteration should win minus rounding
errors in floating point computations), it’s nearly 3x as fast as the
gradient walking in this case. The speed doesn’t change too much given
various random shape (between 2 to 4s on my laptop). The gradient
walking still wins when a small portion of the points are on the convex
hull (random on a disk for example: 0.5s vs 4s).

I believe gradient walking should be faster when less than one third of
the points are on the convex hull, slower in the other case, but it’s a
rough estimation based on observed behavior.

One thing that I don’t like though is that the total time is difficult
to predict. Probabilistic algorithms often have good performance but I’m
always uneasy when I’m not sure how much time they can spend before
returning and even when they will exhibit bad behaviour. It’s a matter
of constraints in which the code must fit and I mostly have to return
results in a timely manner in most of my projects

Is there any public paper describing the performance behavior of the
algorithm? All scientific papers returned by Google point to
paid-subscription materials.

Lionel

PS: is it me or is this becoming more an algorithm-Quiz than a
Ruby-Quiz? It’s my first Quiz and I was expecting to see some clever use
of Ruby, but in this kind of Quiz I don’t see much place for elegant
code.
Not that I complain: I love to study the performance of algorithms too

Matthew_D_Moss · February 19, 2008, 10:06pm

On Tue, Feb 19, 2008 at 09:32:16AM +0900, Lionel B. wrote:

On badly shaped point clouds it suffers greatly, I couldn’t get it to
compute the circle for 1000 points if I feed it with :

What you call a ‘badly shaped point clouds’ is in fact an approximate
circle. It is true that this is the worst case for convex hull based
approaches because in this case the convex hull has lots of points.

It would be nice to have an algorithm that produces random
distibutions that form arbitrary polygons. But I guess that’s a task
for another quiz.

As I mentioned in one of my comments in the code, right before it goes
into brute force, I prefer using Clarkson’s probabilistic aproach here.

So here is my improved version, still based on convex hulls but also
able to deal with circular shaped distributions.

I confident it doesn’t do any harm to your poor laptop this time.

…

g phil

Matthew_D_Moss · February 20, 2008, 2:02am

On Wed, Feb 20, 2008 at 07:20:06AM +0900, Lionel B. wrote:

paid-subscription materials.
sorry, I haven’t come across a public paper of clarkson’s algorithm,
either. But it’s proven that the algorithm has O(log(n)), n of course
being the number of given points.

I’m not going to do a full proof here, but it’s an outline.

You need at max 3 points to describe the smallest enclosing circle. If
you haven’t found it yet you increase the chance of every point that is
outside a circle determined by 13 random points each turn, by
doubleing their relative frequencies.

After k iterations at least one of the (at max) three points you are
looking for has a relative frequency of 2**(k/3)

In average the sum of relative frequencies is increased by a factor of
1+3/13 each iteration.

For an increasing k this leads to a paradox: Because at some point the
relative frequency of at least one point 2**(k/3) would exceed the
overall sum of relative frequencies (n*(1+3/13))**k, if the algorithm
hadn’t come to an end, yet.

The only reference I found on the net where it is at least partially
explained and where I got the outline for the proof from is this (but
be warned, it’s in german):

http://www-i1.informatik.rwth-aachen.de/~algorithmus/algo42.php

PS: is it me or is this becoming more an algorithm-Quiz than a
Ruby-Quiz? It’s my first Quiz and I was expecting to see some clever use
of Ruby, but in this kind of Quiz I don’t see much place for elegant
code.
Not that I complain: I love to study the performance of algorithms too

Even if this one seems to be more about the algorithm, we still do it
in Ruby, right? Anyway we’ll see what our brand new quizmaster will
focus
on.

g phil

Matthew_D_Moss · February 20, 2008, 4:48am

From: “Matthew M.” [email protected]

I admit I wanted to see what folks not familiar with the problem
might do to estimate a solution.

Hmm, yeah - I deliberately avoided researching existing algorithms.

Would be interesting to calculate how much my crude hack tends to
deviate from the perfect solution(s).

Regards,

Bill

Matthew_D_Moss · February 20, 2008, 2:52am

PS: is it me or is this becoming more an algorithm-Quiz than a
Ruby-Quiz? It’s my first Quiz and I was expecting to see some clever use
of Ruby, but in this kind of Quiz I don’t see much place for elegant code.
Not that I complain: I love to study the performance of algorithms too

I aim to have a mix… Things that deal with algorithms to see how
people will approach them with Ruby, and things that focus more on
aspects particular aspects of Ruby. And then just some fun things.

Because of the nature of the problem, this quiz probably leaned very
heavily towards the algorithm side… I admit I wanted to see what
folks not familiar with the problem might do to estimate a solution.

Matthew_D_Moss · February 20, 2008, 8:01am

@bill

Would be interesting to calculate how much my crude hack tends to
deviate from the perfect solution(s).

You might be more interested in speed.

@matthew

I aim to have a mix…

I hope (most of) the quizzes will leave some room for “creativity”.
Maybe there shouldn’t be a proven set of best solutions. But a
comparison of well thought-out approaches is interesting of course.
I’m looking forward to the write-up.

Regards,
Thomas.

Matthew_D_Moss · February 20, 2008, 10:22am

(Benchmark code is attached…)

For completeness’ sake, I’ve uploaded the benchmark script, plus
all the solutions… I had to tweak a few slightly to remove
printouts and such:

http://tastyspleen.net/~billk/ruby/quiz/157-smallest-circle/benchmark/

Regards,

Bill

Matthew_D_Moss · February 20, 2008, 10:08am

From: “ThoML” [email protected]

@bill

Would be interesting to calculate how much my crude hack tends to
deviate from the perfect solution(s).

You might be more interested in speed.

Hi Thomas,

I wasn’t able to benchmark your solution because I couldn’t
find the #combination method?

These are the times I get running the other solutions,
10 runs of 1000 points each. Each run, every solution is
fed the exact same point data.

$ ruby 157_benchmark.rb
user system total real
FRANK 23.469000 0.297000 23.766000 ( 23.781000)
JUSTIN 6.765000 0.000000 6.765000 ( 6.812000)
LIONEL 5.704000 0.000000 5.704000 ( 5.703000)
DOUG 3.797000 0.031000 3.828000 ( 3.829000)
PHILIPP 1.390000 0.016000 1.406000 ( 1.422000)
BILL 0.281000 0.000000 0.281000 ( 0.281000)

Looks like my speed is good… Still curious how my accuracy
compares though…

(Benchmark code is attached…)

Regards,

Bill

Matthew_D_Moss · February 20, 2008, 11:14am

These are the times I get running the other solutions,
10 runs of 1000 points each. Each run, every solution is
fed the exact same point data.

Auuughhh!!

Sorry for another self-reply… :-/

The benchmark script I attached to the earlier post did not,
in fact, pass the same point data to each solution per run.
(Maybe that’ll teach me to make last minute changes without
verifying…)

The benchmark script at
http://tastyspleen.net/~billk/ruby/quiz/157-smallest-circle/benchmark/
now does correctly pass the same coordinate data to each script
per run.

. . .

I had hoped by averaging multiple runs (10) - each run having
different point data - that the rankings would have been pretty
stable.

However, JUSTIN and LIONEL still tend to trade places on
different invocations:

         user     system      total        real

FRANK 24.421000 0.297000 24.718000 ( 24.734000)
JUSTIN 6.110000 0.000000 6.110000 ( 6.109000)
LIONEL 6.078000 0.000000 6.078000 ( 6.094000)
DOUG 3.750000 0.000000 3.750000 ( 3.766000)
PHILIPP 1.609000 0.000000 1.609000 ( 1.609000)
BILL 0.328000 0.000000 0.328000 ( 0.328000)

         user     system      total        real

FRANK 24.656000 0.156000 24.812000 ( 24.828000)
LIONEL 6.047000 0.000000 6.047000 ( 6.047000)
JUSTIN 4.375000 0.000000 4.375000 ( 4.375000)
DOUG 3.281000 0.016000 3.297000 ( 3.297000)
PHILIPP 1.500000 0.000000 1.500000 ( 1.500000)
BILL 0.297000 0.000000 0.297000 ( 0.297000)

         user     system      total        real

FRANK 24.594000 0.172000 24.766000 ( 24.781000)
LIONEL 6.109000 0.000000 6.109000 ( 6.109000)
JUSTIN 5.922000 0.000000 5.922000 ( 5.922000)
DOUG 4.031000 0.016000 4.047000 ( 4.047000)
PHILIPP 1.344000 0.000000 1.344000 ( 1.344000)
BILL 0.281000 0.000000 0.281000 ( 0.282000)

         user     system      total        real

FRANK 24.203000 0.391000 24.594000 ( 24.672000)
LIONEL 6.109000 0.000000 6.109000 ( 6.125000)
JUSTIN 5.500000 0.000000 5.500000 ( 5.500000)
DOUG 4.188000 0.016000 4.204000 ( 4.219000)
PHILIPP 1.500000 0.000000 1.500000 ( 1.500000)
BILL 0.344000 0.000000 0.344000 ( 0.344000)

         user     system      total        real

FRANK 24.546000 0.219000 24.765000 ( 24.765000)
JUSTIN 6.047000 0.015000 6.062000 ( 6.062000)
LIONEL 5.828000 0.000000 5.828000 ( 5.828000)
DOUG 3.641000 0.000000 3.641000 ( 3.641000)
PHILIPP 1.688000 0.000000 1.688000 ( 1.688000)
BILL 0.296000 0.000000 0.296000 ( 0.296000)

(I have verified the rankings are stable if I hardwire
the random seed, though.)

Regards,

Bill

Matthew_D_Moss · February 20, 2008, 10:48am

Bill K. wrote:

(Benchmark code is attached…)

For completeness’ sake, I’ve uploaded the benchmark script, plus
all the solutions… I had to tweak a few slightly to remove
printouts and such:

Index of /~billk/ruby/quiz/157-smallest-circle/benchmark

Here’s my code updated with the convex hull method from Philipp which
doesn’t change the result but gives a huge speedup in the most common
cases.

Lionel