Finding blocks in black-and-white images (efficiently)

Dear all,

I have a number of black-and-white scanned pages. To prepare them for
OCR,
I have to split them in columns and rows. Additionally, somewhere in
between, there
are pictures, which also need to be separated.

So, in a page that might look like this:

Text1 Text4 Text6

Text2 Pict1 Text7

Text3 Text5 Pict2

I’d like to find the largest blocks of white which separate the texts
and pictures, both horizontally
and vertically.

Right now, I would use RMagick with export_pixels_to_str and then
regular expressions to find the
zeros, but I am not sure whether there’s a more effective way for this
purpose…

Do you have any suggestions ?

Thank you very much,

Best regards,

Axel

Axel E. wrote:

Text2 Pict1 Text7

Thank you very much,

Best regards,

Axel

I took the liberty of posting your question to the ImageMagick forum
[Find the white blocks between text and pictures - Legacy ImageMagick Discussions Archive].
There’s some pretty good IM users on that forum and usually it’s not
hard to convert IM commands and options to RMagick code. If they have
any suggestions I’ll let you know.

On Sep 2, 2008, at 5:52 AM, Axel E. wrote:

Text1 Text4 Text6
regular expressions to find the


GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit
Pastry Passion!
Aktuelle Nachrichten aus Politik, Wirtschaft & Panorama | GMX

you are attempting to roll your own image segmentation. google for
‘computer vision’. some helpful links

http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/

http://www.itk.org/

http://camellia.sourceforge.net/

it can be quite a different domain than normal image processing

a @ http://codeforpeople.com/

Axel E. wrote:

Text2 Pict1 Text7

Text3 Text5 Pict2

I’d like to find the largest blocks of white which separate the texts and pictures, both horizontally
and vertically.

Anthony, one of the IM team, has a suggestion you can read here:
Find the white blocks between text and pictures - Legacy ImageMagick Discussions Archive.
If this is something you want to pursue let me know and we can work on
converting his shell script to Ruby.

Axel E. wrote:

thank you very much for your help. This script does indeed look very interesting – and very heroic !
It would be very nice to have it in RMagick,as far as I am concerned. I fear that my shell scripting
capabilities/knowledge of RMagick will not suffice to get it done in a very short time, so I’d some help to convert it into
Ruby. Also, more generally, how do you wrap ImageMagick functions in RMagick ? Do you call C functions ?
At the install, I was lazy and took the gem option :wink:

Okay, I’ll see what I can do. I’ll follow up with you directly. I’m
going out of town tomorrow so it may be a couple of days.

ImageMagick is essentially a library with a C-level API. (Actually there
are two APIs, MagickCore and MagickWand, but that’s neither here nor
there.) The ImageMagick utilities (convert, mogrify, etc.) are
stand-alone programs that call into the library via the API. RMagick
uses the library, too.

Of course since RMagick is Ruby you get much more use out of the
ImageMagick library - access to individual pixels, for example - than
you can via the utilities, and Ruby makes it easier to use the API than
a shell scripting language does.

This page http://studio.imagemagick.org/RMagick/doc/optequiv.html
describes some of the RMagick API that corresponds to the ImageMagick
commands and options.

-------- Original-Nachricht --------

Datum: Fri, 5 Sep 2008 06:23:54 +0900
Von: Tim H. [email protected]
An: [email protected]
Betreff: Re: finding blocks in black-and-white images (efficiently)

ImageMagick library - access to individual pixels, for example - than
you can via the utilities, and Ruby makes it easier to use the API than
a shell scripting language does.

This page http://studio.imagemagick.org/RMagick/doc/optequiv.html
describes some of the RMagick API that corresponds to the ImageMagick
commands and options.


RMagick: http://rmagick.rubyforge.org/

Tim,

Thank you very much for the pointers !

Best regards,

Axel

-------- Original-Nachricht --------

Datum: Fri, 5 Sep 2008 03:25:04 +0900
Von: Tim H. [email protected]
An: [email protected]
Betreff: Re: finding blocks in black-and-white images (efficiently)

Anthony, one of the IM team, has a suggestion you can read here:
Find the white blocks between text and pictures - Legacy ImageMagick Discussions Archive.
If this is something you want to pursue let me know and we can work on
converting his shell script to Ruby.


RMagick: http://rmagick.rubyforge.org/

Dear Tim,

thank you very much for your help. This script does indeed look very
interesting – and very heroic !
It would be very nice to have it in RMagick,as far as I am concerned. I
fear that my shell scripting
capabilities/knowledge of RMagick will not suffice to get it done in a
very short time, so I’d some help to convert it into
Ruby. Also, more generally, how do you wrap ImageMagick functions in
RMagick ? Do you call C functions ?
At the install, I was lazy and took the gem option :wink:

Thanks again and looking forward to your answer!

Best regards,

Axel