Re: flatulent-0.0.1 ascii captcha for the masses

stickstone · July 3, 2007, 4:53pm

the flatulent gem provides brain dead simple ascii art captcha for
ruby.

Hmm, maybe I’m missing the point, but aren’t these really easy for bots
to
decode?

Looking at the examples at http://drawohara.tumblr.com/post/4791838
I see that random ASCII characters have been dropped around. But these
can
be removed trivially, e.g.

gsub!(/[^/|\_()\n]/,’ ')

Only minimal damage has been done to the original characters, which are
now
easy to pattern-match. It therefore seems that the randomly-strewn
characters have the effect of making it more difficult for
visually-impaired
users to access the site, but cause very little impedement to bots

Now, if this ASCII art were turned into a PNG I guess that would make it
a
bit harder - although not much, since it’s pretty trivial to OCR a clean
grid of ASCII characters back to ASCII, albeit more computationally
expensive.

Perhaps this captcha will be useful if very few sites use it, so the
spammers don’t bother writing a decoder. But in that case you don’t want
it
used by “the masses”

Regards,

Brian.

stickstone · July 3, 2007, 7:09pm

On Jul 3, 2007, at 8:44 AM, Brian C. wrote:

be removed trivially, e.g.

gsub!(/[^/|\_()\n]/,’ ')

that noise was just an example - the built-in noise is now based on
the characters used to draw the words so doing that would remove the
word itself!

Only minimal damage has been done to the original characters, which
are now
easy to pattern-match. It therefore seems that the randomly-strewn
characters have the effect of making it more difficult for visually-
impaired
users to access the site, but cause very little impedement to bots

the point is that there aren’t any characters! think about it -
there are only random chars strewn about - not a single character
from the word exists in the captcha. for instance, the
Flatulent.element(‘foobar’) produces this captcha:

   __     &nbsp
;           &nbsp
;    _       \           &
nbsp;  
  / _|
           _        | |
            
         
 |
 |
_     ___      __
_    |
 |__  |   __ _    _ __ <
br>||  _|
   / _ \    /
 _ \   | '_ 
\    / _| |  | '__|
 |
 |    | (_) |  | (_)
 | /|
 |_) |  | (_|||  | |
   
 |_|     |___/
    \___|   |_._\/
    \__,_|  |_|
   
        &n
bsp;        _      _           &
nbsp;           &
nbsp; 
   _           &
nbsp;           <
span 'style=color:#ccc;font-style:oblique'>|           &
nbsp;  |     |

have fun putting that together. to do it you need to render, not
just parse, html! no, let’s just say you drive firefox to render the
html, and then clip out the image, saving it as a tiff. this is the
result http://ocrnow.com gives for the above ascii captcha

l_  "l    1  l_

_ J / l_ ’ / / _l
/ ó
iM=,

pretty close huh?

now, where i’m heading now, is using css and javascript so to
position the image and characters within the image. this means you
would actually need to first ocr the entire screen to find the
captcha, then clip that out, then ocr that.

i’ve been working with ocr’ing several million satellite scans for
the past three years and, let me tell you, it’s hard even when the
text is clear if the position of said text cannot be know with
certainty.

and, backing up a bit, with ascii capture you are several difficult
steps away from having an image - even in the simple case.

Now, if this ASCII art were turned into a PNG I guess that would
make it a
bit harder - although not much, since it’s pretty trivial to OCR a
clean
grid of ASCII characters back to ASCII, albeit more computationally
expensive.

see above.

Perhaps this captcha will be useful if very few sites use it, so the
spammers don’t bother writing a decoder. But in that case you don’t
want it
used by “the masses”

i’ll give 100 bucks to the first person on the list to crack it when
i reach 1.0

two other factors in favour of ascii art

there are tons of ocr programs out there available for free.
there are no ascii art regognition programs that i am aware of.
captcha is not suppose to be ‘secure’ anyhow - it’s suppose to make
it ‘harder’ for bots, that it all.
the porn industry and is actively moving to ascii art as we
speak. this is precisely because it is impossible to filter
using traditional or ocr based spam filters. i think we can all agree
that, when it comes to beating the system, they are the ones to follow.

-a

stickstone · July 3, 2007, 7:28pm

On Wed, Jul 04, 2007 at 02:07:07AM +0900, ara.t.howard wrote:

have fun putting that together. to do it you need to render, not
just parse, html! no, let’s just say you drive firefox to render the
html, and then clip out the image, saving it as a tiff. this is the
result http://ocrnow.com gives for the above ascii captcha
l_  "l    1  l_  
_ J / l_ ’ / / _l
/ ?
iM=,

Is that supposed to be readable?

stickstone · July 3, 2007, 7:50pm

On Jul 3, 2007, at 12:07 PM, ara.t.howard wrote:

the porn industry and is actively moving to ascii art as we speak.

We will be nice and not ask how you came by this statistic, Ara.

James Edward G. II

stickstone · July 3, 2007, 7:38pm

On Jul 3, 2007, at 11:25 AM, Chad P. wrote:

Is that supposed to be readable?

that is exactly the result of ocr’ing (without training of
course) the image.

attached is result file and image.

stickstone · July 3, 2007, 8:07pm

On Jul 3, 2007, at 11:48 AM, James Edward G. II wrote:

On Jul 3, 2007, at 12:07 PM, ara.t.howard wrote:

the porn industry and is actively moving to ascii art as we speak.

We will be nice and not ask how you came by this statistic, Ara.

James Edward G. II

lol!

http://drawohara.tumblr.com/post/4803694

-a

stickstone · July 5, 2007, 12:33am

On Jul 4, 2007, at 12:47 PM, Brian C. wrote:

src = File.read(“test.html”)
| | / _ \ / _ \ | ’ \ / _ | | '__|

Anyway, once your code is deployed on a real live site, by someone
other
than you, it becomes much harder to change. And the source is going
to be
available to the attacker too.

the latest version addresses all these issues and more. check out

http://drawohara.tumblr.com/post/4944987
http://fortytwo.merseine.nu:3000/flatulent/ajax

key points:

noise is image chars
no color diff between noise and image chars
image is not visible without running gecko or otherwise rendering
javascript
image has an encoded timebomb in it: attacker has only 60s for
post. this just rules out brute force attacks.

i think bumps it up into a new league of attacks - maybe not though,
people are creative

However, perhaps ASCII-art generation (as a form of unusual and
disjointed
character set) combined with server-side rendering to a PNG would
get around
that issue, save you a lot of work in obfuscating the HTML itself,
and also
be harder to parse.

true. i’m not too worried about that though.

contains
something that looks like ASCII art could probably be classified as
spam on
that basis alone.

the problem is that acsii art can contain any chars. ← ascii art

ASCII art is, I believe, much more suited to machine reading than a
scanned
printout.

i thought so too until i started playing with ocr’ing it - the
results are absolutely terrible. no doubt someone could train it -
but that’s true of all captchas: a sufficiently trained one will win.

Most importantly, the characters will be on an exact
horizontal/vertical grid alignment, not rotated by a few degrees.

version 0.0.3 adds vertical and horziontal displacement. the next
one will introduce rotation.

And also I
suspect there will probably only be a handful of legible ASCII art
character
sets to choose from.

but the ‘pixel’ charset is large. version 0.0.3 works on that angle
too.

Anyway, time will tell. If your captcha isn’t widely used, then it may
remain strong enough for a reasonable time. (That’s apart from the
usual
attacks on captchas, such as redirecting them to other humans who
are in
search of porn

right. and this is the key point: attacks can beat you with that
strategy every time (not with my timebomb though - at least not as
often). the only goal for a captcha is that it is not easily beaten
by average coders - it’s not securing something after all - it’s a
filter (not wall) for bots.

i’ll await your next attack!

cheers.

-a

stickstone · July 4, 2007, 8:56pm

have fun putting that together. to do it you need to render, not
just parse, html!

It looks pretty easy to me. You’ll conveniently put all the noise
characters
in a different colour.

Here’s my two-minute solution:

$ cat reader.rb
src = File.read(“test.html”)

src.gsub!(/<span [^>]#ccc[^>]>([^<])</span>/i) { " " * $1.size }
src.gsub!(/ /, ’ ')
src.gsub!(/
/i, “\n”)
src.gsub!(/</?pre[^>]>/, ‘’)
puts src
$ ruby reader.rb

/ | | |
| | ___ ___ |__ __ _ _ __
| | / _ \ / _ \ | ’ \ / _ | | '__|
| | | () | | () | | |) | | (| | | |
|| / _ |. / _,| |_|

Of course you can keep changing your code, and I can keep changing mine.
But
someone who took more than two minutes over this could come up with a
much
more robust solution (e.g. dynamically working out the contrast between
foreground and background)

Anyway, once your code is deployed on a real live site, by someone other
than you, it becomes much harder to change. And the source is going to
be
available to the attacker too.

now, where i’m heading now, is using css and javascript so to
position the image and characters within the image.

Hmm - this risks making the captcha visible by fewer and fewer browsers.
OK,
so lynx wouldn’t be able to view a PNG captcha either; but you risk
locking
out a lot of mobile devices, set-top boxes and other embedded web
browsers
(which could otherwise display a PNG quite happily)

However, perhaps ASCII-art generation (as a form of unusual and
disjointed
character set) combined with server-side rendering to a PNG would get
around
that issue, save you a lot of work in obfuscating the HTML itself, and
also
be harder to parse.

two other factors in favour of ascii art

there are tons of ocr programs out there available for free.
there are no ascii art regognition programs that i am aware of.

That’s not because it’s hard - it’s because it’s been totally pointless,
until now that is. If spammers start using ASCII art text, then there’s
an
incentive to make a reader. On the other hand, any E-mail which contains
something that looks like ASCII art could probably be classified as spam
on
that basis alone.

ASCII art is, I believe, much more suited to machine reading than a
scanned
printout. Most importantly, the characters will be on an exact
horizontal/vertical grid alignment, not rotated by a few degrees. And
also I
suspect there will probably only be a handful of legible ASCII art
character
sets to choose from.

Anyway, time will tell. If your captcha isn’t widely used, then it may
remain strong enough for a reasonable time. (That’s apart from the usual
attacks on captchas, such as redirecting them to other humans who are in
search of porn

Regards,

Brian.

stickstone · July 5, 2007, 1:54am

ara.t.howard wrote:

image has an encoded timebomb in it: attacker has only 60s for
post. this just rules out brute force attacks.

From when does it start counting? If I’ve read a blog post and then
try to comment, it’s likely I’ve already used more than 60 seconds. In
fact, probably most of the time I take more than 60 seconds to comment
by itself.

I think a good protection scheme will take into account several factors,
assign them points for failure (or passing), and once a threshold has
been reached, fail the entire thing (or pass it, if you chose that
route).

Sam

stickstone · July 5, 2007, 4:54am

On Jul 4, 2007, at 5:52 PM, Sammy L. wrote:

From when does it start counting?

from when the response it served. a timebomb is encoded in the form
with a server-side key.

to set the key

Flatulent.key = ‘hostname or something’

to set the timebomb threshold

Flatulent.ttl = 120 # seconds

If I’ve read a blog post and then try to comment, it’s likely
I’ve already used more than 60 seconds. In fact, probably most of
the time I take more than 60 seconds to comment by itself.

I think a good protection scheme will take into account several
factors, assign them points for failure (or passing), and once a
threshold has been reached, fail the entire thing (or pass it, if
you chose that route).

indeed, internally we also will also track ip and greylist after n
attempts.

cheers.

-a

stickstone · July 5, 2007, 2:19am

On Jul 4, 2007, at 6:52 PM, Sammy L. wrote:

I think a good protection scheme will take into account several
factors, assign them points for failure (or passing), and once a
threshold has been reached, fail the entire thing (or pass it, if
you chose that route).

Sam

you should do like blogger (blogspot) and others, allow writing and
after clicking on ‘submit’ or ‘post’ or whatever to submit the form
info, you then redirect to a page with the captcha and a submit.
after the captcha page is sent, begin the count. 60 seconds seems a
bit short for a whole post, but with a separate redirect to the
captcha page, it’s totally reasonable. If it takes longer, redirect
again to a new captcha. After 3 or 4 failed attempts, save it in a
log, kill that cookie and require a fresh start or a harder captcha.

Don’t put the count in JavaScript EVER. Client side code is totally
spoof-able.
All you need is the session data in the cookie to identify the user
and check to see if the response came quick enough.
60 seconds might not be long enough, but a browser will time out
during that long of a wait for a request’s response. Still a little
longer might be appropriate from an accessibility standpoint.

stickstone · July 5, 2007, 9:32am

On Thu, Jul 05, 2007 at 07:31:43AM +0900, ara.t.howard wrote:

However, perhaps ASCII-art generation (as a form of unusual and
disjointed
character set) combined with server-side rendering to a PNG would
get around
that issue, save you a lot of work in obfuscating the HTML itself,
and also
be harder to parse.

true. i’m not too worried about that though.

I’d be worried about the JavaScript and CSS requirements. In fact, I
won’t use a system for validating humanity that doesn’t work in Lynx,
unless some other necessary functionality of the website absolutely
cannot work in Lynx (such as Flash animations). Even then, I’d probably
avoid something that won’t work in Lynx, since (for example) a Lynx user
could navigate to YouTube and do a search to find a particular video,
then use youtube-dl to download it to the computer and play it using
MPlayer. No in-browser support for Flash video needed. YouTube can be
a
useful website for a Lynx user – so mine should be, too, since I don’t
even provide Flash videos as the main content of any of my websites.

stickstone · July 5, 2007, 9:35am

On Thu, Jul 05, 2007 at 09:18:01AM +0900, John J. wrote:

you should do like blogger (blogspot) and others, allow writing and
after clicking on ‘submit’ or ‘post’ or whatever to submit the form
info, you then redirect to a page with the captcha and a submit.
after the captcha page is sent, begin the count. 60 seconds seems a
bit short for a whole post, but with a separate redirect to the
captcha page, it’s totally reasonable. If it takes longer, redirect
again to a new captcha. After 3 or 4 failed attempts, save it in a
log, kill that cookie and require a fresh start or a harder captcha.

Avoid reliance on cookies. For one thing, cookies can be forged. For
another, you’ll lose a lot of people with requirements for cookies.
Modern browsers tend to provide a means for selectively refusing
cookies,
and a lot of people use those features.

Don’t put the count in JavaScript EVER. Client side code is totally
spoof-able.
All you need is the session data in the cookie to identify the user
and check to see if the response came quick enough.

Session data need not be stored in a cookie. There are other ways to do
it as well – allow for those who won’t (or can’t) accept cookies.

stickstone · July 5, 2007, 11:14am

ara.t.howard wrote:

image has an encoded timebomb in it: attacker has only 60s for post.
this just rules out brute force attacks.

Only naive ones… The brute force attack that still works is a
birthday attack. Using that they can try attacks as fast as they can
generate new captchas - you expect a collision every 1.2*sqrt(n)
attempts, where n is the size of your keyspace. That’s probably OK for
“captcha per comment” sites, but it’s dangerous for “captcha per
account” sites.

stickstone · July 5, 2007, 4:04pm

Chad P. wrote:

However, perhaps ASCII-art generation (as a form of unusual and
unless some other necessary functionality of the website absolutely
cannot work in Lynx (such as Flash animations). Even then, I’d probably
avoid something that won’t work in Lynx, since (for example) a Lynx user
could navigate to YouTube and do a search to find a particular video,
then use youtube-dl to download it to the computer and play it using
MPlayer. No in-browser support for Flash video needed. YouTube can be a
useful website for a Lynx user – so mine should be, too, since I don’t
even provide Flash videos as the main content of any of my websites.

I was once a proud member of the “This Web S. Best Viewed With Lynx”
club. Ah, the good old days, when a dollar was worth a dime, Netscape
was more popular than Internet Explorer and nobody’s cat had a web page.
;). I learned HTML by editing that web page by hand and generating code
with Perl 4 on an HP100 Pocket PC.

Well, maybe somebody’s cat had a web page …

stickstone · July 5, 2007, 4:56am

On Jul 4, 2007, at 6:18 PM, John J. wrote:

spoof-able.
All you need is the session data in the cookie to identify the user
and check to see if the response came quick enough.
60 seconds might not be long enough, but a browser will time out
during that long of a wait for a request’s response. Still a little
longer might be appropriate from an accessibility standpoint.

all good ideas - for now i’m just trying to get something working.
fyi all the stuff is client side, however the captcha and timebomb
have been blowfish encoded into hidden fields with a key known only
to the server. one could make guesses, but that’s about all.

cheers.

-a

stickstone · July 5, 2007, 8:34pm

On Thu, Jul 05, 2007 at 11:02:21PM +0900, M. Edward (Ed) Borasky wrote:

useful website for a Lynx user – so mine should be, too, since I don’t
even provide Flash videos as the main content of any of my websites.

I was once a proud member of the “This Web S. Best Viewed With Lynx”
club. Ah, the good old days, when a dollar was worth a dime, Netscape
was more popular than Internet Explorer and nobody’s cat had a web page.
;). I learned HTML by editing that web page by hand and generating code
with Perl 4 on an HP100 Pocket PC.

I don’t think any website I’ve put together in the last six or seven
years is best viewed in Lynx, but they degrade gracefully enough so a
Lynx user can still get at the content without too much trouble,
generally. I use CSS pretty exclusively for styling these days, and
sometimes use JavaScript too, but I don’t use either for core site
functionality without an alternative means of getting at content (like
the way Google Maps has a static version in case your browser can’t
handle AJAX).

stickstone · July 5, 2007, 4:50pm

On Jul 5, 2007, at 3:12 AM, Alex Y. wrote:

Only naive ones… The brute force attack that still works is a
birthday attack. Using that they can try attacks as fast as they
can generate new captchas - you expect a collision every 1.2*sqrt
(n) attempts, where n is the size of your keyspace. That’s
probably OK for “captcha per comment” sites, but it’s dangerous for
“captcha per account” sites.

yes. although flatulent does works without sessions, it’s about
three lines to save the data into the session and validate against
that as well so a cautious person would be wise to do so. i’ll be
adding automatic session validation for rails - but the api should
make it super easy anywhere.

cheers.

-a