Usenet.com

www.Usenet.com

Group Index

Comp Thread Archive from Usenet.com

<-- __Chronological__ --> <-- __Thread__ -->

Re: prevent OCR







Felix Deutsch wrote:

Impossible. As long as a human reader is able to comfortably read it,
any state-of-the-art OCR software should be able to handle it.

I'm not sure where the requirement of reading the text comfortably comes from?

  This is where the adversary modelling comes in.  How much does he
want to get OCR done? If it's reasonably short text, why not touch-type
it from the screen? If it's long, why not send it to a typing sweat-shop
off-shore? Timing could be an issue, in which case it's a question of
delaing the adversary as much as possible: to make it a huge job.

  An average user will be stopped by the OCR program complaining about
resolution too low -- there's been several posts on that earlier.

  Even if the adversary does have a top line OCR program, does it handle
8x5 sized letters well? Especially if sampling problems are introduced so
that an 'a' will produce thirty different bit patterns, as well as run into
the immediately preceding and following glyphs?

  (Try scanning an ordinary text as 60 dpi b/w -- that's approximately
the effect I'm going for. A human can read it, with some difficulty,
but an ordinary OCR program doesn't get enough information about segments to
work properly. )

I think it's quite possible to do OCR on a lot less dpi than 200.

Try the scenario I've just suggested: 60 dpi b/w. Please note: it's not a question if it can be done at any cost, but whether some other process would be cheaper (in time or work). If typing by hand is cheaper, and gives a better failure rate, there's no point in using OCR.

  At 60 dpi automatic despeckling becomes a serious problem,
unless you can turn it off. Another of those things the
average user will be stopped by. And even without despeckling,
letters run together in a way that makes them very difficult to
split automatically.

Just load the picture into any graphics program and juggle the palette
for maximum contrast.

So why can't the adversary unjuggle it? Again, a general user would probably stop in confusion when the OCR program produces nothing. A more experienced user would understand the problem, and just change the colours into something that could be easily distinguished after greyscale conversion. If the colours are 'flat', it's trivial. If the colours are grainy -- as after adding noise -- it will again take a bit of work to do it.

  I made a few tests, and in each I could distinguish foreground
and background -- not by eye, but with histogram stretching and
similar techniques.   The question becomes 'just how much
effort is protection worth? How much time will an adversary be
prepared to spend'? And at what point is OCR not the best solution
anymore? I'm just trying to push it over that point.

--
Anders Thulin     [EMAIL PROTECTED]     http://www.algonet.se/~ath




<-- __Chronological__ --> <-- __Thread__ -->


Usenet.com




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.