reCAPTCHA: Using Captchas To Digitize Books

\"capt01.jpg\"

Captcha is well known for keeping automated spammers out and letting humans in.
However, ReCaptcha is a rather clever service using them to help digitize books scanned into the Internet Archive as well. It’s a project from the School of Computer Science at Carnegie Mellon.

The Internet Archive is home to over 200,000 scanned copies of classic books. Some of them are gorgeously crafted, like this children’s book, but fancy styling can make it difficult for computers to translate the books into an indexable digital text. Much like a Mechanical Turk application,

ReCaptcha uses humans to translate images of scanned words that a computer couldn’t understand.

The scanned words are placed alongside a normal captcha widget so users decode both words at the same time. The word can be run by multiple people to cut down on errors. Captchas also offer the opportunity to convert a lot of words.

ReCaptcha’s founders, Luis von Ahn and Ben Maurer estimate that about 60 million CAPTCHAs are solved every day. Assuming that each CAPTCHA takes 10 seconds to solve, it’ this is over 160,000 human hours per day (that’s about 19 years).

It’s great to see projects like this harnessing just a bit of our time to solve some important and complex problems.

Source

Leave a Reply

Your email address will not be published. Required fields are marked *