reCAPTCHA: Using Captchas To Digitize Books


Captcha is well known for keeping automated spammers out and letting humans in.
However, ReCaptcha is a rather clever service using them to help digitize books scanned into the Internet Archive as well. It’s a project from the School of Computer Science at Carnegie Mellon.

The Internet Archive is home to over 200,000 scanned copies of classic books. Some of them are gorgeously crafted, like this children’s book, but fancy styling can make it difficult for computers to translate the books into an indexable digital text. Much like a Mechanical Turk application,

ReCaptcha uses humans to translate images of scanned words that a computer couldn’t understand.

The scanned words are placed alongside a normal captcha widget so users decode both words at the same time. The word can be run by multiple people to cut down on errors. Captchas also offer the opportunity to convert a lot of words.

ReCaptcha’s founders, Luis von Ahn and Ben Maurer estimate that about 60 million CAPTCHAs are solved every day. Assuming that each CAPTCHA takes 10 seconds to solve, it’ this is over 160,000 human hours per day (that’s about 19 years).

It’s great to see projects like this harnessing just a bit of our time to solve some important and complex problems.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at

Up ↑

%d bloggers like this: