Friday, 2 December 2011

Do Your Bit for old books: reCAPTCHA

Have you heard of reCAPTCHA? It's a free CAPTCHA service that helps to digitize books, newspapers and old time radio shows.

A CAPTCHA is a program that can tell whether its user is a human or a computer and stops spam. You see them on most things, those squiggly words or numbers you have to put in.
About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. reCAPTCHA uses this human effort to digitising things printed before the digital age.
The books are scanned, and then transformed into text using "Optical Character Recognition" (OCR). The problem is that OCR is not perfect. reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher.

But, you ask, if a computer can't read such a CAPTCHA, how does the system know the correct answer? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

And you, yes YOU, can help. If you have a website, like we do, you can put it on it (We are in the process of doing this ourselves) or if you have a little spare time, just go on the website and solve a couple. Just think what your 5 mins of spare time can do for the massive amount of vintage books, newspapers and radio shows that are floating around.

This blog falls into our Do Your Bit series which is all about lending your hand to the little things to make the big things happen. Find out more about this series and our others at on our Overview blog or find more in the same series by pressing the "Do Your Bit" tab at the top of the page.