Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Google Technology

Google Using ReCAPTCHA To Decode Street Addresses 104

smolloy writes "Apparently some users of reCAPTCHA have recently begun seeing photographs appear in their CAPTCHA puzzles — photos that look very much like zoomed in house numbers taken from Google Streetview. It appears that Google has decided to put the reCAPTCHA system to help clean up Google streetview images, and 'according to a Google spokesperson, the system isn't limited to street addresses, but also involves street names and even traffic signs.' A large collection of these has appeared on the Blackhatworld website."
This discussion has been archived. No new comments can be posted.

Google Using ReCAPTCHA To Decode Street Addresses

Comments Filter:
  • Re:Eyebleed site (Score:4, Informative)

    by bertoelcon ( 1557907 ) on Thursday March 29, 2012 @05:20PM (#39515599)

    Wow that site is so terrible looking that it makes Geocities and myspace look decent. The only thing it's missing is cosmic cursors.

    Yeah, Techcrunch is really ugly isn't it.

  • by Gen-GNU ( 36980 ) on Thursday March 29, 2012 @05:28PM (#39515723)

    I have read the quote from Google about what they are doing several times, and I don't see what everyone else sees. It appears to me that they are using the already known street names and numbers as possible ReCAPTCHA images. What they are NOT doing is using the results given by people to define what the image says. The point of the experiment is to determine whether these images are sufficient to separate people from web-bots. I imagine that they will look at the number of 'wrong' answers from both sides of the test, and see if bots are able to parse the street view images significantly more often than the standard test images.

    So... can anyone point to something in the Google quote to show me where I went wrong? From TFA, here is the quote:

    We’re currently running an experiment in which characters from Street View images are appearing in CAPTCHAs. We often extract data such as street names and traffic signs from Street View imagery to improve Google Maps with useful information like business addresses and locations. Based on the data and results of these reCaptcha tests, we’ll determine if using imagery might also be an effective way to further refine our tools for fighting machine and bot-related abuse online.

  • by Baloroth ( 2370816 ) on Thursday March 29, 2012 @05:54PM (#39516037)

    Yet Google would have to know what the address numbers really was in order to validate the reCAPTCHA, so that can hardly be why they are doing it. They don't need to crowd source an answer that they already know.

    No they don't. They also add an altered text image alongside the picture (which presumably they generated), and can use that to validate the CAPTCHA. The street number can be validated by numerical probability (if 70% of them say it is "257", and the numbers "2,5,7" appear frequently in the rest, it is probably "257") even if they don't already know what it is.

  • by cforciea ( 1926392 ) on Thursday March 29, 2012 @05:54PM (#39516039)
    I don't think you know how reCAPTCHA works. You are always presented with two different items to decode. One of them is always a known answer, and the other they are less sure about, but become more sure after they show it to enough people and get a crowd sourced answer. They don't give you two prompts just to be double sure you are human.
  • by eldorel ( 828471 ) on Thursday March 29, 2012 @05:57PM (#39516081)
    Recaptcha works by using a known value with an unknown, it's why you have to type 2 words.

    One of the two words is considered solved, and is the actual captcha, the second word is using you as an ocr.

    After enough people provide the same solution for the second word, it goes into the solved category and is used for validation.

    They don't have to pay people to validate the addresses, we're doing it for free.
  • by eldorel ( 828471 ) on Thursday March 29, 2012 @06:38PM (#39516491)
    Not exactly, but pretty close.

    They give you 2 words, one is an already solved known value, and the other is an unknown word.
    if you get the first word correct, they take the value from your second word and add it to the "possible solutions" list.

    After 2000 or so people have solved the word, they examine the results for a statistically unique answer. If there is not outlier, (say 65% have the same answer) it goes back into the unknown pile.

    Once they find a statistically significant answer, it's considered "solved" and is used as one of the initial validation words.

    Rinse, repeat.
  • by LanMan04 ( 790429 ) on Thursday March 29, 2012 @06:49PM (#39516619)

    What they are NOT doing is using the results given by people to define what the image says.

    Um, no, that's exactly what ReCaptcha is for! The standard ReCaptcha images are all from old books that were scanned in (and presumably had trouble being OCRed with high confidence), and Google used ReCaptcha to "read" the words.

    For heaven's sake, ReCaptcha's MOTTO is: "reCAPTCHA: Stop Spam, Read Books"

    I read how it works. Multiple users are shown the same image, and once a few people have identified a given image as the same word, it's treated as the "correct" answer, and then later users have to match that answer to get past the ReCaptcha. This is why they show you more than one word....one word has a "known" answer, the other word is one they're still trying to figure out the "right" answer to.

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE

Working...