Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Google Security Spam Technology

Google Fires Back About Search Engine Spam 270

coondoggie writes "The folks at Google are taking issue over spam and the quality of Google searches, which some claim has gone down in recent months. Today on Google's official blog, Principal Engineer Matt Cutts said, 'January brought a spate of stories about Google’s search quality. Reading through some of these recent articles, you might ask whether our search quality has gotten worse. The short answer is that according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness. Today, English-language spam in Google’s results is less than half what it was five years ago, and spam in most other languages is even lower than in English.' Cutts also explained that the company has made a few significant changes to their method of indexing."
This discussion has been archived. No new comments can be posted.

Google Fires Back About Search Engine Spam

Comments Filter:
  • by Anonymous Coward on Friday January 21, 2011 @04:35PM (#34958424)

    A typical problem companies have is measuring the quality of their products: By their metric, it's great! But per the user experience it's not. The users must be wrong.

    The metric doesn't always capture the things that the users care about. Also, expectations can change. Better than five years ago may not be good enough

    Based on my experience, Google's search quality is insufficient to make it useful for most purposes. It's plan B now. No search engine is much better, but plan A is to use better resources: Wikipedia, knowledge written or compiled by an expert, etc.

  • hmm (Score:4, Insightful)

    by edxwelch ( 600979 ) on Friday January 21, 2011 @04:37PM (#34958458)

    "spam in most other languages is even lower than in English."

    this is definately not true for Spanish. There has always been a higher level of spam results for Spanish

  • by Anonymous Coward on Friday January 21, 2011 @04:37PM (#34958460)

    Bottom line is that their 'metrics' are faulty. Who gives a damn about freshness when the content is irrelevant. Bottom line is that in recent memory its actually more difficult to find good results using google.

    PS. No one cares about forum postings that barely scratch the surface of a subject, contain incomprehensible grammar, or just contain questions about your topic rather than relevant information. But if google doesn't even want to recognize that it is doing things that customers don't like they will eventually go the way of the dodo bird as well.

  • by FrankSchwab ( 675585 ) on Friday January 21, 2011 @04:40PM (#34958510) Journal

    " according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness. "

    And thus begins the downfall of Google. Once you start drinking your own lemonade and stop listening to the people who use your product, you're on a greased downhill slope.

  • by ChaoticCoyote ( 195677 ) on Friday January 21, 2011 @04:49PM (#34958620) Homepage

    I've switched to other search engines; from my experience, Google provides too many tangential and corporate references when I do research.

    Also, how does Google "know" that their search results were valid? I'll often do a Google search, click a couple of links, and after being disappointed, I'll go to another search engine where I get more useful results.

    What bugs me the most are searches on technical or medical topics, where Google give me a dozen "harvester" results -- e.g., I get sites that have stolen conversations from other message boards, and reported them along with tons of ads. Yuck! There must be dozens of hundreds of sites, all with broken answers to questions about JavaScript and/or medicines.

    Just because evidence is anecdotal doesn't mean it should be blithely discounted. If I say "Ouch" at being cut, that means the injury hurt me; the pain is quite real even if no one else has felt it.

  • Re:Pshaw (Score:5, Insightful)

    by martin-boundary ( 547041 ) on Friday January 21, 2011 @04:50PM (#34958626)
    There's no way to know what kind of empirical tests they do. So your anecdotal evidence may well measure a different aspect that is being ignored in their tests.

    Empiricism is all about saying "Here's what I did, and those are the results.". It's not empirical to say "Trust me, I did something I can't tell you about, and the results are really good".

  • Whew! (Score:2, Insightful)

    by bonch ( 38532 ) on Friday January 21, 2011 @04:51PM (#34958646)

    "Our tests say we're better than what our customers are saying!"

  • Re:Sorry Google (Score:5, Insightful)

    by bonch ( 38532 ) on Friday January 21, 2011 @04:55PM (#34958702)

    My favorite part is how searching for something that happens to appear in a Stackoverflow question returns dozens of sites that copy and paste the Stackoverflow content surrounded by ads.

  • Re:I call no-way (Score:5, Insightful)

    by Actually, I do RTFA ( 1058596 ) on Friday January 21, 2011 @04:55PM (#34958704)

    And I'm out of moderator points. Between the "oh, you're looking for something obscure... here's something that's spelled similarly" mentalality, and constantly returning pages from 2003 about technical subjects, it's pretty hard to find anything on Google that I care about. Except for using them to find large corporate sites.

    Add the fact that spam copies are constantly higher than the original, and I see no solution.

  • FUD (Score:2, Insightful)

    by wiredlogic ( 135348 ) on Friday January 21, 2011 @04:57PM (#34958728)

    I'm seeing less spam than a few years ago when link farms and Wikipedia clones were showing up everywhere on the top results pages. This smells like Microsoft funded FUD.

  • In the last few years, I've found search results have been dominated more and more by content mills like associated content, ehow, hubpages, about, and others; or some low quality Q&A page, like yahoo answers. The pages are hastily written and edited, and low content. The articles are also typically written by someone without any relevant knowledge or experience - so the information is common knowledge or wrong.

    If google's metrics say quality is up, but their users think quality is down, then google's metrics need to be revised to match user experience more closely. I've started using duck duck go [duckduckgo.com] because they block content mills, and thus I think their results are as good or better than google, even without the complicated algorithms and all the data google has accumulated.
  • Re:FUD (Score:4, Insightful)

    by I8TheWorm ( 645702 ) * on Friday January 21, 2011 @05:08PM (#34958872) Journal

    I don't think it is. I (and apparently quite a few responders here) am seeing worse results now than ever before. Anything remotely close to what I search for tends to start around the third or fourth result (not including sponsored results).

  • Re:I call no-way (Score:4, Insightful)

    by synthesizerpatel ( 1210598 ) on Friday January 21, 2011 @05:11PM (#34958928)

    You bring up other great points.. Spam copies are maddening.

    There needs to be a 'never show me results from this domain' button to blacklist this garbage and keep people from gaming the system.

  • by DragonWriter ( 970822 ) on Friday January 21, 2011 @07:34PM (#34960986)

    Actually, if you read the blog post from Google linked in TFS, they aren't saying that "there is no problem" (as parent post's title suggested) or that "it's great" (as parent post's text suggested.)

    They did say that their own metrics don't show the trend that various, mostly anecdotal, critics have claimed. But they also said that they view the spam that does exist as a problem, and they announced several steps to address it:

    As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments. We’ve also radically improved our ability to detect hacked sites, which were a major source of spam in 2010. And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content. We’ll continue to explore ways to reduce spam, including new ways for users to give more explicit feedback about spammy and low-quality sites.

    As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better.

    This is not a company denying that there is a problem because their internal metrics don't match the problems being reported. It is a company acknowledging that there is a problem and committing to take action on it, even though their own internal metrics don't agree with their critics on the size of or trend in the problem.

UNIX is hot. It's more than hot. It's steaming. It's quicksilver lightning with a laserbeam kicker. -- Michael Jay Tucker

Working...