Forgot your password?
typodupeerror
Stats Google Medicine

Google Flu Trends Gets It Wrong Three Years Running 64

Posted by timothy
from the coughedword-coughedword dept.
wabrandsma writes with this story from NewScientist: "Google may be a master at data wrangling, but one of its products has been making bogus data-driven predictions. A study of Google's much-hyped flu tracker has consistently overestimated flu cases in the US for years. It's a failure that highlights the danger of relying on big data technologies.

Evan Selinger, a technology ethicist at Rochester Institute of Technology in New York, says Google Flu's failures hint at a larger problem with the algorithmic approach taken by technology companies to deliver services we all want to use. The problem is with the assumption that either the data that is gathered about us, or the algorithms used to process it, are neutral. Google Flu Trends has been discussed at slashdot before: When Google Got Flu Wrong."
This discussion has been archived. No new comments can be posted.

Google Flu Trends Gets It Wrong Three Years Running

Comments Filter:
  • Google Flu: That unit is defective. Its thinking is chaotic. Absorbing it unsettled me.
  • Big Data Fail (Score:2, Insightful)

    by Anonymous Coward

    Not siprising, most analysis on huge data sets is incorrect, that's why the NSA thing is scary! They get it wrong and you end up with a missile through your window! Oops...

    • An excellent example is Li's copula, [wired.com] widely credited for triggering the 08 financial crisis.

      • by u38cg (607297)
        No, Li's copula, nor any other bogeyman formula(B-S, etc), was not to blame. Failure to understand and interpret models, assumptions and limits was.
        • by BitZtream (692029)

          The failure was the morons who thought didn't stop to think 'if were all making so much money, who is losing it?'

          And then someone did. And thats when all hell broke lose, when someone realized that they were about to default on a bond, which uncovered the whole mountain of failure across the industry and a few others that it took out along the way!

        • Exactly. The same as abuse of big data. My point.

  • action into place far showing the data.
    You can see a trend and make a forecast. Then take action to slow the trend based on the forecast and then the prediction will be wrong.

    • by rmdingler (1955220) on Thursday March 13, 2014 @05:47PM (#46478289)

      You can see a trend and make a forecast.

      Agreed. Very similar to a weather forecast, but without the hundred odd years of daily data to study and manufacture predictive models on.

      It is, however, necessary and noble research... they'll just need more flu seasons under their belt to tweak the variables.

      • Weather forcasts are NOT based on trends found in any data set, they are based on the laws of physics and chemistry, they use the same "finite element analysis" techniques found in numerical wind tunnels and other engineering models that are used to build everything from bridges to aircraft. Archival data is used to test the "skill" of the model by making "hindcasts" and comparing them to the instrumental record.

        Climate is basically the long term statistics of weather - meaning a hundered year trend in t
        • by khchung (462899) on Thursday March 13, 2014 @11:52PM (#46480083) Journal

          Exactly, the correct comparison should be "technical analysis" in stock markets, which can be applied to any stock you like with the same level of (un)success.

          Without an underlying theory of how things work, which also needs to be somewhat correct, trying to predict future trends simply by using past data is just dumb curve fitting - with a curve of enough degrees of freedom, you can fit any data, but that doesn't mean its prediction would be any better than random guess.

          • by akozakie (633875)

            ...which hasn't stopped anyone from using it - rationality is for the weak. We're wired for "eureka" moments - the curve fits so well, it MUST be right!

            OTOH, technical analysis is also not a very good model of this, because economy is not a good model of anything in the real world due to an exceptionally strong positive feedback loop between the model and the modeled. A successful technical analysis "method" (meaning it worked for someone, that's statistically probable no matter how stupid the method is) ma

        • It would seem to me that you underestimate how complex weather systems are.

          A farmer who says "when this happened last, the weather did this next" is more likely to be right than the guy who tries to model atmospheric pressure changes in a chaotic system.

          Sure, some people rely entirely on models, but good weather forecasting often involves both. How do we know what a strong north-west pressure system will do? The best answer is "what did it do last time" not "lets model it to death."

          PS, this is also how me

      • While I don't think they use Monte Carlo modeling for the weather, you do have a point about it being a young thing. The gist of TFA is a little silly though... sure someone (google) is using a faulty algorithm to grok epidemiology data... and? That's about the only conclusion we can draw. Doesn't mean 'the technology' itself (ie big data platforms, the techniques applied to it -- graph traversal or map reduce, etc) are actually 'bad things.'
  • by kajong0007 (3558601) on Thursday March 13, 2014 @05:34PM (#46478149)

    Learn from nature! Google needs a genetic algorithm that modifies itself every flu season.

    The fittest algorithm will survive to infect thousands.

  • by lucm (889690) on Thursday March 13, 2014 @05:35PM (#46478157)

    With big data, when you actively look for patterns you always find them; this is how hedge funds have been operating for years. The purpose of the technology is not to make predictions, but rather to confirm existing trends and possibly identify new ones.

    Proper way to utilize big data in this case would be:
    1) to assist the CDC in confirming or refuting trends observed in the field
    2) to offer additional correlations (such as: are people living closer to highways more sensitive fo specific strains of flu)
    3) to provide long-term indicators facilitating the assessment of medication and other flu containment factors

    Big data is not a magic eight ball but it's not a piece of shit either.

  • ..well, so pretty much have all the FUD-spreaders in the CDC, government, and NGOs who've been all telling us that "any moment" we could get a "deadly flu" since the (ha ha ha) Sars "epidemic".

    All I've ever gotten is the "Cry Wolf" heebie jeebies.

    • They started doing it before the SARS "epidemic". I remember them talking about how the swine flu epidemic in 1976 was going to be like the 1918 flu pandemic because we were "due". It was just a matter of time til another flu pandemic like the one in 1918 happened.
  • Is it still in Beta? They should get this "right" and maybe look at other large scale models like weather modeling and add culture (how close people tend to get to each other, how much they are inside in the immediate vicinity of other humans) to the algorithms. It took google years to get gmail out of beta but it was pretty good while they were calling it "beta".. Slashdot on the other hand....
  • by wonkey_monkey (2592601) on Thursday March 13, 2014 @06:09PM (#46478475) Homepage

    but one of its products has been making bogus data-driven predictions. A study of Google's much-hyped flu tracker has consistently overestimated flu cases in the US for years.

    Bogus? Are you sure they weren't just... wrong?

    It's a prediction.

  • In addition to "all of the above", the other contribution is that of the philosophical equivalent of Heisenberg: the predictions of outbreaks may have increased vaccination usage in the areas involved, which of course will have an effect of downplaying the outbreaks in those areas.

    Not saying I have any evidence for that, (and I will wager it unlikely, considering the #s who vaccinate is still far lower than it should be), but a correlation study may be interesting to see.

    If the point of knowledge of a possible outcome is to act to deter it, then shouldn't the actions that attempt to deter it be taken into account?

    • It should, but only after google news picks up reporting on it. Then the modelers can say how much impact of reports of the prediction.
      Next year, no one may report on it other than mockery, and you can't predict reporting that doesn't happen, so they can't start off with reporting taken in to account.

  • In Australia recently they've been pushing people to get vaccinated against influenza for the coming winter because of the reported rise in flu cases during the recent North American winter, especially for the 18 - 60 age bracket. I hope they weren't using Google as their source.
  • All this time i thought the only reason "big data" mattered was to provide motivation for companies to invade my online privacy and better target advertising. i can tell you all those male enhancement product ad placements really hurt my self image.
  • So does this mean that all that shiny blue racks of gleaming hardware in the Google Coud adverts around Slashdot don't actually work??? I really feel sorry for the guys at Google who installed it all and thought they were actually on to something. Only to find that it comes out with the wrong answer every time.
  • not all flu cases are discovered, and not all persons with the flu are knocked out by it, so teh missing numbers are probobly mild cases where the people actually continue to go to work, or rather, study.
  • The headline is that the prediction was overestimating three times in the past three years. So what?

    Google's Flu Trend plots don't have uncertainties on them, so they'll never be exactly right. So they either have to be overestimates or underestimates. In any three years, you are going to get at least *two* under or over estimates. So post-hoc, saying "ZOMG! There's three overestimates in three years!! #EPICFAIL LOL!" isn't very meaningful.

    Until Big Data People understand statistical uncertainty and are hap

  • I think the error just shows how many take a flu-day without being actually sick.
  • "Flu virus predicted to take US congress in 2014 with 96.34% certainty."

If God had a beard, he'd be a UNIX programmer.

Working...