Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Books Google

How Badly is Google Books Search Broken, and Why? (blogspot.com) 106

An anonymous reader shares a blog post: It appears that when you use a year constraint on book search, the search index has dramatically constricted to the point of being, essentially, broken. Here's an example. While writing something, I became interested in the etymology of the phrase 'set in stone.' Online essays seem to generally give the phrase an absurd antiquity -- they talk about Hammurabi and Moses, as if it had been translated from language to language for decades. I thought that it must be more recent -- possibly dating from printers working with lithography in the 19th century.

So I put it into Google Ngrams. As it often is, the results were quite surprising; about 8,700 total uses in about 8,000 different books before 2002, the majority of which are after 1985. Hammurabi is out, but lithography doesn't look like a likely origin for widespread popularity either. That's much more modern that I would have thought -- this was not a pat phrase until the 1990s. That's interesting, so I turned to Google Books to find the results. Of those 8,000 books published before 2002, how many show up in the Google Books search result with a date filter before 2002? Just five. Two books that have "set in stone" in their titles (and thus wouldn't need a working full-text index), one book from 2001, and two volumes of the Congressional record. 99.95% of the books that should be returned in this search -- many of which, in my experience, were generally returned four years ago or so -- have vanished.
Further reading: How Google Book Search Got Lost; Whatever Happened To Google Books?; and Google's New Book Search Deals in Ideas, Not Keywords.
This discussion has been archived. No new comments can be posted.

How Badly is Google Books Search Broken, and Why?

Comments Filter:
  • Set in Stone (Score:5, Insightful)

    by AlanObject ( 3603453 ) on Monday February 18, 2019 @02:28PM (#58141054)

    I always thought that "set in stone" refers to the condition where you have carved words into stone and they can't (easily) be undone.

    Is there any other possible origin of that phrase?

    • It's that thing where instead of reporting a bug you write a blog post and send it out to news aggregators.

      • by Anonymous Coward

        I would not call a completely incorrect database a bug. And besides everyone knows google just ignores bug reports and modified their documentation to explain away the bug

        • How do you know it's an incorrect database when the only way you know the results are wrong is by looking at results of a search done in the same engine?

          A bug in handling the year parameter is much more likely than having the dataset change between searches.

          • by Anonymous Coward

            Oh well look at the article. It is all nonsense junk. And no credible explanation for it. After all the time they have had to credibly explain they have been unable and unwilling. All baloney.

      • It's that thing where instead of reporting a bug you write a blog post and send it out to news aggregators.

        That’s a fairly recent practice - I don’t think it’s set in stone.

    • by tgibson ( 131396 )

      I wonder if the meaning is more literal? Rather than "set" being a synonym for "carved", something set into place or inlaid, like a tile?

    • by Calydor ( 739835 )

      That is what it MEANS, yes. But etymology refers to where words and phrases originally come from; it is entirely possible that such a phrase only becomes popular as archaeologists start indexing ancient clay and stone tablets and the like rather than being passed down through the ages since Babylon.

    • by nasch ( 598556 )

      That's the meaning of the phrase, not the origin.

    • by zlives ( 2009072 )

      I always assumed it was a reference to the 10 commandments, as "gods law" set in stone.

  • by jellomizer ( 103300 ) on Monday February 18, 2019 @02:28PM (#58141064)

    However should you have gone to a library, and perhaps worked with a Librarian to help guide you in your research?
    Google is a good search tool, but it isn't a research tool.

    • by Anonymous Coward on Monday February 18, 2019 @02:46PM (#58141146)

      Google WAS a good search tool. Nowadays it's damn bad. My most hated anti-feature right now is the impossibility to force a word to appear on the results. Back in the day you only needed to add a '+' in front of it, but it no longer honors it so they can brag about how many millions of "results" they give you, even if you don't want them.

      • Wrap the word in quotes.

        There's a basic list of search syntax at the below link, however you can find far more if you search "google search syntax".
        https://support.google.com/web... [google.com]

        • by Anonymous Coward

          That doesn't force the search - it simply 'suggests' those results get a higher ranking. Google search hasn't followed it's own published syntax in many years.

          • Using google to search for fantasy books you have read as a child is amusing.
            1/2 of the stuff it manages to bring up to the search result is 'you must read 10 best books' with no relevance to the keywords. The rest is randomly high ranked search results. What is hillarious that it also generates links to forums that is inactive, where peopled did collective mindwork to do the same thing.

            Some keywords like 'rat' or 'mouse' also draws in extremely weird search results that has nothing to do with the query.

          • You are wrong.

      • Google WAS a good search tool. Nowadays it's damn bad.

        I totally agree.

        My most hated anti-feature right now is the impossibility to force a word to appear on the results. Back in the day you only needed to add a '+' in front of it, but it no longer honors it...

        I never used that. I've always put "allintext:" in front of search terms that I insist on, and I put double quotes around words and phrases that I want exact matches for. Although the effectiveness of both of these has diminished over time as Google has dumbed itself down, it sounds as though they're still much more effective than the plus sign you're currently using.

    • by thomst ( 1640045 )

      jellomizer opined:

      Google is a good search tool, but it isn't a research tool.

      Google begs to differ [google.com] ...

      • by jellomizer ( 103300 ) on Monday February 18, 2019 @02:53PM (#58141176)

        Well Google's Marketing would say most anything to keep the company in good graces.

        However Google and its like services, Are part of the solution but not the full solution.

        Searching is an important part of research, Google is a good tool for researchers, but it only help them search. A modern Librarian, can help you use Google to get more context out of your searches, direct you to Non-Google tools, and often the library will have access to data that is often behind a paywall.

    • Google is a good search tool

      is it? Just yesterday I was looking for "PC Mag 1997 january Pentium MMX" and Google refused to return PC Mag 7 Jan 1997 issue results, whats even more weird clicking Google Books "browse all issues" returns

      The requested URL /books/serial/ISSN:08888507?rview=1 was not found on this server

      but "About this magazine" will happily give you a list of all scanned issues :o and opening january one will let you search it and will return positive results.

    • Google is a good search tool, but it isn't a research tool.

      I disagree that Google is a good search tool - it has become pretty mediocre. I also disagree that it isn't a research tool. It's only ONE tool in a researcher's toolbox, but it can be a very powerful one if it's well made and properly used. Unfortunately, Google has regressed from Snap-On wrench set to cheap dollar-store adjustable POS. Consequently it's getting harder and harder to use properly, as it rounds off corners by design and breaks easily.

    • by AmiMoJo ( 196126 )

      Who wants to put in hours of research flipping through physical books just to answer a simple question about the entomology of a common phrase?

      Google had the right idea. Scan all the books, let people search the text directly. What broke it was copyright laws and lobbying. That's why you can see every page in the book, if it hasn't been purged entirely by the publisher.

      • Who wants to put in hours of research flipping through physical books just to answer a simple question about the entomology of a common phrase?

        A more serious question than "cast in stone" would be "what is the etymology of the use of the word 'entomology' to refer to 'etymology'?"

        I feel sorry for whoever made this wonderful rant. Somehow I doubt the value of using "Google Books" to find references to language that almost certainly pre-dates the Gutenberg press. Or should we ignore languages that meant the same thing as the English phrase we're looking for, even if they appeared centuries prior to the English translation?

  • by nospam007 ( 722110 ) * on Monday February 18, 2019 @02:32PM (#58141082)

    It just now thinks you didn't _mean_ what you entered.
    Join the line.

    • That's why I'm starting to do searches for things I'm not searching for. I figure by process of elimination google will return the right thing eventually.

  • that's just the fabric of reality, steadily unraveling....

  • Prior to the 1940's? (Score:5, Informative)

    by HiThere ( 15173 ) <[ten.knilhtrae] [ta] [nsxihselrahc]> on Monday February 18, 2019 @02:46PM (#58141144)

    I'm rather certain that I've read the phrase in something that was written either in the 1940's or the early 1950's, and it didn't seem a unique turn of phrase in the place where I read it.

    FWIW, James Joyce says, in "Portrait of the Artist"

    It is peopled by the images of fabulous kings, set in stone. Their

    I don't know why Google didn't find that for you. OTOH, I haven't enough google-fu to use Google search to search for a range of dates.

  • by Flexagon ( 740643 ) on Monday February 18, 2019 @02:46PM (#58141148)

    Maybe, just maybe, Google Books is a poor choice for a tool. As big as it is, it's going to be spotty, and weighted toward more recent, digital, texts, and ones that are sufficiently available for scanning.

    Better to use something that represents actual research. If English is your focus, which it seems to be given your current line of attack, it might be better to look in the full Oxford English Dictionary [wikipedia.org], readily available in and through your local library, even digitally.

  • by careysub ( 976506 ) on Monday February 18, 2019 @02:54PM (#58141180)

    Like most of its projects, Google has lost interest in Google Books and has not bothered to maintain it, much less continue developing it. This has been going for more than a decade now. NGram search for example stopped adding new texts to the index in 2008.

    Google fought and won a court case to put 25 million more orphan books which it had already scanned, out of print and largely unavailable, into Google Books. But decided it wouldn't bother. Because out of print books cannot be monetized, it would seem and thus are of no interest to Alphabet, which has over $100 billion in cash on hand. Spending a few million to support Books would shave a small fraction of a percentage off the growth of its investment wealth which is unacceptable to the company that has officially retired the "Don't be evil" slogan.

    At least they haven't pulled the plug on it entirely. I guess there is still some monetization to be had from in-print books.

    • by H3lldr0p ( 40304 ) on Monday February 18, 2019 @03:05PM (#58141238) Homepage

      At least they haven't pulled the plug on it entirely.

      AFAIK Alphabet has put the "good" version in universities where the library admin does all the heavy lifting of scanning in books and such. That was part of the suit settlement. The public doesn't get to access researchers have.

      If I were the Fine Author, I'd head over to one of the unis that signed up with Google and use it there before declaring any sort of hard result.

      • My local University Library even has banks of automatic book scanners in case somebody wants to add a book from the shelves to the digital collection. Instead of the old "photocopy the whole text" strategy that was in common use in the past.

        Unfortunately, it is only available to staff and students; I have a library card that lets me check out books, but I don't have access to the digital copies or the book scanners.

        • Wait - tell me more. Have any links to this kind of service?

          I know of an old family history book that was put through a vanity publisher. I've been trying to track down a copy for a decade. Amazon says it exists, about the only online presence. One of these days I'll get my hands on a copy, and once I do, I want to get it online for future generations. The author is a direct descendant of mine, and I know that nobody has or cares about the rights. It's public domain now .

          The question then is, what's the

          • *direct ancestor. Great grandpa was not a time traveler, best I can tell.

          • The best way for your situation is to just scan the pages by hand, using a consumer scanning tool, and then upload it to archive.org for preservation.

    • Because out of print books cannot be monetized, it would seem and thus are of no interest to Alphabet, which has over $100 billion in cash on hand.

      People who spend money merely because it is sitting around rarely have $100 in cash on hand (much less $100 billion).

      Ostrich leather ad hominem: attack not on the man, but the man's bulging pocketbook.

      • Pointing out that Google does not need to monetize everything given its enormous revenue stream and wealth is hardly an "ad hominem" attack.

        Decent companies do pro bono work all the time (I am working on such a project for my employer right now) using some part of their revenue to subsidize it. Making the vast body of literature available to the public would be a pro bono project that is literally in keeping with the goal Google set for itself (making the world's knowledge accessible).

        • Decent companies do pro bono work all the time (I am working on such a project for my employer right now)

          Your employer considers posting to /. discussions to be pro-bono work? How enlightened! Mine thinks it's time wasting.

    • Because out of print books cannot be monetized
      Depend on copyright. If it is expired you can resell them as eBooks.

      • Due to the complexities U.S. Congress has thrown into the copyright legislation, with retroactive term extension, etc., many of these works are, as I said, orphaned, with no way to clearly resell them. That is what the term means.

        • by tepples ( 727027 )

          Due to the complexities U.S. Congress has thrown into the copyright legislation, with retroactive term extension, etc.

          Though the United States has extended the term of copyright in the past, term extensions do not restore U.S. copyright to works whose copyright has already expired. Anything* published before 1924 is in the public domain. In addition, the Authors Guild opposes the next extension that Disney might beg for and in fact wants the 1998 extension repealed [arstechnica.com].

          * Except sound recordings, which were subject to a patchwork of state copyright laws with a flat expiry in 2067 but are now subject to the CLASSICS Act.

    • ...out of print books cannot be monetized, it would seem and thus are of no interest to Alphabet...

      The Alphabet has no interest in books? Who'd 've thunk it? :)

  • Maybe master writing intelligible sentences before worrying about entomology. ... "How badly broken is... " reads way better than "How badly is ... broken" . Holy crap. I am a native english speaker and this whole article was tedious to read.
    • "How badly broken is... " reads way better than "How badly is ... broken"

      This is just your opinion. I disagree.

      Holy crap. I am a native english speaker

      Then why don't you know that names of countries, along with words derived from them, should be capitalized?

    • Maybe master writing intelligible sentences before worrying about entomology

      Um...

      • entomology is the study of insects
      • etymology is the study of word origins
    • Maybe master writing intelligible sentences before worrying about entomology. ... "How badly broken is... " reads way better than "How badly is ... broken" . Holy crap. I am a native english speaker and this whole article was tedious to read.

      If you're a native English speaker and can't understand it, golly, how badly is your parser broken?!

      That is exactly the sort of nonsense up with which decent people do not put.

      And if you thought entomology was tedious, try etymology! At least with entomology you have basic instincts about avoiding bites to keep you awake. Also, the pictures are more interesting.

      • I love how you purposely avoided ending your sentence with a preposition or splitting infinitives .... That being said, I should avoid posting with autocorrect turned on - it didn't like etymology.
    • Maybe master writing intelligible sentences before worrying about entomology. ... "How badly broken is... " reads way better than "How badly is ... broken" . Holy crap. I am a native english speaker and this whole article was tedious to read.

      Oh, the irony! I stumbled over your first sentence and had to read it three times before I figured out what you were trying to say. Maybe (you should) master writing intelligible sentences before worrying about... the study of insects? Perhaps you should also learn the difference between 'etymology' and 'entomology' while you're on your own quest for intelligible English, (not "english"), self-expression.

      • I was going to write that, but you beat me to it.
        You said it much better than I would have.

        But -
        perhaps they are all referring to amber and fossils, in which case "entomology" was correct - insects set in stone.

  • This is a kind of inverse Middle Ages philosophy: the assumption by the author is that anything coming far before the modern era has no relevance to scholarship, and it's "absurd" to attribute any origination of thought to something several thousand years old. Prior to a few hundred years ago, it would have been "absurd" to suggest that any phrase was recently invented rather than derived from several thousand year old classical or biblical sources.

    Online essays seem to generally give the phrase an absurd antiquity -- they talk about Hammurabi and Moses, as if it had been translated from language to language for decades.

    I would hope we're all aware that this is exactly what happ

    • by 93 Escort Wagon ( 326346 ) on Monday February 18, 2019 @04:07PM (#58141604)

      Wasn’t that one of the early signs of civilization’s decay in Asimov’s Foundation universe? Scholars no longer did original research on their own; they’d just study what previous researchers had already written on a subject, and re-summarize it?

      Sometimes it’s scary how prescient that dude was.

    • If actual study of history is going to be replaced by what is convenient to search on Google, then we are limiting ourselves to a "history" that starts in the 20th century. Maybe this is just the way things will be in the future.

      That is probably the most insightful observation I've read here so far, and it's certainly the scariest. I'm thinking not only of the millennia of accumulated knowledge and wisdom that we stand to lose, but also the mistakes we're more likely to repeat. We could be looking at the modern-day equivalent of the Dark Ages falling upon us over the next century.

    • by DingerX ( 847589 )
      Give some credit to medieval philosophers: they had an interest in contemporary as well as ancient thought, and the game was to present something with enough novelty to be racy, but not too much to be dangerous. That part hasn't really changed, I'm afraid. Of course, then they got libraries and paper, and then they discovered the magic of copypasta.
      Using Google for research at least clues you in to the popular and subtle memes in the field.
      On the other hand, googling for results returns:
      Did you mean i
  • ...is the translation of "set in stone" into italian. Well, maybe on the other side of the atlantic you don't have many buildings made out of stone, you jumped directly from log shacks to steel skyscrapers... But here in old Europe we know the meaning of "set in stone" and I can assure you that has nothing to do with lithography...
    • Well, maybe on the other side of the atlantic you don't have many buildings made out of stone, you jumped directly from log shacks to steel skyscrapers...

      Hope to hell you were just being cute. If not, good gods, you just displayed a grand European ignorance. For instance in KC where I used to live the entire downtown was constructed from limestone. Most of which are still standing. Hell, there are entire neighborhoods of houses made of gray or brown limestone (Westport, Hyde Park).

  • Looking for answers in google may not be the best. It s just a start as many already pointed out. A bit more sophisticated search gets you into the world of dictionaries - yes I am old enough to recall dead tree versions of that - in fact I still have a massive unabridged webster - old enough not to have a PC in it, for sure Gender mainstreaming is not there! one more reason to keep it! But I digress. I looked up webster online - not much there although there is a hint that this is more general expression '
  • With this late rise of artificial stupidity, it becomes harder and harder to find /anything/ online.

    google is less and less a global grep and more and more of an expert system -- double guessing (always wrong) what I'm really after based on shitty models and trends.

    Someone asked me not long ago some detail about some limits in well-known piece of software; All online searches were giving just crap blog posts and other garbage; not a single source code or doc hits; in desperation, I git cloned the source, di

  • by az-saguaro ( 1231754 ) on Monday February 18, 2019 @10:55PM (#58143228)

    I assume you searched in Google Ngram.
    https://books.google.com/ngram... [google.com]

    If you search "set in stone", it appears that usage is a latter day idiom.
    But, here is the secret to this conundrum. Language and idioms change - shift, migrate, morph - similar but slightly evolved words to express the same idea.

    The inherent idea is that something is immutable, indelible, unerasable, uneditable, irrevocable. It is predicated on the idea that you can write, sketch, mockup, proof all you want and still make corrections, like hitting the preview and edit buttons on a Slashdot post, but once you hit submit, your words are eternal, just like when the stone carver finally etches the words into a stele or tombstone.

    Writers write. Typographers set. Artists etch. Stone carvers carve. Through history, all such variations have been used. But since carvers carve, one might think that the classical idiom is carved in stone, with the other variations being corrupted forms based on more modern communication paradigms.

    So, do what I did. Too bad I cannot post a screen capture, but you can do this yourself.
    Go to Google Ngram Viewer.
    Enter (copy-paste) the following line in the search box:

    carved in stone,written in stone,set in stone,etched in stone

    "Carved in stone" is abundant, going back well before 1800.
    The other three have arisen just in recent decades.
    So, prior generations used the idiom correctly. Recent generations have used analogous but technically incorrect variants.
    Collectively, "written, etched, set" were originally just a tiny fraction of the whole, but recently their usage is rising. This means that current generations have either forgotten the true idiom, have gotten sloppy, or have fallen into a wave of rhetorical monkey-see monkey-do copycat-ism or fadism.

    Something else interesting.
    The "written, etched, set" curves are quite congruent, all showing a rapid uprise starting 1970,then an inflection circa 1990, and now topping out, with "set in stone" becoming asymptotic with or equal to "carved in stone", thus the dominant modern transmigration of the idiom. The "written, etched, set" curves are the classical sigmoidal curves of the Verhulst equation of population dynamics. These curves imply that usage of these variant terms is reaching population saturation, each term in its own camp, with non-traditional verbiage having overtaken classical verbiage.

    So, Google is not broken, in fact could be a rather clever historical research tool.

    https://books.google.com/ngram... [google.com]

  • How terribly sad that the only material they have access to is the stuff Google ripped as part of their industrialised copyright infringement project.

    If this goes on, he might have to actually get off his fat arse and do some work himself. Of course, it would help if he had enough brains to work out that "set in stone" probably does go back to a "absurdly" distant time when people set important text in stone - you know, like you can find in almost any ancient Greek ruin. Or that Hammurabi and Moses were m

  • In case you still care about the original question:

    The German equivalent can be found in "Politische, kirchliche und literarische Zustände in Deutschland" by Franz Chassot von Florencourt, published in 1840:

    "stark und unzweideutig treten die Züge wie in Stein gemeißelt hervor, und unzerstörbar für alle Ewigkeit, so sehr auch eine spätere Zeit daran herumpfuscht und den Charakter zu verwischen sucht"

    • The German equivalent ... published in 1840:

      The Hebrew equivalent, written several millenia BCE, Exodus 34.

  • by Rambo Tribble ( 1273454 ) on Tuesday February 19, 2019 @09:28AM (#58144790) Homepage
    What is likely the original phrase, I remember hearing as early as my childhood in the 1950s, was "carved in stone". A equivalent phrase, most likely newer in origin, is "set in concrete". You can see the evolution. It's kinda like "irregardless".
  • I wonder if in earlier meanings of the phrase you might find some crossover with "lapidary."

  • And has been for better than five years now. Their marketing dept has long since taken over the search algorithms. The signal-to-noise ration is *waaay* down from, say, 2012. Lots of garbage results. And then there's the "REALLY STOOOPID", like the time last year I was searching for some computer-related terms (this was at work), and though I don't remember what I was serrching for, I put "those terms" in quotes, and it returned "there were no results for that exact phrase, but here are the results without

  • If a text is not shown, that contains the words searched for and can be seen by another means in the same database, then the search is most certainly bugged!

    But internet searches have been broken since they changed them to show the maximum number of results... 8-P

Technology is dominated by those who manage what they do not understand.

Working...