Slashdot Log In
Google URL Index Hits 1 Trillion
Posted by
Soulskill
on Saturday July 26, @12:03AM
from the orders-of-magnitude dept.
from the orders-of-magnitude dept.
mytrip points out news that Google's index of unique URLs has reached a milestone: one trillion. Google's blog provides some more information, noting,
"The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. Over the last eight years, we've seen a lot of big numbers about how much content is really out there. To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google's index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day."
Related Stories
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.

Screenshot. (Score:5, Funny)
Or it didn't happen.
Reply to This
Re:Odd (Score:5, Funny)
So unless there is a screenshot showing the 1,000,000,000,000 site count, Google's index didn't reach that milestone? Even if it now shows 1,000,000,000,001?
The 1,000,000,000,000th page had only one word on it:
"woosh"
Reply to This
Parent
Re:Screenshot. (Score:5, Funny)
That can be arranged.
Reply to This
Parent
How long till.. (Score:5, Funny)
Once the index reaches a google (or rather a googol), the universe explodes.
Reply to This
Re:How long till.. (Score:5, Funny)
Reply to This
Parent
Re:How long till.. (Score:5, Insightful)
I'm more interested in when Google starts returning relevant results to my queries.
I can't believe that I'm the only one that finds Google's quality of service somewhat below par. I guess they're better than randomly stabbing in the dark, and there certainly isn't any alternative that's obviously better, but Google sure isn't everything they think they are.
I know--stop trying to compete with Wikipedia and cut out Experts-Exchange.com from your search results since their pages don't actually return the information you think they do.
Reply to This
Parent
Re:How long till.. (Score:5, Informative)
... and cut out Experts-Exchange.com from your search results since their pages don't actually return the information you think they do.
Perhaps you should try scrolling to the bottom of the page... :)
Reply to This
Parent
Re:How long till.. (Score:5, Informative)
It took me a while to realize it, but if you scroll clear to the bottom of an expert exchange post, you'll find the comments unhidden and relevant.
Reply to This
Parent
Re:How long till.. (Score:5, Informative)
...and cut out Experts-Exchange.com from your search results since their pages don't actually return the information you think they do.
If you block cookies from experts-exchange.com you can actually see the answers on any e-e page - after you visit the first time, it normally sets a cookie to not show results next visit, which is how they get Google to index their pages anyway. With cookies from them blocked, you can then see the answers - you just have to scroll 7/8s of the way down the page past all the fake "Please sign up to see this result" boxes. :)
(First AC post in years... tee hee.
Reply to This
Parent
Re:How long till.. (Score:5, Interesting)
"I'm more interested in when Google starts returning relevant results to my queries.
I can't believe that I'm the only one that finds Google's quality of service somewhat below par."
You're not the only one, but for the most part it is better then most other search engines out there. The real problem is spammers and paid advertising, I think spammers have really made search frustrating for a lot of companies. And ad companies pay other people to promote their sites for them (digg, slashdot, etc). I've noticed the increase in spam-vertised websites in search results for a lot of things.
Personally I think the idea of sharding and search being more specific for what you're looking for is needed. I'd like to see a google with 'tags' and a delicious interface, things like educational institutions and universities get lumped into their own search engine space for instance, this would help narrow down what one is looking for, although it would take time and feedback to design something well for other areas. The fact is that search results get diluted as you put more and more stuff online (numbers and geometric scale).
For fun, I've noticed stumble upon and del.ico.us are not bad alternatives when looking for new and interesting sites without having to use search
Reply to This
Parent
Wow, that's a lot of porn. (Score:5, Funny)
Seriously, since the web is something like 42% porn. (Yes, that is the ultimate answer.) So that's on average, 60-70 pages of each person in the world naked.
Reply to This
Re:Wow, that's a lot of porn. (Score:5, Interesting)
"the web is something like 42% porn"
That probably stopped being the case after namespace speculators started buying up expired domains in large numbers just to put up a mildly useless index on *each* and *every* site to collect ad revenue or marketing statistics off of unwary visitors. I would also include typosquatters in that category, and maybe someone else can name a few other examples of utter namespace hogging uselessness.
Whatever it is, you can rest assured that it's mostly repetitive trash... no need to stand in awe of it.
Reply to This
Parent
1 trillion url's (Score:5, Funny)
How many of those are automatically generated rank-spoofers, 80%?
My favorite spoof pages were the ones that randomly substituted search terms into porno stories.
"Yes!" she screamed as he thrust his SAMSUNG CD PLAYER deep into her. "I want you balls-deep in my CHEAP HARD DRIVES!" The smell of DISCOUNT SOFTWARE filled the room.
Reply to This
Some numbers (Score:5, Interesting)
Counts of words:
the: 18.3 billion pages
a: 23.9B
0: 12.7B
1: 25.4B
in: 17.1B
I: 10.2B
I know these numbers aren't exact, but you'd think one of them would be over 100B if Google is really indexing a trillion pages. What's on them? Anyone find any keywords that produce more?
Reply to This
Re:Some numbers (Score:4, Funny)
My hobby:
Getting the fewest possible google results above 0 with a quoted string.
"interspecies gangbang": 6
"hot topic meets disney world": 2
"died in a blogging accident": 15,300
"can boys make babies": 4
"why does it hurt when I read": 1
Reply to This
Parent
Re:Some numbers (Score:5, Interesting)
My Hobby
Attributing my sources: http://xkcd.com/369/ [xkcd.com]
In [xkcd.com] , my [xkcd.com] humble [xkcd.com] opinion [xkcd.com] my [xkcd.com] usage [xkcd.com] of [xkcd.com] "My [xkcd.com] Hobby" [xkcd.com] was [xkcd.com] sufficient [xkcd.com] attribution [xkcd.com], all [xkcd.com] by [xkcd.com] itself. [xkcd.com]
Reply to This
Parent
What's going on with the founders' studies? (Score:5, Interesting)
Reply to This
No, it didn't. (Score:5, Informative)
They have identified that there are 1T pages out there, somewhere. They have indexed 40 billion pages. Read the entire Google post. It says it right there.
Bad on Google for the misleading post. Bad on the submitter for not reading the misleading post. Bad on Slashdot for further descending into mindless repetition of mindless submissions of mindless PR announcements.
Reply to This
Dynamic pages pollute count (Score:5, Informative)
Google tries to detect a dynamic page by looking for ampersands and equal signs, as well as looking at the content of the page, it is really quite easy to fool.
e.g.: http://somesite.com/itemlist.php?listmode=1&category=beds&orderby=7 [somesite.com]
when 'rewritten' shows up as
http://somesite.com/items/1/beds/7.html
So 1 billion web pages could be, and I know a few thousand pages like this, just a few hundred thousand dynamic pages. Not that the pages don't have relevant information, some of the stuff can be redundant though. For instance, when the spider crawls across "Records per page = 10" > "Records per page = 20" > "Records per page = 30" etc.. or when lazy programmers don't use cookies and databases to store information but try and concatenate the URL with the user's selections. Thank god for that GET limit [boutell.com]. People need to use POST!
If someone knows how to stop this message board from creating links out of false URLs please, let me know.
Reply to This
Re:Amazing (Score:5, Insightful)
Reply to This
Parent
Re:Amazing (Score:5, Informative)
I couldn't agree more.
Many of the clients I support are constantly asking me "Is there a program that does this? or Can you find me a program to do this" etc etc.
I used to be able to just use google to help me get started but these days the top level searches are all those bloody link farms peddling "free" software, even when typing in the word review you come up with link farms that offer no reviews.
Reply to This
Parent
Re:Amazing (Score:5, Informative)
Many of the clients I support are constantly asking me "Is there a program that does this? or Can you find me a program to do this" etc etc.
I used to be able to just use google to help me get started but these days the top level searches are all those bloody link farms peddling "free" software
Have you tried SourceForge [sourceforge.net]? That's what it's there for, you know.
Reply to This
Parent
Re:Try "Live" search (Score:5, Funny)
Reply to This
Parent
Re:No concern for the foreign readers? (Score:5, Funny)
Reply to This
Parent
Re:First Post (Score:5, Funny)
Also, I believe there's about 1.5 million different users.
yeah but if you take out Twitter and all his sock-puppets you'll just be left with 500K unique users...
Reply to This
Parent