Google Indexing In Near-Realtime 79
krou writes "ReadWriteWeb is covering Google's embrace of a system that would enable any Web publisher to 'automatically submit new content to Google for indexing within seconds of that content being published.' Google's Brett Slatkin is lead developer of PuSH, or PubSubHubbub, a real-time syndication protocol based on ATOM, where 'a publisher tells the world about a Hub that it will notify every time new content is published.' Subscribers then wait for the hub to notify them of the new content. Says RWW: 'If Google can implement an Indexing by PuSH program, it would ask every website to implement the technology and declare which Hub they push to at the top of each document, just like they declare where the RSS feeds they publish can be found. Then Google would subscribe to those PuSH feeds to discover new content when it's published. PuSH wouldn't likely replace crawling, in fact a crawl would be needed to discover PuSH feeds to subscribe to, but the real-time format would be used to augment Google's existing index.' PuSH is an open protocol, and Slatkin says that 'I am being told by my engineering bosses to openly promote this open approach even to our competitors.'"
Maybe I'm just a noob, but... (Score:4, Interesting)
...someone help me out here. People can still find my articles through google before I see the googlebot hit any new articles I post...how is that possible? How would my pages show up on google before the bot actually crawls them?
Re: (Score:2)
Re:Maybe I'm just a noob, but... (Score:5, Funny)
Re: (Score:1, Insightful)
Oh, wow:
http://www.google.com/search?q=NovTest+(909599)+test
Re:Maybe I'm just a noob, but... (Score:4, Interesting)
My site is by no means something high traffic but Googlebot indexes my pages (and shows them in search results) within three minutes:
crawl-66-249-65-232.googlebot.com - - [04/Mar/2010:10:33:34 -0600] "GET /current-crime-decline-to-cause-public-safety-cuts HTTP/1.1" 200 47330 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
I really don't see a need for something to be any more "real time" than that for someone's blog. Do you?
Re: (Score:2)
Absolutely. With this breakthrough technology, a cutting-edge new media purveyor can ensure that their reportage, opinions and commentary are easily accessible to the general public with a minimal delay. In today's fast-paced Internet, a few minutes' delay can make the difference between being on the breaking edge of news and being an Johnny-come-lately.
(To be more succinct, PuSH lets bloggers make sure they have the first post.)
Re: (Score:2)
Some blog engines will automatically notify search engines of an updated site map upon publishing new content.
Re: (Score:2)
I really don't see a need for something to be any more "real time" than that for someone's blog. Do you?
Not really...I generally update my site between 4-6 times per week, but when I update it I'm only posting one article a day with the odd site announcement every so often...maybe I just suck, I don't know, but it seems like it takes a week or two before people start really reading what I write, they always seem to read what I wrote a week or so ago instead of the new content. This happens even if they land on my main page (linked in my sig) rather than on an actual article. ::shrug:: whatever. I average be
Re: (Score:3, Interesting)
maybe I just suck, I don't know, but it seems like it takes a week or two before people start really reading what I write, they always seem to read what I wrote a week or so ago instead of the new content.
As you write more often (say on a specific time schedule and daily) the people who don't read via RSS (which in my case is the majority of my readers) will learn to make going to your site a part of their daily routine and thus your visits on new material will go up.
I watched visiting trends, by hour, over
Re: (Score:2)
Cool, thank you! I'll definitely have a look at that.
Re: (Score:2)
I watched visiting trends, by hour, over the last two years in Google Analytics and picked 7:30 AM and 10:30 AM as the times to post material. It seemed as if most people were checking once in the morning when they got to the office and once at breaktime/lunchtime around 11 AM. To account for some of the time variance seen across those two years I went with 15 minutes earlier than the stats showed. Seems to work for me.
Odd that everyone who reads your content is in your timezone. Do you primarily post artic
Re: (Score:2)
95% of my content isn't just local, it's hyperlocal. Thank for asking about this as I did limit the analysis to those who I put into an "Advanced Segment" where the visitors' region was Minnesota.
Re: (Score:2)
I've noticed that when I post a new blog entry on Livejournal, it appears in Google's results within 2-3 minutes. I know that Livejournal has a public feed for all new blog entries across the site, so I assume Google must be indexing this (and presumably others).
Re: (Score:2)
I really don't see a need for something to be any more "real time" than that for someone's blog. Do you?
In rare cases like the swine flu panic, 3 minutes can be the difference between fame and obscurity.
Re: (Score:2)
Assume Google makes a new sight queue. (Score:2)
The result, your content scanned in seconds not hours or days.
Re: (Score:2)
It's like an RSS feed for Google. Just like you'd use an RSS feed to keep up with various blogs instead of visiting constantly.
Re: (Score:2)
Google's "webmaster tools" already let you set an RSS feed as the sitemap source.
Re: (Score:2)
kinda done now (Score:5, Informative)
If google notices your site/blog updates frequently the bot will come around more often and especially if its a high page rank site.
Re: (Score:2)
That is still slower, not to mention far less efficient for both parties, than event-driven updates.
Re: (Score:2)
1. Go to 4chan/b and post a unique sentence.
2. Observe how quickly stuff gets posted to that site.
3. Search for that sentence through Google
4. Be amazed that Google actually indexes this site.
Re: (Score:2)
There is no such thing as a high Page Rank site. The name Page Rank is a play on words: for one, it is the inventor's last name (Larry Page). Two, it is on a per-page basis.
Sitemaps? (Score:2)
Re: (Score:2)
that involves the googlebot hitting the site map, or you submitting it manually...
this is all automatic.
However, How is this any different from RSS? (except this is designed to be viewed by a machine rather than a human?)
Re: (Score:2)
Re: (Score:1)
However, How is this any different from RSS? (except this is designed to be viewed by a machine rather than a human?
RSS is a pull technology. I update my blog, which updates my RSS feed and the googlebot goes out and pulls my sitemap (which is my RSS feed on Blogger) and indexes any new pages. This technology sounds like I can ping Google when my site is updated and they can know there is new data for them to pull.
Re: (Score:2)
Re: (Score:1)
PubSubHubbub is push technology. So when you make a change, you submit it to a hub which in turn knows the interested parties that have asked to know about your site and then distributes it to them.
So it is more efficient since there isn't a constant polling and it is faster since there isn't a poll lag.
Re: (Score:2)
---
Internet Protocols [feeddistiller.com] Feed @
Submit, check your page rank, edit (Score:3, Interesting)
Google indexing in near realtime (Score:1)
twitter (Score:2)
This sounds a bit like Twitter. Put your content in one hole and it comes out lots of places.
Re: (Score:2)
or like...
Re: (Score:2)
zen saying: (Score:4, Funny)
"If a tree falls in the forest and no one is around to hear it, does it make a noise?"
internet era update:
"If a webpage is published on the web and no google spider notices it, does it exist?"
near future update:
"If a thought enters your mind that is not already indexed by google, is it real?"
Re: (Score:2)
Yes.
Very yes. There are many other channels of communication you can use to give the link to someone else that Google doesn't index. IRC, IM, email, paper (remember that stuff?), and so on. Even if you don't give the link to anyone, it's still not even close to analogous to the original, as a person is still around to see it, namely the s
dear Virak: (Score:2)
please drink more vodak
k thx
Re: (Score:2)
vodak? Is that like Zima?
KNOW YOUR RETARDED INTERNET MEMES (Score:2)
http://www.urbandictionary.com/define.php?term=vodak [urbandictionary.com]
Re: (Score:2)
So it's even worse then Zima! Thanks. :-)
Re: (Score:2)
I fail to see the merits of doing so, or the relevance to the topic at hand.
Re: (Score:1)
Re: (Score:1)
Yes.
Actually no, a noise is something heard by a person or animal. it makes a sound, but not a noise.
Re: (Score:2)
And it's only censorship if a government does it, right? Excessive pedantry is bad enough, but excessive pedantry with absolutely no basis in reality is particularly annoying.
I just noticed it yesterday. (Score:4, Interesting)
As usual I tried to make a tongue in cheek remark and ended up chewing my tongue. I meant Google’s indexer is so fast. Original posting was made at March 3, 2010 2:09 PM. It was in the index by March 3, 2010 5:08 PM. And it was not even from news.google.com, it is the general web search. Pretty soon Google will tell me that I’m out of milk even before I open the fridge door.
Re: (Score:2, Funny)
Pretty soon Google will tell me that I'm out of milk even before I open the fridge door.
It also knows what you did last summer. *ominous look towards the laptop in the corner*
Re: (Score:2)
Hope it isn't too far away, having my google apps account telling me what I need to restock in the fridge (or even the apartment) would be friggin awesome. Then when cookingwithgoogle.com starts up, just writing the recipe I want could give me a grocery list, instant win.
Re: (Score:1)
I'd like to put together a kitchen computer with a camera/barcode reader to keep track of what's in my fridge.
If food came RFID tagged, it would work even better. Of course RFID & food don't mix too well.
Re: (Score:2)
We should be able to build contraptions where you scan every empty carton you throw in the garbage, and it updates the inventory and emails a shopping list, sorted by the aisle for my local grocery store, thank you, to your cell phone.
Yeah, if I can think about it, I am sure someone has already done it. I am not exactly t
Re: (Score:1)
I seriously thought about this once, and realised that the supermarkets will NOT cooperate.
Ever notices how supermarkets are forever changing the location of your favourite product? They want you to walk through the whole store because that way you are likely to make additional/unplanned purchases. Having a shopping list sorted by store aisle would defeat their nefarious marketing plans.
I thought of using user-generated data to create the store maps, (i.e scan the barcode when you grab an item off the shel
Re: (Score:2)
Re: (Score:2)
This is a very fantastic idea. I would love to have something like this as, when I typically go to the grocery store, I find myself buying the same stinking food again and again (it's tough to have a good imagination when you're in a rush).
Any Google engineers out there with a penchant for cooking - this would be a great 20% time project.
Re: (Score:2)
Hope it isn't too far away, having my google apps account telling me what I need to restock in the fridge (or even the apartment) would be friggin awesome. Then when cookingwithgoogle.com starts up, just writing the recipe I want could give me a grocery list, instant win.
Some of these services annoy me because I don't want to be a creature of habit in everything I do. I personally want some variety from time to time and being able to predict individual whims is so far out in the future its not even scifi, its plain fantasy. Or maybe there is an overall pattern there, something that says routine for 4 weeks, then 75% chance of a random choice of ingredients from Wed to Fri and 95% on weekends. But if there is, I don't want to know about it and more importantly, I don't want
This makes... (Score:1, Troll)
Keep in mind Google is quickly becoming an all controlling entity.
I have concerns that this technology could expose users to additional threats.
Likely I see it as one more way for Google to corner the search market.
Lastly I ponder the legal implications of a direct tying to a web site's content. What if there is a copyright violation.
Generally I find this to be a dud tech.
Long ago we had to publish to search engines then the crawlers came and life was good.
Again automation is what made things better.
Diving
Re: (Score:1)
This was a triumph... I'm making a note here: HUGE SUCCESS!
(For the uninitiated read the letters of the start of each sentence downwards.)
I can suz google? (Score:1)
It's a pull, not a push (Score:2)
Amusingly, since this is based on Atom, the client still has to poll. It just has to poll fewer sources. The connection between the original source and the "pushsubhub" server really is a "push" connection, but the hub to client connection is not.
Also, the "pushsubhub" caches and redistributes the feeds, which means the feed operator no longer sees their own clients.
They don't seem to have addressed the general RSS problem of "server timestamp/ID changed, but content did not". Some RSS feeds get this
No really, it's push (Score:2)
The connection between the original source and the "pushsubhub" server really is a "push" connection, but the hub to client connection is not.
This isn't right. You can see in section 7.3 of the spec that the hub sends an HTTP POST to each client (subscriber) for each update; there's no polling.
Re: (Score:2)
This isn't right. You can see in section 7.3 of the spec that the hub sends an HTTP POST to each client (subscriber) for each update; there's no polling.
You're right. Which implies that the subscriber has to have a web server. Somebody will probably try a "web server in the browser" thing for browser-type subscribers.
To some extent, they've re-invented Usenet.
not that fast for me (Score:1)
Spammers delight! (Score:2)
Blog Ping (Score:2)