Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Google The Internet

Google Indexing In Near-Realtime 79

krou writes "ReadWriteWeb is covering Google's embrace of a system that would enable any Web publisher to 'automatically submit new content to Google for indexing within seconds of that content being published.' Google's Brett Slatkin is lead developer of PuSH, or PubSubHubbub, a real-time syndication protocol based on ATOM, where 'a publisher tells the world about a Hub that it will notify every time new content is published.' Subscribers then wait for the hub to notify them of the new content. Says RWW: 'If Google can implement an Indexing by PuSH program, it would ask every website to implement the technology and declare which Hub they push to at the top of each document, just like they declare where the RSS feeds they publish can be found. Then Google would subscribe to those PuSH feeds to discover new content when it's published. PuSH wouldn't likely replace crawling, in fact a crawl would be needed to discover PuSH feeds to subscribe to, but the real-time format would be used to augment Google's existing index.' PuSH is an open protocol, and Slatkin says that 'I am being told by my engineering bosses to openly promote this open approach even to our competitors.'"
This discussion has been archived. No new comments can be posted.

Google Indexing In Near-Realtime

Comments Filter:
  • by Pojut ( 1027544 ) on Thursday March 04, 2010 @01:02PM (#31359520) Homepage

    ...someone help me out here. People can still find my articles through google before I see the googlebot hit any new articles I post...how is that possible? How would my pages show up on google before the bot actually crawls them?

  • by Rogerborg ( 306625 ) on Thursday March 04, 2010 @01:10PM (#31359672) Homepage
    GOTO Subject
  • by garcia ( 6573 ) on Thursday March 04, 2010 @01:17PM (#31359758)

    My site is by no means something high traffic but Googlebot indexes my pages (and shows them in search results) within three minutes:

    crawl-66-249-65-232.googlebot.com - - [04/Mar/2010:10:33:34 -0600] "GET /current-crime-decline-to-cause-public-safety-cuts HTTP/1.1" 200 47330 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    I really don't see a need for something to be any more "real time" than that for someone's blog. Do you?

  • by 140Mandak262Jamuna ( 970587 ) on Thursday March 04, 2010 @01:34PM (#31360018) Journal
    Funny I just posted this yesterday in Pandas Thumb [pandasthumb.org]

    As usual I tried to make a tongue in cheek remark and ended up chewing my tongue. I meant Google’s indexer is so fast. Original posting was made at March 3, 2010 2:09 PM. It was in the index by March 3, 2010 5:08 PM. And it was not even from news.google.com, it is the general web search. Pretty soon Google will tell me that I’m out of milk even before I open the fridge door.

  • by garcia ( 6573 ) on Thursday March 04, 2010 @01:37PM (#31360054)

    maybe I just suck, I don't know, but it seems like it takes a week or two before people start really reading what I write, they always seem to read what I wrote a week or so ago instead of the new content.

    As you write more often (say on a specific time schedule and daily) the people who don't read via RSS (which in my case is the majority of my readers) will learn to make going to your site a part of their daily routine and thus your visits on new material will go up.

    I watched visiting trends, by hour, over the last two years in Google Analytics and picked 7:30 AM and 10:30 AM as the times to post material. It seemed as if most people were checking once in the morning when they got to the office and once at breaktime/lunchtime around 11 AM. To account for some of the time variance seen across those two years I went with 15 minutes earlier than the stats showed. Seems to work for me.

    Good luck.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (5) All right, who's the wiseguy who stuck this trigraph stuff in here?

Working...