Google Now Searches JavaScript 114
mikejuk writes "Google has been improving the way that its Googlebot searches dynamic web pages for some time — but it seems to be causing some added interest just at the moment. In the past Google has encouraged developers to avoid using JavaScript to deliver content or links to content because of the difficulty of indexing dynamic content. Over time, however, the Googlebot has incorporated ways of searching content that is provided via JavaScript. Now it seems that it has got so good at the task Google is asking us to allow the Googlebot to scan the JavaScript used by our sites. Working with JavaScript means that the Googlebot has to actually download and run the scripts and this is more complicated than you might think. This has led to speculation of whether or not it might be possible to include JavaScript on a site that could use the Google cloud to compute something. For example, imagine that you set up a JavaScript program to compute the n-digits of Pi, or a BitCoin miner, and had the result formed into a custom URL — which the Googlebot would then try to access as part of its crawl. By looking at, say, the query part of the URL in the log you might be able to get back a useful result."
Really? (Score:5, Insightful)
Googlebot will have a very quick timeout on scripts and probably wont be more powerful than a standard home computer. How would that be useful for calculating digits of pi or bitcoin mining? It would take far longer than doing it the conventional way.
Incremental and/or parallel computing? (Score:5, Interesting)
Re:Incremental and/or parallel computing? (Score:5, Funny)
I already do this using a system of CNAME's in a .xxx domain.
Re: (Score:1)
The same reason why 72 hours of video is uploaded to YouTube every minute.
Re: (Score:2)
I realise that the kind of idiots who like Bitcoins will be the same fools who drool over Google, and that these same monkeys won't see any problem with providing an algorithm which generates a secret to a third party for execution,
Bitcoin mining doesn't involve any secret information.
I'm not sure why you're slagging "idiots who like Bitcoins" so much, either. Sure, Bitcoin has attracted some cranks, anarchists, people who don't trust government-issued money, and speculators who will say all manner of things in attempts to influence the price of Bitcoins (both up and down), but have you actually looked at the crypto and the system of incentives built into the Bitcoin system? It's brilliant, and it's basically the micropayment system
Re: (Score:1)
It seems to me though that there's no reason to limit this to googlebot any Javascript interpreter will do.I'm surprised if nobody from the blackhat community doesn't have this up and running for
Re: (Score:1, Interesting)
Re: (Score:1)
Re: (Score:2)
Intent.
Prove it. Seriously. You wouldn't be able to.
Re:Incremental and/or parallel computing? (Score:5, Interesting)
Anyone wanting to do this would be doing it on a dedicate website. They wont care about the domain or IP address being blacklisted from Google. And good luck with the theft of service charge, they never asked Google to index them. They did not even agree to any terms of service from Google. As I said, good luck.
Re:Incremental and/or parallel computing? (Score:5, Informative)
Stop trying to teach what you don't understand (Score:1)
Right. That is exactly what I said. The standard for the internet is well defined. You should read about it [wikipedia.org]. If you make a web page available to the internet without a password, captcha or firewall, etc. you are making it available to all. You have already purposely accepted the condition ahead of time. This is opting in [thefreedictionary.com]. The robots.txt allows you to opt-out instead. If you opt in by placing it on the internet available to web crawlers and
Re: (Score:2)
Re: (Score:1)
Why do you keep reiterating my point for me and then saying I didn't make my point? If you don't create a mecha
Re: (Score:1)
If you don't create a mechanism to keep Google out (e.g. robots.txt) then - by your own admission - you have opted to allow Googlebot to read what you publish to the world.
Allowing Google to do something does not mean asking Google to do it. Allowing does not involve "service".
Re: (Score:1)
Re: (Score:1)
You said "theft of service". If you had read the second sentence, it said allowing does not involve service.
For example, you "allowed" me to pray for you. I pray for a fee. You are hereby charged with theft of service.
Re: (Score:1)
Re: (Score:1)
I have "studied" it quite well. So you are saying you cannot be charged with theft of service until you arrange with god to benefit from my prayers.
Google is doing what it wants to do. It doesn't become theft of service just because someone benefits from it.
Re: (Score:2)
No. I am saying that you shouldn't try to make analogies, because you suck at it.
Also, don't waste your time studying things if your definition of 'quite well' results in the level of complete misunderstanding you have managed to acheive. Just accept that you aren't smart enough
Re: (Score:1)
Please try to be funny when you troll.
Re: (Score:2)
Re: (Score:1)
You are demonstrating the act of trolling quite well. Only thing left is for the observer to know the name of this internet behavior. For an experienced internet user like me, it was very simple, thank you. Your posts can go into textbooks to illustrate trolling to help people less well informed than me. Thanks for community service.
Re: (Score:2)
Re: (Score:1)
Sorry, I am not writing the book I mentioned. I just hoped someone would. Though you can add your above post as an illustration in your own book, as I hadn't mentioned I intend to write any book like mentioned but you concluded it anyway. You are quite the person to write a book on "clueless morons". Being one yourself is quite a help, I am sure.
Re: (Score:2)
That's OK. It's probably for the best. I've seen your writing.
It is always good to have hopes and dreams, even if they are phenomenally unrealistic. For example, I hope you get a clue someday.
You really are a dim bulb there, Sherlock. Have a nice life in fantasy lan
Re: (Score:1)
You really are a dim bulb there
Even though it was you that drew the wrong conclusions?
Anyway, don't worry. This is the best you can come up with, at the moment, but next year you are sure to think of a witty reply. Keep trying.
Re: (Score:2)
I didn't draw the wrong conclusion. I was making the point that the only way a book will be published that uses my post as an example of trolling is if you write one yourself, and the only way any book you write would be published is if you pay someone to publish it. Alas, you are too dim to figure these things out, so:
PLONK [netlingo.com]
Re: (Score:1)
the only way a book will be published that uses my post as an example of trolling is if you write one yourself
Unsubstantiated
the only way any book you write would be published is if you pay someone to publish it
Ditto. Also, a "book" need not be "paid published" to be called a book in these days of e-books.
Anyway, I was just making fun of your stupidity. Alas, the same quality of yours makes you unable to understand it.
Re: (Score:2)
If they then download the my javascript experiment and run it at their cost, that's their problem.
When I can trust crawlers to not ignore my robots.txt I'll stop using fail2ban on my apache logs.
Re: (Score:1)
There is no reason to believe, as the research is scant at best, that Google even respects a robots.txt file. They are a vacuum hose attached to an analytic engine, easily metaphorized to Steven King's Langoliers.
Here's your sign (Score:2)
From the preceeding link: "Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler. Visit http://code.google.com/web/controlcrawlindex/docs/faq.html [google.com] to learn how to instruct robots when they visit your site. You can test your robots.txt file
Re: (Score:2)
Research is scant? It's ridiculously easy for anyone with a webserver to verify if Google respects robots.txt.
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
So you are saying that someone would go through all the trouble of registering the domain, creating the code, and getting (or waiting for) Google to index it, then wouldn't care that Google would cease to execute the actual code before the desired results are obtained? Re-read what I wrote. I merely said it would be blacklisted quickly. I didn't say that it woul
Re: (Score:2)
So far blacklisting has worked pretty well for Google. Google has used it well to punish black hat SEO techniques.
In this case though, if I dont care about my page rank, I would simply create tons of long length domain names for pennies (+icann fees). I would use few at a time and would care if Google blacklisted few at a time (I would be storing partial results, just like one of the parent mentioned, and the takeover should be seamless). It doesnt take a lot to recoop your domain name fees if your task is
Re: (Score:1)
Dedicated hardware is cheap, and designing software costs a lot of money and time. What you are proposing would be ridiculously convoluted and costly, even disregarding the legal ramifications. We software engineers often talk about using the right tool for the right job. Your outlandish proposal ignores numerous sound engineering principles, not the least of which is adhering to this simple maxim.
Re: (Score:2)
May be not. But if someone wanted to do it just for the heck of it, it can be done. It may not scale very well, otherwise I dont see issues at all with it.
Re: (Score:1)
Re: (Score:3)
I think you missed the "just for the heck of it". I understand my approach is not the practical one, and any sane person would just use their resources to do what little can be done and implement it on their own hardware. But it does it mean it cannot be done in a no loss way. Say I want to calculate the last 100 digits of Graham's number, it is can be split into multiple calculations, a sub result calculation can take less than a second (which is what I assume Google will limit the runtime to). The bandwid
Re: (Score:1)
Re:Incremental and/or parallel computing? (Score:4, Insightful)
Your JS would generate HTML on the client side. Just generate a link that your server can understand. Google bot, doing what it does, will try to load this URL. When it does, the server stores this result, and generates a new problem for GoogleBot to solve. This is the basis, for the article and the entire comment thread.
Re: (Score:1)
Like I said, you are making assumptions about Googlebot. You seem to think that they have no idea how to sanitize an input and will just execute whatever you send them byte for byte. That's not going to happen.
Re: (Score:3)
Er, they are looking for JS that generates HTML (So this is not an assumption). The purpose of GoogleBot is to index. If they run the JS and dont even index the results, it is makes no sense.
Would you mind specifically mentioning what I assumption I am making. And there is no way to Sanitize JS (JS is a turing complete language, there is no way (atleast as far as present day research) to santize it in any reasonable way)
Re: (Score:2)
Soory about the typos, I guess I need to get some sleep.
Re: (Score:2)
I feel honored to have been considered a Google employee. Well, not really. Is there is something wrong with my point, that it sounds Fanboish or Employeeish?
Re: (Score:1)
Re: (Score:2)
Wait a minute, are you suggesting that having spiders run my javascript x86 emulator which runs jruby scripts which mines bitcoins, isn't practical?
Simply another example (Score:1)
why having other parties fetch your arbitrary code and execute it is such a wonderful idea.
Re:Simply another example (Score:5, Funny)
A much more likely application (Score:5, Interesting)
Send Google JavaScript which generates different results for Google than for normal visitors, in order to rank up the site.
Re: (Score:1)
That's an interesting idea and much more insidious than mine, which was to simply send nothing to Google and fuck 'em.
Not allow your site to be indexed by Google? Yeah, that'd really fuck Google up good, wouldn't it?
Re:A much more likely application (Score:5, Funny)
What is this method you have written, "sudo_mod_me_up?"
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
You don't need JavaScript for that. A lot of servers serve different HTML to Google than to us. It's especially noticeable when searching for a rare term; Google will show you results that appear to contain the term, but without relevant context (only mystifying unrelated terms) and when you open it the page turns out to have some completely different subject.
Re: (Score:1)
I noticed this in a PHP attack script earlier this year. It installs a script pointing to a Russian malware domain, but only inserts it in the page if the user agent is not GoogleBot or a few other spiders. It also checked for some Google ip ranges. Surely Google must be combating this by doing some stealth spidering, otherwise SEO and malware providers will game them if they stick to their classic robot rules.
Re: (Score:2)
Re: (Score:1)
The point is, with Google executing JavaScript you could make it less obvious, by just having the JavaScript depend on some difference between the Google and the Browser JavaScript execution (maybe timings of certain rendering operations).
Also, it might be used through XSS, to have competitors delisted.
Re: (Score:1)
Re: (Score:2)
I would be surprised if the googlebot didn't try everything to appear to the server like a normal user browser. Even better would be to crawl a site while in disguise, then again while not disguised. Differences would affect the sites ranking negatively.
Re: (Score:1)
Serving different content based on IP or self-identification is possible even without JavaScript. However if the detection makes use of peculiar behavior of the JavaScript implementation (and the JavaScript implementation will have to have some differences, or else it won't find content which is initially hidden, but unhidden by an user interaction), just fetching from a different UI or with a different browser/spider identification doesn't work.
And BTW, the spider will certainly expose itself from the very
Re: (Score:3)
By "gracefully degrading" do you mean "if (useragent == 'googlebot') { random-spamwords(); paywalled-content(); links-to-every-parsable-uri(); }"?
Re: (Score:2)
Re: (Score:2)
I noticed this already some time ago. (Score:1)
Re: (Score:2)
Re: (Score:1)
Re: (Score:3, Funny)
Also, the dry cleaning that you dropped off on Thursday is ready for pick-up and your driver's license expires in three months.
Sincerely,
The Slashdot Citizens Brigade
Re: (Score:3)
Now that you said it. The preview Google shows of one of my sites has all the CSS aplied, including some that is aplied by javascript after the page load.
so much for (Score:5, Insightful)
using javascript to hide or obfuscate email addresses to help protect them from spammers, scammers and bots.
thanks fer nuttin, google.
Re: (Score:3)
Re: (Score:3)
Do you think spammers scraping the web for email addresses respect robots.txt?
Re: (Score:3)
Uhm, years ago one could already do that using SpiderMonkey and some Perl. It's what I used to report nasty redirects in Blogspot/Blogger to Google (thousands and thousands). It took me some time, but Google did see the light and the problem was resolved.
Why do people keep thinking that spammers are retards? If it can be abused, it will be. And spammers/cybercriminals are among the first to do so.
Re: (Score:1)
Evaluate JavaScript on the client (Score:1)
Now Google controls the client, the search engine and the analytics it should not be too difficult for them to see how traffic is flowing between sites. Pages need not even be physically linked for Google to see a connection. E.g. reading an article on the BBC may cause people to search for a company. With people signing into Chrome Google Google must have some very rich logs.
Google has been doing this for quite some time (Score:2, Interesting)
Although maybe not quite in the same context. Google used to display javascript-munged email addresses in their search results until some of the larger sites involved, such as Rootsweb, complained.
GET vs POST (Score:1)
I really hope website developers and web application developers know the difference between GET and POST requests.
Else, this could turn ugly.
Re: (Score:2)
Re: (Score:2)
Google adding potential security holes in its bot? (Score:1)
I can already picture hackers drooling at the idea of turning Google's cloud into the ultimate zombie network.
Chrome (Score:3)
I for one welcome the Javascript spamming. (Score:1)
Re: (Score:2)
Re: (Score:2)
They don't need to run the scripts (Score:3)
You don't need to actually run the scripts, most of the time it's enough to just scrape the strings and links out of them.
WTF? (Score:2)
Oh yeah, fuck accessibility. Fuck the web in general. "It's better for everybody". That's literally all you need to know. "Just go ahead and remove that from your robots.txt".
I'm not saying there may not be good reasons (e.g. having the CSS and Javascript actually makes it possible to detect invisible text and whatnot, without that search engines may not even have a chance), but I really would appreciate some good reasoning, not being talked to like a fucking 5 year old.
Or hey, how about adding that "of cou
Spammers! (Score:4, Informative)
They've been testing this for a while - We've already had the first complaints against someone spamming an email that only exists in exactly one place: Online as the result of some (trivial) javascript. Turned out that if you Googled the page, the result snapshot included the javascript generated email... In other words - it's already there and this will effectively kill javascript as a way of hiding functioning mailto links. Okay it would be fairly simple to add a condition based on the User Agent as GoogleBot is easily identified but it will make things a bit more complicated for the average user.
Re: (Score:2)