Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Google The Internet

SVG and the Indexing of Web Standards 98

wombatmobile writes "The world's most popular search engine company is a leading supporter of open standards. It pours money and people into initiatives that promote, assist, support and implement Web standards. As a core foundation of is mission statement, all web assets should ideally be of a kind that it can work with. Strange then, that the world's most popular search engine doesn't index all of the current important Web standards formats. Doug Schepers of W3C blogs about how Scalable Vector Graphics content is recognized and not recognized by search engines, currently and historically." Readability really helps out on this site.
This discussion has been archived. No new comments can be posted.

SVG and the Indexing of Web Standards

Comments Filter:
  • Standards? (Score:5, Funny)

    by girlintraining ( 1395911 ) on Saturday July 10, 2010 @10:35PM (#32863956)

    This is the great thing about standards: There's so many to choose from!

    • Re: (Score:1, Informative)

      by Anonymous Coward
      because almost nobody uses svg except the people who care and are aware of it and those people are able to find it. next fucking question, please.
  • by Stan Vassilev ( 939229 ) on Saturday July 10, 2010 @10:45PM (#32864004)

    I don't know why this guy is using filetype Google searches to find out how common SVG and Flash content is.

    SVG content makes up just 0.106% of all Web content, by my rough estimation. Flash is almost 5 times as common as SVG. That's pretty grim for SVG. ... But wait, let's put that into perspective. Flash is about 4.8 times more common than SVG. HTML is roughly 838 times more common than SVG. 838 times. Flash content comprises approximately 0.52% of all Web content, and HTML is roughly 189 times more common than Flash.

    Let *me* put that into perspective. Most Flash content is deployed via JavaScript, so it won't show in a Google filetype search. None of the sites with Flash I've worked on would pop SWF filetype results in Google. Saying that Flash to SVG are 5 to 1 is hilarious, given the-still-leading browser on the market, IE, supports zero SVG content (to change with IE9 which is in alpha right now).

    Saying that Flash is 0.52% of the content of the web is also hilarious. Even just counting the countless embedded YouTube players in blogs would change those numbers drastically.

    • Re: (Score:3, Insightful)

      by digitalunity ( 19107 )

      If they're counting Flash by simply comparing number of SWF files to HTML files, that might be right.

      If you were to compare traffic from flash content and compare that to HTML or other web traffic, I think you'd see that a very high percentage of bandwidth is consumed by flash video.

    • by I'm Schepers ( 900611 ) on Sunday July 11, 2010 @01:05AM (#32864534)
      Hi, Stan-

      You raise a good point, but I'm not actually talking about the actual amount of content on the web, I'm talking about how it is indexed and searchable (in this case, by Google). I'm sure that there is a lot more Flash content than my rough study indicates, and I could be clearer about that in my blog post, but for the purposes of discussing the relative representation in search results, I think it's fair to say that the presence (or lack of presence) of content is distorted by how easy it is to find it through the search engine.

      Ultimately, it doesn't matter how much Flash or SVG content is on the web... both should be indexed and represented in search results. How we get to that point, and how we can make is fruitful for people searching for the content, is the interesting question.

      • by Stan Vassilev ( 939229 ) on Sunday July 11, 2010 @07:33AM (#32865602)

        You raise a good point, but I'm not actually talking about the actual amount of content on the web, I'm talking about how it is indexed and searchable (in this case, by Google). I'm sure that there is a lot more Flash content than my rough study indicates, and I could be clearer about that in my blog post, but for the purposes of discussing the relative representation in search results, I think it's fair to say that the presence (or lack of presence) of content is distorted by how easy it is to find it through the search engine.

        Ultimately, it doesn't matter how much Flash or SVG content is on the web... both should be indexed and represented in search results. How we get to that point, and how we can make is fruitful for people searching for the content, is the interesting question.

        This has been attempted before, which, in the case of Flash, resulted of pages and pages of SERP like these [google.com].

        It's probably understandable why Google lowered the "rank" of Flash content in their SERP.

        Indexing SVG is also of dubious benefits. Flat images may be a nice addition to the images section, if search engines have a good way of recognizing those from SVG-based interactive apps, but that's about it.

        However, not all SVG files work outside the page they are embedded in, especially if they depend on related scripts. This is even more so the case with Flash, which often has its data sources loaded externally, based on parameters passed in-page. That's one more reason why people use JS for Flash embedding: it doesn't produce naked SWF files in search results, which rarely works anyway.

        Searching is about keywords and phrases, so it works best with HTML, where the majority of text is. Image search is based on the text around the image, and SVG static image search will likely work best that way as well, so there's no pressing need to try to find couple of irrelevant words in a SVG file lost among thousands of vector/color data items.

        In other words, indexing Flash/SVG seems to be a solution in search of a problem.

        • Okay, I'm not in a position to defend Flash (I was trying to be nice by giving it the benefit of the doubt), but SVG was designed from the bottom up as a text format, where visible text is exactly that, and while some SVGs don't contain any text or metadata, many do have a significant percentage of relevant textual data. You presume (I believe incorrectly) that the text in an SVG file is irrelevant, because you're comparing it with Flash. SVG is not Flash, and it's not necessarily used in the same way.

          But

          • I disagree.

            SVG, Flash and HTML don't have a meaningful difference when it comes to their abilities or structure. All three are hierarchies of content nodes, changeable via dynamic scripting. That HTML and SVG use text to demarcate the nodes and flash uses binary blobs is of no consequence, because parsers don't care whether the symbols are text or binary. The fact that HTML is easier to index than flash is not due to any inherent quality of the technology, it's due to the fact that HTML is used for static c

            • I agree, for some value of the word "dynamic".

              Scripted dynamic content often injects new content into the DOM, or replaces or removes existing content. In extreme (but common) cases, the majority of content is dynamically inserted into the DOM clientside, which makes it a very poor candidate for indexing. Without existing structure, hyperlinks, and textual content (text, metadata, titles, descriptions, etc.), search engines have little to go on.

              The kind of dynamic interactive content you're talking about,

          • SVG was designed from the bottom up as a text format, where visible text is exactly that, and while some SVGs don't contain any text or metadata, many do have a significant percentage of relevant textual data.

            Flash not being readable in Notepad, because it's binary, isn't the primary reason it's less indexable than HTML.

            Yes, SVG's official serialization format is text-based (XML-based), but that that's just an implementation detail. The internal structure of Flash is actually quite simple, and also a tree of (binary) tags, and you can easily serialize a SWF to XML and back with no loss, and you'll notice they are very similar, the only difference - Flash is somewhat wider in scope than SVG currently is (audio, t

            • You seem to be trying to pit SVG against Flash, though I'm not sure why. My point about indexing SVG has nothing to do with whether Flash or SVG is "better", or even the differences between the formats; in fact, the only reason I mentioned Flash is by way of comparison with relative usage on the Web, and by any metric, Flash is far more widely used currently. If you prefer Flash, by all means use Flash. My blog post was about technical details of indexing SVG in search engines... in that context, I'm onl

    • Google does follow links in Javascript (yes, Googlebot can execute JS these days) and the PS on these sites presumably does contain links to .swf files.

      • Re: (Score:1, Informative)

        by Anonymous Coward

        Google does follow links in Javascript (yes, Googlebot can execute JS these days) and the PS on these sites presumably does contain links to .swf files.

        Google follows JS links when it's a string literal that looks like an URL. I.e. Google will follow this:

        var gohere = 'http://example.com/123.pdf'; ... ...

        But it won't dynamically execute and detect urls constructed at runtime, for example:

        function openDoc(domain, docid) { document.location = 'http://' + domain + '/' + docid + '.pdf'; }
        openDoc('example.com', '123');

        A version of the Googlebot can execute JavaScript, that's correct, but that's not used for basic indexing, but for other purposes (aiding spam fi

    • by l0xin ( 1774918 )

      None of the sites with Flash I've worked on would pop SWF filetype results in Google.

      There's something wrong there then, if, you want the greatest exposure possible for those sites.

      • There's something wrong there then, if, you want the greatest exposure possible for those sites.

        Exposing dyanmic SWF's which just have 50 times repeated "loading... loading..." in them is hardly a good exposure.

        The correct way is to have all relevant content in your HTML fallback, which Google will find and index, and thus expose you. Flash is for making the same info more functional, and (let's face it) prettier. Works as it should.

  • From TFA... (Score:3, Insightful)

    by Darkness404 ( 1287218 ) on Saturday July 10, 2010 @10:46PM (#32864010)

    SVG content makes up just 0.106% of all Web content, by my rough estimation. Flash is almost 5 times as common as SVG. That's pretty grim for SVG.

    How is that "grim" for SVG? Flash isn't just used for vector graphics and animations anymore....

    • Speaking of which, it would be just lovely if you could import SVGs into Flash. You would think it would be something obvious for the developers to add. Maybe they really aren't the same thing? *Cue Twighlight Zone music*
  • Poorly rendered (Score:2, Offtopic)

    by Lord Grey ( 463613 ) *

    That site [schepers.cc] serves as a poster child for why Safari 5.0's "Reader" feature exists.

    • Re: (Score:3, Interesting)

      That site [schepers.cc] serves as a poster child for why Safari 5.0's "Reader" feature exists.

      And why's that? It's not especially long/tall and even when viewing a 500 pixels rendition in FF there is always a landmark, so to speak, which means one rarely loses their place. Reader is definitely handy; I think it's great for reading short stories posted online*, but I fail to see what's wrong with the site in question besides being another damn blog...

      *Preferably as .txts, but anything is better than a 10-page ad fest. One of the good things about adjustable font size is that I can decide how much tex

  • How does one search for scalable vector graphics that are in *.xml files?

    • Re: (Score:3, Interesting)

      by abulafia ( 7826 )

      I think the answer is, you don't, without a schema that tells you what to expect, at least if you want to do it right.

      More generally, you can guess how people embed it, and probably even be right a lot. But considering what a clusterfuck "XML" and things-that-look-like-XML-but-aren't, and things-that-don't-even-come-close-except-for-containing->-and-<, I'd hate to try for any general purpose use that anyone cared about.

      I think life would be better for everyone if they stopped thinking about XML as som

  • canvas (Score:3, Informative)

    by thoughtsatthemoment ( 1687848 ) on Saturday July 10, 2010 @11:33PM (#32864214) Journal
    With Apple and others pushing canvas, the future of SVG isn't looking bright. I am not sure which one is better, but SVG seems to be the one on the way out, especially after Adobe stopped supporting SVG.
    • And what, pray tell, do you think <canvas> is? Look at the spec [whatwg.org]. It's SVG written in JavaScript.

      • Well, no. (Score:3, Interesting)

        by abulafia ( 7826 )

        SVG is one displayable object type within a canvas.

        SVG is something that I wish had taken off, but I think it is doomed. It is a wonderful format - I've written a few SVG generators for various purposes, and it is clean, easy and beautiful. I think it was missing a champion - Adobe dabbled with it, but it seems like it was a hedged bet for them - they always prefer things over which they have more control, which I guess should be expected behavior by now.

        Or maybe the problem is that it is too general-purpos

        • by Miseph ( 979059 )

          I'm pretty sure the real explanation is that porn has no use for SVG.

          I'm serious. If the porn industry had any desire to deal in line art, SVG would be all over the place. But they don't, so they only really got on board with JPEG, and that's the only image type most people are even aware exists.

          I actually had to argue with other members of a group I'm in about not making our logo a JPEG, and using SVG instead. It was line art and basic text, with a potential need to make it either very small or very big, y

        • SVG is one displayable object type within a canvas.

          You don't know anything about SVG then. Read the spec for canvas, read the spec for SVG, then you'll understand what I'm saying. The native drawing commands in canvas like lineto, moveto, and bezierCurveTo are straight from SVG, which, in turn, is straight from PostScript.

          I know. I'm writing some code that converts 2D sketches in a CAD tool to and from SVG.

          • And you could do to learn how to behave like a grownup. I'm not going to dickwave about it, but have written rather a lot of code against the relevant specs; if you're working in this space, you're probably using some of my code.

            I think we're suffering from a difference of semantics. I wouldn't call PDF Postscript. Because it isn't. Once upon a time, one could be forgiven for confusing the two, and there were even PoC valid-PDF documents that you could pipe to a printer, but they're quite divergent how. Sim

            • Sorry to sound snippy; I had just been arguing with a 15-year old Apple fanboy when I posted that ... ;)

              Similarly, I don't think that, because they share imperatives, calling Canvas the same as SVG is right, because they are different things and will become more different over time.

              No doubt, but at this point the similarities are close enough that you could just about write an SVG-to-canvas converter that simply does a little text massaging. My original point is that pretending that HTML 5 canvas is some amazing new technology that we should be worshipping at the altars of Apple rather than an incremental improvement over SVG is rather silly.

          • I wrote an actionscript app 5 years ago that rendered SVG to screen using flash's drawing API, which looks a lot like canvas. None of this stuff is particularly novel, but then open standards aren't supposed to innovate, they're supposed to standardize the best practices.

      • I meant http://www.w3.org/TR/2010/WD-SVG11-20100622/ [w3.org], not SVG in general.
  • by Mabbo ( 1337229 ) on Sunday July 11, 2010 @12:10AM (#32864358)
    We *need* to get full support for SVG going. Not as a replacement for flash, or any of that (though really, they could), but just as a basic image format for non-photographic images in computers. Vector graphics scale beautifully, work well with screen magnifiers for the visually impaired, are lightweight, easy to make and edit by hand (it's xml!).

    You could implement whole web-apps as a single SVG file if you so desired. That is, if all browsers had full support of SVGs- and as my job this summer is in part to work on WebKit SVG support, let me assure you, nobody is fully compliant yet. But we're getting there. (Damn you Sub-resource loading!)

    • Vector graphics scale beautifully

      I was of the understanding that scaling was the singular purpose.

      Fish swim beautifully.

      • Re: (Score:3, Insightful)

        by BasilBrush ( 643681 )

        Not quite the singular purpose. For images that suit line art:
        2) ...media size tends to be lower with a vector representation.
        3) ...the media remains editable at the line/object level rather than at the pixel level.

    • easy to make and edit by hand (it's xml!).

      That is so 1970. Get a draw program.

    • We *need* to get full support for SVG going.

      Unfortunately, it's not going happen any time soon. What client wants a website with a feature that doesn't work in IE? Never mind IE 9, the websites I build still have to support IE 6! So even assuming IE 9 has good SVG support, SVG won't have a chance on a typical corporate site until IE 9 is in the position that IE 6 is now: the minimum that anyone worries about, with IE 6, 7 and 8 just distant memories with a trivial amount of users who can be ignored the way

      • by kanto ( 1851816 )

        Sometimes you have to leave people behind... I nominate the ones using IE.

        Back in the day you couldn't use it because just visiting a site would replace your bookmarks with pages of ladies with negotiable virtue. Nowadays, last I looked at it, it's an opera clone; but hell, I guess at least they're mimicking the first class look and feel.

      • Re: (Score:3, Insightful)

        by BasilBrush ( 643681 )

        IE8 and earlier are about 50% of browser use out there. So something approaching 50% of browsers in use support SVG. And those that don't can use plugins.

        Flash managed to get accepted purely through the plugin route.

        • And those that don't can use plugins.

          The sorts of PCs that have Internet Explorer as the only installed web browser, especially versions of IE before 8, are also the sorts of PCs where the user lacks the credentials to become an administrator. So even if you do use the Google Chrome Frame plug-in, users are only going to see a password prompt when trying to install it.

          • Again, Flash managed to get accepted as a de-facto standard entirely through the plugin route. So the hurdle you mention clearly doesn't prevent adoption of a standard.

            • Re: (Score:2, Interesting)

              by BitZtream ( 692029 )

              Yea, when you come bundled by default on 95% of the PCs in the world, the fact that its a plugin means its just like any other plugin you'd try to get installed later.

              Flash comes with Windows, all users think is that its being upgraded which is perceptually different to an end user than installing a new plugin, regardless of the fact that the one included with the OS is more or less useless nearly 10 years down the road.

              • You fail to realise that most corporate environments (you know, the one's running IE6) have standard desktop builds they load onto their desktops after they get the computer from the manufacturer.

      • So, why exactly are you still developing sites for IE? Do you also think that Home Sapiens should be concerned with Homo Erectus, Cro Magnun, and Neanderthal? Should we reserve seats in the UN for them, or what?

        • It's not my decision, it's the client's. They just want sites to work in all common versions of IE, automatically. I don't like it, but they don't want to hear lectures on web standards and progress.
    • I completely agree that SVG is a great standard that should be preserved and used more often. However, functionality takes a back seat to status qou and corporate sponsorship. Browsers and viewers tend to spend more energy supporting what people already use unless they see corporations pushing some new flashy thing, at which point they will support that too. People tend to only use those things which are well supported.

      It's really a chicken vs. egg problem. There may be the occasional technology that finds

    • Re: (Score:3, Informative)

      by pjt33 ( 739471 )

      easy to make and edit by hand (it's xml!)

      Assuming that you can look at a series of coordinates for Bezier control points and visualise the result. If you can, I venture to suggest that you're exceptional.

      • If you've ever studied Bezier splines*, that's really not so hard. Catmull-Rom splines or B-Splines, on the other hand...

        * The reason I've studied them is because we have a horribly outdated computer graphics course, at my uni.

        • The reason I've studied them is because we have a horribly outdated computer graphics course, at my uni.

          Ditto here, just last here.

      • Assuming that you can look at a series of coordinates for Bezier control points and visualise the result.

        It's not that hard. Put your on-curve points at horizontal, vertical, or 45-degree slope, and then put the off-curve control points at the intersection of the tangent lines. This involves a bunch of zeroes and the occasional subtraction, but one can get the hang of it.

        Exercise: Draw a circle with eight Bezier curve segments. (Answer [wikimedia.org])

        • A few years back I had to port an "ellipse segment" drawing command to a set of quadratic curves because that was all that actionscript supported and the segment was part of a longer path that had to be rendered as a whole path (filled). It took me quite a while to get the hang of it, and I documented the crap out of that code to never forget how it worked. But yes, the resulting code just chops everything up into smaller ellipse segments until they get small enough to turn into a quadratic arc.

    • Webkit already supports canvas [wikipedia.org] (which WebGL [wikipedia.org] builds on), why does it need two drawing methods?

      • It does not necessary need Javascript for once? And is Canvas in CSS background such a good idea?

      • You might argue that webkit doesn't need HTML anymore either (except for the canvas tag). Canvas and javascript can render anything, like the bespin web-based code editor, which uses only canvas.

        Canvas and SVG serve very different purposes.

      • Sorry to go slightly offtopic, but I think we should provide for each type of general use a separate subspecification, more suited to the venture at hand. For instance, I've alway wondered where should I raise the question of static type inference and OpenCL type extensions to JavaScript, in order to accommodate in-browser codecs.
    • ... the lack of support for SVG is one of the reasons why the GNU/Linux distro timeline [futurist.se] keeps PNGs in addition to SVGs even though they are inferior (file size and hyperlinks).

    • by PJ6 ( 1151747 )
      I agree that SVG is a good thing, but the ubiquitous attitude of "its in XML, we don't need to have a proper application with a UI that writes it for the user" needs to go strait to hell. HTML can go there, too.
  • err sorry there is no god given right for all of your pages to be indexed, Google only indexes pages it considers important and that will (in Googles opinion) add value to a searcher.

    What information in a SVG would be useful to a searcher? And SVG and flash are difficult for Google to spider and extract useful information from.

    For such low quality pages why should any search engine waste resource on it? Its just wasting their resources and also a svg page counts towards a sites page cap so its not go
    • Excuse my ignorance, but what is "a site's page cap"?

      • the max number of pages that google will index on a given site only realy aplies to the realy big sites
    • What information in a SVG would be useful to a searcher?

      Text.

    • Re: (Score:2, Insightful)

      by AxeTheMax ( 1163705 )

      What information in a SVG would be useful to a searcher?

      Think of maps and the text in them for a start.

      • bit dificult to get at that data in a meaningfull way I supose if they marked it up using rdf might help a bit- But Google can buy that data or get at it in another way Google LBC for one.
  • SVG content makes up just 0.106% of all Web content, by my rough estimation.

    Wow, that guy's rough estimates are up at the permille level. Impressive.

  • The world's most popular search engine company is a leading supporter of open standards.

    Bing uses open standards?! o_O I guess Microsoft has changed....

  • Comment removed based on user account deletion

"What man has done, man can aspire to do." -- Jerry Pournelle, about space flight

Working...