Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
The Internet The Media

Ask Slashdot: What Happened To Semantic Publishing? 68

An anonymous reader writes There has always been a demand for semantically enriched content, even long before the digital era. Take a look at the New York Times Index, which has been continuously published since 1913. Nowadays, technology can meet the high demands for "clever" content, and big publishers like the BBC and the NY Times are opening their data and also making a good use of it.

In this post, the author argues that Semantic Publishing is the future and talks about articles enriched with relevant facts and infoboxes with related content. Yet his example dates back to 2010, and today arguably every news website suggests related articles and provides links to external sources. This raises several questions: Why is there not much noise on this topic lately? Does this mean that we are already in the future of Online (Semantic) Publishing? Do we have all the tools now (e.g. Linked Data, fast NoSQL/Graph/RDF datastores, etc.) and what remains to be done is simply refinement and evolution? What is the difference in "cleverness" of content from different providers?
This discussion has been archived. No new comments can be posted.

Ask Slashdot: What Happened To Semantic Publishing?

Comments Filter:
  • we have these newspaper boxes in NYC as well and they hold a lot of local and foreign language newspapers where people advertise local contractor services as well as rooms in their not so legally modified homes that were meant for one family.

    stuff that people usually don't advertise through your internet ad agencies

  • No (Score:5, Funny)

    by bhcompy ( 1877290 ) on Tuesday March 24, 2015 @12:17PM (#49328757)
    I don't want Symantec publishing. Costs too much to renew every year while hogging all my available CPU and RAM
  • The publishers are (slowly) moving from simply copying plain-text, which they used to print (on dead trees), to web-sites, where hyper-linking is possible.

    That's all you need — usually there is no reason to corral the links into a separate "info-box".

    As the print-magazines wane [medialifemagazine.com] and digital ones rise [stateofthemedia.org], this realization will come to the (still) technically-illiterate journalists and even their editors.

    Meanwhile here on Slashdot (and other forums, where links are allowed), there is simply no excuse for

  • Not always clever. (Score:5, Insightful)

    by wcrowe ( 94389 ) on Tuesday March 24, 2015 @12:22PM (#49328827)

    There is a fine line between "clever" and "annoying". Very often, what gets considered as "related" content, is only tangently related, and sometimes the way it is displayed makes it indistinguishable from the content of the current article. Add to that all of the surrounding clickbait, and it just becomes a confusing mess.

    • Or it's "related" in some obscure way, but entirely unhelpful. When a journalist writes a science/tech related article, the "infobox" should contain the references consulted. When the journalist is writing about an incident that occurred, I'd like to see transcripts, reports from investigators, etc. that the journalist drew from to write the story.

      More often than not it seems like they make stuff up or attempt to assemble things they don't understand into a narrative that "seems" plausible but may not be su

      • by wcrowe ( 94389 )

        Bingo. That is also a problem. Too often the article raises more questions than it answers.

    • Well, if it was actual semantic content provided as such to an aware browser, then it would decrease annoyance by giving the user more control.

      Unfortunately for the summary, links are not in fact semantic content. You can have more, or less, links, and you haven't done anything with regards to semantic content. What you need is computer-understood meta-data, including links, that is separate from the main content, follows standard conventions, and can be used by the client software to give semantic informat

    • What's that a big problem with semantic ads when they first came out?
      People would go look at a news article for a someone that had been burned to death and get ads for BBQs.
      If it's the same thing that is.
    • by Anonymous Coward

      That's just to keep you on their site, so they get more ad impressions. It has nothing to do with the linked material being related, although I assume that would help.

      Although more clickbait is also effective, and in that case it doesn't even matter if you read the article, just as long as you keep clicking on links and generate ad impressions.

      On-line journalism isn't even about content. Just about getting people to load a page on your site, and ideally keep them loading more pages on your site.

  • by TuballoyThunder ( 534063 ) on Tuesday March 24, 2015 @12:26PM (#49328867)
    I hate, hate, hate, hate web pages that have hot-linked words with popups. It is even worse when it is an advertisement. And those "recommended articles" at the end are just as bad. Click-bait links to content that is of no value.
    • by gstoddart ( 321705 ) on Tuesday March 24, 2015 @12:39PM (#49329023) Homepage

      Sadly, almost all new "innovations" on the web are almost immediately co-opted by advertising, which more or less renders the technology as crap to be blocked.

      It's all about monetizing, and nothing to do with an improved experience.

      The internet has more or less been ruined by marketing.

      • by ceoyoyo ( 59147 )

        It's our fault. We abhor anything on the Internet that's not free. Where people are in the habit of paying for things, the providers of those things worry about quality.

        • by qpqp ( 1969898 ) on Tuesday March 24, 2015 @02:59PM (#49330489)

          It's our fault.

          It's Eternal September all the way down.

          Where people are in the habit of paying for things, the providers of those things worry about quality.

          Bullshit. The Internet was a fine place before youtube and google and continues to be so now. It just became more convenient, for everyone. Including the parasites.
          Go look at other segments of the Internet: email, ftp, irc, jabber, torrents... dominated by quality-oriented mentality!
          Look at linux (the systemd debacle notwithstanding;) ), BSD, the open source community in general... Sure, a lot is paid for, but even more is driven by enthusiasm first and foremost.

          • Go look at other segments of the Internet: email, ftp, irc, jabber, torrents... dominated by quality-oriented mentality!

            Technically email has become dominated by spam, but other than that......

          • by ceoyoyo ( 59147 )

            I'm not sure I really follow your argument, but the open source community seems like a reasonable example. Linux is paid for - big companies sink billions of actual dollars into it, and contributors put in even more value in time. Quality, in the things that are important to the people contributing to it, is high. Quality in the things that are not important to contributors, but are important to many of the people who do not contribute? Not so high.

            Quality is also high in ad encrusted click bait sites -

            • by qpqp ( 1969898 )

              I'm not sure I really follow your argument

              Well the other services (except for email, obviously) are largely run by volunteers and don't even have ads (spam notwithstanding).

              Quality in the things that are not important to contributors, but are important to many of the people who do not contribute? Not so high.

              Now I'm not sure that I follow. Sure, there's lots of stuff that lacks the polish of countless missing man-hours, but we've all come a really long way since the 80s/90s. I'm sure we'll get there if we don't fuck up before that.
              I've also seen lots of examples of features that were unimportant to the contributors, but since there was an itch to scratch e.g. in getting recognition

        • It's our fault. We abhor anything on the Internet that's not free.

          Think about how much of the free internet you would be unwilling to pay for. Now imagine how much your life would be improved if all that were gone.

          Most of the internet is now just click bait, and would only be improved by removal.

          • by ceoyoyo ( 59147 )

            I agree. Now turn it around. Think of all the things on the Internet you WOULD miss if they were gone. Now think of how many of them you would be willing to pay for. Think of the number of times you've seen the term "paywall" used on Slashdot.

            • Think of all the things on the Internet you WOULD miss if they were gone. Now think of how many of them you would be willing to pay for.

              Most of them, actually (and I have, from time to time). I think a lot of people would be willing to, when you consider that the average family pays $90 for cable (not including internet).

              The primary difficulty would be finding out about new interesting things that you might be willing to pay for if you knew about them.

    • If it was really semantic content, then your client (browser) could walk the graph of related (advertised) documents from those links and provide all sorts of information. For the advertising to be semantic, it would need to be wrapped in some sort of standard API or descriptive (semantic) access method that flagged it as advertising. You could then, in a good client, turn off all the advertising links, and even substitute dictionary entries with the same keyword.

      Semantic access is exactly that; providing t

    • As far as I can tell from the article linked to, it means "auto-generated content." For example, a page that shows all the scores in the college orange-hoop-ball finals might be auto-updated when a team gets a score.

      It should be obvious that auto-generated content can't replace human generated content (unless we invent AI), because humans want to see new things that lead to deeper understanding. It should be obvious but "you won't believe what happens next when when Selena auto-generated this tweet!" kin
    • by Megane ( 129182 )
      Usually those ad links are done after page load by a script. If you can find out which script is doing that, it's not hard to tell Ad Block Plus to block it. Stuff like that gets a whole-domain block from me because the domain is usually from a company that does nothing other than web ads, thus nothing of value is lost.
  • People don't want "clever". They want "shiny".

    And if web pages where every other word is a hyperlink of dubious value, then I'm afraid "semantic publishing" is a buzzword for "annoying and intrusive".

    Some of us still prefer to read a single, coherent article by someone who can write in English. You want to put foot notes at the bottom, go ahead.

    But, please, don't give me the blinking and whirling semantic web whereby every move of the mouse updates your AHDH-laden site.

    • by qpqp ( 1969898 )

      But, please, don't give me a blinking and whirling semantic web whereby every move of the mouse updates your AHDH-laden site.

      FTFY. The semantic web is a vision that has little to do with what you described:

      According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries".[2] The term was coined by Tim Berners-Lee for a web of data that can be processed by machines.[3] While its critics have questioned its feasibility, proponents argue that applications in industry, biology and human sciences research have already proven the validity of the original concept.[4]

      (From the related Wikipedia [wikipedia.org] article.)

      • If the Semantic Web is so wonderful, then why did it fizzle out?
        • by qpqp ( 1969898 )

          why did it fizzle out?

          I think it's too early to say that it did. Scholar [google.com] has 10.5k hits for articles from this year alone...

        • I think it mainly didn't catch on because it meant that you had to manually add a lot of markup to make your site (machine-readably) "semantic". Nobody was willing to make that effort.

          Things have changed now that web sites are usually generated, having a separation of HTML templates and database/structured content. This makes it easier to make the structure you have in your backend available to the browser, e.g. using schema.org annotations or others. IMDB has metadata using the Open Graph Protocol (http://

  • I think they're just anti-semantic.Publishers probably think they have a superior knowledge base.

  • I think there are two reasons why the whole rdf(s)/owl annotated web pages never really gained traction. First of all it's hard work if you have to do it manually, but most content management systems now offer some kind of key word adding feature though. The second reason, IMO, is that the current Big Data and Machine Learning techniques (and more computing power / persistence media / bandwidth than 15 years ago when the whole rdf/owl thing took off) trump the whole categorization and knowledge extraction /
    • by qpqp ( 1969898 )

      [...] the current Big Data and Machine Learning techniques [...] trump the whole categorization and knowledge extraction / data mining process [...]

      Could you please explain, how a statistical approximation can trump an exact model? I think that big data & co. is a step in the right direction with the means that we currently have available and that we'll get there eventually. There's too many benefits that would result from doing it properly to neglect the required effort.

      • I don't think it's that computers and machine learning really trump an exact model. It's more that manual curated semantic information is difficult to do well and even when done well is simply the curator's interpretation of the key points. Ontologies and controlled vocabularies (necessary to make semantic solutions work) are always biased towards their creators view of the world. Orthogonal interpretations rarely fit with the ontologies and require mapping between knowledge systems. Rather than simplifying

        • by qpqp ( 1969898 )
          I agree that the tools are currently insufficient (though quite powerful, e.g. Protege), but I also believe that it's quite possible to achieve a high level of accuracy by combining better tools, dividing the problem space and working on killer features that require this higher level of abstraction.
          Ideally, people (at first for industrial applications) would recognize the need for a proper machine-readable representation of the different states of a specific environment, so that eventually the different o
      • To clarify, I don't think statistics (ML) would give a better model than an 'exact' - manual - model. I was more speaking in the sense of a 'good enough' system which is also scaleable.
        • by qpqp ( 1969898 )
          I admit that I took that quote a bit out of context. I apologize.
          But as mentioned above [slashdot.org], I think we just lack a killer feature. And people do use semantically enriched data (also in addition to ML), mostly research, but some do actual work.
          • No you didn't :) It was a valid argument.
            However this semantic enhancement requires a couple of things: the model (ontology) must be defined by consensus. A model is by definition an incorrect representation of reality. Hence even with a manually crafted model ontology, it still won't be 'exact'. If you apply this on big medical ontologies, you're really in trouble, as they may have hundreds of thousands of concepts. So this is the ontology part. Next you have the actual semantic annotation part of the doc
            • by qpqp ( 1969898 )

              There will always be some outliers/exceptions, but it should be possible to sufficiently specifically define the rules and vocabulary of a given system, possibly by breaking it further down into facets/perspectives and then mapping the relations and constraints.
              So then you could have many ontologies, which will gradually converge over time. I'm talking long-term, of course. The annotation part could also require consensus, or vetting, by multiple recognized entities. All in all, the result would still be m

      • I am basically in agreement with rockmuelle. But to put (what I think is) his argument slightly differently, there is no such thing as an exact model, because the categories that you would want to mark in a model are inherently fuzzy. Library catalogers knew this decades (a century?) ago; they were trying to create a model, embodied in their card catalogs, of the information in books. But the inter-cataloger agreement was (from my observations) far from exact. A century later, and it's no different--and

        • by qpqp ( 1969898 )

          And if I've misrepresented rockmuelle, or misunderstood your question, qpqp, it's because I don't have an exact model of what you're saying.

          Come now, don't blame everything on me!

          What I meant by exact model is of course a predictable, and in a sense deterministic process; inasmuch as that is possible for the given case.
          Even with machine learning you create a representation of the surveyed system, but this model will (currently, and in most cases) always be an approximation.
          By mapping concepts, their (often ambiguous) meanings, usage scenarios and other relations from different areas to each other, supported by these approximations, it should

          • Well, Machine Learning doesn't exclude the use of Semantic Tools like Ontologies. You can still use them to gazeteer your ML indexing process, inference over the Ontology hierarchy etc... Both aren't really mutually exclusive. However, I do think the idea of everyone annotating their webpages semantically is never going to take off. The closest thing we have successfully achieved on the interwebz in that sense is WikiPedia.
  • I remember a few years back attending a conference presentation from some university types trying to convince my company the future of the web was semantic and RDF. I found it hard to take seriously because a) RDF really sucks to read or write, b) it's a pain in the ass to imbue content with semantic information, c) it's largely irrelevant since web engines do a better job anyway.

    If someone produce an uber simple semantic language - just plain text - that could be tossed into a page or link and utilised w

    • by mugnyte ( 203225 )

      Better yet, if a semantic derivative of any web page is built by these powerful web crawlers, building a channel for pushing a link to it back the original web site would mean each crawler wouldn't need to start from scratch. Instead they could annotate and extend the semantic information, serve it from multiple locations, while the original site stayed larger out of the process, save for serving the link(s) or be amenable to a filtering proxy that decorates pages with the links.

      Reduced down, there would b

  • 1) Computer software can not create clever hotlinks, it takes a very clever human to do it (not just a good writer). This is expensive to pay someone to do, but a computer CAN put a picture of side-boob and put a clickbait headline on anything. Guess what we end up having...

    2) Hotlinks for things you don't want to read about are annoying and make it harder to read.

    3) People and computers can however, easily link dictionary definitions, which a) the intended target of an article find extremely annoying (s

    • "allow non-specialists to read specialized works (such as scientific papers and legal documents). But the specialist/intended target are the major market so this is rare." Being part of that specialist target myself, I'm afraid you're right. Google has a special search database for people like me, scholar.google.com; but they're constantly making it harder to find. It used to appear as a link at the top of a google search page, then it was relegated to a drop-down, now it's not even there any more. Gues

  • Spam, SEO, etc. People lie in meta data. Semantic publishing was clearly doomed when the meta keywords tag turned into a big spam pit.

  • by Altrag ( 195300 ) on Tuesday March 24, 2015 @02:15PM (#49330011)

    The trouble is that this is both boring (for a person) and hard (for a computer.)

    So nobody wants to do it manually, and while everybody's got an algorithm to mark up text, they're all terrible and prone to being gamed by unscrupulous advertisers.

    How many websites have you gone to and seen some random word in the middle of the text that's bolded, double-underlined, larger font and a completely different color to really draw your eye to it (and away from what you're actually there to read.. ie: be as annoying as fucking possible) and then you hover over it and discover its a Wikipedia link to a house [wikipedia.org] or something equally as pointless?

    This has been the problem with "the semantic X" ever since link farms were invented. They usually don't provide a whole lot of additional information (if any) and they distract from what you're trying to see.

    If you really want a semantic experience, go to basically any popular wiki. They're explicitly curated and therefore the links you find are (usually) actually both informative and relevant. Of course they do this by going the boring (manual) route and compensating for it by having a million people doing the job instead of just a handful.

    Go back and read that "mundane" Wikipedia article about the house and, if you have even the slightest amount of curiosity about anything, can probably spend several hours link chaining.. there's links to construction, history, archaeology, anthropology, etc -- and they're all placed in such a way that they're relevant to the article and yet kept subtle enough that you can read over the ones you aren't interested in without a significant drain on attention.

  • Because doing it right is not-automatable and therefore expensive. Really, really expensive. I worked for a company that effectively did nothing but take FDA data from package inserts and recoded it into machine form using industry-standard codes, taxonomies, etc. Even with the slow pace of FDA approvals and insert updates, it took a team of about a dozen clinicians, another dozen bio-informaticists, another couple dozen (relatively specialized - do you know what an ALP test is and what it's used for?) code

  • for the cost of doing it right; and to whatever degree you backed off doing it right you'd end up missing the point.

    The big win of text based matching is that nobody has to prepare to be indexed in a search engine, search engine optimization notwithstanding. The big loss is that you get false matches due to polysemy (words that have more than one meaning) and false misses due to synonymous words whose equivalence the search engine doesn't know about.

    If you go to something like RDF in which concepts have

  • I'm beginning to see why Slashdot is famous for not reading articles. The articles are often poor. This article isn't the clickbait regularly posted by certain submitters. Instead it reads like a writing assignment.

    "The Dynamic Semantic Publishing (DSP) architecture of the BBC curates and publishes content (e.g. articles or images) based on embedded Linked Data identifiers, ontologies and associated inference." This is one of those sentences that makes sense only to those who already know everything about i
  • I think part of the problem is defining what the "Semantic Web" or "Semantic Publishing" is. For me, it is being able to navigate information based on semantic content. For example, applied to web search, I'd expect the search engine to be able to present me with the topics present in my search results and allow me to re-rank/refine those results based on the presence of topics. If I search for cancer, I would expect the search engine to identify the topics within my search results (lets say: diagnosis, tre

UNIX was half a billion (500000000) seconds old on Tue Nov 5 00:53:20 1985 GMT (measuring since the time(2) epoch). -- Andy Tannenbaum

Working...