Forgot your password?
typodupeerror
Google Open Source News

Free Software Activists Take On Google Search 254

Posted by Soulskill
from the just-hope-the-spammers-don't-download-it dept.
alphadogg writes "Free software activists have released a peer-to-peer search engine to take on Google, Yahoo, Bing and others. The free, distributed search engine, YaCy, takes a new approach to search. Rather than using a central server, its search results come from a network of independent 'peers,' users who have downloaded the YaCy software. The aim is that no single entity gets to decide what gets listed, or in which order results appear. 'Most of what we do on the Internet involves search. It's the vital link between us and the information we're looking for. For such an essential function, we cannot rely on a few large companies and compromise our privacy in the process,' said Michael Christen, YaCy's project leader."
This discussion has been archived. No new comments can be posted.

Free Software Activists Take On Google Search

Comments Filter:
  • Well (Score:4, Insightful)

    by Anonymous Coward on Monday November 28, 2011 @06:07PM (#38196012)

    Result: Search results will be controlled by botnets

    • Re:Well (Score:5, Insightful)

      by Intron (870560) on Monday November 28, 2011 @06:10PM (#38196042)

      Result: Search results will be controlled by botnets

      Yes. What's to stop me from downloading the code, modifying it to put my results on top and then joining my 1000 or so servers to the pool? You only need a small advantage to get big differences in results -- the difference between 10th and 11th place is page one vs obscurity.

      • Re:Well (Score:5, Informative)

        by HFShadow (530449) on Monday November 28, 2011 @06:17PM (#38196114)

        This has been solved by distributed computing a long time ago, you simply get more than on worker to check the results and if anything looks fishy chuck away everything from that worker.

        Not that this makes this any better of an idea.

        • by blair1q (305137)

          Or you could get each search server to solve a small np-hard problem in real-time before serving its results.

          You could call it "shitcoinfo" or "botsnot" or "captchayerknows" or "altacocker" or something.

          • by Issarlk (1429361)
            How is it a problem for a spammer to use his (stolen) node to solve problems? He's not paying the electricity bill.
        • by Urza9814 (883915)

          ...And if 10% of your workers are all part of the same botnet deliberately trying to skew the results, then there's about a 10% chance that the person re-checking the results will be giving you the same "error".

      • Re:Well (Score:5, Insightful)

        by Hazel Bergeron (2015538) on Monday November 28, 2011 @06:18PM (#38196116) Journal

        The great thing about centralised search engines is that they're not gamed... oh wait...

        ...is that it isn't in the provider's interest to encourage spam domains full of adverts brokered by itself... oh wait...

        ...is that there's careful control over dissemination of information so privacy is not compromised... oh wait...

        A p2p search engine will have different problems. But in the limit perhaps it'll be like a load of Google or whatever servers sitting around the Internet instead of in one or two datacentres.

        • Re: (Score:3, Interesting)

          by Anonymous Coward

          At least it actually is in the interest of search providers like Google, Yahoo and Microsoft to produce useful results in order to achieve / maintain a large userbase.
          Not so much in the interest of somebody who simply sees a distributed search engine as his chance to drive fews to his blog / ad collection / malware site.

        • Re:Well (Score:5, Insightful)

          by blackraven14250 (902843) on Monday November 28, 2011 @07:18PM (#38196762)
          If it were in Google's interest to bump spam domains to the top, it wouldn't be the useful search engine with leading market share that it is today, as it would have already bumped said results.
        • But in the limit perhaps it'll be like a load of Google or whatever servers sitting around the Internet instead of in one or two datacentres.

          But Google has a lot of servers around the internet, not just in one or two datacenters, so basically your pie-in-the-sky best-case scenario for this alternative is that it might, if everything goes well, end up being just like Google.

          Which is great, but if I want something just like Google, I can, you know, just use Google.

          • Assuming you regard Google as the best possible search engine with no room for improvement.

            As was made clear at the end of the 19th century, anything that could possibly be invented already has been, so we don't need to bother trying any more.

    • by xTantrum (919048) on Monday November 28, 2011 @06:27PM (#38196216)
      ...and start coding my ideas. First itunes, then fb and now p2p search. Just goes to show ideas are a dime a dozen its just who implements it first. Can't wait to see how this turns out though. P2P is really how the internet should be structured as much as possible.
    • by kheldan (1460303)
      Another result: People who don't have unlimited bandwidth per month will use all theirs up supporting other people's searches.
    • Re:Well (Score:5, Insightful)

      by alexgieg (948359) <alexgieg@gmail.com> on Monday November 28, 2011 @07:07PM (#38196640) Homepage

      This system probably solves spam the same way Freenet managed to eliminate it from its boards: by adopting a(n anonymous) Web Of Trust model. In practice, you'll only see results coming from those you trust directly or indirectly. The fake results will be there, but buried.

      And even if they currently don't do that due to the smallness of the network, at some point they will. It's unavoidable.

      Although the problem then might become you only seeing what you like because your friends/trusted nodes all think more or less the same, hence basically shielding yourself from different views. But then, mainstream search engines already do something like this, so it won't be that different from what we already have.

      • Re:Well (Score:5, Insightful)

        by M. Baranczak (726671) on Monday November 28, 2011 @08:19PM (#38197298)

        Freenet solves the spam problem by ensuring that nobody actually uses Freenet. I think this project will apply the same solution.

        This scheme has pretty slim chances of success. Which doesn't necessarily mean it shouldn't be attempted.

    • Result: Search results will be controlled by botnets

      Nope, search results will be controlled by geeks. Result? 15K hits on Pikachu cosplay girl searches, zero on Project Runway.

  • Question (Score:4, Insightful)

    by StripedCow (776465) on Monday November 28, 2011 @06:10PM (#38196044)

    Will one client be able to view the queries of its peers?

    If yes, how is that an improvement?
    If no, how does it work?

    • Re:Question (Score:4, Interesting)

      by CanHasDIY (1672858) on Monday November 28, 2011 @06:14PM (#38196084) Homepage Journal

      Will one client be able to view the queries of its peers?

      If yes, how is that an improvement? If no, how does it work?

      From TFA: [yacy.net]

      It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.

      However, that seems to be all the information there is on the process... doesn't quite assuage the ol' paranoia circuits, does it?

      • From TFA: [yacy.net]

        It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.

        Providing noone modifies the open source code to log user search requests and censor queries

        • by ackthpt (218170)

          From TFA: [yacy.net]

          It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.

          Providing noone modifies the open source code to log user search requests and censor queries

          I'd be more concerned with some people stacking search results with links to spoof sites or malware servers.

          Is this proofed against someone reverse engineering it and crap-flooding the results?

      • From TFA: [yacy.net]

        It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.

        However, that seems to be all the information there is on the process... doesn't quite assuage the ol' paranoia circuits, does it?

        The network stores everything.

      • From TFA: [yacy.net]

        It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.

        And we all know that noone will ever modify their portion of the decentralized system to do any of these awful things....

        • by adolf (21054)

          Because, you know, I'm sure that YaCy is totally and absolutely 100% efficient about things. Every peer obviously has a list of URLs that it is responsible for, and every peer is capable of censoring anything on its list, and there will never be more than 1 copy of any shred of data.[/sarcasm]

          Except it doesn't really work that way, as since nobody is in charge, nobody can dictate who will index what. You can censor the data on your own node and you'll certainly be successful (it's your computer, after all

  • Great (Score:5, Funny)

    by Moheeheeko (1682914) on Monday November 28, 2011 @06:12PM (#38196062)
    Only used by neckbeards = all search results will be tentacle hentai and open source software websites.

    Awesome...

  • by alphatel (1450715) * on Monday November 28, 2011 @06:12PM (#38196064)
    It's hard to argue with "free" and "freedom", so I give it the thumbs up. But in this day and age it feels like going from a Ducati Panigale to a 1950's Triumph Bonneville.
    • It's hard to argue with "free" and "freedom", so I give it the thumbs up. But in this day and age it feels like going from a Ducati Panigale to a 1950's Triumph Bonneville.

      Lots of people said basically the same thing back when the linux kernel was still numbered 0.9x.

    • by kiwimate (458274)

      It's hard to argue with "free" and "freedom"

      I may differ from many readers in this opinion, but I happen to think it's very easy to argue with "free" and "freedom" if by doggedly sticking with dogmatic principles you end up taking giant leaps backward.

  • Ummm (Score:5, Insightful)

    by Webs 101 (798265) on Monday November 28, 2011 @06:13PM (#38196070) Homepage
    Yahoo's search engine IS Bing.
  • by DMFNR (1986182) on Monday November 28, 2011 @06:19PM (#38196126)
    Of course they decide to give it a name that doesn't even look like a word. I can't think of a singled popular search engine that doesn't have a catchy name. How do these free software developers expect the word to get around about their software when nobody can pronounce it and probably won't even remember what it was called? Especially a peer to peer search engine which I would imagine depends even more on a decent amount of people actually using it than a regular search engine.
    • by nurb432 (527695)

      Because most names are taken and they don't have a legal team to do research.

      • How about SLING! It is better than BING.
      • Why would you need a legal team? USPTO offers an online search for trademarks.

        • by adolf (21054)

          Trademarks don't need to be registered with the USPTO in order to be enforceable and actionable.

          A mark need only be used in trade and -- zing! -- it's a trademark. Registering just makes it easier if/when things get ugly enough that a court gets involved, and makes it easier for others to avoid infringement in the first place.

          This aspect of a trademark is a lot closer to copyright than it is to patents. Unlike patents, neither copyrights nor trademarks must be registered with a central body, although both

    • by markdavis (642305) on Monday November 28, 2011 @06:26PM (#38196204)

      +1 Mod parent up.

      Seems the geeky crowd still doesn't understand that marketing DOES play a critical role in the popularity of any type of project. "YaCy" really does suck- it is impossible to say, isn't a word, introduces strange capitalization, and it is not even easy to remember.

      • by adolf (21054) <flodadolf@gmail.com> on Monday November 28, 2011 @06:37PM (#38196318) Journal

        Seems the geeky crowd still doesn't understand that marketing DOES play a critical role in the popularity of any type of project. "YaCy" really does suck- it is impossible to say, isn't a word, introduces strange capitalization, and it is not even easy to remember.

        So fork it, changing only the name, and release it yourself under a more marketable moniker. The technical aspects of doing this are easy.

        And if you think selecting a catchy, unencumbered name is also easy, then you really shouldn't have any problem pulling it off.

        It's all GPL, so you can pretty much do what you want with it. If you really want to be in charge of marketing and distribution for a GPL project, the only thing stopping you is you.

      • by Meski (774546)
        Yay-cee was how I was saying it to myself. Rhymes with racy. Just ignore the mid-word capatilisation, it'll go away when the project is properly capitalised. :^^)
      • by bryan1945 (301828)

        I'd go with "Yucky." Self deferential. (What, like Yahoo! or Bing are awesome?) Or maybe Yoggyso, if they can get away with it. (What, like Google makes sense?)

    • Yahtzee (Score:5, Funny)

      by pavon (30274) on Monday November 28, 2011 @06:31PM (#38196248)

      I assumed it was intended to be pronounced like Yahtzee, which is both memorable and quite descriptive of the quality of results you can expect.

    • by raftpeople (844215) on Monday November 28, 2011 @06:55PM (#38196536)
      Other names they considered that were equally bad:
      1) FreEble
      2) !!_//[%%%
      3) Bing
      3) xkCQQT
    • by Anonymous Coward on Monday November 28, 2011 @08:46PM (#38197582)

      GIMP is another example. Great free graphics program, terrible name.

  • by 91degrees (207121) on Monday November 28, 2011 @06:20PM (#38196134) Journal
    While these things can succeed on the backs of some philanthropic individuals, it's just human nature that to get a decent community, you need to benefit the supporters in some way.

    Doesn't need to be any formal system. Free software, for example, seems to be based more on the honour system than anything else, but people do develop free software because there's something in it for them - software tailored to their needs. What is the incentive for being a search peer?
    • by TheRaven64 (641858) on Monday November 28, 2011 @07:20PM (#38196792) Journal
      I sketched out a few designs for a decentralised search engine (but didn't implement them, so kudos to these guys for actually bothering), and one of the ideas I had was to allow nodes to return sponsored links (e.g. Amazon referrals). The client would display these for the top few nodes and track the reputations of individual peers. The more users who liked the search results that you returned, the more of them would see your sponsored links. If you came up with a ranking algorithm that did a better job than existing ones, then you'd get a bigger slice of the advertising space. It's essentially the same business model as Google, just on a smaller scale.
  • Java... (Score:5, Interesting)

    by HBI (604924) <kparadine&gmail,com> on Monday November 28, 2011 @06:22PM (#38196160) Homepage Journal

    I was going to load up a peer but there's no way i'm running Java. I've almost completely excised it from all of my computers, no going back.

    • Re:Java... (Score:5, Interesting)

      by vadim_t (324782) on Monday November 28, 2011 @06:56PM (#38196544) Homepage

      Ugh, yeah. Another cool project is going to be held back by Java.

      Way back, this happened with Freenet. I thought it was a cool idea, but the darn thing wasn't happy with all the 256MB I could give it. Even now, Java is still a considerable load on laptops with 4GB RAM.

      I think that for best adoption they should have concentrated on making it small and light. If it can be run in say, 64MB RAM then you can install it anywhere. And it's quite likely that a good part of why Freenet was so horrible when I tried it, is because it made a lot of the machines it ran on swap like crazy.

      • Re:Java... (Score:4, Interesting)

        by Lazy Jones (8403) on Monday November 28, 2011 @09:31PM (#38197972) Homepage Journal

        cool project is going to be held back by Java.

        You know, I'll take "cool projects held back by Java" any time over equally cool projects written in C that need to be patched 5 times a year for the next 10 years because of sloppy programming leading to arbitrary remote code execution vulnerabilities. Please, just let software written in C die with dignity, the language had its decades of glory before everything was accessible over the 'net ...

    • Re: (Score:3, Funny)

      That's OK, please join me in my efforts in porting this over to Flash.
    • by devent (1627873)

      That is really stupid of you.
      Let me see how the facts are:
      Firefox with a few addons and 9 tabs: 180MB RAM. Eclipse with a lot of projects open: 200MB RAM.

      At least with a Java Application I can just download it and run it on my Linux and Windows computers. It would be really nice if more applications would leave the Windows-monoculture, like from companies that owe their very existence to open source systems like Google (Google Sketchup is still not available for Linux and probably never will be).

  • by markdavis (642305) on Monday November 28, 2011 @06:22PM (#38196162)

    This whole concept seems quite fascinating/interesting. Ironically, two questions came to my mind immediately:

    1) How much bandwidth does this take?
    2) How much disk space does this take?

    Neither question is answered on their FAQ ( http://www.yacy-websuche.de/wiki/index.php/En:FAQ [yacy-websuche.de] ), although they addressed the disk space issue thus: "Can I limit the size of the indexes on my hard-drive? For the moment no. Automatically limiting that size would mean having to delete stored indexes, which is not suitable. "

    Yikes! I am not sure how many people will want to run a local YaCy client when there is no control over how much disk space it uses (or, apparently, bandwidth). It still has a lot of promise, though.

    • Disk quotas or separate file systems are a simple solution to this problem. Just takes a little more work than a line in a config file.

      • by markdavis (642305)

        I wonder what happens when the thing runs out of space? If you can't set how much it uses, then how are we to know that it handles running out of space "gracefully"?

        Also, you (presumably) and I are Linux users- so quotas, separate file systems, loopbacks, space checking, or whatever, are not rocket science. But that could be a lot more challenging for the people doing this on MS-Windows. Some users might be thinking they are "helping the world" by installing that app, then months later not understand why

    • by nurb432 (527695) on Monday November 28, 2011 @06:25PM (#38196194) Homepage Journal

      Run it in a VM. limit its disk space and networking in one fell swoop.

    • 3) What is to stop a malicious node in the network from getting my search history?

      All of their claims about privacy seem to be implementation details of their code (which, being open source, is trivial to modify). They don't tell me how they designed the protocol to be avoid someone modifying the code to record searches or even to inject phishing sites into the top lists.

    • by KlomDark (6370)

      Yikes! I am not sure how many people will want to run a local YaCy client when there is no control over how much disk space it uses

      Hasn't stopped Microsoft - Have you SEEN the size of C:\Windows\winsxs?? (AKA Window 7's fatal flaw) And they have no plan to do anything about it, and there's nothing you can do about it. You can't move it, you can't delete obsolete files from it, it just slowly fills any partition you put it on. (Filling your boot drive is "By Design" according to Microsoft.)

    • by wvmarle (1070040)

      The index you can keep on your own hard disk and that of your direct peers is always going to be tiny compared to the indexes Google and Bing et. al. have. That in itself is an issue. Add to that the problem of finding and ranking results that come from a highly fragmented database and doing so at a good speed and I don't see it take off any time soon.

  • by cshark (673578) on Monday November 28, 2011 @06:33PM (#38196286)
    Haven't we learned from gnutella, and the others, that this kind of thing just doesn't work? That it'll get overwhelmed by spam, hackers, you name it? I'll try it because I always try new p2p type stuff. But I'm really hoping they have a good security team.
    • It doesn't matter what kind of security team they have.
       
      Even if you LOVE it, the name is so bad you can't even tell anybody about it. Hell I already forget how to spell it and I just saw it 2 seconds ago.

    • by wvmarle (1070040)

      And it's likely going to be as slow, as so many servers on so many different (and often relatively slow) connections have to be queried. Sorry but I don't like waiting for search results for more than a second or so, when Google provides them almost instantly.

      Google sets the standard, that's what you have to beat. So yes the bar to get into the search engine market is really high, and not many players will be able to give it a go with much chance for success.

  • by vadim_t (324782) on Monday November 28, 2011 @06:49PM (#38196448) Homepage

    So, I tried the portal and searched for slashdot.

    1. geek.net
    2. slashdot tags
    3. ostg.com
    4. slashdot.org/favicon.ico ...
    main page nowhere to be seen.

    Second try, antirely different results:
    1. microsoft.slashdot.org
    2. slashdot.org ...

    Seems very erratic so far. Then maybe it needs some time to stabilize a bit.

    • by Teancum (67324)

      I installed the software and went into the local administration, with a few clicks (it isn't quite as intuitive as I'd like it to be) I was able to set up the web crawling functions to bring in my favorite site. There are several limits that can be put onto that crawl, but the main point is that you can add sites to the search, and they show up when other peers are performing queries.

      It will be interesting to see how this software performs. It seems about as good as Lycos was back in the early 1990's, so

  • What about people who want to join but don't run their own compilers? You know, those people exist.

    • by Wolfier (94144)

      The platform-specific stuffs are there, but where's the .Jar?

    • by Bucky24 (1943328)

      You know, those people exist.

      I think the idea is that those kind of people wouldn't be interested in this kind of a project.

  • "As is often the case in the early stages of a new technology, results are better on some topics than on others -- mainly computer-related issues."

    Uh, no. Google became a search juggernaut because it provided better results. Otherwise there would be no motivation to switch from Yahoo. And, since this solves a problem most people don't care about, it's doomed.

  • Oh my, Google is so dead. DEAD!

    It'll be just like when Diaspora totally stomped Facebook!

    • It'll be just like when Diaspora totally stomped Facebook!

      Or when GNU/Hurd started cutting into Linux's...

      Sorry, I tried, I really did - but I can't keep from laughing.

  • by hubertf (124995) on Monday November 28, 2011 @07:48PM (#38197016) Homepage Journal

    ... by the Harvest Project, which installed several local data collectors, and which then added a search engine over all those collectors. The cache system added in between is still known today: Squid.

    http://en.wikipedia.org/wiki/Harvest_project [wikipedia.org]

      - Hubert

  • I installed the server on my machine and gave it a shot. I made very classical request such as the name of a couple universities, a couple famous website and made a few regular queries like "chocolate mousse recipe". None of the request actually pointed to something even remotely close to what I was looking for. I thought it might need some bootup time, so I tried again an hour later. It was not much better. Just much slower. I'll try again in a few day. But that does not look good...

    On top of that it looks

  • I tried Yacy. I've tried it a few times times since I first tried it years ago to see if it had improved or not. It has not. The main problems with it are:
    • Yacy demands a whole lot of resources. You need a powerful dedicated server just to run it.
    • It likes to crawl sites at a very rapid rate, webmasters all over the world should be happy that it has not taken off. How about waiting a few seconds between page fetches from the same server, eh? Run it and you risk people all over banning your IP. I tried to crawl my own sites with it - not a good idea. At least I could shut the thing down when I saw what it was doing..
    • It crashes, and it crashes a whole lot. Do a few searches and it will crash.
    • Do a search and Yacy will hog CPU time for quite a while. Do another search while it's eating resources and it crashes.
    • The search results are horrible. They are basically useless.
    • Yacy has absolutely no support for different languages. The whole Internet is not the same language, yet Yacy pretends it is. Just want search results in your own language? Not an option.

    I could go on, but you get the idea. I would really like to see a usable peer to peer search engine. The Internet needs it. Yacy is not it. The idea is good, the implementation can best be described as EPIC FAIL.

  • I met with the project leader in FOSSASIA last year, and he's a very nice guy. I like the YaCy project, and even installed it on my laptop. But yet, I don't understand how this can be made news on Slashdot. The project is literally YEARS old. It's major flaw? Not enough peers. Last time I tried, there was hundreds of them, when it would really work if there was hundreds of thousands. Also, search is quite slow compared to google.
  • by jlarocco (851450)

    I downloaded it and gave it a try, but I'm going to stick with DuckDuckGo [duckduckgo.com]. In my experience the results have been as good as or better than Google and if I don't find what I'm looking for, it also gives links to do the search in Google or Bing.

As the trials of life continue to take their toll, remember that there is always a future in Computer Maintenance. -- National Lampoon, "Deteriorata"

Working...