'Google Search On Steroids' Brings Dark Web To Light 69
snydeq writes The government agency that brought us the Internet has now developed a powerful new search engine that is shedding light on the contents of the so-called deep Web. DARPA began work on the Memex Deep Web Search Engine a year ago, and this week unveiled its tools to Scientific American and 60 Minutes. "Memex, which is being developed by 17 different contractor teams, aims to build a better map of Internet content and uncover patterns in online data that could help law enforcement officers and others. While early trials have focused on mapping the movements of human traffickers, the technology could one day be applied to investigative efforts such as counterterrorism, missing persons, disease response, and disaster relief."
WebQL... (Score:1)
Re: (Score:2)
17 different contractor teams (Score:4, Insightful)
"... being developed by 17 different contractor teams..."
There's a recipe for failure if even I saw one!
Re: (Score:2)
"... being developed by 17 different contractor teams..."
There's a recipe for failure if even I saw one!
That's exactly what I came here to say. My job is now done (by you).
Re: (Score:1)
That's exactly what I came here to say. My job is now done (by you).
Outsourcing strikes again!
Re:17 different contractor teams (Score:5, Funny)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
"... being developed by 17 different contractor teams..." There's a recipe for failure if even I saw one!
Is that so bad for a search engine? If I had a bunch of people and needed to search a downtown neighborhood, I would break into teams and search different buildings all at the same time. Get the results of the search and organize according to relevence. Searching different networks is not much different. You could have a team of Tor specialists working on Tor, a team working on freenet, etc. Plus a team working on a common framework including a plugin or API system.
Re:17 different contractor teams (Score:4, Informative)
Magical Program that can cure cancer & warts! (Score:2)
Sounds like this search engine runs on magic dust and could not only find the cure for cancer, but finally get rid of those facial warts that always come back! If you give us XXXX Billions of Dollars, it will keep you safe in your sleep from the bogey man, terrorists and spam!
Ya, sounds like bullshit to me.
Re: (Score:1)
So does a shotgun. That's why you need to be specific about your use cases.
search on steroids (Score:4, Funny)
Whike I am sure that steroid abuse is assisted by :the dark web' , there are more dangerous drugs for sale there, not to mention actual violent crime they should crack down on
Re: (Score:3)
Re: (Score:2)
Like murder for hire? And child exploitation rings? You do know entities involved in those things use the Dark Web to contact each other and organize, right? I mean, Dread Pirate Roberts used it to organize his hits [slashdot.org].
go Xerox yourself (Score:1)
Absolutely nothing about Google. But it's a search! Search is Google! That's why it's Google! Duh huh huh huhuh huhuhuh!
The upshot is (Score:5, Interesting)
before, criminals could keep from being caught by having a robots.txt file.
The sad thing is this isn't a joke
Exactly. That was my takeaway as well. (Score:5, Funny)
Exactly. That was my take-away as well.
(1) Get a huge government contract
(2) Ignore robots.txt
(3) Profit!
Re: (Score:2)
http://www.robotstxt.org/robot... [robotstxt.org]
There are two important considerations when using /robots.txt:
robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention. /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use. /robots.txt to hide information.
the
So don't try to use
Re: (Score:2)
It is for your friendly neighborhood fusion center. [latimes.com] Just ask Ross Ulbricht. [schneier.com]
"...could one day be applied to...." (Score:1)
Digital Scarecrow (Score:5, Insightful)
I thought that this sounded ominous for a minute. Then I remembered that government projects like this are designed to have a chilling effect on activity that they cannot monitor, understand or enforce by their very existence and not by being actual potent tools to combat it (i.e. paper tiger). More likely this thing will become a money pit that contractors can use as a sandbox project to allow their employees to play in for implementation of IP that may be works-in-progress for future projects that may be useful, but are just lofty concepts that have no basis in reality. 17 contracting teams is about 15-16 too many hands in the cookie jar for this to be anything more than a Men In Black-wannabe training camp or a glorified propaganda project, most likely both.
Re: (Score:2)
This.
And ...
How many tools does the government have that kids circumvent every day?
This sounds a lot like the war on spam.
Re: (Score:1)
How many tools does the government have that kids circumvent every day?
Having lived through being a curious kid: all of them.
Google search on steroids (Score:1)
Argghhh!! Show me KITTENS!!!!111!!
"Dark Web" (Score:2)
Which is it, Deep Web or Darknet?
Excellent reporting there.
Re: (Score:1)
Whichever one will get us more funding, obviously.
Re: (Score:2)
Which is it, Deep Web or Darknet?
Excellent reporting there.
TFA explains that it's both:
Memex searches content typically ignored by commercial search engines, such as unstructured data, unlinked content, temporary pages that are removed before commercial search engines can crawl them, and chat forums[...]
Memex also automates the mechanism of crawling the dark, or anonymous, Web where criminals conduct business. These hidden services pages, accessible only through the TOR anonymizing browser, typically operate under the radar of law enforcement selling illicit drugs and other contraband.
"Deep" Web or "Dark" Web? (Score:4, Informative)
Point is that the headline says "Dark Web" while the excerpt says "Deep Web", but then immediately starts talking about law enforcement, which means Dark Web.
"Deep Web" and "Dark Web" are both useful concepts. We should avoid conflating them.
Re: (Score:3)
You are right, the "deep web" is not the same thing as the/a "darknet" or "dark web". They don't do a good job of keeping that clear in the headline. From TFA's own citation on wikipedia:
"The deep web should not be confused with the dark Internet, computers that can no longer be reached via the Internet.
However the article does assert that this Memex project is indexing both unpublicized content on the general internet (the deep part) plus anonymized content on Tor and other privacy services (the dark part).
Re: (Score:1)
"Deep Web" and "Dark Web" are both useful concepts. We should avoid conflating them.
They also don't exist. People should stop believing they do. It all travels over the same wire. Each new 'encryption' protocol works exactly once, at best. The most functional you might find is Craigslist
Disappointed (Score:2)
No link to the search page... In fact it seems that there isn't a search page at all.
The only thing Memex has in common with Google is the tracking.
Please don't abuse disaster relief/response! (Score:1)
Look, we've been trained to treat anything you do for counter/anti/whatever-terrorism as an intrusion to our privacy and as a general way to screw us over and make money for your buddies. We've accepted that. And we've learned that it's bad for us and that we can't do jack about it, but at least we can ignore it.
Now you start lumping disaster relief and disaster response into it. And that's where I draw the line. We need that, ok? That's something important, not like your war on pedophiles, war on terrorism
"Memex" already has a famous meaning... (Score:2)
What does this have to do with Google? (Score:2)
Other than that this is also a search engine, that is.
So why do we need a root zone? (Score:2)
If they're this clever at finding things then let them do TLD discovery and we can dispense with that trillion dollar ICANN nonsense that doesn't do anything.
mod_doorknock? (Score:2)
Some people have been using port knocking to allow remote admin yet cut down on the ssh bots trying to login.
It would be trivial to do the same in a cgi where if your ip address is 1.2.232.121 you have to hit /target/232 then /target/121 to get the real data.