Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Software AI The Internet Science Technology

Researchers Report Breakthrough In 'Distributed Deep Learning' (techxplore.com) 16

Using a divide-and-conquer approach that leverages the power of compressed sensing, computer scientists from Rice University and Amazon have shown they can slash the amount of time and computational resources it takes to train computers for product search and similar "extreme classification problems" like speech translation and answering general questions. Tech Xplore reports: In tests on an Amazon search dataset that included some 70 million queries and more than 49 million products, Shrivastava, Medini and colleagues showed their approach of using "merged-average classifiers via hashing," (MACH) required a fraction of the training resources of some state-of-the-art commercial systems. "Our training times are about 7-10 times faster, and our memory footprints are 2-4 times smaller than the best baseline performances of previously reported large-scale, distributed deep-learning systems," said Shrivastava, an assistant professor of computer science at Rice. Medini, a Ph.D. student at Rice, said product search is challenging, in part, because of the sheer number of products. "There are about 1 million English words, for example, but there are easily more than 100 million products online."

MACH takes a very different approach [than current training algorithms]. Shrivastava describes it with a thought experiment randomly dividing the 100 million products into three classes, which take the form of buckets. "I'm mixing, let's say, iPhones with chargers and T-shirts all in the same bucket," he said. "It's a drastic reduction from 100 million to three." In the thought experiment, the 100 million products are randomly sorted into three buckets in two different worlds, which means that products can wind up in different buckets in each world. A classifier is trained to assign searches to the buckets rather than the products inside them, meaning the classifier only needs to map a search to one of three classes of product. [...] In their experiments with Amazon's training database, Shrivastava, Medini and colleagues randomly divided the 49 million products into 10,000 classes, or buckets, and repeated the process 32 times. That reduced the number of parameters in the model from around 100 billion to 6.4 billion. And training the model took less time and less memory than some of the best reported training times on models with comparable parameters, including Google's Sparsely-Gated Mixture-of-Experts (MoE) model, Medini said. He said MACH's most significant feature is that it requires no communication between parallel processors. In the thought experiment, that is what's represented by the separate, independent worlds.
The research will be presented this week at the 2019 Conference on Neural Information Processing Systems (NeurIPS 2019) in Vancouver.
This discussion has been archived. No new comments can be posted.

Researchers Report Breakthrough In 'Distributed Deep Learning'

Comments Filter:
  • Not a new battery breakthrough? Is it over now?
  • MACH! (Score:4, Funny)

    by Hrrrg ( 565259 ) on Tuesday December 10, 2019 @09:55PM (#59506586)

    That's awesome. When they come out with the improved versions, they can say it's learning at MACH 2, MACH 3...

  • A good fit. (Score:4, Insightful)

    by Ostracus ( 1354233 ) on Tuesday December 10, 2019 @11:18PM (#59506718) Journal

    "Medini said. He said MACH's most significant feature is that it requires no communication between parallel processors."

    Could be a good fit for GPUs.

  • Doesn't this split lose accuracy (if that is the right word here) for speed?

    Unless I'm missing something (and I might well be missing everything) - the non-cross-communication is both a blessing and a curse, I would imagine.

    It seems like especially in items that don't have a lot of numbers in all of the buckets - but maybe that dataset (being actual merchandise, most of which are similar / have similar items that are all fakes on Amazon's marketplace) doesn't have many problems like that. /end of assumption

    • Re:Accuracy? (Score:4, Insightful)

      by werepants ( 1912634 ) on Wednesday December 11, 2019 @11:25AM (#59508176)

      No, it's not giving up accuracy, I don't think, at least not an appreciable amount.

      The main goal is to match a search query to an item, or a set of items. If you do this directly, memory requirements get crazy very quickly. If you find a hashing algorithm such that the distance between hashes approximates the distance between unhashed items, you can train your classifier on things hashed into bins and get major memory savings, with no appreciable loss to accuracy.

      I could be very wrong about how this is working, but it sounds quite similar to MinHash, which can be used to accelerate cluster analysis and association rule mining.

    • by ceoyoyo ( 59147 )

      It doesn't require cross communication for *training*. I strongly suspect it does require it for inference. That's usually okay though, because the training is the thing that takes all the time.

      My impression from the summary (I haven't read the paper yet) is that instead of trying to build a model that can connect an input with a specific product, you build one that can connect the input with one of a smaller number of classes of products. The classes are basically random. You do that for a bunch of differ

  • quick read -- this is not really an all-reduce communication / distributed deep learning innovation. its a clever way to do classify when you have tons of classes (good innovation)
  • by Anonymous Coward
    I run a cab company. I'm going rename it to "Distributed Deep Learning Cab Co.", see the value of the company skyrocket and sell it to gullible investors then retire with a sack of coke and a boatload of hookers. Happy days.
  • Does this take us one step closer to Skynet?
  • In this case combining compressive sensing with deep learning makes the algorithm able to make similarly performant searches using less information, in essence compressing the search space required to get a good result. It's beautiful! The idea of compressed sensing is such a triumph of math it makes me geek out whenever it finds an application. The idea is that if you can take an image, lets say 1 megapixel, with 1 byte per pixel, so 1 MB, and compress it to 100 KB, lets take a 100 kilopixel image, and

news: gotcha

Working...