• HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • CONTACT US
Saturday, March 25, 2023
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • CONTACT US
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • CONTACT US
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News

Method enables machine learning from unwieldy data sets

Bioengineer by Bioengineer
December 16, 2016
in Science News
Reading Time: 3 mins read
0
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

When data sets get too big, sometimes the only way to do anything useful with them is to extract much smaller subsets and analyze those instead.

Those subsets have to preserve certain properties of the full sets, however, and one property that's useful in a wide range of applications is diversity. If, for instance, you're using your data to train a machine-learning system, you want to make sure that the subset you select represents the full range of cases that the system will have to confront.

Last week at the Conference on Neural Information Processing Systems, researchers from MIT's Computer Science and Artificial Intelligence Laboratory and its Laboratory for Information and Decision Systems presented a new algorithm that makes the selection of diverse subsets much more practical.

Whereas the running times of earlier subset-selection algorithms depended on the number of data points in the complete data set, the running time of the new algorithm depends on the number of data points in the subset. That means that if the goal is to winnow a data set with 1 million points down to one with 1,000, the new algorithm is 1 billion times faster than its predecessors.

"We want to pick sets that are diverse," says Stefanie Jegelka, the X-Window Consortium Career Development Assistant Professor in MIT's Department of Electrical Engineering and Computer Science and senior author on the new paper. "Why is this useful? One example is recommendation. If you recommend books or movies to someone, you maybe want to have a diverse set of items, rather than 10 little variations on the same thing. Or if you search for, say, the word 'Washington.' There's many different meanings that this word can have, and you maybe want to show a few different ones. Or if you have a large data set and you want to explore — say, a large collection of images or health records — and you want a brief synopsis of your data, you want something that is diverse, that captures all the directions of variation of the data.

"The other application where we actually use this thing is in large-scale learning. You have a large data set again, and you want to pick a small part of it from which you can learn very well."

Joining Jegelka on the paper are first author Chengtao Li, a graduate student in electrical engineering and computer science; and Suvrit Sra, a principal research scientist at MIT's Laboratory for Information and Decision Systems.

Thinking small

Traditionally, if you want to extract a diverse subset from a large data set, the first step is to create a similarity matrix — a huge table that maps every point in the data set against every other point. The intersection of the row representing one data item and the column representing another contains the points' similarity score on some standard measure.

There are several standard methods to extract diverse subsets, but they all involve operations performed on the matrix as a whole. With a data set with a million data points — and a million-by-million similarity matrix — this is prohibitively time consuming.

The MIT researchers' algorithm begins, instead, with a small subset of the data, chosen at random. Then it picks one point inside the subset and one point outside it and randomly selects one of three simple operations: swapping the points, adding the point outside the subset to the subset, or deleting the point inside the subset.

The probability with which the algorithm selects one of those operations depends on both the size of the full data set and the size of the subset, so it changes slightly with every addition or deletion. But the algorithm doesn't necessarily perform the operation it selects.

Again, the decision to perform the operation or not is probabilistic, but here the probability depends on the improvement in diversity that the operation affords. For additions and deletions, the decision also depends on the size of the subset relative to that of the original data set. That is, as the subset grows, it becomes harder to add new points unless they improve diversity dramatically.

This process repeats until the diversity of the subset reflects that of the full set. Since the diversity of the full set is never calculated, however, the question is how many repetitions are enough. The researchers' chief results are a way to answer that question and a proof that the answer will be reasonable.

###

Additional background

PAPER: Fast mixing Markov chains for strongly Rayleigh measures, DPPs, and constrained sampling

ARCHIVE: Making big data manageable

ARCHIVE: "Shrinking bull's-eye" algorithm speeds up complex modeling from days to hours

ARCHIVE: To handle big data, shrink it

ARCHIVE: Collecting just the right data

Media Contact

Abby Abazorius
[email protected]
617-253-2709
@MIT

http://web.mit.edu/newsoffice

############

Story Source: Materials provided by Scienmag

Share12Tweet8Share2ShareShareShare2

Related Posts

Transitions of low and high-entropy metal tellurides.

“Glassiness” and “blurriness” might explain the behavior of high-entropy superconductors

March 25, 2023
Assistant Professor Ren Wang

Illinois Tech Assistant Professor Ren Wang receives prestigious National Science Foundation Award

March 24, 2023

New type of entanglement lets scientists ‘see’ inside nuclei

March 24, 2023

NIH awards researchers $7.5 million to create data support center for opioid use disorder and pain management research

March 24, 2023
Please login to join discussion

POPULAR NEWS

  • ChatPandaGPT

    Insilico Medicine brings AI-powered “ChatPandaGPT” to its target discovery platform

    65 shares
    Share 26 Tweet 16
  • Northern and southern resident orcas hunt differently, which may help explain the decline of southern orcas

    44 shares
    Share 18 Tweet 11
  • Skipping breakfast may compromise the immune system

    43 shares
    Share 17 Tweet 11
  • Insular dwarfs and giants more likely to go extinct

    35 shares
    Share 14 Tweet 9

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Recent News

“Glassiness” and “blurriness” might explain the behavior of high-entropy superconductors

Illinois Tech Assistant Professor Ren Wang receives prestigious National Science Foundation Award

New type of entanglement lets scientists ‘see’ inside nuclei

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 48 other subscribers
  • Contact Us

Bioengineer.org © Copyright 2023 All Rights Reserved.

No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

Bioengineer.org © Copyright 2023 All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In