• HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
Wednesday, August 20, 2025
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News Chemistry

A statistical solution to processing very large datasets efficiently with memory limit

Bioengineer by Bioengineer
April 1, 2021
in Chemistry
Reading Time: 4 mins read
0
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

Scientists develop a statistical randomness-based framework to optimally classify extremely large datasets without needing large memories

IMAGE

Credit: Ryo Maezono from JAIST.

Ishikawa, Japan – Any high-performance computing should be able to handle a vast amount of data in a short amount of time — an important aspect on which entire fields (data science, Big Data) are based. Usually, the first step to managing a large amount of data is to either classify it based on well-defined attributes or–as is typical in machine learning–“cluster” them into groups such that data points in the same group are more similar to one another than to those in another group. However, for an extremely large dataset, which can have trillions of sample points, it is tedious to even group data points into a single cluster without huge memory requirements.

“The problem can be formulated as follows: Suppose we have a clustering tool that can process up to lmax samples. The tool classifies l (input) samples into M(l) groups (as output) based on some attributes. Let the actual number of samples be L and G = M(L) be the total number of attributes we want to find. The problem is that if L is much larger than lmax, we cannot determine G owing to limitations in memory capacity,” explains Professor Ryo Maezono from the Japan Advanced Institute of Science and Technology (JAIST), who specializes in computational condensed matter theory.

Interestingly enough, very large sample sizes are common in materials science, where calculations involving atomic substitutions in a crystal structure often involve possibilities ranging in trillions! However, a mathematical theorem called “Polya’s theorem,” which utilizes the symmetry of the crystal, often simplifies the calculations to a great extent. Unfortunately, Polya’s theorem only works for problems with symmetry and is, therefore, of limited scope.

In a recent study published in Advanced Theory and Simulations, a team of scientists led by Prof. Maezono and his colleague, Keishu Utimula, PhD in material science from JAIST (In 2020) and first author of the study, proposed an approach based on statistical randomness to identify G for sample sizes much larger (~ trillion) than lmax. The idea, essentially, is to pick a sample of size l that is much smaller than L, identify M(l) using machine learning “clustering,” and repeat the process by varying l. As l increases, the estimated M(l) converges to M(L) or G, provided G is considerably smaller than lmax (which is almost always satisfied). However, this is still a computationally expensive strategy, because it is tricky to know exactly when convergence has been achieved.

To address this issue, the scientists implemented another ingenious strategy: they made use of the “variance”, or the degree of spread, in M(l). From simple mathematical reasoning, they showed that the variance of M(l), or V[M(l)], should have a peak for a sample size ~ G. In other words, the sample size corresponding to a maximum in V[M(l)] is approximately G! Furthermore, numerical simulations revealed that the peak variance itself scaled as 0.1 times G, and was thus a good estimate of G.

While the results are yet to be mathematically verified, the technique shows promise of finding applications in high-performance computing and machine learning. “The method described in our work has much wider applicability than Polya’s theorem and can, therefore, handle a broader category of problems. Moreover, it only requires a machine learning clustering tool for sorting the data and does not require a large memory or whole sampling. This can make AI recognition technology feasible for larger data sizes even with small-scale recognition tools, which can improve their convenience and availability in the future,” comments Prof. Maezono excitedly.

Sometimes, statistics is nothing short of magic, and this study proves that!

###

About Japan Advanced Institute of Science and Technology, Japan

Founded in 1990 in Ishikawa prefecture, the Japan Advanced Institute of Science and Technology (JAIST) was the first independent national graduate school in Japan. Now, after 30 years of steady progress, JAIST has become one of Japan’s top-ranking universities. JAIST counts with multiple satellite campuses and strives to foster capable leaders with a state-of-the-art education system where diversity is key; about 40% of its alumni are international students. The university has a unique style of graduate education based on a carefully designed coursework-oriented curriculum to ensure that its students have a solid foundation on which to carry out cutting-edge research. JAIST also works closely both with local and overseas communities by promoting industry-academia collaborative research.

About Professor Ryo Maezono from Japan Advanced Institute of Science and Technology, Japan

Dr. Ryo Maezono is a Professor at the School of Information Science at Japan Advanced Institute of Science and Technology (JAIST) since 2017. He received his Ph.D. degree from the University of Tokyo in 2000 and worked as a researcher at the National Institute for Materials Science, Ibaraki, Japan from 2001-2007. His research interests comprise material informatics and condensed matter theory using high-performance computing. A senior researcher and professor, he has 166 publications with over 1700 citations to his credit.

Funding information

This study was funded by JAIST Research Grant (Fundamental Research) 2019, FLAGSHIP2020 (project numbers hp190169 and hp190167 at K-computer), KAKENHI grant (grant numbers 17K17762 and 19K05029), Grant-in-Aid for Scientific Research on Innovative Areas (16H06439 and 19H05169), PRESTO (JPMJPR16NA), Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency (JST), MEXT-KAKENHI (grant numbers 19H04692 and 16KK0097), FLAGSHIP2020 (project numbers hp190169 and hp190167 at K-computer), Toyota Motor Corporation, I-O DATA Foundation, Air Force Office of Scientific Research (AFOSR-AOARD/FA2386-17-1-4049 and FA2386-19-1-4015), and JSPS Bilateral Joint Projects (with India DST).

Media Contact
Ryo Maezono
[email protected]

Original Source

https://www.jaist.ac.jp/english/

Related Journal Article

http://dx.doi.org/10.1002/adts.202000301

Tags: Chemistry/Physics/Materials SciencesMathematics/StatisticsTechnology/Engineering/Computer Science
Share13Tweet8Share2ShareShareShare2

Related Posts

Non-Equilibrium Effects Driven by Rarefaction in Shock Wave and Boundary Layer Interactions

Non-Equilibrium Effects Driven by Rarefaction in Shock Wave and Boundary Layer Interactions

August 19, 2025
Serve with a Spectacular Swerve: The Science Behind Spin and Precision

Serve with a Spectacular Swerve: The Science Behind Spin and Precision

August 19, 2025

Enhanced Trap Visualization: Full-Dimensional Imaging Advances Solar Cell Efficiency

August 19, 2025

Chefs and Scientists Collaborate to Explore Microbiology Through Kombucha and Kimchi

August 19, 2025
Please login to join discussion

POPULAR NEWS

  • blank

    Molecules in Focus: Capturing the Timeless Dance of Particles

    141 shares
    Share 56 Tweet 35
  • Neuropsychiatric Risks Linked to COVID-19 Revealed

    80 shares
    Share 32 Tweet 20
  • Modified DASH Diet Reduces Blood Sugar Levels in Adults with Type 2 Diabetes, Clinical Trial Finds

    60 shares
    Share 24 Tweet 15
  • Predicting Colorectal Cancer Using Lifestyle Factors

    47 shares
    Share 19 Tweet 12

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Recent News

New Research Reveals Biological Factors Behind Daytime Sleepiness

For Apes, What’s Out of Sight Stays on Their Mind

Methionine Gamma-Lyase: Purification and Anticancer Insights

  • Contact Us

Bioengineer.org © Copyright 2023 All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

Bioengineer.org © Copyright 2023 All Rights Reserved.