High-performance data processing technology through a new database

DGIST developed a new graph-based database partitioning method and its system implementation showed 4.2 times faster performance on average than Apacke Spark SQL

Credit: ©DGIST

DGIST developed a core technology that supports a fast and efficient large-scale data analysis, which can have a huge impact on large-scale data analysis in a near future.

DGIST announced on May 21st that Professor Min-Soo Kim’s team in the Department of Information and Communication Engineering developed a data management and processing techniques for relational database called ‘GPT (Graph-based Partitioning Table) technology.’ GPT technology shows more than 4 times faster query performance on average compared with widely used Spark SQL system and can be applied to various areas requiring fast join processing technique.

Relational database is widely used in various fields. As the size of relational database increases, a number of machines are used to store such large data where each node manages a part of data. Each part of data is called “partition” of a data and is generated by partitioning an input data as a number of individual partitions. ‘Apache Spark SQL’ is widely used parallel query processing system for relational database. Although a number of query processing technologies have been developed, they require expensive network communication among machines to process large-scale of data.

To overcome a performance issue, Professor Min-Soo Kim’s team studied a more efficient method to manage and process large-scale relational database in parallel and distributed environments. The team developed GPT technology that supports an efficient database partitioning method for relational database which can eliminate an expensive network communication among machines during query processing, thereby successfully resolving critical issues in database partitioning method and parallel and distributed query processing technologies.

GPT technology uses graph-theoretic view for modeling co-partitioning relationships among relational tables. Each table to be partitioned is modeled as a vertex and co-partitioning relationships (or join predicate) between two tables is represented as an edge, and some tables are replicated across machines. To decide tables to be partitioned, GPT technology exploits a concept of hub vertex so that adjacent tables of the same hub table are co-partitioned. By doing so, query processing using co-partitioned tables does not require network communication.

The GPT technology developed by Professor Min-Soo Kim’s team achieves 4.2 times faster performance on average compared with Apache Spark SQL when we use TPC-DS database and queries, which is the industry standard benchmarking method. In addition, GPT technology can be used as an optimization technique for large-scale data processing in a real world beyond a theoretical issue.

Professor Min-Soo Kim in the DGIST Department of Information and Communication Engineering explained that “As there are huge interest regarding fast and efficient large-scale data processing starting from 2010s, we have focused on studying this issue. We expect that the technology for processing relational data we developed from this research will be very useful in the future as data becomes larger and complex.”

###

This research was co-conducted by Ph.D. candidate Yoon-Min Nam in the Department of Information and Communication Engineering as the first author and was published on April issue of ‘Information Sciences,’ a world-renowned international journal.

Media Contact
Min-Soo Kim
[email protected]

https://www.dgist.ac.kr/en/html/sub06/060202.html?mode=V&no=c678e85ac47c3981b86f080b1bf3892d&GotoPage=1

http://dx.doi.org/10.1016/j.ins.2018.12.031

High-performance data processing technology through a new database partitioning method

Related Posts

Five or more hours of smartphone usage per day may increase obesity

NASA’s terra satellite finds tropical storm 07W’s strength on the side

NASA finds one burst of energy in weakening Depression Dalila

Researcher’s innovative flood mapping helps water and emergency management officials

POPULAR NEWS

Robotic Ureteral Reconstruction: A Novel Approach

Digital Privacy: Health Data Control in Incarceration

Breakthrough in RNA Research Accelerates Medical Innovations Timeline

Mapping Tertiary Lymphoid Structures for Kidney Cancer Biomarkers

About

Follow us

Recent News

Evaluating a Self-Care App for Chest Trauma Patients

Anesthesia Method’s Impact on Elderly Hip Fracture Recovery

Menopause Care: Insights from Workforce Review and Consultation

Subscribe to Blog via Email

Welcome Back!

Retrieve your password