Improving Distributed Graph Processing by Load Balancing and Redundancy Reduction

Improving Distributed Graph Processing by Load Balancing and Redundancy Reduction
Author :
Publisher :
Total Pages : 294
Release :
ISBN-10 : OCLC:1251936666
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Improving Distributed Graph Processing by Load Balancing and Redundancy Reduction by : Shuang Song (Ph. D.)

Download or read book Improving Distributed Graph Processing by Load Balancing and Redundancy Reduction written by Shuang Song (Ph. D.) and published by . This book was released on 2020 with total page 294 pages. Available in PDF, EPUB and Kindle. Book excerpt: The amount of data generated every day is growing exponentially in the big data era. A significant portion of this data is stored as graphs in various domains, such as online retail and social networks. Analyzing large-scale graphs provides important insights that are highly utilized in many areas, such as recommendation systems, banking systems, and medical diagnosis. To accommodate analysis on large-scale graphs, developers from industry and academia design the distributed graph processing systems. However, processing graphs in a distributed manner suffers performance inefficiencies caused by workload imbalance and redundant computations. For instance, while data centers are trending towards a large amount of heterogeneous processing machines, graph partitioners still operate under the assumption of uniform computing resources. This leads to load imbalance that degrades the overall performance. Even with a balanced data distribution, the irregularity of graph applications can result in different amounts of dynamic operations on each machine in the cluster. Such imbalanced work distribution slows down the execution speed. Besides these, redundancy also impacts the performance of distributed graph analysis. To utilize the available parallelism of computing clusters, distributed graph systems deploy the repeated-relaxing computation model (e.g., Bellman-Ford algorithm variants) rather than in a sequential but work-optimal order. Studies performed in this dissertation show that redundant computations pervasively exist and significantly impact the performance efficiency of distributed graph processing. This dissertation explores novel techniques to reduce the workload imbalance and redundant computations of analyzing large-scale graphs in a distributed setup. It evaluates proposed techniques on both pre-processing and execution modules to enable fair data distribution, lightweight workload balancing, and redundancy optimization for future distributed graph processing systems. The first contribution of this dissertation is the Heterogeneity-aware Partitioning (HAP) that aims to balance load distribution of distributed graph processing in heterogeneous clusters. HAP proposes a number of methodologies to estimate various machines’ computational power on graph analytics. It also extends several state-of-the-art partitioning algorithms for heterogeneity-aware data distribution. Utilizing the estimation of machines’ graph processing capability to guide extended partitioning algorithms can reduce load imbalance when processing a large-scale graph in heterogeneous clusters. This results in significant performance improvement. Another contribution of the dissertation is the Hula, which optimizes the workload balance of distributed graph analytics on the fly. Hula offers a hybrid graph partitioning algorithm to split a large-scale graph in a locality-friendly manner and generate metadata for lightweight dynamic workload balancing. To track machines’ work intensity, Hula inserts hardware timers to count the time spent on the important operations (e.g., computational operations and atomic operations). This information can guide Hula’s workload scheduler to arrange work migration. With the support of metadata generated by the hybrid partitioner, Hula’s migration scheme only requires a minimal amount of data to transfer work between machines in the cluster. Hula’s workload balancing design achieves a lightweight imbalance reduction on the fly. Finally, this dissertation focuses on improving the computational efficiency of distributed graph processing. To do so, it reveals the root cause and the amount of redundant computations in distributed graph processing. SLFE is proposed as a system solution to reduce these redundant operations. SLFE develops a lightweight pre-processing technique to obtain the maximum propagation order of each vertex in a given graph. This information is defined as Redundancy Reduction Guidance (RRG) and is utilized by SLFE’s Redundancy Reduction (RR)-aware computing model to prune redundant operations on the fly. Moreover, SLFE provides RRaware APIs to maintain high promgrammablity. These techniques allow the redundancy optimizations of distributed graph processing to become transparent to users


Improving Distributed Graph Processing by Load Balancing and Redundancy Reduction Related Books

Improving Distributed Graph Processing by Load Balancing and Redundancy Reduction
Language: en
Pages: 294
Authors: Shuang Song (Ph. D.)
Categories:
Type: BOOK - Published: 2020 - Publisher:

DOWNLOAD EBOOK

The amount of data generated every day is growing exponentially in the big data era. A significant portion of this data is stored as graphs in various domains,
On Improving Distributed Pregel-like Graph Processing Systems
Language: en
Pages:
Authors: Minyang Han
Categories:
Type: BOOK - Published: 2015 - Publisher:

DOWNLOAD EBOOK

Load Balance For Distributed Real-time Computing Systems
Language: en
Pages: 259
Authors: Junhua Fang
Categories: Computers
Type: BOOK - Published: 2020-05-19 - Publisher: World Scientific

DOWNLOAD EBOOK

This illustrative compendium analyzes the load balancing problem in distributed stream processing systems and explores a set of high-performance real-time proce
A FRAMEWORK FOR SCALABLE DISTRIBUTED JOB PROCESSING WITH DYNAMIC LOAD BALANCING USING DECENTRALIZED APPROACH
Language: en
Pages: 97
Authors: Dr P. SrinivasaRao
Categories: Education
Type: BOOK - Published: 2017-12-30 - Publisher: Lulu.com

DOWNLOAD EBOOK

A distributed system consists of many heterogeneous processors with different processing power and all processors are interconnected with a communication channe
Optimal Load Balancing in Distributed Computer Systems
Language: en
Pages: 262
Authors: Hisao Kameda
Categories: Technology & Engineering
Type: BOOK - Published: 2012-12-06 - Publisher: Springer Science & Business Media

DOWNLOAD EBOOK

An important consideration in improving the performance of a distributed computer system is the balancing of the load between the host computers. Load balancing