검색 상세

Efficient Data Decomposition and Resource Allocation Method for TIN Parallel Processing of 3D LiDAR Data

초록/요약

The parallelization method has been introduced to solve the limitation of single computing unit in large scale data processing. There have been two common approach in applying this parallelization method i.e., batch and stream processing mechanism. As batch processing mechanism, the MapReduce become the most popular programming paradigm in order to achieve the simplicity of programming with having map and reduce procedure. In our work, we adopt MapReduce paradigm to a specific domain in the TIN parallelization of large scale LiDAR dataset. In order to enable efficient TIN parallelization of large scale dataset, we addressed the issues of data dependencies between parallel worker execution and the bottleneck in the shuffle phase of MapReduce. Here, we introduced triangulation approach with convex boundary to reduce the number of processed vertices in in each parallel worker. The convex boundary triangulation approach is a mechanism for choosing only a set of convex boundary triangles that located around the boundary area in the triangulation process. By thus, the parallel workers can work independently each other because the vertices that being processed in in the boundary area is not affected by other vertices subset areas. Hence, the better time performance can be achieved due to more efficient accessing number of vertices in the triangulation. In addition of our work, we propose resource allocation strategy to manage the usage of cache, memory, and disk in order to reduce communication bottleneck in the data transfer of shuffling process. Our strategy works by giving priority of resource usage to the task that having more fraction of data compare to the other tasks. By thus, the task that have more data in the shuffle process able to complete the task execution more efficient. In order to evaluate our work, we conduct an experiment with a number of LiDAR data set from NCALM Lab, Berkeley, USA. We evaluate the number of vertices that being processed both in our proposed method and other grid/bucket works as comparison. In addition, we measure the time improvement of our proposed method by varying the number of subset areas and scaling up the number of dataset. In the end of our evaluation, the simulation of proposed resource allocation strategy is provided in order to measure the performance improvement in the bottleneck phase.

more

목차

1. Introduction: Parallelization of large scale data processing 1
1.1. Motivation 1
1.1.1. Large scale data processing 1
1.1.2. Parallelization of large scale data processing 2
1.1.3. Data dependencies in parallel execution 6
1.1.4. Bottleneck in shuffle process 8
1.2. Thesis Summary 10
1.3. Thesis Contribution 11
1.4. Thesis Outline 11
2. Background and Related Works 12
2.1. LiDAR TIN processing for digital mapping 12
2.1.1. Triangulated Irregular Network (TIN) digital mapping 12
2.1.2. Delaunay Triangulation Properties 14
2.1.3. Iterative sequential TIN construction 16
2.1.4. Adjacent dependencies areas in TIN construction 18
2.2. TIN Parallelization method 21
2.3. Reducing network overhead in Shuffle processes 24
2.4. Cache Sharing Mechanism 25
3. Efficient Data Decomposition and Resource Allocation Method for TIN Parallel Processing of 3D LiDAR Data 28
3.1. Large scale TIN Parallelization 28
3.1.1. Regular parallelization method in MapReduce Style 28
3.1.2. Pre-processing LiDAR point cloud 31
3.1.3. Triangulation approach with convex boundary triangle 33
3.1.4. TIN Construction of parallel worker 38
3.1.5. Merging strategy in bucket/grid method 40
3.2. Resource Allocation 41
4. Implementation: Enabling TIN Parallelization of large scale data processing 48
4.1. Data dependencies task execution in parallel framework 48
4.2. Key selection for task execution 54
4.3. Task Monitoring Tools 57
5. Evaluation and Analysis 59
5.1. Experiment Setup 59
5.1.1. Experiment Environment 59
5.1.2. Experiment Data 60
5.2. Experiment Objective 61
5.3. Experiment Result 62
5.3.1. The efficiency of triangulation with convex boundary 62
5.3.2. Case Study for the proposed resource allocation methods 68
6. Conclusion and Future Work 73
6.1. Conclusion 73
6.2. Future Works 74
References 75

more