검색 상세

Convolution Neural Networks를 위한 고속 Convolution 알고리즘 과 가속기

초록/요약

Recent advances in computing power made possible by developments of faster general-purpose graphics processing units (GPGPUs) have increased the complexity of convolutional neural network (CNN) models. However, because of the limited applications of the existing GPGPUs, CNN accelerators are becoming more important. The current accelerators focus on improvement in memory scheduling and architectures. Thus, the number of multiplier-accumulator (MAC) operations is not reduced. In this study, a new convolution layer operation algorithm is proposed using the coarse-to-fine method instead of hardware or architecture approaches. This algorithm is shown to reduce the MAC operations by 33%. However, the accuracy of the Top 1 is decreased only by 3% and the Top 5 only by 1%. . Furthermore, the proposed hardware accelerator demonstrates higher performance, lower power consumption, and higher energy efficiency than other ASIC implementations except for [45]. The proposed accelerator demonstrates a performance higher by 1.7×, a 65% decrease in on-chip memory, and a gate count lower by 20% compared to the hardware accelerator of [45]. Although the proposed accelerator has a larger gate count, it demonstrates higher performance, lower power consumption, energy efficiency improved by 1.7–1.8×, and a chip memory size smaller than that for the accelerator of [22].

more

목차

I. Introduction 6
II. Overview of Convolutional Neural Networks 12
A. Overall Architecture 12
B. Convolution Layer 13
C. Pooling Layer 17
D. Convolutional Neural Networks 19
1. LeNet 19
2. AlexNet 21
3. VGG-16 24
4. GoogLeNet 27
III. Acceleration for Deep Neural Networks 32
A. Quantization and Binarization 32
B. Pruning and Sharing 32
C. Low-Rank Factorization and Sparsity 34
IV. Two-Step MAC Operation for Convolutional Layer 35
V. Architecture for Two-Step MAC Operation 50
D. Modified HCCA 50
E. Reconstruction of the output pixel ordering 55
F. Overall architecture 59
G. PEG Architecture 60
H. Temporary Feature Map 66
VI. Experimental Results 68
I. Algorithms performance 68
J. Hardware Accelerator 72
VII. Conclusions 76
Bibliography 78

more