dCollection 디지털 학술정보 유통시스템

Convolution Neural Networks를 위한 고속 Convolution 알고리즘 과 가속기

원문보기

주제(키워드) Convolution Neural Networks , 가속기
발행기관 아주대학교
지도교수 선우명훈
발행년도 2019
학위수여년월 2019. 2
학위명 박사
학과 및 전공 일반대학원 전자공학과
실제URI http://www.dcollection.net/handler/ajou/000000028581
본문언어 영어
저작권 아주대학교 논문은 저작권에 의해 보호받습니다.

초록/요약

Recent advances in computing power made possible by developments of faster general-purpose graphics processing units (GPGPUs) have increased the complexity of convolutional neural network (CNN) models. However, because of the limited applications of the existing GPGPUs, CNN accelerators are becoming more important. The current accelerators focus on improvement in memory scheduling and architectures. Thus, the number of multiplier-accumulator (MAC) operations is not reduced. In this study, a new convolution layer operation algorithm is proposed using the coarse-to-fine method instead of hardware or architecture approaches. This algorithm is shown to reduce the MAC operations by 33%. However, the accuracy of the Top 1 is decreased only by 3% and the Top 5 only by 1%. . Furthermore, the proposed hardware accelerator demonstrates higher performance, lower power consumption, and higher energy efficiency than other ASIC implementations except for [45]. The proposed accelerator demonstrates a performance higher by 1.7×, a 65% decrease in on-chip memory, and a gate count lower by 20% compared to the hardware accelerator of [45]. Although the proposed accelerator has a larger gate count, it demonstrates higher performance, lower power consumption, energy efficiency improved by 1.7–1.8×, and a chip memory size smaller than that for the accelerator of [22].

I. Introduction 6
II. Overview of Convolutional Neural Networks 12
A. Overall Architecture 12
B. Convolution Layer 13
C. Pooling Layer 17
D. Convolutional Neural Networks 19
1. LeNet 19
2. AlexNet 21
3. VGG-16 24
4. GoogLeNet 27
III. Acceleration for Deep Neural Networks 32
A. Quantization and Binarization 32
B. Pruning and Sharing 32
C. Low-Rank Factorization and Sparsity 34
IV. Two-Step MAC Operation for Convolutional Layer 35
V. Architecture for Two-Step MAC Operation 50
D. Modified HCCA 50
E. Reconstruction of the output pixel ordering 55
F. Overall architecture 59
G. PEG Architecture 60
H. Temporary Feature Map 66
VI. Experimental Results 68
I. Algorithms performance 68
J. Hardware Accelerator 72
VII. Conclusions 76
Bibliography 78

반출 Meta View 목록

아주대학교

검색 상세

Convolution Neural Networks를 위한 고속 Convolution 알고리즘 과 가속기

초록/요약

목차