dCollection 디지털 학술정보 유통시스템

센서 융합과 강화학습 기법을 이용한 쿼드콥터 드론의 자율비행에 관한 연구

원문보기

주제(키워드) 자율주행 , 센서 융합 , 드론 , 강화학습
주제(DDC) 621
발행기관 아주대학교
지도교수 구형일
발행년도 2022
학위수여년월 2022. 2
학위명 석사
학과 및 전공 IT융합대학원 IT융합공학과
실제URI http://www.dcollection.net/handler/ajou/000000031502
본문언어 한국어
저작권 아주대학교 논문은 저작권에 의해 보호받습니다.

초록/요약

This paper is research about autonomous drone based on sensor fusion and reinforcement learning. 1. In outdoor drones, normally IMU and GPS sensor are used for localization. Each sensor has pros and cons. IMU has a fast update time, however, a lot of noise. GPS can give stable data regardless of time flow. But Its update rate is slow and It is dependent on the environment. So sensor fusion such as Kalman filter, Extended Kalman Filter and Unscented Kalman Filter makes better performance for localization. But in the case of traditional filters, covariances of sensors are fixed as initialized values. The proposed algorithm in this study makes covariances of sensors be adaptive according to the environment. A fuzzy system is chosen to change the value of covariance. 2. Navigation is based on Reinforcement Learning. Among a lot of reinforcement learning algorithms, this paper used the PPO algorithm. PPO algorithm is included in the policy-based algorithm. Usually, an environment with countless variables is perfect for policy-based methods like PPO. It has powerful performance in spite of simple algorithm logic. The entire Flow of the logic is, first, acquiring the current position by using dynamic EKF and 2D lidar data. Second, Earning action value of the drone from the policy neural net through inputting the estimation pose and 2D lidar data. Third, Gather data such as pose estimation and GAE until the size of the batch. Then, Update policy neural net and value neural net. Fourth, repetition of this logic until arriving goal position or limited epoch number. Conclusion of the experiment of that Dynamic EKF, PPO algorithm presents superior to others. In the case of pose estimation, the difference between ground truth and pose estimation by dynamic EKF is the smallest in filters. In addition, In considering computing time, It could be used in a real-time system. Agent on PPO algorithm arrives at goal position within smaller epoch than an agent on DQN algorithm. Especially, the PPO algorithm tuned by using grid search could arrive at goal position in only 9 trials. It means reinforcement learning could make adaptive navigation in an unknown environment.

제1장 서론 1
1.1 연구 배경 1
1.2 논문의 구성 2
제2장 이론 4
2.1 센서융합을 위한 필터 이론 4
2.1.1 칼만필터(KALMAN FILTER) 4
2.1.2 확장 칼만필터(EXTENDED KALMAN FILTER) 10
2.1.3 언센티드 칼만필터(UNSCENTED KALMAN FILTER) 13
2.2 강화학습 19
2.2.1 마르코프 결정 과정 19
2.2.2 가치 기반 강화학습 20
2.2.3 정책 기반 강화학습 22
제3장 설계 및 구현 26
3.1 센서 융합을 이용한 위치 추정 26
3.1.1 IMU 관성시스템 27
3.1.2 GNSS 위성 항법 시스템 28
3.1.3 확장 칼만필터를 이용한 위치 추정 29
3.1.4 언센티드 칼만필터를 이용한 위치 추정 33
3.1.5 DYNAMIC EKF BASED ON FUZZY 36
3.2 강화 학습 기반 자율비행 시스템 42
3.2.1 PPO를 활용한 자율비행 구현 방법 44
제4장 실험 및 결과 47
4.1 실험 환경 47
4.1.1 ROS(ROBOT OPERATING SYSTEM) 47
4.1.2 시뮬레이션 시스템 48
4.2 실험 결과 49
4.2.1 센서융합의 성능 분석 49
4.2.2 PPO 강화학습 모델의 성능 분석 54
5장 결론 및 향후과제 57
참고문헌 60

반출 Meta View 목록

검색 상세

센서 융합과 강화학습 기법을 이용한 쿼드콥터 드론의 자율비행에 관한 연구

초록/요약

목차