검색 상세

센서 융합과 강화학습 기법을 이용한 쿼드콥터 드론의 자율비행에 관한 연구

초록/요약

This paper is research about autonomous drone based on sensor fusion and reinforcement learning. 1. In outdoor drones, normally IMU and GPS sensor are used for localization. Each sensor has pros and cons. IMU has a fast update time, however, a lot of noise. GPS can give stable data regardless of time flow. But Its update rate is slow and It is dependent on the environment. So sensor fusion such as Kalman filter, Extended Kalman Filter and Unscented Kalman Filter makes better performance for localization. But in the case of traditional filters, covariances of sensors are fixed as initialized values. The proposed algorithm in this study makes covariances of sensors be adaptive according to the environment. A fuzzy system is chosen to change the value of covariance. 2. Navigation is based on Reinforcement Learning. Among a lot of reinforcement learning algorithms, this paper used the PPO algorithm. PPO algorithm is included in the policy-based algorithm. Usually, an environment with countless variables is perfect for policy-based methods like PPO. It has powerful performance in spite of simple algorithm logic. The entire Flow of the logic is, first, acquiring the current position by using dynamic EKF and 2D lidar data. Second, Earning action value of the drone from the policy neural net through inputting the estimation pose and 2D lidar data. Third, Gather data such as pose estimation and GAE until the size of the batch. Then, Update policy neural net and value neural net. Fourth, repetition of this logic until arriving goal position or limited epoch number. Conclusion of the experiment of that Dynamic EKF, PPO algorithm presents superior to others. In the case of pose estimation, the difference between ground truth and pose estimation by dynamic EKF is the smallest in filters. In addition, In considering computing time, It could be used in a real-time system. Agent on PPO algorithm arrives at goal position within smaller epoch than an agent on DQN algorithm. Especially, the PPO algorithm tuned by using grid search could arrive at goal position in only 9 trials. It means reinforcement learning could make adaptive navigation in an unknown environment.

more

목차

제1장 서론 1
1.1 연구 배경 1
1.2 논문의 구성 2
제2장 이론 4
2.1 센서융합을 위한 필터 이론 4
2.1.1 칼만필터(KALMAN FILTER) 4
2.1.2 확장 칼만필터(EXTENDED KALMAN FILTER) 10
2.1.3 언센티드 칼만필터(UNSCENTED KALMAN FILTER) 13
2.2 강화학습 19
2.2.1 마르코프 결정 과정 19
2.2.2 가치 기반 강화학습 20
2.2.3 정책 기반 강화학습 22
제3장 설계 및 구현 26
3.1 센서 융합을 이용한 위치 추정 26
3.1.1 IMU 관성시스템 27
3.1.2 GNSS 위성 항법 시스템 28
3.1.3 확장 칼만필터를 이용한 위치 추정 29
3.1.4 언센티드 칼만필터를 이용한 위치 추정 33
3.1.5 DYNAMIC EKF BASED ON FUZZY 36
3.2 강화 학습 기반 자율비행 시스템 42
3.2.1 PPO를 활용한 자율비행 구현 방법 44
제4장 실험 및 결과 47
4.1 실험 환경 47
4.1.1 ROS(ROBOT OPERATING SYSTEM) 47
4.1.2 시뮬레이션 시스템 48
4.2 실험 결과 49
4.2.1 센서융합의 성능 분석 49
4.2.2 PPO 강화학습 모델의 성능 분석 54
5장 결론 및 향후과제 57
참고문헌 60

more