Segmentation-guided Planar Masking for Improving Diffusion-based Monocular Depth Estimation
- 주제(키워드) Monocular depth estimation , Diffusion model , Structural consistency , Pretrained guidance , Planar mask , Denoising guidance
- 주제(DDC) 006.31
- 발행기관 아주대학교 일반대학원
- 지도교수 Kyung-Ah Sohn
- 발행년도 2026
- 학위수여년월 2026. 2
- 학위명 석사
- 학과 및 전공 일반대학원 인공지능학과
- 실제URI http://www.dcollection.net/handler/ajou/000000035619
- 본문언어 영어
- 저작권 아주대학교 논문은 저작권에 의해 보호받습니다.
초록/요약
Diffusion-based monocular depth models such as Marigold produce depth maps with strong global structure, yet still exhibit characteristic artifacts on large planar surfaces in indoor scenes, which in turn degrade geometric layout recovery. This thesis proposes a planar-guided, training-free framework that augments a pre-trained diffusion-based depth model with an inference-time guidance energy defined on intermediate depth predictions using plane masks predicted by the ZeroPlane segmentation model. The guidance energy combines plane-smoothing and detail- preservation terms evaluated inside the diffusion loop, and back-propagates their gradients to the latent variables, enforcing locally consistent depth on planar regions while preserving boundaries and fine structures, without modifying model weights. Experiments on the NYU Depth V2 test split equipped with ZeroPlane-predicted plane masks show consistent improvements over Marigold on standard depth metrics, with relatively larger gains on planar pixels and modest but stable improvements on non-planar regions, demonstrating that injecting planar priors into the sampling process enhances the structural consistency of diffusion-based monocular depth estimation without retraining.
more목차
1. Introduction 1
2. Related Work 5
2.1 Monocular Depth Estimation in Indoor Scenes 5
2.2 Diffusion-based Approaches for Depth Estimation 6
2.3 Plane-Aware Depth Estimation and Planar Priors 7
2.4 Segmentation Models and Planar Guidance Sources 8
3. Methodology 10
3.1 Dataset 10
3.2 Planar-Guided Diffusion-based depth model 12
3.2.1 Planar masks and overall idea 13
3.2.2 Plane-smoothing term 14
3.2.3 Detail-preservation term 15
3.2.4 Time-dependent balance 17
3.2.5 Guidance update inside the diffusion loop 18
3.3 Qualitative Visualization and Error Analysis 19
4. Experiments 20
4.1 Experimental Setup 20
4.2 Quantitative Results 22
4.3 Qualitative Results 24
4.4 Ablation Study 27
4.4.1 Effect of start in the plane-focused schedule 27
4.4.2 Plane-focused versus balanced versus detail-focused schedules 28
4.4.3 Influence of guidance strength 30
5. Conclusion 32
Discussion and Future Work 33
Reference 35

