검색 상세

EviDent-ML : A Collaborative Framework for Evidence-driven Data Requirements Engineering and Data Uncertainty Assessment for ML-based Critical Systems

초록/요약

The latest trend of incorporating various data-centric machine learning (ML) models in software-intensive systems has posed new challenges in the quality assurance practice of software engineering, especially in a high-risk environment. ML experts are now focusing on explaining ML models to assure the safe behavior of ML-based systems. However, not enough attention has been paid to explain the inherent uncertainty of the training data. As the training data is the foundation of the performance outcome of critical machine learning (ML) models, we argue that along with the functional and non-functional requirements of such ML models, we need to pay attention to the quality requirements of the training data as well. The current practice of ML-based system engineering lacks transparency in the systematic fitness assessment process of the training data before engaging in the rigorous ML model training. Lack of written agreement about the training data, collaboration bottlenecks, lack of data validation framework, etc. are posing new challenges to ensuring training data fitness for safety-critical ML components. To mitigate these challenges, this research proposes EviDent-ML, a multilayered collaborative framework that includes multiple experts to elicit verifiable data requirements and evaluate data quality based on the elicited requirements in an evidence-driven manner. The framework is designed to achieve two primary objectives. Firstly, the framework provides guidelines to explore the problem space (problem-to-be solved) and the data space (data to be representative of the problem), and derive data requirements. An evidence-based requirements specification template is proposed to ensure the verifiability of the requirements. Secondly, after collecting the training data, experts' confidence in the data quality is assessed based on the evidence derived from the exploratory data analysis on the collected training data. As it is unrealistic to derive a binary answer from such an assessment, Dempster-Shafer's theory of evidence is used to combine experts' subjective opinions. A quality assurance case is finally prepared as an important artifact in the end of the entire process of data requirements elicitation and data quality assessment. We perform an empirical study involving 36 participants in total from industry and academia and apply the proposed framework to two domains: autonomous cars and smart grid. In the case of autonomous cars, a critical feature, `Pedestrian detection', is explored. A well-known benchmark dataset, CityPersons, is analyzed. In the case of smart grid, the main focus is `Load forecasting' feature. The results show that the proposed approach is more effective to not only elicit data requirements but also to evaluate the experts’ confidence in the quality of the training data. Moreover, with the help of the stepwise methodology based on the framework, it was possible to generate traceable artifacts more systematically than the traditional approach. To help with the application of the Dempster-Shafer theory and confidence propagation in the assurance case, we develop a prototype of an automated support tool.

more

목차

I. Introduction 1
1.1 Motivation and Background 1
1.2 Systematic Literature Review and Gap Analysis 3
1.2.1 Review Method 5
1.2.2 Conceptual Framework for Safety-driven AI Systems Engineering 5
1.2.3 Discussion of the Research Questions 9
1.3 Problem Statement 17
1.4 Contribution 20
1.5 Scope 21
II. Related Work 23
2.1 RE & design guidelines for reliable ML 23
2.2 Training Data Quality Evaluation and Assurance 23
2.3 Data Quality Standards 25
2.4 Evidence-driven safety assurance 25
III. Preliminary Concepts 27
3.1 Twin Peak Models in the Days of AI 27
3.2 Problem Space and Data Space 28
3.2.1 Conceptualization 28
3.2.2 Blind Spots 30
3.3 Dempster Shafer's Theory of Evidence 31
IV. EviDent-ML: The Proposed Multi-layered Collaborative Framework 34
4.1 Layer-1: Problem Layer 34
4.1.1 Step-1- Explore the operational domain (Dimension-1) 36
4.1.2 Step-2: Explore goals and requirements (Dimension-2) 37
4.1.3 Step-3: Explore risk factors (Dimension-3) 38
4.2 Layer-2: Data Layer 41
4.2.1 Step-4: Elicit and analyze data requirements in a goal-driven way 41
4.2.2 Step-5: Specify data requirements 44
4.3 Layer-3: Evidence Layer 44
4.3.1 Step-6: Evidence consolidation 45
4.3.2 Step-7: Belief Combination and Preparation of Quality Assurance Case 47
4.3.3 Combined Belief Mass Propagation: 51
V. Tool Support 56
5.1 Download Templates 57
5.2 Collective Confidence Calculation 57
5.3 Detailed Belief Mass Analysis 57
VI. Theoretical Evaluation: Case Study Methodology 61
6.1 Case Study Design 61
6.2 Domain and Dataset Description 61
6.3 Overall evaluation 64
6.4 Case Study Results Analysis 64
6.4.1 RQ1: Addressing collaboration challenges 64
6.4.2 RQ2: Enhancing traceability 66
6.4.3 RQ3: Discovering blind spots and missing requirements 66
VII. Empirical Study 68
7.1 Study Questions and Hypothesis 68
7.2 Selected domain applications 71
7.2.1 Domain-1: Autonomous cars 71
7.2.2 Domain-2: Smart Grid 72
7.3 Feasibility Study 72
7.3.1 Participants 72
7.3.2 Study Design 74
7.3.3 Results 78
7.4 Replicated Study 82
7.4.1 Participants 83
7.4.2 Study Design 83
7.4.3 Results 83
7.5 Discussion 86
7.5.1 Reasons for improved outcome with proposed approach 86
7.5.2 Comparison with output of ChatGPT 87
VIII. Threats to Validity 90
8.1 Internal Validity 90
8.2 External Validity 90
8.3 Construct Validity 91
IX. Conclusions 92
9.1 Contributions 92
9.2 Future Scope 93
References 94
A. Appendix I 119
1.1 Systematic Literature Review 119
1.1.1 Search Strings Used 119
1.1.2 Search Strategies and Data Sources 120
1.1.3 Study Selection Process 120
1.1.4 Study Quality Assessment 121
1.1.5 Challenges and Research Efforts at Each Layer 122
B. Appendix II 142
2.1 Sample Results of Feasibility Study 142
2.1.1 Data Description 142
2.1.2 Traditional Approach (Group 8) 149
2.1.3 Proposed Approach (Group 12) 150
C. Appendix III 156
3.1 List of Publications 156

more