검색 상세

Staleness-aware semi-asynchronous federated learning

초록/요약

As the attempts to distribute deep learning using personal data have increased, the importance of federated learning (FL) has also increased. Attempts have been made to overcome the core challenges of federated learning (i.e., statistical and system heterogeneity) using synchronous or asynchronous protocols. However, stragglers reduce training efficiency in terms of latency and accuracy in each protocols, respectively. To solve straggler issues, a semi-asynchronous protocol that combines the two protocols can be applied to FL; however, effectively handling the staleness of the local model is a difficult problem. We proposed SASAFL to solve the training inefficiency caused by staleness in semi-asynchronous FL. SASAFL enables stable training by considering the quality of the global model to synchronise the servers and clients. In addition, it achieves high accuracy and low latency by adjusting the number of participating clients in response to changes in global loss and immediately processing clients that did not to participate in the previous round. An evaluation was conducted under various conditions to verify the effectiveness of the SASAFL. SASAFL achieved 19.69% higher accuracy than the baseline, 2.32 times higher round-to-accuracy, and 2.24 times higher latency-to-accuracy. Additionally, SASAFL always achieved target accuracy that the baseline can’t reach.

more

목차

1 Introduction 1
2 Background and related work 4
2.1 Synchronous FL 4
2.2 Asynchronous FL 5
2.3 Semi-asynchronous FL 8
3 Motivation 12
4 SASAFL: Staleness Aware Semi Asynchronous Federated Learning 16
4.1 Global model reception policy 20
4.1.1 Various types of clients 20
4.1.2 Details of global model reception policy 21
4.2 Adjusting number of participating clients 23
4.2.1 Details of adjusting number of participating clients 23
4.3 The SASAFL protocol 25
5 Experiment and evaluation 29
5.1 Experiment setup 29
5.1.1 Testbed 29
5.1.2 Benchmark 30
5.1.3 Model and dataset 30
5.1.4 Training parameters 30
5.1.5 Metrics 31
5.2 Experiment results 31
5.2.1 Lag tolerance 31
5.2.2 Training curve 32
5.2.3 Accuracy and latency performance 34
5.2.4 Limitation of SASAFL 35
6 Conclusion and future works 48

more