TY - GEN
T1 - Proteus
T2 - 12th IEEE International Conference on Joint Cloud Computing, JCC 2021 and 2021 9th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, MobileCloud 2021
AU - Xiang, Yongan
AU - Shi, Bin
AU - Zhang, Chongle
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - With the prevalence of machine learning applications, an increasing number of machine learning tasks is transplanted into cloud computing platform. The cloud for machine learning consists of heterogeneous resources such as GPUs, CPUs, memory, etc, which brings challenges for resource management. So, how to schedule machine learning tasks and allocate appropriate GPU resources for computing, so that the cluster can maximize the use of resources and reduce task computing time has become a concern in the industry and academia. This paper proposes a scheduling strategy named Proteus, which is based on Lyapunov optimization. Through the Proteus, we can make our tasks have a minimum turnaround time in a long time sequence, which is the time from the submission of the task to the completion of the task. By performing a comprehensive analysis, we implement the scheduling algorithm and conducts several simulation experiments. The experimental result shows that our scheduling strategy can achieve significant results in most task scheduling environments, reducing the turnaround time of tasks to 40%-50% of the original. This shows that the Proteus can provide higher resource utilization and performance of cloud clusters and reduce task turnaround time.
AB - With the prevalence of machine learning applications, an increasing number of machine learning tasks is transplanted into cloud computing platform. The cloud for machine learning consists of heterogeneous resources such as GPUs, CPUs, memory, etc, which brings challenges for resource management. So, how to schedule machine learning tasks and allocate appropriate GPU resources for computing, so that the cluster can maximize the use of resources and reduce task computing time has become a concern in the industry and academia. This paper proposes a scheduling strategy named Proteus, which is based on Lyapunov optimization. Through the Proteus, we can make our tasks have a minimum turnaround time in a long time sequence, which is the time from the submission of the task to the completion of the task. By performing a comprehensive analysis, we implement the scheduling algorithm and conducts several simulation experiments. The experimental result shows that our scheduling strategy can achieve significant results in most task scheduling environments, reducing the turnaround time of tasks to 40%-50% of the original. This shows that the Proteus can provide higher resource utilization and performance of cloud clusters and reduce task turnaround time.
KW - Distributed machine learning
KW - Lyapunov optimization
KW - Task scheduling
UR - https://www.scopus.com/pages/publications/85125924850
U2 - 10.1109/JCC53141.2021.00013
DO - 10.1109/JCC53141.2021.00013
M3 - 会议稿件
AN - SCOPUS:85125924850
T3 - Proceedings - 2021 IEEE International Conference on Joint Cloud Computing, JCC 2021 and 2021 9th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, MobileCloud 2021
SP - 9
EP - 15
BT - Proceedings - 2021 IEEE International Conference on Joint Cloud Computing, JCC 2021 and 2021 9th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, MobileCloud 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 August 2021 through 26 August 2021
ER -