TY - JOUR
T1 - State-Dependent Parameter Tuning of the Apparent Tardiness Cost Dispatching Rule Using Deep Reinforcement Learning
AU - Min, Byungwook
AU - Kim, Chang Ouk
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2022
Y1 - 2022
N2 - The apparent tardiness cost (ATC) is a dispatching rule that demonstrates excellent performance in minimizing the total weighted tardiness (TWT) in single-machine scheduling. The ATC rule's performance is dependent on the lookahead parameter of an equation that calculates the job priority index. Existing studies recommend a fixed value or a value derived through a handcrafted function as an estimate of the lookahead parameter. However, such parameter estimation inevitably entails information loss from using summarized job data and generates an inferior schedule. This study proposes a reinforcement learning-based ATC dispatching rule that estimates the lookahead parameter directly from raw job data (processing time, weight, and slack time). The scheduling agent learns the relationship between raw job data and the continuous lookahead parameter while interacting with the scheduling environment using a deep deterministic policy gradient (DDPG) algorithm. We trained the DDPG model to minimize the TWT through a simulation in a single-machine scheduling problem with unequal job arrival times. Based on a preliminary experiment, we verified that the proposed dispatching rule, ATC-DDPG, successfully performed intelligent state-dependent parameter tuning. ATC-DDPG also displayed the best performance in the main experiment, which compared the performance with five existing dispatching rules.
AB - The apparent tardiness cost (ATC) is a dispatching rule that demonstrates excellent performance in minimizing the total weighted tardiness (TWT) in single-machine scheduling. The ATC rule's performance is dependent on the lookahead parameter of an equation that calculates the job priority index. Existing studies recommend a fixed value or a value derived through a handcrafted function as an estimate of the lookahead parameter. However, such parameter estimation inevitably entails information loss from using summarized job data and generates an inferior schedule. This study proposes a reinforcement learning-based ATC dispatching rule that estimates the lookahead parameter directly from raw job data (processing time, weight, and slack time). The scheduling agent learns the relationship between raw job data and the continuous lookahead parameter while interacting with the scheduling environment using a deep deterministic policy gradient (DDPG) algorithm. We trained the DDPG model to minimize the TWT through a simulation in a single-machine scheduling problem with unequal job arrival times. Based on a preliminary experiment, we verified that the proposed dispatching rule, ATC-DDPG, successfully performed intelligent state-dependent parameter tuning. ATC-DDPG also displayed the best performance in the main experiment, which compared the performance with five existing dispatching rules.
KW - Deep deterministic policy gradient
KW - dispatching rule
KW - dynamic scheduling
KW - parameter tuning
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85124848358&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124848358&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2022.3152192
DO - 10.1109/ACCESS.2022.3152192
M3 - Article
AN - SCOPUS:85124848358
SN - 2169-3536
VL - 10
SP - 20187
EP - 20198
JO - IEEE Access
JF - IEEE Access
ER -