The nonparametric MKS test [15], oftentimes called the sequential Mann-Kendall-Sneyers test, has been applied to the change point detection for long-term time series data (e.g., hydrological changes, climatic changes). According to the Centers for Disease Control and Prevention (CDC) report, both social distancing and mass gathering can potentially lead to an abrupt change in regional COVID-19 cases, albeit in different directions [16]. Then, we have evaluated the potential of the MKS test for change point detection in short-term time series data, the COVID-19 cases of infection.
In this section, we first articulate the MKS test. Then, we use an example to demonstrate the model implementation.
Method description
The MKS test applied to the COVID-19 time series data can be completed in three major steps.
Step 1: Deriving test statistics (S
k)
We have treated new weekly cases as an independent observation in a 45-week time series data. Under the null hypothesis that the development of new cases remains stable, for each state, we have a time series of the weekly new cases: X = {x1, x2, x3…xN }, where n is the total number of weeks under observation (N = 45 in our case study). mi (i = 1, 2, …, N) represents the total number of elements xj preceding xi (j < i) where xj < xi.
Based on mi, the test statistic Sk derives the cumulative mi for each week, as shown in Eq. (1).
$${S}_k=\sum_{i=1}^k{m}_i\ \left(k=1,2,3,\dots, N\right)$$
(1)
The mean of Sk can be derived by Eq. (2).
$$E\left({S}_k\right)=k\left(k-1\right)/4$$
(2)
The variance of Sk can be derived by Eq. (3).
$$VAR\left({S}_k\right)=k\left(k-1\right)\left(2k-5\right)/72$$
(3)
Step 2: Deriving two sequences (U
f and U
b)
Next, we derive two sequences, the forward sequence Uf and the backward sequence Ub, based on the three variables (Sk, E(Sk), and VAR(Sk)) in Eqs. (1) through (3). Specifically, the forward sequence Uf of the time series is derived by Equation [4].
$${U}_f=\left({S}_k-E\left({S}_k\right)\right)/\sqrt{VAR\left({S}_k\right)}$$
(4)
Then, we reverse the sequence of the original time series X and term it Xr. An intermediate sequence Ufr is derived by applying Eq. (4) to the reversed time series Xr. We reverse the sequence of the values in Ufr (i.e., the first value appears the last, and vice versa). We generate the backward sequence Ub by adding a negative sign to the reversed values.
Step 3: Deriving change points
Lastly, we identify the change points of the time series X based on the two generated sequences (Uf and Ub). We first identify the initial set of the change points as the points of intersection between the two sequences. Previous studies show that it is uncertain to recognize all of these change points as abrupt changes, as a change point can be induced by a sudden shift of the mean value over two stable periods [17]. These outlier points could be reevaluated by using additional detection methods, such as the double mass curve [18]. To avoid miscounting the change points while making the proposed method more applicable, we employ a statistical filter—the points of intersection falling beyond the 95% confidence intervals (CIs), which correspond to Z-scores = ±1.96, are rejected. This filter has been used in relevant MKS studies [19]. It is worth noting that the MKS test can also identify the monotonic trend or the change of direction—if a point of intersection is between the Z-scores of 0 and 1.96, the change is upward; if the point is between the Z-scores of − 1.96 and 0, the change is downward.
Model implementation
In this section, we take the state of Virginia as an example to further elaborate on the model implementation. The MKS test can be implemented in Microsoft Excel by calling embedded functions. The datasets and codes are available on GitHub (https://github.com/peterbest52/mks).
Data cleaning
Daily confirmed cumulative COVID-19 case data between March 22, 2020 and January 31, 2021 (in a total of 45 weeks) were obtained from the USAFacts website (https://usafacts.org/data/). Then, we aggregated the data on a weekly basis, generating a 45-week time series for each state representing new weekly cases. Lastly, to demonstrate the method, we extracted the data for Virginia as the time series X.
MKS test
For time series X, we derived mi, the cumulative times that the case value of the current week is larger than that of each preceding week. Following this step, Sk was derived as the cumulative mi (i = 1, 2, …, k), according to Eq. (1); then, the mean value of Sk or E(Sk) and the variance of Sk or VAR(Sk) were derived by Eqs. (2) and (3), respectively. It is worth noting that, since k is the only independent variable in Eqs. (2) and (3), E(Sk) and VAR(Sk) are the same for all states in this study. Based on Eq. (4), we derived the forward sequence Uf for Virginia (solid line in Fig. 1).
Then, we reversed the time series X and derived Xr. We derived the intermediate sequence Ufr by applying Eq. (4) to Xr. Lastly, we derived the backward sequence Ub (dashed line in Fig. 1) by first reversing the sequence of values in Ufr and then adding a negative sign to these values.
Change point detection
The forward sequence (Uf) and the backward sequence (Ub) were plotted as the solid line and dashed line, respectively (Fig. 1). The points of intersection between the two sequences became the initial set of the change points. The thresholds of 95% CIs (Z-scores = ± 1.96) were set as the statistical filter. Only change points within the thresholds were retained. Specifically, in the case of Virginia, three points of intersection were initially detected. Week 4 (Point A in Fig. 1) and Week 43 (Point C in Fig. 1) were identified as the final change points with statistical confidence. Week 8 (Point B in Fig. 1) was excluded (Z-score = 2.72), as it fell beyond the thresholds. Since both Point A and Point C were between Z-scores of 0 and 1.96, these changes were upward.