Skip to main content

Comparative effectiveness research on patients with acute ischemic stroke using Markov decision processes



Several methodological issues with non-randomized comparative clinical studies have been raised, one of which is whether the methods used can adequately identify uncertainties that evolve dynamically with time in real-world systems. The objective of this study is to compare the effectiveness of different combinations of Traditional Chinese Medicine (TCM) treatments and combinations of TCM and Western medicine interventions in patients with acute ischemic stroke (AIS) by using Markov decision process (MDP) theory. MDP theory appears to be a promising new method for use in comparative effectiveness research.


The electronic health records (EHR) of patients with AIS hospitalized at the 2nd Affiliated Hospital of Guangzhou University of Chinese Medicine between May 2005 and July 2008 were collected. Each record was portioned into two "state-action-reward" stages divided by three time points: the first, third, and last day of hospital stay. We used the well-developed optimality technique in MDP theory with the finite horizon criterion to make the dynamic comparison of different treatment combinations.


A total of 1504 records with a primary diagnosis of AIS were identified. Only states with more than 10 (including 10) patients' information were included, which gave 960 records to be enrolled in the MDP model. Optimal combinations were obtained for 30 types of patient condition.


MDP theory makes it possible to dynamically compare the effectiveness of different combinations of treatments. However, the optimal interventions obtained by the MDP theory here require further validation in clinical practice. Further exploratory studies with MDP theory in other areas in which complex interventions are common would be worthwhile.

Peer Review reports


Comparative effectiveness research (CER) is a way of identifying what works for which patients under which circumstances [1]. CER is not a single entity, it can take many forms, including cohort studies, literature systematic reviews, observational studies, and randomized controlled trials (RCTs) [1, 2]. Non-randomized comparative clinical studies also play an important role in assessing the safety and effectiveness of medical interventions for routine practice. Recent attention to non-randomized comparative clinical studies in CER has focused on methodological issues [3, 4]. Experts realize that there are methodological challenges for non-randomized comparative clinical studies that cannot be ignored, especially with the increased requirements for data analysis driven by the demand for real-world evidence. These challenges include [4] dealing adequately with multiple therapies and possible outcomes; an extremely heterogeneous baseline in terms of patient characteristics and setting; and confounding in studies that use different kinds of health databases. Methodology researchers have made great progress in the development and application of statistical methods for the description and analysis of CER data [57]. Such methods include using propensity score analysis to adjust for group differences [8, 9], structural equation models and decomposition methods to identify how outcomes vary differentially with respect to patient characteristics and other factors for alternative treatment cohorts [10], and instrumental variable methods to address the problem of uncontrolled confounding [7, 1114]. However, the uncertainties in real-world systems that evolve dynamically with time have yet to be adequately identified.

Treatment with syndrome differentiation is considered the kernel of Traditional Chinese Medicine (TCM)[15], which means that therapeutic interventions are changed dynamically according to the variation of the state of the syndrome or disease over time. There is a general impression among Chinese medicine practitioners that treatments that change dynamically with syndrome differentiation and time are superior to those that remain unchanged. However, when TCM treatments are tailored to the individual patient, as is common practice, it is more difficult to assess their effectiveness than when they are applied to all patients in a standard manner in clinical studies. Methods that allow the researcher to model the uncertainties in real-world practice, and especially those that may dynamically change with time, are needed to describe TCM treatments and compare their effectiveness.

MDP theory is a versatile and powerful tool used to analyze sequential decision problems [16] with applications in many areas, such as natural science, engineering technology, and medical care, and it increase the utilization of medical resources and optimize methods of diagnosis or treatment. The MDP theory is also important for medical decision-making, such as the administration of medical devices, admission control in hospitals, decisions on operation timing, and the adjustment of treatment strategies [1723].

Syndrome differentiation and TCM treatments are very often interdependent and interleaved over time, principally due to uncertainty about the underlying disease, uncertainty associated with patient responses to certain treatments, and the likelihood of patient states varying within the period of treatment, such as from one pattern of TCM to another pattern. The introduction of MDP theory into CER on TCM makes dynamic comparison and evaluation possible. In this study, we show how MDP theory can be used to model integrative medicine treatments (the blending of the best of conventional medicine and complementary and alternative medicine) [24] for patients with acute ischemic stroke (AIS), and to provide an optimal solution from dynamic effectiveness comparisons in sequential clinical practice.


Data collection

The electronic health records (EHR) of patients with AIS hospitalized at the 2nd Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, China, were collected. The inclusion criteria for the records were a primary diagnosis of cerebral infarction and hospital admission within 14 days of the onset of stroke. Records of patients who had thrombolysis or had undergone early anticoagulation treatment were excluded.

All of the data were collected with an information acquisition form, one form for each record, that captured the general information of the patient, TCM and Western medicine diagnosis, all applied treatments with course detail, levels of neurological function defect on the first, third, and last day of hospitalization, and the results of brain imaging (i.e., computerized X-ray tomography or magnetic resonance imaging). This study was approved by the ethic committee of 2nd Affiliated Hospital of Guangzhou University of Chinese Medicine.

Description of patients' condition and the criterion to be optimized

To determine the key characteristics for describing the condition of patients with AIS and the criterion to be optimized by using MDP theory, an expert panel was formed that included scholars, physicians of Western medicine, TCM practitioners, and doctors in the field of integrative medicine (with an educational background in both Western medicine and TCM), and a half-day expert panel meeting was held.

Six key characteristics were selected based on the results of the panel meeting (see Additional file 1: Appendix 1): (i 1 ) age; (i 2 ) any disease history, such as diabetes, hypertension, coronary heart disease, abnormal blood liquid level, or auricular fibrillation; (i 3 ) any complication, such as pulmonary infection, urinary tract infection, or deep vein thrombosis; (i 4 ) TCM diagnosis; (i 5 ) TCM syndrome differentiation (TCM pattern); and (i 6 ) level of neurological function (with items for evaluation taken from the NIHSS [25] and assessment standard of neurological function impairment [26]). A score was used to describe the level of neurological function defect (see Additional file 2: Appendix 2). The total scores were in the range of 0-29, where a high score indicates poor function. Patients who were dead scored 29.

Duration of hospitalization for each patient was divided into two stages. Stage 1 ran from admission to the third day of hospital stay, and Stage 2 ran from the third day of hospital stay to discharge. This resulted in three time points for the state assessment: the first (timepoint 1, t1), third (timepoint 2, t2), and last day (timepoint 3, t3) of hospitalization. Each record was treated as two "state-action-reward" stages divided by the three timepoints. State refers to a patient's condition in terms of the six key characteristics; action represents the combination of treatments; and reward refers to the value of the differential between the scores for neurological function impairment [25, 26] before and after treatment (equal to the total score before treatment minus the score after treatment). According to the expert panel's advice, the total reward values for the two stages became the criteria to be optimized. In terms of the reward values, 0 represents no change in a patient's condition, values larger than 0 represent improvement in a patient's condition, and values lower than 0 mean deterioration. If the value is larger than 0, then the larger the value, the better the improvement in state. The action that maximizes the total reward value is regarded as the optimal action, that is, the optimal intervention combination for the corresponding state.

Description of interventions

Five circumstances were used to distinguish different treatment combinations (action) at each stage (see Additional file 3: Appendix 3): (a 1 ) whether to use antiplatelet and/or anticoagulant agents; (a 2 ) whether to use TCM treatments for replenishing qi and wen yang (Yi Qi Wen Yang); (a 3 ) whether to use TCM treatments for clearing heat and extinguishing wind (Qing Re Xi Feng); (a 4 ) whether to use TCM treatments for relaxing the bowels; and (a 5 ) whether to use herbal medicine.

Treatment strategies were carried out at the request of the physician in charge of the patient under the same theory of TCM [27]. Patients with a TCM diagnosis belonging to the Yin pattern were treated by "Yi Qi Wen Yang" treatments, and those with a TCM diagnosis belonging to the Yang pattern received "Qing Re Xi Feng" treatments. Herbal medicine was prescribed according to the current symptoms of the patient. If the patient was constipated, TCM treatments to relax the bowels were used. Aspirin or Clopidogrel was taken orally by each patient within 48 hours of hospital admission, except those who were allergic to or genuinely intolerant of these agents. Anticoagulant agents, including unfractionated heparin (UFH), low-molecular-weight heparin (LMWH), or warfarin were used if the patient had any of the following conditions: atrial fibrillation, serious artery angiostenosis, or advancing stroke. Any treatment might be changed at any time if the physician thought it necessary.

For patients with a history of hypertension, diabetes, or dyslipdemia, the agents that they had been taking before admission continued to be administrated during their hospital stay. However, these interventions were not included in the analysis, as they did not focus on stroke treatment.

Data management and analysis

All of the information acquisition forms were double entered with EpiData 3.1 (EpiData Association Odense, Denmark). The final dataset was converted into SPSS format. Missing data were replaced by the median of nearby points. Data were analyzed primarily with SPSS13.0 (SPSS, USA). The Markov decision processes (MDPs) were written in C language and compiled using Dev C++

Formulating an MDP model for the treatment of AIS

According to clinical experience and TCM theory, treatment decision-making depends on the current condition of patient, and the corresponding TCM/integrative medicine (i.e. the combination of practices and methods of alternative medicine with conventional medicine) therapies are described as non-stationary finite horizon MDPs, in which each state variable denotes the patient's condition at a certain time. The optimality problem is solved by maximizing the non-stationary finite horizon expected total utility. For finite horizon MDPs, the state space is a set of vectors consisting of all possible conditions for a patient, the set of available actions for a state is composed of treatments used for therapy for a given state, the transition probabilities in the MDPs are determined by the records of therapeutic effectiveness, and the corresponding utility function is evaluated based on the neurological functional impairment score related to the patient's condition and the effectiveness of treatment. Thus, the optimality problem is actually described as a non-stationary finite horizon expected total utility MDP model, and the optimality technique already developed for MDPs can be used to solve it efficiently [16].

Formulating a model for MDPs with finite horizon reward criteria

First, it is necessary to specify the condition of the patient, which is the information known by the physician. A state i in MDPs denotes the patient's condition. As described in former section, a patient's condition is evaluated based on an overall consideration of various factors, such as i 6 represents level of consciousness, visual field defects, and muscle power of the limbs, etc.. Thus, the state is denoted by a vector i = (i1, ..., in), where the state vector ik (k = 1, ..., n) corresponds to every aspect of the patient's condition and n is the dimension of the state vector. The state space is composed of all possible state vectors, that is, S = {i = (i1, ..., in) | ik{0, 1, ..., li}, k = 1, ..., n}, where li denotes the number of corresponding factors.

Second, a vector consisting of treatment combinations a = (a 1, ..., a m ) is regarded as action a available to the decision-maker. As explained in former "description of intervention" section, in the treatment of AIS, each component a i corresponds to a type of treatment used for therapy, and a i takes a value in {0, 1, ..., ji} (i = 1, ..., m). For example, in the case of whether to use antiplatelet agents or not, 0 denotes that an antiplatelet agent should not be used and 1 denotes that aspirin and/or clopidogrel should be chosen. Similarly, in the case of whether to use herbal medicine or not, 0 and 1 respectively denote that herbal medicine should not and should be used. A(i) denotes a set of all possible actions available to the controller when the state is at state iS. In other words, A(i) represents the set of all treatments available to the controller at state i.

Third, when a physician prescribes a type of treatment combination (action a) for a certain patient in state i, the corresponding effectiveness can be detected in state j of the patient at the next observable time point. Therapeutic effectiveness may differ when the same treatment combination is applied to different patients with the same condition. Thus, the dynamic evolution of the treatment process is specified using the so-called transition probability Pt(j|i,a), which means that Pt(j|i,a) denotes the probability that the state is j S at time t + 1 when action aA(i) is taken at state iS at time t. We use # (j, i, a) to denote the number of transfers from state i to the next state j under action a. For each state i, jS, and any given action aA(i), the transition probability is given by Equation (1).

p t ( j | i , a ) : = # ( j , i , a ) j S # ( j , i , a )

Fourth, the reward function ut(i, a), which depends on the current state i S, a chosen action aA(i), and decision epoch t, is expressed as

u t ( i , a ) = j S p t ( j | i , a ) u t ( j , i , a ) ,

where ut(j, i,a) denotes the reward value when the state of the treatment process is i at stage t, an action aA(i) is taken, and the treatment process results in state j at the next stage t + 1.

Finally, to complete the model, it is necessary to introduce the N-horizon expected total reward criterion. This needs to define a class of policies (i.e., all possible sequences of treatment combinations) admissible to the controller. A policy can be denoted as a sequence of functions π = {f1, f2, . . fN}, where ft (1 ≤ t ≤ N) acts on S and satisfies that ft(i)A(i) for all iS. Hence, function ft(i) is the treatment combination chosen at state i at stage t. Let Π be the set of all policies. For any given policy π and initial state i, J(π,i) denotes the corresponding expected total reward from the initial time to the end time N.

To that end, a model is specified for non-stationary MDPs with the N -horizon expected total reward criterion for the foregoing treatment processes:

{ S , ( A ( i ) , i S ) , p t ( j | i , a ) , u t ( i , a ) } ,

where the state space S, the available action set A(i) at state iS, the transition probability pt(j|i,a) with i, jS and aA(i), and the reward function ut(i,a), are as previously defined. To elucidate following arguments, some notation is introduced: For each fixed policy π = {f1, f2, . . fN}Π, a transition probability matrix P(t, π) is defined with the (i,j) element as pt(j|i, ft(i)).

For each πΠ and initial state iS, the N -horizon expected total reward to be maximized is denoted by

J ( π , i ) : = E i π t = 0 N - 1 u t ( i ( t ) , a ( t ) ) + u N ( i ( N ) ) ,

where E i π denotes the expectation operator determined by the given pt(j|i, ft(i)) and the initial state iS, i(t) and a(t) are the state and action variables at time t, and uN(i(N)) is the terminal reward associated with the state i(N)S; see [16] for details.

Finally, the corresponding optimal value function is defined as J*(i) = supπIIJ(π, i), iS. A policy π* in Π is said to be optimal if J(π*,i) = J*(i) for all iS.

Solutions to the optimality problem

For each πΠ, U t(π, i) denotes the corresponding expected total utility from time t to the end time N given state i t = j at time t, that is (by the well known Markov property),

U t ( π , j ) : = E i π n = t N - 1 u t ( i ( t ) , a ( t ) ) + u N ( i ( N ) ) | i ( t ) = j for t = N - 1 , . . . , 1
U N ( π , j ) : = u N ( j ) , j S .


J t ( i ) : = inf π Π U t ( π , i ) for t = N , . . . , 1

implies that J*(i) = U 1(i) = J1(i).

To find a method to obtain an optimal policy, by Theorem 4.3.3 (16) the following algorithm is used.

StepI: Set t = N and

J N ( i ) : = u N ( i ) for all i S

StepII: Substitute t-1 for t and compute Jt(i) by

J t ( i ) = max a A ( i ) u t ( i , a ) + j S P t ( j | i , a ) J t + 1 ( j ) for t = N - 1 , . . . , 1 .

Obtain ft*, which realizes the maximum in Eq. (9).

Step III: If t = 1, then stop. Otherwise return to StepII. The policy obtained π* = {f*1, ..., f*N-1} is optimal (by Theorem 4.3.3 in [16]) as the control model consists of finite state and action spaces.

Numerical implementation

All of the records from the patients with AIS were broadly classified into several groups according to the patient's condition (each of which is called a "state"), and the types of treatments were divided into two stages during which different treatment combinations were used. Information was collected to form Tables 1 and 2, which show patient condition and the corresponding treatment combination (i.e., "actions") at Stage 1 and Stage 2, respectively. Patient condition as assessed by the six key characteristics is listed in columns 2 through 7. The first column denotes the number of patients with the same condition, and columns 8 through 12 list the main treatments (sometimes more than one for each "state") used for AIS (the columns in Tables 1 and 2 have the same meaning but are for a different treatment stage.)

Table 1 The patients' conditions and treatments at Stage 1*
Table 2 The patients' conditions and treatments at Stage 2*

The elements of the MDP model can now be formulated. From Table 1 and Table 2, the state space can be expressed as S = {200111, 200112, ......, 311122, 311123}, and the corresponding sets of admissible actions are given as

A 200111 = {00001; 00101; 00111; 10001; 10101; 10111}

A (200112 = {10011; 10101; 10110; 10111; 11111; 10001; 11101} ...... The optimality problem is considered to be within a finite time horizon from stage 1 to stage 2. A terminal reward of 0 is assigned to all states. Based on Tables 1 and 2 and Eq (1), the transition probabilities pt(j|i, a) (t = 1, 2) are computed and listed in Additional file 4: Appendix 4 and Additional file 5: Appendix 5. From the neurological functional impairment scores in Tables 1 and 2 and Eq (2), the reward functions ut (i, a) (t = 1, 2) can be obtained by Eq (2), and are listed in Additional file 6: Appendix 6 and Additional file 7: Appendix 7.

Using the algorithm to solve the optimal problem, an optimal policy π* = {f*1, f2*} (corresponding to the optimal treatments) can be obtained as follows.

f *1 (200111) = {00001}, f *1 (200112) = {10101},......

f *2 (200111) = {00111}, f *2 (200112) = {10001},...... The optimal treatments with this optimal policy are shown in Table 3 and Table 4.

Table 3 Optimal combination of treatment at stage 1 (example)
Table 4 Optimal combination of treatment at stage 2 (example)


General information

A total of 1504 records with a primary diagnosis of AIS were identified for the period 1st May 2005 to 31th July 2008. Of these, 1337 met the inclusion criteria. Only states with more than 10 (including 10) patients' information were included, resulting in 960 records being enrolled in the MDP model representing 30 kinds of patient condition. Sixty-eight percent of records were from patients over 66 years old. A disease history was given for 74% of the 960 patients. Most of the records had fairly low scores for neurological function impairment, indicating that the severity of the patient's condition was minor to medium (see Table 5). The i 6 value for eight patients who were dead in stage 2 was 29 (the highest score for neurological functional impairment).

Table 5 General information of the patients at admission

There was 0 to 1.12% of missing data in i 1 to i 5 and 0.07 to 18.39% of data missing for i 6, of which 18.39% was on ataxia, 13.80% information on visual field defects, and 13.76% on sensory disturbance. Other missing data for i 6 were found in other indexes, such as level of consciousness, facial paralysis, muscle power of upper and lower limbs, aphasia, and dysarthria, with levels of missing data ranging from 0.07 to 7.11%. For a 1 to a 5 this figure was 0 to 0.37%. All of the missing data were replaced.

Optimal combination of treatments for corresponding states

By calculating and screening with the MDP theory, the optimal combinations of treatments for the 30 states (see Table 6 and Table 7) were obtained.

Table 6 Optimal combination of treatments for a variety of states at Stage 1
Table 7 Optimal combination of treatments for a variety of states at Stage 2

The results of six states (see Table 8 and Table 9) can be used as an example to show how these can be used to individually compare the effectiveness of treatments. The states in Table 8 represent patients who were older than 66 (i 1 = 3), had at least one kind of disease history (i 2 = 1), were without complications during their hospitalization (i 3 = 0), had Zhong Jing Luo (apoplexy involving channels or collaterals) (i 4 = 1) as the TCM diagnosis and a Yin TCM pattern (i 5 = 2). Different levels of neurological functional impairment (i 6) were detected, which meant that the severity of stroke varied among patients, as represented by State 10036, State 10037, and State 10038.

Table 8 Example of states within which patient's pattern of Chinese medicine was Yin
Table 9 Examples of States within which patient's pattern of Chinese medicine was Yang

At Stage 1, 122 patients were in State 10036, and received a combination of therapeutic intervention including TCM treatments to replenish qi and wen yang (Yi Qi Wen Yang), TCM treatments to relax the bowels, and herbal medicine (labeled as 01011). Each patient was given a score for neurological functional impairment to describe their i 6 level. Among patients in State 10036 at Stage 1, those who had been treated with a combination of a 2, a 4, and a 5 (labeled as action "01011" at Stage 1) got the highest Reward (valued as 1 unit, see Table 8) at t2 compared with other kinds of treatment combinations for patients in the same State.

One hundred and twenty-seven patients were in State 10036 at Stage 2, which implies that if the treatment combination labeled "01011" was maintained, then patients in this State at Stage 2 would obtain the highest reward (1 unit) at t3.

Similarly, for patients at Stage 1 in State 10037, who had a more severe clinical condition than those in State 10036, the results showed that if the action was "01011", then the reward value would be a maximum of 4 units. In contrast, for patients in State 10037 at Stage 2, an intervention with only herbal medicine (action labeled as "00001") resulted in the highest reward of 4 units. For patients in State 10038 at Stage 1, a "10001" action resulted in a reward of 6.28 units at t2, whereas the action "10001" at Stage 2 resulted in 4.67 units of reward at t3.

Patients in States 10031, 10032, and 10033 (see Table 9) all had a TCM pattern of Yang, whereas those in States 10036, 10037, and 10038 had a TCM pattern of Yin.

The results in the first line of Table 9 show that by combining TCM treatments for clearing heat and extinguishing wind (Qing Re Xi Feng) (labeled as a 3) with herbal medicine (labeled as a 5), the best reward value at Stage 1 for patients in state 10031 was 1 unit. At Stage 2, patients in the same state 10031 may have needed a treatment of antiplatelet agents (a 1) together with TCM treatments to relax the bowels (a 4), and a 5 to form the action known as "10011" to gain a maximum value reward. It seems that for State 10033, in which patients tendered to have a more severe clinical condition, the two actions that involved TCM therapeutic interventions achieved the best rewards.


Based on inpatient EHR, MDPs were applied to describe and analyze the dynamic process of different combinations of TCM treatments and/or integrated treatments of TCM and Western medicine for patients with AIS, and to determine the optimal treatment combination for each State by comparing the rewards gained from the corresponding actions. To the best of our knowledge, no similar topic has been previously addressed in the field of integrative medicine (IM) or in complementary and alternative medicine (CAM).

No medication has yet been confirmed to have neuroprotective effects in the management of patients with AIS [28]. Although antiplatelet agents can reduce the risk of mortality and morbidity when aspirin is administered within 48 hours after the onset of stroke, it cannot be used in up to 28% patients with aspirin "resistance" [29]. The management of patients with AIS with heparin carries an increased risk of bleeding complications [30]. The use of intravenous recombinant tissue plasminogen activators (rt-PA) in cerebral infarctions is associated with improved outcomes, but cannot be used as a routine therapy outside special units [31].

Several commonly used and government-approved traditional Chinese patent medicines (TCPMs), such as, Ginkgo biloba [32], milk vetch [33, 34], Mailuoning [35], Qingkailing [36], and Danshen [37] agents, have shown promising effects for ischemic stroke. However, no definite conclusions can be drawn from studies of these agents due to a general lack of reporting on methodology [30, 3840]. Properly designed clinical research to study the role of traditional medicine in ischemic stroke is warranted, but a number of issues must be addressed in the design of such studies first [41]. One of these issues is complex interventions involving varying dosages and interactions. Randomized controlled trials (RCTs) are a possible approach to evaluating complex interventions as a whole compared with an appropriate alternative [42], but cannot separate the benefits of different combinations of components. The multi-component structure of treatments is closer to real world practice, especially in therapy for stroke with complex dynamics from onset through progression [43]. Moreover, the model of applying a treatment and conducting it without any change through the whole course of acute stroke is inconsistent with the basic theory of TCM whereby treatment is altered according to syndrome differentiation [15, 44].

The results of this study indicate that the new method of MDPs may prove useful for comparative effectiveness research (CER). MDPs can be applied to dynamically compare the effectiveness of various combinations of complex treatments, and may be able to overcome the uncertainties related to individual patients' responses to certain combination of treatments and the uncertainties concerning dynamic changes in treatment for certain patients over the course of disease [2123, 45].

Past research implies that herbal medicine may possess neuroprotective properties [46, 47], protect against ischemic reperfusion injury [48, 49], reduce edema in the brain [48], improve cerebral microcirculation [33, 47], and inhibit apoptosis [50]. Such properties may partly explain the effectiveness of the combinations of treatments identified in this research.

This study has several limitations. First, all of the data were taken from EHR, and missing data are inevitable. The amount of missing data was less than 1.12% in most categories, although 18.39% of missing data was detected in i 6. As i 6 is a key variable in describing the rewards of actions, the results should be interpreted cautiously because of the possible bias caused by the replacement of missing data. In addition, due to too much variety, different components of herbal medicine were classified as one action. As a result, the effectiveness of different prescriptions of herbal medicine is not comparable. Another limitation is that each patient's record was divided into two stages according to three time points, with each episode being regarded as an independent sample when modeled by MDPs. This is consistent with the Markov property of non-after effect according to the basic theory of MDPs, but it may, to a certain extent, ignore potential correlations between episodes obtained from the same patient at different stages. Finally, although the key characteristics representing the patient states were based on the results of an expert panel meeting, the states of patients with acute ischemic stroke are variable, and it is likely that some characteristics that might be important for certain patients were missed.


MDPs can be used as a new method for comparative effectiveness research on TCM. This new approach makes it possible to compare the effectiveness of certain combinations of treatments dynamically by considering state, action, and reward simultaneously. The method can be applied to optimize medical intervention combinations and to support clinical decision-making. However, the optimal interventions obtained by the MDPs in this study require further validation in clinical practice. The results from the MDP model should be interpreted with caution both due to the property of the MDPs themselves and because of possible bias that may have been generated either from the data collection or the data management. Further exploratory studies with MDPs in other areas in which complex interventions involving TCM, Western medicine, or a combination of both are common would be worthwhile.


  1. 1.

    IOM: Initial National Priorities for Comparative Effectiveness Research [cited 2011, March 1]. []

  2. 2.

    Concato J, Peduzzi P, Huang GD, O'Leary TJ, Kupersmith J: Comparative effectiveness research: what kind of studies do we need?. J Investig Med. 2010, 5 (8): 764-769.

    Article  Google Scholar 

  3. 3.

    Avorn J: Debate about funding comparative-effectiveness research. N Engl J Med. 2009, 360 (19): 1927-1929. 10.1056/NEJMp0902427.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Lohr KN: Comparative effectiveness research methods: symposium overview and summary. Med Care. 2010, 48 (6 suppl): S3-S6.

    Article  PubMed  Google Scholar 

  5. 5.

    Crown WHO, Obenchain RL, Englehart L, Lair T, Buesching DP, Croghan T: The application of sample selection models to outcomes research: the case of evaluating the effects of antidepressant therapy on resource utilization. Stat Med. 1998, 17 (17): 1943-1958. 10.1002/(SICI)1097-0258(19980915)17:17<1943::AID-SIM885>3.0.CO;2-0.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Hadley J, Polsky D, Mandelblatt JS, Mitchell JM, Weeks JC, Wang Q, et al: An exploratory instrumental variable analysis of the outcomes of localized breast cancer treatments in a medicare population. Health Econ. 2003, 12 (3): 171-186. 10.1002/hec.710.

    Article  PubMed  Google Scholar 

  7. 7.

    Brookhart MA, Rassen JA, Schneeweiss S: Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol Drug Saf. 2010, 19 (6): 537-554. 10.1002/pds.1908.

    Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Mojtabai R, Zivin JG: Effectiveness and cost-effectiveness of four treatment modalities for substance disorders: a propensity score analysis. Health Serv Res. 2003, 38: 233-259. 10.1111/1475-6773.00114.

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Baker SG, Lindeman KS, Kramer BS: The paired availability design for historical controls. BMC Med Res Methodol. 2001, 1: 9-10.1186/1471-2288-1-9.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Crown WH: There's a reason they call them dummy variables: A note on the use of structural equation techniques in comparative effectiveness research. PharmacoEconomics. 2010, 28 (10): 947-955. 10.2165/11537750-000000000-00000.

    Article  PubMed  Google Scholar 

  11. 11.

    Bennett DA: An introduction to instrumental variables analysis: part 1. Neuroepidemiology. 2010, 35 (3): 237-240. 10.1159/000319455.

    Article  PubMed  Google Scholar 

  12. 12.

    Bennett DA: An introduction to instrumental variables-part 2: Mendelian randomisation. Neuroepidemiology. 2010, 35 (4): 307-310. 10.1159/000321179.

    Article  PubMed  Google Scholar 

  13. 13.

    Greenland S: An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000, 29 (4): 722-729. 10.1093/ije/29.4.722.

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Martens EP, Pestman WR, de Boer A, Belitser SV, Klungel OH: Instrumental variables: application and limitations. Epidemiology. 2006, 17 (3): 260-267. 10.1097/01.ede.0000215160.88317.cb.

    Article  PubMed  Google Scholar 

  15. 15.

    Deng TT: Syndrome Differentiation and Treatment: a essence of TCM. Tradit Chin Med J. 2005, 4 (1): 1-4. Chinese

    Google Scholar 

  16. 16.

    Puterman ML: Markov decision processes: discrete stochastic dynamic programming. 1994, New York: Wiley, P74-P93.

    Google Scholar 

  17. 17.

    Sloan TW: Safety-cost trade-offs in medical device reuse: a Markov decision process model. Health Care Manag Sci. 2007, 10 (1): 81-93. 10.1007/s10729-006-9007-2.

    Article  PubMed  Google Scholar 

  18. 18.

    Nunes LG, de Carvalho SV, Rodrigues Rde C: Markov decision process applied to the control of hospital elective admissions. Artif Intell Med. 2009, 47 (2): 159-171. 10.1016/j.artmed.2009.07.003.

    Article  PubMed  Google Scholar 

  19. 19.

    Magni P, Quaglini S, Marchetti M, Barosi G: Deciding when to intervene: a Markov decision process approach. Int J Med Inform. 2000, 60 (3): 237-253. 10.1016/S1386-5056(00)00099-X.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Kim M, Ghate A, Phillips MH: A Markov decision process approach to temporal modulation of dose fractions in radiation therapy planning. Phys Med Biol. 2009, 54 (14): 4455-4476. 10.1088/0031-9155/54/14/007.

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Hauskrecht M, Fraser H: Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artif Intell Med. 2000, 18 (3): 221-244. 10.1016/S0933-3657(99)00042-1.

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Saucedo VM, Karim MN: Experimental optimization of a real time fed-batch fermentation process using Markov decision process. Biotechnol Bioeng. 1997, 55 (2): 317-327. 10.1002/(SICI)1097-0290(19970720)55:2<317::AID-BIT9>3.0.CO;2-L.

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Hauskrecht M, Fraser H: Modeling treatment of ischemic heart disease with partially observable Markov decision processes. Proc AMIA Symp. 1998, 538-542.

    Google Scholar 

  24. 24.

    Bell IR, Caspi O, Schwartz GE, Grant KL, Gaudet TW, Rychener D, et al: Integrative medicine and systemic outcomes research: issues in the emergence of a new model for primary health care. Arch Intern Med. 2002, 162 (2): 133-140. 10.1001/archinte.162.2.133.

    Article  PubMed  Google Scholar 

  25. 25.

    NIH Stroke Scale(Rev 10/1/2003). The internet stroke center. [cited 2011, March 1]. []

  26. 26.

    The Forth National Conference of Cerebrovascular Disease: The standard assessment of Clinical Neurological Functional Impairment on patients with stroke(1995). Chin J Neural. 1996, 29: 381-383. Chinese

    Google Scholar 

  27. 27.

    Mou XL, Huang Y: Application of Yin and Yang syndrome differentiation method in Triditional Chinese Medcine syndrome differentiation on patients with stoke. J Guangzhou Univ Tradit Chin Med. 2009, 26 (1): 80-82. Chinese

    Google Scholar 

  28. 28.

    Adams HP, del Zoppo G, Alberts MJ, Bhatt DL, Brass L, Furlan A, et al: Guidelines for the early management of adults with ischemic stroke: a guideline from the american heart association/american stroke association stroke council, clinical cardiology council, cardiovascular radiology and intervention council, and the atherosclerotic peripheral vascular disease and quality of care outcomes in research interdisciplinary working groups: the american academy of neurology affirms the value of this guideline as an educational tool for neurologists. Stroke. 2007, 38 (5): 1655-1711. 10.1161/STROKEAHA.107.181486.

    Article  PubMed  Google Scholar 

  29. 29.

    Krasopoulos G, Brister SJ, Beattie WS, Buchanan MR: Aspirin "resistance" and risk of cardiovascular morbidity: systematic review and meta-analysis. BMJ. 2008, 336 (7637): 195-198. 10.1136/bmj.39430.529549.BE.

    Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Tan Y, Liu M, Wu B: Puerarin for acute ischaemic stroke. Cochrane Database Syst Rev. 2008, 23 (1): CD004955-

    Google Scholar 

  31. 31.

    Ihlen H, Ditlefsen L: Procainamide in acute myocardial infarction: a study on two different tablet preparations of sustained release type. Curr Ther Res Clin Exp. 1975, 18 (5): 720-726.

    CAS  PubMed  Google Scholar 

  32. 32.

    Liu J: The use of Ginkgo biloba extract in acute ischemic stroke. Explore (NY). 2006, 2 (3): 262-263. 10.1016/j.explore.2006.03.012.

    Article  Google Scholar 

  33. 33.

    Tang Q: Milk vetch for cerebral infarction. J Jiangsu University (Medicine edition). 2003, 13 (4): 366-367. Chinese

    Google Scholar 

  34. 34.

    Zhang Y, Liu JL, Li F: Milk vetch and Ligustrazine for ischemic stroke. Chin J Info Traditional Chin Med. 2003, 10 (7): 53-Chinese

    Google Scholar 

  35. 35.

    Chen JH, Guo HB: Mailuoning and Naofukang for cerebral infarction. Henan Med Info. 2002, 10 (12): 59-60. Chinese

    Google Scholar 

  36. 36.

    Yu BR, Liao YX: Qingkailing for cerebral infarction. Chin J Rehabil. 1999, 14 (2): 102-103. Chinese

    Google Scholar 

  37. 37.

    Geng ZB, Yao JY: Compound Dan Shen for acute ischemic stroke. Res Traditional Chin Med. 2000, 16 (4): 30-31. Chinese

    Google Scholar 

  38. 38.

    Zeng X, Liu M, Yang Y, Li Y, Asplund K: Ginkgo biloba for acute ischaemic stroke. Cochrane Database Syst Rev. 2005, 19 (4): CD003691-

    Google Scholar 

  39. 39.

    Wu T, Ni J, Wu J: Danshen (Chinese medicinal herb) preparations for acute myocardial infarction. Cochrane Database Syst Rev. 2008, 16 (2): CD004465-

    Google Scholar 

  40. 40.

    Wu B, Liu M, Liu H, Li W, Tan S, Zhang S, et al: Meta-analysis of traditional Chinese patent medicine for ischemic stroke. Stroke. 2007, 38 (6): 1973-1979. 10.1161/STROKEAHA.106.473165.

    Article  PubMed  Google Scholar 

  41. 41.

    Feigin VL: Herbal medicine in stroke: does it have a future?. Stroke. 2007, 38 (6): 1734-1736. 10.1161/STROKEAHA.107.487132.

    Article  PubMed  Google Scholar 

  42. 42.

    Campbell M, Fitzpatrick R, Haines A, Kinmonth AL, Sandercock P, et al: Framework for design and evaluation of complex interventions to improve health. BMJ. 2000, 321 (7262): 694-696. 10.1136/bmj.321.7262.694.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Krakauer JW: The complex dynamics of stroke onset and progression. Curr Opin Neurol. 2007, 20 (1): 47-50. 10.1097/WCO.0b013e328013f86b.

    Article  PubMed  Google Scholar 

  44. 44.

    Wang YY: The proposal for improving the methodological system of Syndrome Differentiation of Traditional Chinese Medicine. J Tradit Chin Med. 2004, 45 (10): 729-931. Chinese

    Google Scholar 

  45. 45.

    Alagoz O, Hsu H, Schaefer AJ, Roberts MS: Markov decision processes: a tool for sequential decision making under uncertainty. Med Decis Making. 2010, 30 (4): 474-483. 10.1177/0272989X09353194.

    Article  PubMed  Google Scholar 

  46. 46.

    Kim H: Neuroprotective herbs for stroke therapy in traditional eastern medicine. Neurol Res. 2005, 27 (3): 287-301. 10.1179/016164105X25234.

    Article  PubMed  Google Scholar 

  47. 47.

    Gong X, Sucher NJ: Stroke therapy in traditional Chinese medicine (TCM): prospects for drug discovery and development. Phytomedicine. 2002, 9 (5): 478-484. 10.1078/09447110260571760.

    CAS  Article  PubMed  Google Scholar 

  48. 48.

    Wang NL, Liou YL, Lin MT, Lin CL, Chang CK: Chinese herbal medicine, Shengmai San, is effective for improving circulatory shock and oxidative damage in the brain during heatstroke. J Pharmacol Sci. 2005, 97 (2): 253-265. 10.1254/jphs.FP0040793.

    CAS  Article  PubMed  Google Scholar 

  49. 49.

    Lee IY, Lee CC, Chang CK, Chien CH, Lin MT: Sheng mai san, a Chinese herbal medicine, protects against renal ischaemic injury during heat stroke in the rat. Clin Exp Pharmacol Physiol. 2005, 32 (9): 742-748. 10.1111/j.1440-1681.2005.04259.x.

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Bei W, Peng W, Ma Y, Xu A: NaoXinQing, an anti-stroke herbal medicine, reduces hydrogen peroxide-induced injury in NG108-15 cells. Neurosci Lett. 2004, 363 (3): 262-265. 10.1016/j.neulet.2004.04.031.

    CAS  Article  PubMed  Google Scholar 

Pre-publication history

  1. The pre-publication history for this paper can be accessed here:

Download references


This work was supported by the Scientific Research Project of Public Welfare Industry, State Administration of Traditional Chinese Medicine of P. R. of China (No. 200707004); the Finance Department of Guangdong Province (No. [2006]143); National Natural Science Foundation of China (NSFC) and Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (GDUPS, 2011).

Author information



Corresponding author

Correspondence to Xianping Guo.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

DRW, XPG: Study design, analysis and interpretation, drafts and revision of article, and final approval for submission. YFC, JXC, YQZ, MZ: Study design, acquisition of data and clean up the data, revision of article, and final approval for submission. QLL, JHC, YHH, LEY: Study design, analysis the data, drafts and revision of article, and final approval for submission. YBL: Study design, drafts and revision of article, and final approval for submission. All authors read and approved the final manuscript.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wu, D., Cai, Y., Cai, J. et al. Comparative effectiveness research on patients with acute ischemic stroke using Markov decision processes. BMC Med Res Methodol 12, 23 (2012).

Download citation


  • Markov decision processes
  • Acute ischemic stoke
  • Comparative effectiveness research
  • Traditional Chinese Medicine/integrative medicine