Quality of maternal obstetric and neonatal care in low-income countries: development of a composite index

Background In low-income countries, studies demonstrate greater access and utilization of maternal and neonatal health services, yet mortality rates remain high with poor quality increasingly scrutinized as a potential point of failure in achieving expected goals. Comprehensive measures reflecting the multi-dimensional nature of quality of care could prove useful to quality improvement. However, existing tools often lack a systematic approach reflecting all aspects of quality considered relevant to maternal and newborn care. We aim to address this gap by illustrating the development of a composite index using a step-wise approach to evaluate the quality of maternal obstetric and neonatal healthcare in low-income countries. Methods The following steps were employed in creating a composite index: 1) developing a theoretical framework; 2) metric selection; 3) imputation of missing data; 4) initial data analysis 5) normalization 6) weighting and aggregating; 7) uncertainty and sensitivity analysis of resulting composite score; 8) and deconstruction of the index into its components. Based on this approach, we developed a base composite index and tested alternatives by altering the decisions taken at different stages of the construction process to account for missing values, normalization, and aggregation. The resulting single composite scores representing overall maternal obstetric and neonatal healthcare quality were used to create facility rankings and further disaggregated into sub-composites of quality of care. Results The resulting composite scores varied considerably in absolute values and ranges based on method choice. However, the respective coefficients produced by the Spearman rank correlations comparing facility rankings by method choice showed a high degree of correlation. Differences in method of aggregation had the greatest amount of variation in facility rankings compared to the base case. Z-score standardization most closely aligned with the base case, but limited comparability at disaggregated levels. Conclusions This paper illustrates development of a composite index reflecting the multi-dimensional nature of maternal obstetric and neonatal healthcare. We employ a step-wise process applicable to a wide range of obstetric quality of care assessment programs in low-income countries which is adaptable to setting and context. In exploring alternative approaches, certain decisions influencing the interpretation of a given index are highlighted. Electronic supplementary material The online version of this article (10.1186/s12874-019-0790-0) contains supplementary material, which is available to authorized users.


Background
Over the last two decades, there has been significant progress in reducing the number of maternal deaths globally, with a 45% decrease in the maternal mortality ratio (MMR) from 1990 to 2013. Despite multiple interventions to improve both maternal and neonatal healthcare services in low-income countries, great disparities remain between high and low-income countries with an average lifetime maternal mortality of 1 in 38 compared to 1 in 3700 respectively [1]. The disparity persists in relation to neonatal deaths with 99% of 2 million annual neonatal deaths occurring in low and middle-income countries [2,3]. The majority of maternal and neonatal deaths occur during the intrapartum and immediate postpartum periods with obstetric hemorrhage as the primary cause [3][4][5]. While studies demonstrate greater access and utilization of maternal and neonatal health services in low-income countries (LIC), mortality rates remain high with poor quality increasingly scrutinized as the potential point of failure in achieving expected goals [6][7][8][9]. Thus, the evaluation of the quality of obstetric care, especially in LIC, has garnered increasing attention [10].
Defining appropriate measurements to assess quality can be challenging due to the multi-dimensional nature of quality [11]. Although attempts have been made, there remains a lack of consensus on appropriate measurements and data sources to be used in low-income countries [6]. The vast number and complexity of existing quality indicators, while useful for monitoring specific clinical settings, can have limited utility in comprehensive monitoring due to the amount of information needed to be processed [12]. One way to simplify is through composite indicators or indices, which combine individual indicators into a single index reflecting a more complex underlying concept (e.g. quality of care) [11,12]. Composite indices allow various perspectives to be reflected simultaneously and thus facilitate the comparison of quality performance between facilities or health systems over time [11,13]. The resulting composite scores can be easily communicated to diverse stakeholders such as providers, health managers, or purchasers of healthcare [12].
In respect to obstetric care, individual quality indicators or simple indices are commonly used in tracking progress along specific sub-components of care. However, long lists of indicators can fall short on providing condensed information on quality that more easily facilitate comparisons across facilities, health programs, or countries [14]. As quality improvements in one specific area of care do not necessarily correlate to improvements in other areas, composite indices resulting in a single score can account for a multitude of relevant indicators across multiple quality dimensions [12,15]. However, to ensure a composite index accurately reflects a multidimensional concept, it is prudent to adhere to a step-wise approach based on transparent methodologies and statistical consistency, allowing for replication and application across stakeholders and environments [11,14].
There is a wide range of approaches and applications related to evaluation of obstetric and neonatal quality of care in LIC settings. In the context of quality improvement program evaluations, process and input indicators are commonly reported individually, especially if sub-components of obstetric care are being assessed [13,16,17]. Otherwise, these individual indicators are summarized into simple indices each representing a sub-component of obstetric care (e.g. infection prevention, third stage labor management, respectful care, etc.) [18][19][20]. A few evaluation studies apply more complex statistical approaches, such as standardization of scores or principal component analysis, in generating quality indices but are limited in scope [21,22]. In contexts where quality of care indicators are used as an integral part of the intervention, such as in performance-based financing, indicator weights are common means to reflect the relative contributions of single quality aspects in the resulting quality score [23,24]. Given the complexity of obstetric care, evaluations would need to rely on a range of data sources to accurately reflect the multi-dimensionality of quality of care. Some standardized approaches (e.g. Bologna score) collect data with only patient exit interviews or other single data sources, which cannot capture all relevant dimensions of quality required for a robust comprehensive evaluation [25,26]. In recent program evaluations [19,27], quality of care has been assessed in a more multifaceted way using facility inventories, interviews, and structured observation checklists, but a standard approach in combining the indicators into one meaningful composite index has not yet emerged.
There appears to be a lack of composite indices of obstetric quality of care that can be easily applied to LICs demonstrating a multidimensional concept while following a transparent process. To address this gap, we employ an existing conceptual framework reflecting the measurement of quality of care in order to develop an index resulting in composite scores, which can then be used to compare obstetric and neonatal quality of care among facilities. This article attempts to illustrate the step-wise development of a composite index based on current standards of construction with the goal to produce a single score reflecting the multidimensional aspect of maternal obstetric and neonatal quality of care. Using a systematic approach starting from a set of quality of care indicators to form different composite indices, we further demonstrate how various methodological approaches affect the resulting score.

Data sources
To illustrate the development of this obstetric quality of care score, we used data taken from the baseline assessment of the Results Based Financing for Maternal and Newborn Health (RBF4MNH) program in Malawi [28]. This evaluation included a sample of 33 Emergency Obstetric Care (EmOC) facilities (five hospitals, 18 health centers) offering obstetric and newborn care services located in four districts: Balaka, Dedza, Mchinji, Ntcheu. Baseline data was collected in 2013 prior to the start of the implementation of RBF4MNH and included four different data collection tools: a facility inventory, structured patient-provider observations, a structured interview with health workers, and a structured exit interview with women who recently delivered at the facility. All data was collected by trained research assistants. The facility inventory assessed the availability of equipment, essential medications, guidelines, emergency transportation and human resources. The providerpatient sample consisted of a total of 82 direct observations of uncomplicated delivery cases and assessed birth attendants' adherence to clinical guidelines during routine obstetric care. Interviews were conducted with a total of 81 midwives and midwifery nurses, assessing health worker satisfaction in the work place and their experiences with supervision and training. The exit interview sample consisted of 204 women who delivered at these facilities; interviews assessed women's experience receiving obstetric care at the facility and their perceptions of the quality of care received.

Composite index development approach
We employed the step-wise approach outlined by the Organization of Economic Cooperation and Development (OECD) guidelines for composite index development [12]. Although developed for high-income countries, the identified standards are fully applicable to the context of LICs. The OECD guidelines includes the following steps with slight modifications: 1) developing a theoretical framework; 2) metric selection; 3) imputation of missing data; 4) initial data analysis 5) normalization 6) weighting and aggregating of selected variables; 7) uncertainty and sensitivity analysis of resulting composite score; 8) and deconstruction of score into its components [12,14].
Based on this approach, we developed a base composite index resulting in a composite score for each facility and tested alternatives by altering the decisions taken at different stages of the construction process [12,14]. Table 1 provides an overview of different approaches at each step to further illustrate the base and alternative index scenarios taken to formulate the composite scores.

Conceptual framework
The conceptual framework, which provided the basis of choosing single indicators to contribute to the composite index, was slightly modified from a multidimensional matrix measuring quality of care first introduced by Maxwell [29] and later refined by Profit et al. [14] (See Table 2). We consider this matrix ideal for the purpose of measuring quality of care as it incorporates two complementary approaches of measuring quality of care. This results in a quality matrix which sufficiently reflects the dynamic process of healthcare delivery [14,31]. The matrix includes the six key dimensions of quality of care as initially outlined by the Institute of Medicine (IOM) [32] and subsequently adapted by the World Health Organization (WHO): effective, efficient, accessible, acceptable/patient-centered, equitable, and safe [30]. These are complemented by the three quality of care elements  [33]. We felt that the definition of the WHO dimensions correlated best with the contextual environment of LICs with the aspect of timeliness included under the WHO quality dimension of accessibility, which also considers that healthcare services need to occur in a setting that is equipped with adequate resources to meet the needs of the community.

Metric selection
Guided by this conceptual framework, the indicator selection process was based on a literature review focused on obstetric and neonatal care quality indicators. The starting point was the recent WHO publication on "Standards for Improving Quality of Maternal and Newborn Care in Health Facilities" [34] with a set of quality of care indicators identified through literature review, expert consultations, and a consensus-building Delphi process representing 116 maternal health experts in 46 countries. We further examined additional sources of maternal and neonatal quality of care indicators [4,[35][36][37][38][39][40][41][42][43][44][45] to identify any further indicators that had not been specified in the WHO document. Using multiple sources in combination with the WHO document, we identified an initial set of indicators most relevant to obstetric and neonatal care quality. Starting with this indicator selection, the content and definition of each indicator was reviewed with duplicated indicators removed or redundant indicators combined (e.g. adequate supervision available vs. number of supervisory visits). We mapped the resulting indicators by assigning them to the cells provided by the conceptual quality of care matrix (Table 2). Generally, there was little to no overlap in assigning indicators to single matrix cells. In situations where an indicator could be assigned to more than one cell, consensus between co-authors was sought for the most appropriate indicator assignment given both the dimension definition and content suggested by the reviewed literature. For example, "availability of clean water" could conceptually fall under "accessible" to represent access to water or "safe" to highlight the importance of clean water. Ultimately, the indicator was assigned to the safe dimension to represent "sanitation and hygiene". For the following steps we transition from the literature to the existing data from Malawi as described above.

Imputation of missing data
Generally, data quality in terms of completeness was high. Most missing values were due to certain data collection tools not being applied at certain facilities. As our aim was to develop a composite score including information from each of the different data sources, we included only facilities where all four data collection tools were actually applied resulting in a final sample of 26 facilities out of a total 33 EmOC facilities. The vast majority of missing values occurred in variables stemming from direct observations where observers were asked to enter "1" if they observed a certain task and "0" if they did not, in the course of the observation. Supervision and debriefings during data collection revealed that the latter tended to be an issue, with observers not being aware of the implications of not entering zeros for nonobserved behavior at the end of the observation. We are, therefore, highly confident that missing values on these variables actually reflect non-observation of behavior and replaced missing values with "0" accordingly. We are further confident that the small remaining number of missing values can be assumed to be missing at random and were replaced with the respective sample mode (or sample mean for the one continuous variable with missing values). Due to the nature of how the data was collected and the missing values, multiple imputation would not have been appropriate as an alternative [46]. Therefore, we searched for a proxy variable that would be a close substitute for the missing data in the original variable [47]. When it was not possible to identify an appropriate proxy variable, we used the mode for binary variables as an alternative for missing data in the direct observations. In addition, we provided a second alternative by coding the missing values in the direct observations as "1" thus providing a full range of possible outcomes (detailed information on missing data and results of using alternative missing imputation methods is provided in the Additional file 1).

Initial data analysis
As the composite index was intended to be calculated at facility level, we aggregated the data from individual-level data collection tools which measured information at the individual to the facility level. We did this by averaging data across all individual-level observations (i.e. cases, interviews) for each variable and facility. This resulted in scores between 0 and 1. For reasons of simplicity, these proportions were then retransformed into binary variables using a 0.5 cut-off (i.e. "0" for less 0.5, "1" for 0.5 or greater). The few continuous variables were averaged. This resulted in one observation for each variable and each facility, which was necessary to combine the data sets.
In the following step, we matched the variables contained in the available datasets with the mapped matrix indicators. Once matched, we analyzed the variables contained in each matrix cell for internal consistency by correlating each variable pair within the cells. Variables of a given cell with correlations > 0.7 were re-evaluated and were merged into one single variable, in cases where the variables measured approximately the same quality construct and were consistent with the conceptual framework.

Indicators within cells: normalization, weighting and aggregation
Due to the necessity of a uniform scale for aggregation, normalization of the indicator values is required when different units of scale exist [12,14]. As the vast majority of variables was binary, in the base case, we transformed the couple remaining non-binary variables using cut-off values supported by standards reported in the literature. To define the number of skilled birth attendants per facility, a cut-off value of at least 3 was used based on the literature and requirements of the program [48]. For the other continuous variable, time from arrival to contact with the provider, we used the median time of 20 min as our cut-off value. The remaining variables were ordinal variables with the median used as a cut-off value. For Alternative A (Table 1), we rescaled the few non-binary variables to a range of values between 0 and 1 (see below).
To identify weights, we considered data-driven methods (e.g. principal component analysis) relatively inappropriate given our variables mainly represented measures of adherence to universally established quality of care standards [12]. Thus, statistically derived weights may have assigned more importance to readily measurable or easily achievable input or process measures relevant to the observed context, but independent of the defined standards, making comparability across settings difficult. Therefore, we identified weights using expert ratings identified by the WHO Delphi study [34]. However, as these indicator ratings varied only minimally and thus did not sufficiently support a clear weighting pattern for indicators identified by the matrix, we applied equal weights. Additional publications on quality of care indicator weights almost uniformly suggested the use of equal weights. [12,49,50].
For the base case scenario, the indicators within each matrix cell were then aggregated using an additive approach, meaning that the values for each indicator within a cell were added together to reflect a raw sum with the maximum sum (i.e. cell score) varying between cells depending on the total number of indicators within a given cell. In the respective Alternative B (Table 1), we used geometric instead of additive aggregation (see below).

Cells within matrix: normalization, weighting, and aggregation
In a next step, we further combined the cell scores into a single composite score. As the maximum cell scores (ranging from 6 to 19) differed depending on the number of indicators identified for each cell, we rescaled each cell score based a range from 0 to 1, except in Alternative C where Z-score standardization was used to rescale [12]. Rescaling the cell scores ensured each cell contributes equally to the overall composite score. These rescaled scores were subsequently aggregated, using equal weights to obtain an overall composite score ranging from zero to twelve. In the respective Alternative D (Table 1), we replaced the additive aggregation of cell scores with geometric aggregation (see below).

Uncertainty and sensitivity analysis
A number of uncertainties based on decisions, such as normalization and aggregation methods, taken at various steps can influence the outcome of a composite score. Therefore, we calculated the outcomes with theoretically equally valid but different decisions to evaluate for a practically relevant difference [51]. Given these many steps and decisions taken in response to the underlying data, we further explored possible uncertainties introduced by not opting for an alternative approach at a given step [51]. Therefore, we created a set of alternative composite indices that differed in one decision step and compared these to our base composite index. The four alternative approaches are as follows (see also Table 1): Alternative A Instead of transforming non-binary variables (ordinal or continuous) into a binary form, we re-scaled them to a range between 0 and 1. This alternative approach could increase distortion by extreme values, but at the same time widens the contribution of variables that have a narrow range of values across the sample, thus better reflecting the actually measured information and underlying variance of these variables [12].
Alternative B Geometric aggregation (i.e. multiplying indicator values) of indicators to obtain a cell score, rather than arithmetic aggregation. With this alternative, "0" values in single indicators can no longer be compensated by the remaining indicators, which would have a larger effect on the outcome in the case of binary measurements.
Alternative C Standardization using Z-scores in each cell to achieve normalization, which converted cell score values to a normally distributed scale with a mean of 0 and a standard deviation of 1. Standardization of cell indicators with extreme values will have a greater effect in the resulting composite score [12].
Alternative D Geometric aggregation to combine cell score into a composite score, decreasing the extent of compensation of low cell score values by high values.
Our sensitivity analysis consisted of a descriptive comparison of the scores and applying ranks to each studied facility using the base and alternative scores. Robustness of facility ranking using the base index compared to the alternative indices was determined using Spearman rank correlation.

Deconstruction
We deconstructed the base and alternative composite scores by evaluating each cell within the matrix, comparing sample means and confidence intervals (95% confidence intervals, +/− 2 standard deviations) using base and alternative scores. Furthermore, we applied the same methods to evaluate the elements of structure, process, and outcome.

Results
We begin with presenting the results of the literature review exercise in choosing indicators, followed by the empirical results of the data analysis.

Indicator selection
From the reviewed literature, we initially identified 271 possible indicators representing the quality of obstetric and neonatal care (Additional file 2). Mapping across our conceptual matrix, at least one identified indicator covered each of the quality of care elements and dimensions. When matching our available data from Malawi to this literature-based comprehensive indicator set, we failed to sufficiently match two of the six dimensions: efficiency and equity. We considered possible indicators for efficiency and equity using our data, but insufficient numbers of variables in our dataset reflecting efficiency and equity would not properly represent these dimensions in comparison to other dimensions. Our final result yielded 85 indicators distributed among 4 quality dimensions representing structure, process, and outcome elements with the data available to us (Additional file 3). Table 3 presents the base and four alternative composite scores for each of the 26 facilities (labeled A through Z in descending order per rank of facility-specific base score). Correspondingly, the ranks of the base (line) and alternative scores (dots) are presented in Fig. 1 Facility Rankings. The scores and ranks vary considerably by method choice and noted differences are as follows:

Uncertainty and sensitivity analysis
Base vs. alternative A: Rescaled ordinal or continuous variables are compared to binary variables (base case). Alternative A scores are slightly more condensed and therefore show less variation across facilities. This method reduces the extreme ends of the scale and extremely well performing facilities are no longer seen as potential outliers. Despite the narrow distribution of values in the underlying data, the variation in values remains minimal. This method showed no significant outliers in facility rankings compared to the base case (Fig. 1). Base vs. alternative B: Additive aggregation (base case) is compared to geometric aggregation of binary indicators into cell scores. With the same maximum possible points as the base case, in the absence of perfect quality, geometric aggregation leads to substantially lower scores in each cell than additive aggregation, and, therefore to lower total scores. In addition, this method created outliers in facility rankings compared to the base case. This can be seen best with Facility N (Fig. 1), which ranked 14th in the base case, but dropped to 23rd due to not meeting all indicators in 9 of the 12 cells. Although this facility obtained some or most of the indicators in each cell, it did not obtain all indicators in the majority of cells, which caused this facility to be most affected by this method. Base vs. alternative C: Rescaled cell scores using the Min-max method (base case) is compared to standardization using Z-scores for each cell, which expands the underlying scale to a normal distribution curve. Although this allows for easier identification of exceptionally good and poor performing facilities, it is a relative metric, only allowing for comparison of facilities against each other, but does not give the user a way to easily identify how well the facilities are performing in absolute terms (e.g. against some standard, against another sample). This method also shows no significant outliers in facility rankings compared to the base case. Base vs. alternative D: Additive aggregation of the cell scores (base case) is compared to geometric aggregation into a composite score. The maximum scale range is now 0-1 with almost all facilities identified as poor performers. This method is extremely sensitive to a low performance in a given cell score and therefore does not allow for much differentiation between facilities. This method also showed a significant difference in facility rankings compared to the base case most notable for facility P, which was ranked 16th in the base case, but drops to 26th when using the geometric aggregation at the cell level. This facility scored "0" in the patient-centered structure cell, which then resulted in a total score of "0" using this method.
The respective coefficients produced by the Spearman rank correlations comparing the base to each alternative score ranged from 0.90-0.99 (Table 4), indicating that there was only a small impact of aggregation and transformation decisions on the resulting facility ranking. As expected of all alternative cases, geometric aggregation at the indicator level led to the biggest discrepancies from the base case. The Z-score standardization resulted in the most similar rankings to the base case.
Lastly, we deconstructed the matrix to take a closer look at how cell scores differed by which method was used (Table 5). With the base score, there is a greater range in the confidence interval of the outcome element, which contains fewer indicators within each cell. Alternative A is similar to the base composite index with slightly lower score values. Alternative B shows significantly lower scores and a greater range of confidence intervals. Of particular note, is the "effective" dimension along the "process" element, with no facility able to meet every indicator within that cell as seen in Alternative B. This is also demonstrated in the dimension "accessible" along the structure element, which is the best performing dimension within the "structure" element in the base case. However, "accessible" becomes the worst performing dimension in Alternative B as the majority of facilities could meet at least some indicators, but very few could meet every indicator within this cell.

Discussion
With this study, we present an approach towards developing a composite index for maternal obstetric and neonatal quality of care tailored to a low-income country context. Starting from an established conceptual framework, we illustrate a sequence of steps towards a maternal obstetric and neonatal quality of care composite index using literature and an existing data set. To highlight the transparency in our approach, we compare alternative scores representing different decision pathways. We believe this illustration provides a useful outline to be applied and adapted as necessary to other quality of care data sets.

Composite score development Quality of care framework
Identifying the most adequate conceptual framework is critical to creating a theoretical foundation for the assessment of complex, multidimensional constructs. Ideally, this framework should be defined a priori and guide the selection of indicators, identification of appropriate data sources, and the design of evaluation tools [12]. In this case, we were unable to fully match all matrix dimensions with the data available to us. We evaluated the possibility of linking to other data sets by reviewing the Health Management Information Systems (HMIS) data and Service Provision Assessment (SPA) obtained by the Malawi Ministry of Health [52]. In addition, we examined the Demographic Health Survey data obtained by the National Statistics Office [53,54]. Unfortunately, the data from these surveys did not cover the specific time period when our data was collected, nor was it disaggregated by facility in order to be incorporated into our data.
The resulting composite index was limited to those aspects of quality initially captured by the data and, thus, omitting measures of equity and efficiency. Although efficiency and equality are considered essential components in improving maternal health in low-income countries, these measures are often not considered in regards to specifically the quality of maternal and neonatal health care services and rarely included in quality assessment tools as can be seen by the lack of indicators in these dimensions in our initial indicator table (Additional file 1) [45,55,56]. Despite the lack of attention for equity and efficiency dimensions, these aspects are important to policy makers and donors who want to ensure financial assistance is provided in an effective manner while aligning their goals with the providers of care [57,58]. Further research could better identify appropriate and useable efficiency and equity indicators specifically related to quality of care and maternal health when aiming for comprehensive evaluation.
On the other hand, composite indices should also reflect user-friendliness, applicability, and reproducibility to inform benchmarking or performance evaluation across settings [14]. To this extent, comprehensiveness should be weighed against feasibility and practicability. To ensure easy reproducibility, an ideal composite index should consist of a limited, but relevant set of key indicators. This is especially true for the assessment of obstetric care in LICs and remains an ongoing pursuit, mainly limited by the availability of reliable and routinely collected quality measures [6,59]. Quality of care is a widely framed construct that tries to address a variety of perspectives, therefore a universally accepted quality of care composite is difficult to achieve. The underlying quality dimensions remain rather universal, but universally accepted indicators may be difficult to achieve or differ in relevance between settings. Therefore, most obstetric care quality indices are limited in comprehensiveness by the indicators available [25].
To this regard, a more feasible approach could be to embrace these limitations and promote the development of composite indices in response to a program's particular focus of quality of care and available data, which may differ depending on location and time. While clearly limited in universality, such composite indices may still be relevant provided they are constructed following a set of standards that maintain transparency in respect to strengths and limitations. Aligned with this more feasibility-driven approach, we tried to illustrate how such standards and transparency could be applied to the development of a composite index built upon program-specific data related to obstetric and neonatal care [28].

Uncertainty analysis
A major pitfall in combining indicators into composite indices are the introduction of uncertaintiesknowingly or unknowinglydue to decisions taking in the normalization, weighting or aggregation of indicators, which may bias the resulting score towards desired aspects of care [60]. Statistical comparison of different decisions during the development of a composite index allows understanding of these uncertainty biases and offers the opportunity to explore how these decisions may affect the outcome of composite scores and thus facility rankings. In our illustration, all scores were relatively consistent in assigning high or low ranks to a given facility (Table 4). Theoretically, none of the alternative scenarios drastically affected the relative comparison of quality of care between studied facilities. Still, given a different sample, index differences might have been more pronounced. We point out below some strengths and weakness of the following alternatives, especially in relation to the concept of quality of care and communicating this with stakeholders.
Alternative A differs from the base score to the extent that non-binary variables were re-scaled prior to aggregation into a cell score. With these variables now contributing values between 0 and 1 (instead of somewhat arbitrary cut-off values used in the base index) the scores for all but two cells now contain decimals instead of integer information while keeping the same score range for each cell as in the base score. Overall, this led to lower absolute values for the resulting composite scores as without a set cut-off, fewer facilities meet the extreme values of "1" or "0" for these respective indicators. This relative increase in variability of the cell scores also resulted in smaller confidence intervals when compared to the binary variables of the base score (Table 5), However, if an extreme value is not excluded, this method could distort the indicator when rescaled resulting in the other smaller or larger values clustered at one end of the range [12]. This alternative is beneficial in instances where indicator content is ordinal (e.g. patient satisfaction ratings) and/or continuous (e.g. number of staff available) and a common scale needs to be created.
Since the vast majority of variables in our data set were binary, transforming all variables to the binary form was more feasible. Regardless, the scores and subsequent deconstruction for Alternative A can be easily read and communicated to various stakeholders. Alternative score C differs from the base score to the extent that cell scores were standardized prior to aggregation into the overall composite. While the rescaled cell scores in the base score result in scores ranging from 0 to 1, this standardization normalized the resulting scores around a mean of 0. Using this method for indicators or cell scores prior to aggregation, prevents any potential distortions that otherwise would have occurred by differently scaled cell means. Still, as this normalization approach does not change the actual range of the individual cell score, it allows individual facilities with extreme score values (i.e. exceptionally good or bad performance for the given score) to contribute more to their overall composite score. This might be desirable if the intention of the resulting composite is to consider exceptional performance on single indicators to be preferable. In our illustration, this element of exceptionality compared to the base score was most pronounced in the relative distance between the top-ranked facility "A" vs. the next-ranking facilities. However, this approach prevents any further comparison of performance between matrix components when deconstructing the composite averaged across facilities. Given the normal distribution introduced to the sub-scores, the resulting means will always be "0", unless additional approaches (e.g. retransformation) are taken [12].
Alternative scores B and D differ from the base score to the extent that geometric aggregation was applied when combining indicators into cell scores or cell scores into the overall composite. This approach limits the degree to which aggregated measures can compensate for each other in offsetting a low score in one area by performing better in other areas. In alternative B, the geometric aggregation of binary cell indicators results in an all-or-nothing situation within each cell, as only one indicator value of "0" reduces the aggregated cell score to "0" [14,15]. In our illustration, the majority of facilities did not score a value of 1 for every single cell indicator within a cell, especially in respect to quality of care related to accessible/structure (availability of functional equipment, supplies, drugs), effective/process (adherence to clinical standards), and patient-centered/process aspects (see Table 5). Whereas the base score represents quality of care more along a continuum allowing for a gradual increase in scores, geometric aggregation does not allow for this flexibility and demands more perfection, which may not be as feasible in low income environments due to lack of supplies or equipment.
This effect was even more pronounced in alternative D once geometric aggregation was applied to the rescaled cell scores. While geometric aggregation in our example reduced the variability of resulting scores (alternative D), it also had strongest effects on facility ranking (alternative B). This effect on the ranking reflects how inadequate performance in one measured item is no longer compensated, thus honoring facilities whose performance is more complete across all measured indicators. This "all-or-nothing" scenario may be desirable in implementing and evaluating health financing programs (e.g. pay for performance) where, but it may develop perverse incentives if a facility believes it cannot meet every indicator in a particular area and only focus on areas where they can achieve all indicators [61,62]. In the context of obstetric care where omission of single processes and lack of specific equipment or supplies might have severe implications on the birth outcome [37], a composite accounting for such non-compensable single omissions may be preferable.
Lastly, we addressed alternative methods for imputing missing data (Additional file 1). We had considered more complex imputation approaches, but decided against them for various reasons, most importantly one statistical and one conceptual reason. Regarding the former, methods such as random or multiple imputation require larger sample sizes to work properly and we would have, therefore, risked further biasing our study in unknown ways. Second, from a policy perspective, we wish to illustrate how a large number of variables can feasibly be combined into an overall quality score usable in monitoring and evaluation systems, for instance. In light of this, we were reluctant to use approaches which would require more in-depth statistical knowledge to replicate.

Policy implications
The article illustrates how a large variety of dimensions and elements of quality of care can be combined into a meaningful and easy-to-handle composite score useful in ranking facilities by their quality level, to monitor facilities' progress in quality improvement, and to determine which specific quality areas may need more attention. Our composite index was guided by the data sets we had available and the low-income context in which the data was obtained. Therefore, the indicators comprised not only of processes that are necessary for providing quality of care in any context, but also the inputs such as essential medicines and basic equipment that are often not readily available in a LIC [37]. As the indicators were obtained from the literature citing standards of maternal and neonatal quality of care, this index has the ability to be applied in multiple contexts. Yet, indicators may need adaptation, as is often required, to align with the local context [34]. We further hope that our example was instrumental in sensitizing readers to the implications of certain key decisions in the aggregation process.

Conclusions
Identifying and addressing gaps in quality maternal and neonatal healthcare is an essential function in any health system in order to improve health outcomes. Providing condensed indicators of quality of care in the form of a composite index can be a useful adjunct but can also introduce biased information if not constructed carefully. In this paper, we outline and illustrate an approach to a composite index reflecting a multi-dimensional framework of maternal obstetric and neonatal healthcare. In so doing, we provide a step-wise process applicable to a wide range of obstetric quality of care assessment programs in LICs as it can be easily adapted and implemented in a given setting or context. A comprehensive matrix combining both elements and dimensions of care allows deconstruction of the composite into cell scores representing specific aspects of quality. In reflecting and exploring alternative approaches, we attempted to highlight how certain decisions influence the practicability or usefulness of a given index. By integrating known quality frameworks, we are able to develop a composite index which can communicate a multidimensional quality assessment of obstetric and neonatal healthcare to multiple stakeholders potentially informing policy changes and new interventions.

Additional files
Additional file 1: Missing Data. The first table identifies the number and percentage of missing data followed by the base case method for imputing missing data compared to an alternative using proxy variables, which are listed if used for imputation of missing data. An additional alternative method for missing data was examined, which coded direct observation missing values as task performed (1) versus task not performed (0) in the base case. These alternative methods for imputing missing data were compared with the results, which is demonstrated in the tables containing the facility composite scores and Spearman rank correlations. (DOCX 26 kb) Additional file 2: Initial indicator Table. This table identifies