Application of principal component analysis and logistic regression model in lupus nephritis patients with clinical hypothyroidism

Background Previous studies indicate that the prevalence of hypothyroidism is much higher in patients with lupus nephritis (LN) than in the general population, and is associated with LN’s activity. Principal component analysis (PCA) and logistic regression can help determine relevant risk factors and identify LN patients at high risk of hypothyroidism; as such, these tools may prove useful in managing this disease. Methods We carried out a cross-sectional study of 143 LN patients diagnosed by renal biopsy, all of whom had been admitted to Xiangya Hospital of Central South University in Changsha, China, between June 2012 and December 2016. The PCA–logistic regression model was used to determine the influential principal components for LN patients who have hypothyroidism. Results Our PCA–logistic regression analysis results demonstrated that serum creatinine, blood urea nitrogen, blood uric acid, total protein, albumin, and anti-ribonucleoprotein antibody were important clinical variables for LN patients with hypothyroidism. The area under the curve of this model was 0.855. Conclusion The PCA–logistic regression model performed well in identifying important risk factors for certain clinical outcomes, and promoting clinical research on other diseases will be beneficial. Using this model, clinicians can identify at-risk subjects and either implement preventative strategies or manage current treatments.


Background
Systemic lupus erythematosus (SLE) is a multisystem autoimmune disease, and lupus nephritis (LN) is a frequently occurring and serious complication of SLE [1,2]. Studies indicate that the prevalence of hypothyroidism is much higher in SLE, and especially among LN patients, than in the general population [3][4][5][6]; additionally, the risk of subsequent cardiovascular events and renal impairment is higher among LN patients with thyroid dysfunction.
Accordingly, analysis of the associations between LN and hypothyroidism and a determination of relevant risk factors would greatly aid in diagnosis and disease management.
However, the pathological and physiological mechanisms underlying SLE with hypothyroidism are sophisticated. Furthermore, the availability of multiple indicators and of large relevant datasets makes it difficult to analyse clinical data directly; therefore, the precise nature of these mechanisms remains unknown [6][7][8].
Logistic regression is widely used to analyse the relationship between individual risk/protective factors and outcomes [9]. However, if the variables therein are collinear, the regression equation will be unstable and its results difficult to predict. Principal component analysis (PCA) is a powerful method by which to explore intricate datasets that feature multiple variables. PCA uses a mathematical algorithm to determine a smaller number of new variables called principal components (PCs), which are linear functions of those in the original dataset. Hence, PCA scales down the dimensionality of a large dataset while preserving as much statistical information as possible [10,11]. As such, the current study's use of PCA helps ensure the stability of the regression equation. In fact, PCA has previously been used to analyze complex serological and immunological datasets with multiple variables in SLE cross-sectional studies. Raymond et al [12] used PCA to describe the dynamic interplay and the influence of complex cytokines measured in serum, detect the cytokine groups that differentiated across disease activity in SLE patients. Adel Helmy et al. [13] used PCA to identify cytokine groups which accounted for the majority of the variation within the serological laboratory test data in traumatic brain injury patients.
The current study examines the laboratory test results of selected patient populations, and leverages PCA-logistic regression analysis to pinpoint key PCs. Such information may greatly assist in the prevention or management of this disease.

Patients
In our cross-sectional study, we investigated 143 LN patients diagnosed through renal biopsy who had been admitted to Xiangya Hospital of Central South University in Changsha, China during the June 2012-December 2016 period. The exclusion criteria included the coexistence of another autoimmune disease or having been diagnosed with thyroid disease prior to LN. All patients were informed of the objectives of this study, and each provided signed written consent prior to enrolment. As this research did not affect patient treatment, as per Central South University policies, ethics board approval was not required.

Collection of clinical data
Data on patient characteristics, clinical symptoms, and laboratory results were retrospectively collected from each patient's medical records. These included: (1) general information, including age and sex; (2) clinical symptoms, including course of disease, hypertension, fever, cutaneous manifestations, alopecia, oral ulcer, malar rash, renal dysfunction (proteinuria), and haematological disease; and (3) laboratory results, including white blood cell count, haemoglobin (Hb) concentration, concentration of total protein (TP), serum lipid, erythrocyte sedimentation rate, C-reactive protein, C3, C4, and antibodies to dsDNA, simth, SSA, SSB, anti-U1 ribonucleoprotein, and ribosomal P protein. Patients' SLE disease activity (i.e., SLEDAI) scores were collected from medical records and calculated by an experienced clinician.

Statistical analysis
Values herein are expressed as mean (standard deviation), median, and interquartile range, or as a number and percentage. We undertook comparisons between categorical variables by using the χ 2 test, and between continuous variables in two independent groups by using the t-test. In cases where we were unable to establish a normal distribution for a variable, we performed the Mann-Whitney U-test.
We performed PCA by using SPSS software (a factor analysis package), to determine the interplay of clinical variables among LN patients with and without hypothyroidism. We achieved convergence during an Oblimin rotation with Kaiser normalization. In the final PCA iteration, we covered nine clinical variables in the patient group analysed. To be considered a PC, a variable's eigenvalue had to exceed 1, and PC 1 represents the group of variables that induced the greatest amount of variation in the data. We used logistic regression to further screen clinically significant eigenvalues and scrutinize critical factors that affect outcomes among LN patients.
We performed the analysis in three stages. First, we performed a monofactor analysis to examine differences between LN patients with and without hypothyroidism. Second, we performed PCA with regard to all the serology, immunology, and biochemistry variables of LN patients. We truncated those data by rotational reorientation to maximize variance along the new axis (i.e., PC) while concurrently preserving the relationship and order among the data points; the PCs could then be used in further classification, as they retain information from the original data. Third, the absolute majority of cumulative contribution (> 2/3) was used to extract PCs as independent variables, and the clinical outcome was used as a dependent variable for logistic regression modelling. In this way, we were able to obtain the PCs that significantly correlated with certain clinical outcomes. We generated an ROC of multivariate observations to assess the PCA-logistic regression model's performance. Statistical analysis was performed using SPSS (version 19), and all p-values less than 0.05 were considered statistically significant.

Patient characteristics
We compared the clinical characteristics of 48 LN patients with hypothyroidism and 94 LN patients with

Principal component analysis
To cover as many indices that affect the outcomes of LN with hypothyroidism as possible, factors with p < 0.05 were included as input variables for PCA. The Kaiser-Meyer-Olkin value was 0.7 when all the clinical variables were included; meanwhile, the p-value of the Bartlett test of primary data was 0.000, indicating that the data were suitable for use in PCA. We removed symptomatic variables and those of which the extract value were too small in the common factor variance table. The model generated nine PCs that explained 74% of the variation within  (Table 2), the loadings represented the degree of importance of the corresponding compound. For example, the first three degrees of importance of PC 1 in the sequence were albumin (ALB) > TP > C3; likewise, the first three degrees of importance of PC 2 in the sequence were SCr > BUN > UA.
In focusing on the indices whose loading was obviously higher than those of others, we could clearly see that PC 1 was mainly about renal functions (including SCr, BUN, and UA); PC 2 was about serum protein factor (including TP and ALB); PC 3 was a leukocyte factor; and PC 4 was a globulin factor. We additionally found that PC 5 -PC 8 could not be accurately classified as any certain factor bearing a specific meaning, and PC 9 was an autoantibody factor.

PCA-logistic regression analysis
We used the nine PCs as input variables and the clinical outcome (LN with or without hypothyroidism) as a dependent variable in logistic regression modelling. Our analytical results showed that PC 1 , PC 2 , and PC 9 were the PCs that have a significant influence on whether LN was combined with hypothyroidism (Table 3)-that is to say, SCr, BUN, UA, TP, ALB, and anti-ribonucleoprotein (RNP) antibody might be paramount factors in treating LN with hypothyroidism. It is noteworthy that the Exp(B) of PC 2 and PC 9 were 2.361 and 4.724, respectively; these indicate that the correlation between each of these two PCs and LN patients with hypothyroidism was much stronger than that between other pairings. We also generated an ROC (Fig. 1) that was close to the top-left corner of the coordinate system. The area under the ROC curve (AUC) was 0.885 (p < 0.001).

Discussion
We applied PCA-logistic regression analysis to demonstrate that three PCs-namely, PC 1 , PC 2 and PC 9 , which included SCr, BUN, UA, TP, ALB, and anti-RNP antibody-were found to be important clinical variables with respect to LN patients with hypothyroidism. The Exp(B) of PC 2 and PC 9 was 2.361 and 4.724, respectively, indicating that the correlation between these two PCs and the outcome was much stronger than that among others. Previous studies conclude that the most common kidney derangements associated with hypothyroidism are elevated SCr levels, reduced estimated glomerular filtration rate, and water-electrolyte imbalance [14,15]. Moreover, SCr levels in SLE patients with hypothyroidism were found to be elevated [3]. The current study also showed that renal function indices such as SCr, BUN, and UA are essential factors in whether LN patients are associated with hypothyroidism. Possible mechanisms might include reduced renal perfusion [16], adaptive preglomerular vasoconstriction caused by filtrate overloads [17], and decreased endothelial nitric oxide synthase activity/capacity of the renal vasculature caused by reduced secretion of insulin-like growth factor 1 and vascular endothelial growth factor [18].
Severe hypoalbuminemia was observed in SLE patient with subclinical hypothyroidism [3], correspondingly, we found lower TP and ALB were influential for LN patients with hypothyroidism. Actually, most thyroid hormones are bound to plasma proteins including thyroidbinding globulin (TBG), thyroxine-binding pre-albumin (TBPA) and ALB. While kidney function of LN patients is impaired, TBG, TBPA and ALB are significantly reduced because of severe and persistent proteinuria, thyroid hormone synthesis is also affected by this [19,20]. Furthermore, the serum hormonal concentration may be altered by changes in the binding capacity of serum proteins, thereby patients with hypoproteinemia may exhibit clinical features and laboratory findings suggestive of hypothyroidism [21,22].
Additionally, in this study, higher anti-RNP antibody level had massive effect among LN patients with hypothyroidism, which has not been reported before. Anti-RNP antibody reacts with proteins that are associated with U1 RNA and form U1snRNP, autoimmunity to RNP autoantigens is frequently seen in systemic autoimmune diseases including lupus and it may induce the occurrence of renal disease [23][24][25], thyroid hormone synthesis may be affected by impaired kidney function as mentioned earlier. Moreover, the induction of anti-RNP autoantibodies is associated with the initial clinical manifestations of autoimmune disease, in this case, autoantibodies may lead to thyroid hormone synthesis disorders by damaging the thyroid follicular epithelium [26][27][28][29], suggesting that RNP related immune responses may have pathogenic roles in hypothyroidism. Accordingly, those hypotheses deserved to be verified through further mechanism research.

Conclusions
The principal component analysis (PCA)-logistic regression model approach used herein is a useful statistical method by which to analyse the effects of multiple clinical index interactions in lupus nephritis (LN) patients who also have hypothyroidism. Using this model, we found serum creatinine (SCr), blood urea nitrogen (BUN), blood uric acid (UA), total protein (TP), albumin (ALB), and anti-ribonucleoprotein (RNP) antibody to be particularly vital factors with respect to these patients. What is more, the impact of PC 9 -which mainly involved the anti-RNP antibody-was the strongest among these patients: its Exp(B) was 4.724, the highest among nine principal components. SCr, BUN, UA, TP, ALB, and autoantibody levels are modifiable factors that can be improved through early treatment to improve renal function and strengthen nutrition support, in order to reduce risk among LN patients with hypothyroidism. Ultimately, PCA offers great insights in exploring the influence of clinical variables or measuring the important factors that affect patient outcomes.