Construction and validation of machine learning models for sepsis prediction in patients with acute pancreatitis

Background This study aimed to construct predictive models for the risk of sepsis in patients with Acute pancreatitis (AP) using machine learning methods and compared optimal one with the logistic regression (LR) model and scoring systems. Methods In this retrospective cohort study, data were collected from the Medical Information Mart for Intensive Care III (MIMIC III) database between 2001 and 2012 and the MIMIC IV database between 2008 and 2019. Patients were randomly divided into training and test sets (8:2). The least absolute shrinkage and selection operator (LASSO) regression plus 5-fold cross-validation were used to screen and confirm the predictive factors. Based on the selected predictive factors, 6 machine learning models were constructed, including support vector machine (SVM), K-nearest neighbour (KNN), multi-layer perceptron (MLP), LR, gradient boosting decision tree (GBDT) and adaptive enhancement algorithm (AdaBoost). The models and scoring systems were evaluated and compared using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and the area under the curve (AUC). Results A total of 1, 672 patients were eligible for participation. In the training set, 261 AP patients (19.51%) were diagnosed with sepsis. The predictive factors for the risk of sepsis in AP patients included age, insurance, vasopressors, mechanical ventilation, Glasgow Coma Scale (GCS), heart rate, respiratory rate, temperature, SpO2, platelet, red blood cell distribution width (RDW), International Normalized Ratio (INR), and blood urea nitrogen (BUN). The AUC of the GBDT model for sepsis prediction in the AP patients in the testing set was 0.985. The GBDT model showed better performance in sepsis prediction than the LR, systemic inflammatory response syndrome (SIRS) score, bedside index for severity in acute pancreatitis (BISAP) score, sequential organ failure assessment (SOFA) score, quick-SOFA (qSOFA), and simplified acute physiology score II (SAPS II). Conclusion The present findings suggest that compared to the classical LR model and SOFA, qSOFA, SAPS II, SIRS, and BISAP scores, the machine learning model-GBDT model had a better performance in predicting sepsis in the AP patients, which is a useful tool for early identification of high-risk patients and timely clinical interventions. Supplementary Information The online version contains supplementary material available at 10.1186/s12893-023-02151-y.


Background
Acute pancreatitis (AP), an inflammatory disease of the pancreas, is the leading cause of hospital admissions for gastrointestinal diseases worldwide [1,2].The worldwide incidence rate of AP is 33.74 per 100,000 person-years, with a gradual increase in incidence [3,4].Approximately 10-20% of patients with AP have complicated systemic inflammatory response syndrome (SIRS) and multiple organ dysfunction syndrome, which can lead to the development of severe AP with a mortality rate of 10-15% [5].Sepsis is a life-threatening SIRS caused by the host's dysregulated response to infection, which ultimately leads to septic shock and multiple organ failure and is the main cause of health loss all over the world [6].Up to 40-70% of patients with AP will develop an infection related to pancreatitis in the late stages, or sepsis in severe cases [7,8].The progression of AP to sepsis is associated with higher mortality rates and a poor prognosis [9].Therefore, early identification of AP patients who are likely to develop sepsis is of great significance for reducing mortality and disease burden.
Several scoring systems have been identified to predict the severity and prognosis of AP and sepsis, including the SIRS score, bedside index for severity in acute pancreatitis (BISAP) score, sequential organ failure assessment (SOFA) score, quick-SOFA (qSOFA), simplified acute physiology score II (SAPS II) [10][11][12].However, poor performances of scoring systems in predicting sepsis have been observed [13].The predictive performance of the logistic regression (LR) model based on conventional clinical indicators in predicting sepsis among patients with AP was also moderate, with the area under the receiver (AUC) of the operating characteristic curve (ROC) value being 0.73 [9].Advanced machine learning algorithms can model nonlinear relationships, analyze complex high-order interactions, and robustly handle multicollinearity among the predictor variables [14].Machine learning has been widely used in the diagnosis/ risk prediction of sepsis, and the prognosis of sepsis.A database study conducted a machine learning approach to predict 30-day mortality for patients with sepsis, the AUC of the model was 0.857 [15].A study conducted in the Chinese population used a machine learning model for accurate prediction of sepsis in intensive care unit (ICU) patients, the established machine learning-based model showed good predictive ability with AUC being 0.91 [16].In addition, the machine learning model also showed excellent predictive value for severe AP and concurrent acute kidney injury (AKI) risk in AP [17,18].However, to the best of our knowledge, no study has reported the application of machine learning in predicting the risk of sepsis in patients with AP.The early detection and prediction of patients who may develop sepsis are essential to improve the adverse consequences of AP.
Herein, this study aimed to (1) construct predictive models for the risk of sepsis in patients with AP using machine learning methods and validate the predictive performances; (2) select the optimal machine learning model and compare it with the LR model and scoring systems.This study may help to identify the risk of sepsis in patients with AP at an early stage and assist in the clinical treatment of AP and the prevention of sepsis.

Data design and study population
This study was a retrospective cohort study.[20,21].The included criteria were (1) aged ≥ 18 years; (2) diagnosed with AP upon intensive care unit (ICU) admission.Excluded criteria were (1) patients with a length of ICU stay less than 24 h; (2) diagnosed as sepsis upon ICU admission.he requirement of ethical approval for this was waived by the Institutional Review Board of Tianjin Medical University General Hospital, because the data was accessed from MIMIC III database and MIMIC IV database (publicly available database).The need for written informed consent was waived by the Institutional Review Board of Tianjin Medical University General Hospital due to retrospective nature of the study.All methods were performed in accordance with the relevant guidelines and regulations.

Variable definition
Patients diagnosed with AP were determined by using the International Classification of Diseases (ICD) (ninth edition, code 577.0 or 10th version, code K 85.0) codes.Sepsis was diagnosed according to the sepsis-3 criteria [22]; in brief, patients with documented or suspected infection and an acute change in total SOFA score of ≥ 2 points were considered to have sepsis.Infection was identified from the ICD code.
SOFA score calculated the dysfunction of six organ systems and the severity of the dysfunction, including the respiratory, coagulation, liver, cardiovascular, kidney, and nervous systems with a score of 0-4 for each item and a total score of 0-24 [23].qSOFA score: calculated by the presence of changes in mental status, respiratory rate > 22 breaths per minute, and preoperative systolic blood pressure < 100 mmHg [22].The SAPS II score (0-163) consists of 17 variables composed of 12 physiological variables, age, type of admission, and three different underlying disease variables [24].Components of the BISAP scoring system included BUN > 25 mg/dl, impaired mental status, SIRI, age > 60 years, and pleural effusion [25].SIRS was defined as two or more out of the following four: temperature > 38.0 °C or < 36.0 °C, heart rate > 90 beats/ minute, respiratory rate > 20 breaths/minute, leukocytosis > 12,000/dL, or leucopenia < 4,000/dL [26].

Outcome and follow-up
The outcome of the study was the risk of sepsis.Followup was conducted during hospitalization in the ICU and the end point of follow-up was sepsis or discharge from the ICU.The mean follow-up time was 3.64 (1.93-9.70)days.

Construction and performances assessment of the machine learning models
The patients were randomly divided into two groups, of which 80% were used as the training set and the remaining 20% as the testing set.Based on the predictive factors selected, 6 machine learning models were constructed including support vector machine (SVM), K-nearest neighbor (KNN), multi-layer perceptron (MLP), LR, gradient boosting decision tree (GBDT), and adaptive enhancement algorithm (AdaBoost).The models were evaluated and compared by sensitivity, specificity, positive prediction value (PPV), negative prediction value (NPV), accuracy, and the AUC of the ROC.

Sample size calculation for predictive models
Our sample size calculation aimed to ensure a precise estimation of model parameters while minimizing the potential of overfitting.In order to achieve the goal of an average absolute prediction error (MAPE) of 0.05, as suggested by Riley et al. [27], 478 samples would be sufficient for a maximum of 13 predictors, a statistically determined risk prediction model.

Statistical analyses
Variables with more than 20% missing values were excluded from further analysis.Random forest imputation was used to deal with missing data below 20%.Random forest imputation is a nonparametric algorithm that accommodates nonlinearities and interactions and does not require the specification of a specific parametric model [28].Supplementary Table 1 shows the variables with missing values below 20%.Sensitivity analysis was performed to compare the data before and after imputation (Supplementary Table 2).Means ± standard deviations (SD) was used to describe the distribution of normally distributed measurement data, and T-test was used to compare the differences between the two groups.Medians and quartiles were used to represent measurement data that conformed to a normal distribution, and rank-sum tests were used for comparisons between groups.Count data were expressed as the number of cases and composition ratio (%), and the chi-square test was used for comparison between groups.
The least absolute shrinkage and selection operator (LASSO) ("LassoCV" method in Sklearn) regression plus 5-fold cross-validation were used to screen and confirm the predictive factors and selected the best alpha = 0.0075 when one standard error of the minimum mean squared error (MSE) was used as a screening criterion.In order to select the optimal model from the 6 machine learning models, Delong's test was used.Comparing the performance of the optimal machine learning model with LR, scoring systems (SOFA, qSOFA, SIRS, SAPS II, and BIASP).Clinical benefit was assessed using Decision Curve Analysis (DCA).A P < 0.05 was considered statistically significant.Python 3.9.0(Python Software Foundation) and R (version 4.2.2) were used for all analyses.

Basic characteristics of the study population
A total of 1,930 participants diagnosed with AP were screened from MIMIC III and MIMIC IV databases; of these 1,930 patients, 256 were excluded due to the length of ICU stay less than 24 h, and 2 were excluded due to the age < 18 years.Finally, 1, 672 patients were eligible for participation, with 1,338 patients in the training set and 334 patients in the testing set.The flow chart of the participants' selection is depicted in Fig. 1.In the training set, 261 AP patients (19.51%) were diagnosed with sepsis.In the training set, the mean age of the AP patients with sepsis was 58.43 (16.50) years, 58.2% of the AP patients with sepsis were male, 42.9% of the AP patients with sepsis were married, 67.8% of the AP patients with sepsis were on vasopressors, and 95.8% of the AP patients with sepsis were on mechanical ventilation.The mean GCS score of AP patients with sepsis was 9.73 (4.48), the mean heart rate of AP patients with sepsis was 102.91 (23.15) bpm, and the mean respiratory rate of AP patients with sepsis was 22.57 (6.86) breaths/minute.There were significant differences between AP patients with and without sepsis in insurance, marital status, vasopressors, mechanical ventilation, GCS, heart rate, SBP, respiratory rate, SpO 2 , WBC, RDW, blood creatinine, BUN, bicarbonate, SOFA, qSOFA, SAPS II, SIRS, and BISAP (each P < 0.05).All baseline characteristics of the study population are summarized in Table 1.

Predictive factors selection for the risk of sepsis in AP patients
After LASSO regression selection with 5-fold cross-validation via minimum criteria, 13 variables remain as the predictive factors for the risk of sepsis in AP patients: age, insurance, vasopressors, mechanical ventilation, GCS, heart rate, respiratory rate, temperature, SpO 2 , platelet,   2 shows the loss curves for the MSE loss with different Lambda.The SHAP plot (Fig. 3) shows the relationship between the value of features and their impact on the model prediction.Each row represents the SHAP value distributions of a feature, and the x-axis refers to the SHAP value, where the value of SHAP > 0 shows that the prediction favors the positive class, and a value < 0 indicates that the prediction tends to be the negative class.The color of sample points in Fig. 3 indicates the corresponding feature value: redder points mean higher feature importance values, while bluer points indicate lower feature values.The features are sorted according to the sum of SHAP values incorporating all the samples in the dataset.
Fig. 2 The loss curves for the MSE loss with different Lambda
Fig. 3 The SHAP plot of the relationship between the value of features and their impact on the model prediction

Discussion
In this retrospective study, we developed and validated machine learning-based models for predicting sepsis in AP patients.In the training set, 261 AP patients (19.51%) were diagnosed with sepsis.The results of this study showed that the GBDT model had an excellent performance in the prediction of sepsis in patients with AP, with the AUC in the testing set at 0.985.Furthermore, the GBDT model achieved better predictive performance for sepsis prediction in AP patients compared with the LR model, and scoring systems.Advanced machine learning methods are good at dealing with high-order interactions and fitting complex nonlinear relationships, and can be used to integrate large amounts of data from electronic health records (EHRs).The application of machine learning to data-driven analysis shows promise for improving predictive performance in healthcare [29][30][31].A large retrospective study developed and validated a machine learning tool within 48 h after admission for predicting which patients with AP [32].A retrospective study enrolling patients with AP from multiple centers explored a machine learning model for early identification of severe AP (SAP) among patients hospitalized for AP, and the model showed evident clinical practicability [17].The study by Qiu et al. developed and validated three machine-learning models for predicting multiple organ failure in moderately severe and severe AP [33].A systematic review included 47 machine learning predictive models for AP, with 10 studies reporting severity prediction, 10 studies complication prediction, 3 studies mortality prediction, 2 studies recurrence prediction, and 2 studies surgery timing prediction [34].The study by İnce et al. evaluated the success of artificial intelligence for early prediction of severe course, survival, and ICU requirements in patients with AP [35].A metaanalysis suggested that the machine learning approach had a better performance than the existing sepsis scoring systems in predicting sepsis [36].A systematic review and meta-analysis showed that individual machine learning models can accurately predict sepsis onset ahead of time [37].A machine learning model for prediction of sepsis in ICU patients showed good predictive ability in Chinese sepsis patients [16].However, there have been limited studies that constructed predictive models for the risk of sepsis in patients with AP using machine learning methods.This study used machine learning The results of this study showed that the GBDT model had an excellent performance in predicting sepsis in AP patients.The GBDT model has been applied to diagnose and predict the outcomes of several diseases.A study that developed and assessed machine learning models for predicting recurrence risk after endovascular treatment in patients with intracranial aneurysms found that the GBDT model showed an optimal prediction performance for predicting recurrence risk in patients with intracranial aneurysms after endovascular treatment in 6 months [38].Lee et al. established machine learning models for predicting the risk of end-stage renal disease among chronic kidney disease patients who survive sepsis, and the GBDT algorithm yielded an accuracy as high as 0.879, as measured using the AUC [39].Furthermore, we compared the performance of models, the traditional LR model, and scoring systems to predict sepsis in AP patients in the early stage.The result showed that the GBDT model achieved the best performance in predicting sepsis in terms of the predictive performance.Similarly, a previous study suggested that compared to the classical LR model, machine learning models using features that can be easily obtained at admission had a better performance in predicting AKI in AP patients [40].A retrospective temporal validation study suggested an interpretable machine learning model performed significantly better than LR and outperformed conventional severity scores in predicting in-hospital mortality among sepsis patients and varying subgroups [31].The high AUC of the GBDT model, compared to traditional models and scoring systems suggested that machine learning models can be used frequently as an adjunct to clinical decision making and provider's intuition regarding patient prognosis and ideal next steps in care.Early and effective identification of high-risk patients with sepsis in AP patients can prevent further deterioration of the patient's condition.This study helps clinicians to develop individualized treatment plans for patients, reducing the disease burden on patients and facilitating the rational allocation of medical resources.
GBDT is an ensemble algorithm widely used for regression and classification tasks.The GBDT algorithm creates multiple weak learners or individual trees by bootstrapping training samples and integrates their outputs to make predictions.The GBDT algorithm is less sensitive to hyperparameters, less prone to overfitting, and easy to implement.For the practical applicability of the GBDT model in a clinical setting, an example of how SHAP can be used locally to explain individual prediction was provided (Supplementary Fig. 1).The GBDT model is a promising approach for sepsis prediction in AP patients, but further research is still needed to evaluate its generalizability to other tasks and its computational efficiency.
This study suggested that the basic characteristics of patients (age, temperature, and insurance) and vital signs (heart rate, respiratory rate, and SpO2 were associated with the risk of sepsis in AP.A study by Hong et al. indicated that age may be useful for predicting the development of persistent organ failure in patients with AP [41].According to the study by Miller et al., an ED-SAS score that incorporates factors including SpO2 and age provides a rapid method for predicting prognosis in AP [42].The temperature has been reported as a predictor factor for sepsis in AP patients [9].heart rate has been observed Fig. 5 The net benefit of GBDT model, LR model, and scoring systems at different threshold probabilities for predicting sepsis in AP patients Fig. 4 The ROC curve comparison between GBDT and LR models and scoring systems to be associated with severe AP [43].The interventions also can predict the risk of sepsis in AP.Early vasopressor use was significantly associated with increased inhospital mortality among critically ill AP patients [44].We found that the inflammatory markers including RDW and platelets can predict the risk of sepsis in AP patients.As a part of routine blood tests, RDW is a quantitative measurement of the size variability of peripheral blood red blood cells (RBCs), which reflects the heterogeneity of RBCs.Because the changes in the shape and size of circulating red blood cells are often related to the occurrence and development of hematological diseases, RDW is used for the morphological classification of anemia and differential diagnosis of microcytic anemia [45].RDW is positively associated with AP severity and is likely a useful predictive parameter of AP severity [46].Platelets are small pieces of cytoplasm shed by mature megakaryocytes, which participate in the hemostasis function of the body.When the stress effect secondary to acute and critical diseases occurs in the body, the number of platelets will change, and the degree of platelet change will affect sepsis [47].A study by Feng et al. found that a low platelet count increases the risk of sepsis in patients with AP [9].Simple, routine, and widespread individual laboratory parameter, BUN has been proposed as marker of disease severity [48].In this study, BUN could be used to predict the risk of sepsis in AP patients.A study by Hong et al. demonstrated that BUN could predict severe AP [49].Farrell et al. found that persistent elevation of BUN is associated with the development of severe AP [50].Previous studies have also suggested that BUN is strongly associated with sepsis [51,52].GCS was originally used as an assessment tool for patients with head injuries to assess the coma of patients, which has become an important part of the system to determine the severity of an injury [53].In this study, GCS could predict the risk of sepsis in patients with AP.A retrospective analysis also demonstrated that GCS was among the predictive factor of sepsis among patients with AP [9].
Our study has several strengths.To the best of our knowledge, we first report the application of machine learning models to predict the risk of sepsis in AP patients using the MIMIC database.The optimal model was screened using a variety of machine learning methods and showed significantly better predictive value than LR and scoring systems, providing a basis for the accurate prediction of sepsis risk in AP patients.The sample size in this study is very sufficient for the construction and validation of prediction models.A larger sample size is valuable for developing a more robust prediction model, which has good generalization ability and good statistical efficacy for a wider population.However, the study was still subject to some limitations.First, the retrospective nature of the study may have introduced unavoidable selection bias, which limits the interpretation of the results.Second, the MIMIC data were obtained from a single center in the United States, which may affect the generalizability of the prediction model to other populations.The results may not be representative of the entire population of AP patients, although we attempted to provide detailed information in our study.Third, the study included AP patients in MIMIC-III and IV, which included hospitalized patients from 2001 to 2019.The population studied here is not consecutive and therefore different biases may have been introduced.As treatment regimens are developed and optimized, consistency of treatment regimens cannot be guaranteed, which may introduce some bias into the results.Fourth, radiological results in AP, specific chemoradiotherapy information, and medication dosage in vasopressors and mechanical ventilation may have an impact on our results, but the lack of radiological data in the database prevented us from performing further analyses.Fifth, the study lacked external validation.External validation is crucial to assess the generalizability and reliability of the model, especially when using data from a single center.Therefore, it would be important to perform further validation on an independent dataset in future studies to examine the robustness and generalization ability of the proposed model, which might greatly increase the impact of the current finding.Future research will need to explore other machine learning algorithms for predicting sepsis in AP patients.

Conclusions
This study constructed and validated machine learning models to predict sepsis in patients with AP.The GBDT model, based on 13 predictive factors, showed promising performance in predicting sepsis in AP patients.A prediction model is a useful tool for the early identification of high-risk patients and timely clinical intervention.

Fig. 1
Fig. 1 The flow chart of the participants selection Data were collected from Medical Information Mart for Intensive Care III (MIMIC III) database (https://mimic.mit.edu/docs/iii/) between 2001 and 2012 and the MIMIC IV database (https://mimic.mit.edu/docs/iv/) between 2008 and 2019.MIMIC-III includes data from more than 58,000 admissions to Beth Israel Deaconess Medical Center in Boston from 2001 to 2012, including 38,645 adults and 7,875 neonates [19], and MIMIC-IV includes 524,740 admissions for 382,278 patients at this center from 2008 to 2019

Table 2
Construction and performance validations of machine learning models Notes: SVM: Support vector machine; KNN: K-nearest neighbor; MLP: multi-layer perceptron; LR: logistic regression; GBDT: gradient boosting decision tree; AdaBoost: adaptive enhancement algorithm; PPV: Positive predictive values; NPV: Negative predictive values; AUC: Area under curve; CI: confidence interval; Ref: Reference methods to construct predictive models for the risk of sepsis in patients with AP and validated the predictive performance.

Table 3
Comparisons of the predictive performances of the GBDT model with LR model, SOFA, qSOFA, SAPS II, SIRS, and BISAP scores