Construction and validation of a predictive model of invasive adenocarcinoma in pure ground-glass nodules less than 2 cm in diameter

Objectives In this study, we aimed to develop a multiparameter prediction model to improve the diagnostic accuracy of invasive adenocarcinoma in pulmonary pure glass nodules. Method We included patients with pulmonary pure glass nodules who underwent lung resection and had a clear pathology between January 2020 and January 2022 at the Qilu Hospital of Shandong University. We collected data on the clinical characteristics of the patients as well as their preoperative biomarker results and computed tomography features. Thereafter, we performed univariate and multivariate logistic regression analyses to identify independent risk factors, which were then used to develop a prediction model and nomogram. We then evaluated the recognition ability of the model via receiver operating characteristic (ROC) curve analysis and assessed its calibration ability using the Hosmer-Lemeshow test and calibration curves. Further, to assess the clinical utility of the nomogram, we performed decision curve analysis. Result We included 563 patients, comprising 174 and 389 cases of invasive and non-invasive adenocarcinoma, respectively, and identified seven independent risk factors, namely, maximum tumor diameter, age, serum amyloid level, pleural effusion sign, bronchial sign, tumor location, and lobulation. The area under the ROC curve was 0.839 (95% CI: 0.798–0.879) for the training cohort and 0.782 (95% CI: 0.706–0.858) for the validation cohort, indicating a relatively high predictive accuracy for the nomogram. Calibration curves for the prediction model also showed good calibration for both cohorts, and decision curve analysis showed that the clinical prediction model has clinical utility. Conclusion The novel nomogram thus constructed for identifying invasive adenocarcinoma in patients with isolated pulmonary pure glass nodules exhibited excellent discriminatory power, calibration capacity, and clinical utility. Supplementary Information The online version contains supplementary material available at 10.1186/s12893-024-02341-2.


Introduction
Owing to advances in diagnostic imaging and the widespread use of low-dose computerized tomography (LDCT) screening, an increasing number of pulmonary pure gross glass nodules (pGGNs) are being detected, causing alarm to patients [1][2][3][4].A ground-glass nodule (GGN) is defined as a nodule with a slightly increased density that does not obscure underlying bronchial structures or vascular margins in high-resolution CT images [5].Depending on the presence or absence of solid components, GGN can be classified as pure GGN (pGGN) and partially solid GGN.Notably, pGGNs are defined as GGNs without solid components [6].
Reportedly, the development of pGGNs progresses slowly, and at different pathological stages, they exhibit different growth patterns and show varying degrees of invasiveness.Some pGGN stages include: atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), microinvasive adenocarcinoma (MIA), and invasive adenocarcinoma (IAC) [7][8][9][10].Currently, there are no uniform guidelines for GGN surgery.Some studies have shown that sublobar resection is acceptable for AAH, AIS, and MIA, but is unsuitable for IAC [11,12].With adequate surgical resection, patients with AIS and MIA exhibit approximately 100% disease-free survival [13][14][15], whereas the 5-year survival rate of patients with localized IAC have varies in the range70-90% [16,17].Therefore, the preoperative identification of the IAC is essential to help clinicians choose the correct procedure for their patients.
In clinical practice, identifying IAC manifesting as a pGGN is challenging.Attempts have been made in previous studies to distinguish between preinvasive adenocarcinomas and IACs presenting as pGGN [18][19][20][21][22][23][24][25][26].Further, the correlation between pathological manifestations and nodules presenting as solid, partially solid, and pGGN have been investigated in other studies [27,28].However, in these previous studies, a wide variety of assessment factors were not identified.Additionally, the validity of some of these studies is limited by the small number of patients included.
In this study, we aimed to retrospectively analyze a relatively large number of patients with pGGN < 2 cm and develop a multiparametric predictive model and nomogram using patients' clinical information, hematological findings, and imaging features to improve the ability of clinicians to diagnose IAC in pGGN and provide a basis for rational clinical decision-making.

Materials and methods
This study was approved by the Ethics Committee of Qilu Hospital, Shandong University (registration number: KYLL-202008-023-1), and all the patients read and signed the informed consent form prior to this study, approving the use of their clinical information.

Patient selection
In this study, we retrospectively evaluated patients who underwent minimally invasive pneumonectomy with a clear pathology of pulmonary nodules at the Qilu Hospital of Shandong University between January 2020 and January 2022.The inclusion criteria were as follows: (1) patients with a single intrapulmonary nodule based on a chest CT scan performed within 1 month before surgery; (2) patients with lung nodules with maximum diameter ≤ 2 cm; (3) patients with pGGN without any solid component as indicated via CT imaging; (4) absence of pulmonary atelectasis and active inflammation based on lung images; (5) clear pathological findings obtained after surgical resection; (6) asymptomatic at diagnosis; and (7) no preoperative treatment.The exclusion criteria were as follows: (1) aged< 18 years, (2) history of thoracic surgery, (3) incomplete perioperative data, (4) history of malignant disease within the past 5 years, and (5) metastatic tumors.The patients included in our study were screened according to the inclusion and exclusion criteria (Fig. 1).Further, we used a random split-sample approach to randomly assign all the enrolled patients to either the training cohort or validation cohort at a ratio of 7:3.The training cohort was used to develop the prediction nomogram, while the validation cohort was used to verify the performance of the nomogram.

Data collection and variable definitions
The following data were collected from the hospital database for all the eligible patients: (1) demographic data: sex, age, smoking history, body mass index (BMI), and preoperative comorbidities [hypertension, diabetes, and chronic obstructive pulmonary disease (COPD)]; (2) preoperative assessment data: American Society of Anesthesiologists (ASA) score, percentage of the predicted forced expiratory volume in one second (FEV1% predicted), and percentage of the predicted value of maximal voluntary ventilation (MVV% predicted); (3) laboratory blood test indicators: blood type, serum complement C1q, lactate dehydrogenase (LDH), serum amyloid (SA), serum 5'-nucleotidase (5'-NT), blood sugar, albumin, neutrophil, eosinophil, basophil, monocyte, lymphocyte, erythrocyte, hemoglobin, and platelet levels, derivative prognostic nutritional index (PNI), neutrophil-lymphocyte ratio (NLR), derived neutrophil-to-lymphocyte ratio (dNLR), plateletlymphocyte ratio (PLR), monocyte-lymphocyte ratio (MLR), neutrophil-to-lymphocyte and platelet ratio (NLPR), systemic inflammatory response syndrome (SIRS), the aggregate index of systemic inflammation (AISI), systemic inflammation index (SII), and panimmune-inflammation value (PIV); (4) lung cancer tumor markers: pro-gastrin-releasing peptide (pro-GRP), carcinoembryonic antigen (CEA), squamous cell carcinoma antigen (SCC), cytokeratin 19-fragment (cyfra21-1), carcinoma antigen 125 (CA125), and neuron-specific enolase (NSE) levels; (5) CT image characteristics: nodule location (central or peripheral), nodule shape (regular or irregular), spiculation (sunburst appearance), calcification, lobulation, cavitation signs, pleural adhesions, vascular penetration signs, bronchus signs, lymph node enlargement signs, pleural effusion signs, and maximum tumor diameter; and (6) postoperative pathological results.PNI, NLR, dNLR, MLR, NLPR, SIRI, AISI, SII, and PIV were calculated using the following expressions: All chest CT examinations included the complete thorax and were performed in the supine position.Scans were obtained during deep inspiration and the patients holding their breath.The CT images were examined and interpreted by two radiologists with more than 5 years of experience in thoracic radiology.The two radiologists independently determined the characteristics of each image, while a third radiologist, with over 20 years of experience in thoracic radiology reassessed the measurements to check for discrepancies.Any discrepancies noted were resolved through discussion among the three radiologists.Central nodules were defined as those located in the bronchi, lobular bronchi, or segmental bronchi of the lungs, while peripheral nodules were defined as those located below the tertiary bronchi.Spiculation was defined as diffusion from the nodule edge into the lungs without contact with the pleural surface.Further, cavitation was defined as the presence of spaces filled with gas and considered regions of transparency or low attenuation.Pleural adhesion was defined as the linear attenuation of pleura or fissures from the nodule.Patterns in CT images, including delamination, central nodule, diffusion, or a popcorn pattern, were considered signs of calcification.A pulmonary artery passing through the nodule, as observed in the CT images was indicative of vascular penetration.Further, bronchial signs on the CT image showed direct bronchial engagement of the nodule.Lobulation was defined as a wavy or scalloped portion on the surface of a lesion, with strands stretching from the nodal edge to the lung parenchyma.Pleural effusion was defined based on a blunted angle of the rib diaphragm in the CT image.Lymph node enlargement was defined as a > 1-cm long lymph node axis in the CT image.
All pathological samples were fixed in formalin, stained with hematoxylin and eosin (H&E), and assessed by two experienced lung pathologists.The histopathological evaluation was performed by examining H&E-stained slides under a light microscope.All registered GGNs had clear pathological diagnoses.Pathological findings were divided into four groups: benign, AAH, AIS, MIA, and IAC.AAH, AIS, MIA, and IAC were based on the International Association for the Study of Lung Cancer (IASLC)/American Thoracic Society (ATS)/European Respiratory Society (ERS) classifications of lung in resected specimens [29].Additionally, owing to the poor prognosis of patients with IAC, patients with benign lesions, AAH, AIS, and MIA were designated as the non-IAC group.

Establishment of the predictive model
First, data for the training cohort were analyzed using univariate analysis to assess all the factors affecting IAC in pGGNs.Thereafter, all factors with P < 0.2 in the univariate analysis were included for further multivariate logistic regression analyses.A predictive model and nomogram were constructed using R statistical software (Windows version 4.2.1, http:// www.r-proje ct.org.The scores for each variable were calculated using a regression model, and the predicted probability of IAC in pGGNs was derived by summing the scores for each variable.

Predictive model and nomogram performance
The performance of the predictive nomogram was evaluated based on its discriminatory power, calibration ability, and clinical utility.Discrimination describes the ability of a model to properly distinguish between incidents and non-incidents.Receiver operating characteristic (ROC) curves were used to evaluate the efficiency of the predictive nomograms [30].Calibration gauges the extent to which predicted probabilities correspond to actual results.We used the Hosmer-Lemeshow test to assess the calibration capability of the model, and P > 0.05 was indicative of a satisfactory calibration ability [31].Further evaluation of the calibration ability of the model was performed by constructing nomogram calibration curves.Internal verification was performed by repeating the bootstrap method 1,000 times [32].Decision curve analysis (DCA) was used to assess the clinical utility of the predictive nomogram based on the net benefit of different threshold probabilities [33].We determined the optimal cutoff value based on ROC curve analysis results for the training cohort when the Youden index (sensitivity + specificity − 1) reached its maximum value.

Patient characteristics
The procedure for identifying and selecting eligible patients is illustrated in Fig. 1.
A total of 563 eligible patients were included in this study.Among these patients, there were 48 cases of benign nodules and AAH, 163 were AIS, 178 cases of MIA, and 174 cases of IAC.Further, we classified all the cases under the non-IACs (n = 389) or IACs (n = 174) groups, according to the criterion, "whether the nodule had IAC characteristics." We then randomly assigned the enrolled patients to either the training cohort (n = 395) or validation cohort (n = 168) in a 7:3 ratio; no significant differences existed between the two cohorts with respect to any of the variable (Table 1).Thus, there were 273 non-IAC and 122 IAC cases in the training cohort and 116 non-IAC and 52 IAC cases in the validation cohort.The characteristics of the training and validation cohorts are presented in Table 2.

Nomogram establishment
All seven independent risk factors for pGGNs measuring ≤ 2 cm were included in the logistic regression models.Details regarding the prediction model are presented in Table 4.The probability of the occurrence of IACs in small pGGNs was then calculated according to the following equation: ln (p/1-p) = 2.41 × maximum tumor diameter + 0.053 × age + 0.049 × SA -1.245 × tumor location (centrality = 0; peripherality = 1) + 0.935 × pleural adhesions (no = 0; yes = 1) + 0.979 × bronchus sign (no = 0; yes = 1) + 0.815 × lobulation (no = 0; yes = 1) -12.759.Further, we plotted the predicted nomogram for the probability of IACs in pGGNs of size ≤ 2 cm using R statistical software, based on the above equation (Fig. 3).The nomogram comprised 10 axes, with axes 2-8 representing the seven variables in the prediction model.By drawing a line vertically to the highest-point axis, the estimated score of each risk factor could be computed and added to obtain the total risk score, which was then used to predict the probability of pGGNs developing IACs before surgery.Thus, appropriate treatment and surgery modalities can be selected.
As shown in this nomogram, there are a total of 10 axes, and axes 2-8 represent the seven variables in the prediction model.By plotting a line perpendicular to the highest point axis, the estimated score for each risk factor can be calculated and can be further summed to obtain a total score.The total point axis was then used to predict the probability of IAC for pGGNs measuring ≤ 2 cm before surgery.

Predictive performance and nomogram validation
The discriminatory power of the prediction model and nomogram was assessed via ROC curve analyses.As shown in Fig. 4, the area under the ROC curve (AUC) for the training cohort was 0.839 (95% CI: 0.798-0.879)and for the validation cohort, it was 0.782 (95% CI: 0.706-0.858),indicating that nomogram showed good predictive accuracy.Further, the ROC curve for the training cohort had cutoff, sensitivity, and specificity values of 0.274, 0.811, and 0.733, respectively, indicating excellent performance (Table 5).
We used the Hosmer-Lemeshow test and calibration charts to assess the calibration capability of our model.Thus, the observed Hosmer-Lemeshow test P-values for the training and validation cohorts were 0.1071 and 0.2595, respectively, suggesting that the difference between the predicted and actual observed probabilities was not significant.Therefore, the nomogram showed good calibration as indicated by the calibration plots corresponding to the training (Fig. 5a) and validation   cohorts (Fig. 5b).Further, the bias-corrected C-indices for the training and validation cohorts were 0.840 and 0.785, respectively.
The X-axis represents the probability predicted by the nomogram and the Y-axis represents the actual probability of pGGN being an ICA within 2 cm.The black dashed    line represents the ideal curve, the blue solid line represents the apparent curve (uncorrected), and the red solid line represents the deviation curve corrected by bootstrap method (B = 1000 times).

Clinical utility of the predictive nomogram
We performed DCA to assess the clinical utility of the predictive nomograms.As shown in Fig. 6a and b, the nomogram provided a greater net benefit and wider threshold probabilities for predicting the risk of IACs in pGGNs measuring ≤ 2 cm in both the training and validation cohorts, demonstrating its clinical utility.Additionally, our clinical impact curve (Fig. 7) showed that a high benefit ratio could be obtained within a probability threshold of 0.5-1.0.This observation suggested that the present model can be used in clinical practice to predict the probability of IACs developing in small pGGNs, helping surgeons make better clinical decisions.The y-axis measures the net gain, and the black line represents the hypothesis that pGGNs within 2 cm are non-IACs in nature, and the gray line represents the hypothesis that pGGNs measuring ≤ 2 cm are IACs.The blue line in Fig. 7A represents the training cohort, and the red line in Fig. 7B represents the validation cohort.
The horizontal coordinate is the probability threshold and the vertical coordinate is the number of people.The blue line indicates the number of people whose pGGNs were judged by the model to be IAC at different probability thresholds; the red line indicates the number of people whose pGGNs were judged by the model to be IAC and nodal true IAC at different probability thresholds.At the bottom, the cost: benefit ratio is also added, indicating the ratio of loss to benefit at different probability thresholds.

Discussion
The definitive pathological diagnosis of pGGN with diameter ≤ 2 cm is relatively challenging owing technical limitations and the potential risk of complications when performing coarse needle aspiration biopsy for pGGN   [34,35].With the increasing number of cases of early stage lung cancer presenting as pulmonary pGGNs, it is crucial to determine whether a pulmonary pGGN is an IAC; such information is vital for selecting relevant treatment options for patients.In this study, we showed that the percentages of IAC in pGGN were 30.9%,30.9%, and 31.0%for the total, training, and validation cohorts, respectively.Attempts have been made in previous studies to distinguish IAC from pre-infiltrative lesions; however, these studies did not include benign GGN [18][19][20][21][22][23][24][25][26].Further, studies have also been conducted to analyze the differentiation of solid, partial, and GGN nodules [27,28].However, clearly distinguishing IAC from non-IAC in pGGN remains challenging.Therefore, in this study, we aimed to explore the potential predictors that can be used to distinguish IAC and non-IAC in pGGN.Thus, we identified seven correlated factors, namely, maximum tumor diameter, age, SA, pleural effusion sign, bronchial signs, tumor location, and lobulation.Nodule size is an important parameter for assessing GGN invasiveness.Several investigations have shown that increasing adenocarcinoma aggressiveness is associated increasing pGGN lesion size increases [21,36].It has also been reported that a critical nodule size of 1 cm is optimal for predicting aggressive pGGN with 100% specificity [20].However, Wu et al. demonstrated that nodule size cannot be used to distinguish between infiltrative and pre-infiltrative lesions; the mean nodule size in their study was < 1 cm [21].In our predictive nomogram, the maximum tumor diameter was identified as the most significant risk factor, consistent with the results of previous studies [37,38].Notably, in China, the prevalence of IAC in pGGN is higher than that in pGGN of a similar size in Western countries (24%) [4].These observation suggests that, owing to the high risk associated with IAC, biopsy or surgery should be considered as soon as possible when pGGN size is > 1 cm.Hu et al. indicated that age ≥ 60 years is a risk factor for IAC [39].Consistent with their findings, the results of this study suggested that the probability of IAC in pGGN increases with age.However, this risk factor is not well recognized.Therefore, clinicians should be cautious when pGGN is observed in older patients.Further, such patients should undergo more frequent follow-up CT scans and be considered for biopsy or surgery.
Previous studies have also demonstrated that the CT features of pulmonary nodules can be used to assess their aggressiveness.These imaging features included lobulation, spiculation, bronchus signs, cavitation signs, pleural adhesion signs, and nodule shape [24,25,27,[40][41][42][43][44][45][46][47][48][49][50][51][52].Furuya et al. reported that 82% of lobulated nodules and 97% of acinar nodules are malignant [53].Lobulation and spiculation of pGGN are also more common in invasive lesions than in pre-infiltrative Fig. 3 Nomogram for predicting the probability of IAC for pGGN ≤ 2 cm.SA, serum amyloid lesions [18].Further, Lee et al. found that lobulation is more common in IAC than in pre-infiltrative lesions [42].However, lobulation was not included as a risk factor for malignant lung nodules in the Herder model [51].In this study, we identified lobulation as a risk factor for the occurrence of IAC in pGGN; however, no significant differences were observed in this regard with respect to spiculation.A possible reason for this observation is the limited number of nodules with spiculations included in the study.The bronchus signs observed in this study constituted another CT feature that showed association with malignancy.Bronchial signs were more frequently observed in patients with malignant GGN than in those with benign GGN.Reportedly, patients with IAC present with bronchial signs more frequently than those with AIS [45,54].Thus, based on our results, bronchial signs were identified as significant predictors of IAC (P = 0.009).
Pleural effusions have rarely been associated with the aggressiveness of pGGN in the literature.Our study demonstrated that pleural effusion signs on CT images can be used as a predictor of IAC in pGGN.At initial diagnosis, approximately 15% of patients with lung cancer present with pleural effusion, while 50% of patients at the advanced stages of the disease present with pleural effusion [55,56].If the pleural effusion is malignant, the patient's prognosis is poor.Therefore, if signs of pleural effusion are observed in CT images for patients with pGGN, treatment should be considered as soon as possible.
The distinction between the centrality and peripherality of nodules is also a significant indicator of the risk of IAC in pGGN.Our results suggested that central pGGNs are more likely to be aggressive than peripheral pGGNs.However, previous studies with a focus on the relationship between centrality and nodal aggressiveness did not show any significant association in this regard [57].Therefore, a multicenter clinical study with a large sample size is required to validate our findings.
Amylase production in lung cancer has been identified via pathological or immunohistochemical analyses, and it has also been confirmed that its serum level decreases after resection.Previous pathological and biochemical studies also support the existence of a mechanism by which lung cancer tissues produce amylase [58,59].Further, several case reports have described high serum amylase levels in patients with lung cancer [60][61][62][63][64][65][66][67][68][69][70].However, this study is the first identify SA as a risk factor for predicting the probability of IAC occurrence in pGGN.In an immunohistochemical study, amylase levels in lung cancer tissues were found to be higher than those in normal lung tissues [63].In contrast, other studies have shown that inflammatory as well as normal lung tissue can also produce amylase [58,71] and that increased positive staining for amylase in lung cancer tissue is not associated with hyperamylasemia [72].Adenocarcinoma is the predominant histological type of amylase-producing lung cancer [59,73].Our results revealed a positive correlation between SA concentration in vivo and nodule invasiveness, i.e., a higher SA concentration resulted in a higher probability of IAC occurrence in pGGN.Therefore, SA may be a new indicator for monitoring and evaluating patient prognosis.
Additionally, this study showed that multiple serum tumor biomarkers (CEA, CYFRA21-1, SCC, NSE, and CA125) had no significant value in distinguishing IAC from pGGN.The reason for their ineffectiveness may be that in early-stage lung adenocarcinomas, tumor proteins are rarely secreted into the bloodstream.
The performance of our constructed predictive model was found to be comparable with those of previously published predictive models by a considerable margin.First, we introduced benign tumors and combined them in a non-IAC cohort.This grouping   nomogram are readily available and prevalent in clinical practice.This study had some limitations.First, this was a single-center retrospective study that included only surgically resected pGGNs; therefore, validation bias was inherent in our study design.Second, this was a retrospective cross-sectional study.Until present, no longitudinal study assessing nodal growth has been reported.Thus, further studies are required in this regard.Third, the definition of pGGN and the criteria for pathological diagnosis may vary among physicians.Fourth, the subjectivity of radiologists may have led to different judgments regarding the characteristics of the examined pulmonary nodules.Fifth, only internal validation was conducted.Therefore, further studies with the recruitment of more patients for external validation are necessary.Additionally, the applicability of our predictive nomogram remains to be confirmed in multicenter prospective clinical trials with large sample sizes.

Conclusion
In this study, we showed that the maximum tumor diameter, age, SA, pleural effusion sign, bronchial sign, tumor location, and lobulation were predictors of IAC in patients with pGGN in the lungs.We also developed and validated a novel, easy-to-use nomogram for predicting the risk of IAC occurrence in patients with pGGN measuring ≤ 2 cm in diameter, based on certain influencing factors.This tool may be used to guide clinicians in developing specific and individualized treatment strategies for patients.

Fig. 4
Fig. 4 ROC curves of nomograms predicting IAC for pGGN ≤ 2 cm in the training and validation groups.ROC, receiver operating characteristic; AUC, area under the ROC curve; IAC, invasive adenocarcinoma; pGGN, pure ground glass nodule

Table 2
Clinical characteristics of patients with IACs and Non-IACs in the training and validation cohorts

Table 5
Results of ROC curve for training cohortTP True positive, FP False positive, TN True negative, FN False negative, TPR True positive rate, FPR False positive rate, TNR True negative rate, FNR False negative rate, PPV Positive predict value, NPR Negative predict value, FDR False discovery rate