Efficient diagnosis of benign and malignant pulmonary nodules based on Nano-zero-valent iron enhanced serum metabolic fingerprinting

Citation: Qiongqiong Wan, Zhourui Zhang, Mengmeng Zhao, Xianqin Ruan, Yanhong Hao, Jiajun Deng, Yunlang She, Minglei Yang, Yongxiang Song, Feng Jin, Ailin Wei, Sheng Zhong, Jie Zheng, Dong Xie, Suming Chen. Efficient diagnosis of benign and malignant pulmonary nodules based on Nano-zero-valent iron enhanced serum metabolic fingerprinting[J]. Chinese Chemical Letters, 2025, 36(10): 110794. doi: 10.1016/j.cclet.2024.110794 shu

Efficient diagnosis of benign and malignant pulmonary nodules based on Nano-zero-valent iron enhanced serum metabolic fingerprinting

English

Efficient diagnosis of benign and malignant pulmonary nodules based on Nano-zero-valent iron enhanced serum metabolic fingerprinting

a.
The Institute for Advanced Studies, Wuhan University, Wuhan 430072, China
b.
Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
c.
Department of Thoracic Surgery, Hwa Mei Hospital, Chinese Academy of Sciences, Ningbo 315010, China
d.
Department of Thoracic Surgery, Affiliated Hospital of Zunyi Medical College, Zunyi Medical College, Guizhou 563099, China
e.
Shandong Key Laboratory of Infectious Respiratory Diseases, Shandong Public Health Clinical Center Affiliated to Shandong University, Ji'nan 250013, China
f.
Department of Scientific Management, Guang'an People's Hospital, Guang'an 638099, China
g.
TiLigh Biosciences, Shanghai 200050, China
^* Corresponding authors.
E-mail addresses: kongduxd@163.com (D. Xie)
sm.chen@whu.edu.cn (S. Chen).
¹ These authors contributed equally to this work.
Received Date: 11 June 2024
Accepted Date: 21 December 2024
Revised Date: 08 November 2024
Available Online: 15 October 2025

Abstract: Accurate classification of pulmonary nodules is critical for early diagnosis of lung cancer. However, non-invasive and accurate diagnosis of benign and malignant pulmonary nodules faces great challenges. In this study, we develop a nano zero-valent iron (nZVI)-assisted laser desorption/ionization mass spectrometry (LDI MS) platform, which enables ultra-high-throughput acquisition of abundant metabolic fingerprint information of serum in negative ion mode. We further recruit a large-scale multicenter prospective cohort and collect 1099 serum samples from participants with benign and malignant nodules. The accurate machine learning models are built and validated based on nZVI-assisted LDI MS metabolomics to achieve efficient classification of benign and malignant nodules. Using our established stacking ensemble learning model, the AUC of the ROC curve for benign and malignant lung nodule classification can be as high as 0.9, and the sensitivity can reach 85.5%, which is significantly better than existing clinical models. This work provides an integrated workflow from detection technology to diagnostic models for biomarker-based pulmonary nodule diagnosis, which would be widely used in rapid and large-scale screening of pulmonary nodules.

Key words:

Pulmonary nodule
/ Nano-zero-valent iron
/ Laser desorption/ionization mass spectrometry
/ Metabolomics
/ Diagnosis

Lung cancer is the leading cause of cancer-related deaths worldwide, accounting for the highest mortality rates among both men and women [1]. In 2020, there were 2.22 million new cases of lung cancer and 1.79 million deaths globally [2]. The prognosis of lung cancer is very poor, and one of the main reasons is that about 75% of lung cancer patients are diagnosed at an advanced stage (stage Ⅲ–Ⅳ) [3]. For stage Ⅳ lung cancer, the five-year survival rate is <10% [4]. In contrast, the five-year survival rate increases dramatically to 70%−90% for stage Ⅰ cancer [4]. Undoubtedly, the most effective way to reduce lung cancer mortality is accurate early diagnosis [5].

Low-dose CT (LDCT) has proven to be very effective in early screening for lung cancer in high-risk populations. However, it is very difficult to assess the benign or malignant nature of pulmonary nodules presented by CT imaging data [6,7]. Currently widely used clinical nodal assessment tools (e.g., Mayo Clinic and Veterans Affairs (VA) models [8] and computer-assisted diagnostic methods [9]) are limited in their effectiveness by the lack of cancer-specific biological information [10]. Although suspected lung cancer lesions identified by LDCT can be further diagnosed by invasive methods such as bronchoscopy and surgery, complications including hemorrhage, infection, pneumothorax, and even death may occur [11]. Great efforts have been made to develop a reliable, sensitive, and noninvasive diagnostic method for pulmonary nodules [2,5,11].

In recent years, liquid biopsy has been recognized as an easily accessible, more cost-effective and less invasive method for cancer diagnosis and monitoring [11]. Most noninvasive early detection methods rely on the identification of tumor-derived nucleic acids or proteins in the blood [10,12]. In contrast, metabolic profiling is closer to the disease phenotype, it measures metabolites of biological activity with higher sensitivity, and has been shown to be a promising method for early cancer detection [2,13]. It is to be expected that low-invasive metabolic fingerprinting will provide great opportunities for the diagnosis of pulmonary nodules.

As a state-of-the-art metabolomics analysis tool, mass spectrometry (MS) dominates metabolic analysis [12-14], because it can provide comprehensive molecular metabolic information directly at the omics level [15-19]. However, for metabolomic analysis of complex biological samples (e.g., serum, tissue), it is often requires time-consuming pre-processing and liquid chromatography (LC)-MS coupled analyses [17,20], which limits its application in large-scale analysis of clinical samples [2]. Matrix-assisted laser desorption/ionization (MALDI) MS can obtain metabolic fingerprint profiles of biological samples in a very short period of time (a few seconds) [21,22], but commonly used organic matrices produce a large number of interfering ions in the small molecule mass range [23], which seriously affects the accuracy and coverage of metabolite analysis [24]. The development of nanomaterial matrices holds promise for high-throughput metabolomics analysis based on MALDI MS [2,25], but the exploration of novel nano-matrices with ultra-low background interference remains a challenge [23,26].

Previous studies have shown that nanoscale metal oxides can be used as MALDI MS matrices to analyze small metabolic molecules in the positive ion mode [25,27]. However, a large number of alkali metal addition peaks are usually generated, making the resolving of the mass spectra difficult. Metabolites detected in the positive ion mode of LDI MS have been previously used in combination with machine learning for the diagnosis of pulmonary nodules, but the accuracy is not high enough and needs to be used in combination with multiple modalities [2]. In contrast, MS spectra in the negative ion mode often have the advantage of lower background signals and higher reproducibility [28]. We considered the possibility of using zero-valent metal nanoparticles as matrices to assist the LDI of metabolic small molecules. Nano-zero-valent iron (nZVI) nanoparticle is a metallic nanomaterial that has been widely used in environmental chemistry [29,30]. It has special physical and chemical properties, such as a strong reduction or complexation ability for positively charged heavy metal ions [31], and strong adsorption capacity for organic compounds due to its high specific surface area [29]. We envision that these properties of nZVI may make it a potential high-performance nano-matrix to enable metabolomics analysis of biological samples.

In this study, we developed an nZVI-assisted LDI MS platform that can capture metabolites from complex biological samples and can efficiently acquire serum metabolic fingerprint profiles in negative ion mode. Using this platform, we performed metabolomic analysis on a large-scale (>1000) pulmonary nodule sample collected from multiple centers, and discovered the great potential of serum metabolic fingerprinting in the diagnosis of pulmonary nodules. Different machine learning models including ensemble learning models were optimized and developed to achieve accurate classification of benign and malignant pulmonary nodules with the overall accuracy and sensitivity are greater than 80%. Using the established stacking model, the AUC of the ROC curve for benign and malignant lung nodule classification can be as high as 0.9, and the sensitivity can reach 85.5%, which is significantly superior to the clinical nodule assessment tools. In addition, a panel of metabolic biomarkers was established to efficiently discriminate malignant lung nodules from benign ones. This study provides a new strategy for the high-performance molecular diagnosis of pulmonary nodules.

The detailed experimental methods including the instrumental setup, multicenter clinical serum sample collection for pulmonary nodules, serum metabolite extraction, characterization of nZVI nanoparticles, optimization of the analytical conditions of the nZVI matrix, investigation of the performance of the nZVI matrix using standards, acquisition of the serum metabolic fingerprints by the nZVI-assisted LDI MS, mass calibration, mass spectral processing, and data analysis were described in Supporting information.

To investigate the potential of zero-valent iron in the enhancement of the ionization of small molecules, nZVI particles were used to examine its performance in MALDI MS analysis. We first characterized the nZVI particles by scanning electron microscope (SEM), transmission electron microscope (TEM) and X-ray diffraction (XRD) analysis. TEM and selected area electron diffraction (SAED) analyses demonstrated the crystalline structure of the nZVI particles (Figs. 1A and B). The XRD patterns with sharp peaks also indicate the nanoparticles are crystalline. Two peaks at about 45° and 65° corresponding to the (110) and (200) planes of iron were observed, respectively, indicating the zero-valent nature of these iron nanoparticles (Fig. 1D). The SEM image shows that the nZVI particles have a regular spherical shape with an average diameter of 73 ± 21 nm (Figs. 1C and E).

Figure 1

Figure 1. Nano-zero-valent iron (nZVI)-assisted LDI MS platform for serum fingerprinting and diagnosis of pulmonary nodules. (A) Transmission electron microscopy (TEM) image of nZVI particles. (B) Selected area electron diffraction (SAED) pattern of nZVI particles showing crystalline structure. (C) Scanning electron microscope (SEM) image of nZVI particles. (D) X-ray diffraction (XRD) pattern of nZVI particles. (E) Particle size distribution of the nZVI particles. (F) Schematic of the nZVI-assisted LDI MS analysis of metabolites in negative ion mode. (G) Workflow for the diagnosis of patients with benign and malignant pulmonary nodules using the nZVI-assisted LDI MS and machine learning.

DownLoad: Full-Size Img PowerPoint

We next investigated the performance of nZVI particles in the LDI MS analysis of small molecules. Previous studies have shown that metal nanoparticle-assisted LDI usually works better only in the positive ion mode, and there is a great lack of matrices with superior performance in the negative ion mode. Thus, we focused on the performance of nZVI particles in analyzing small molecule metabolites in negative ion mode (Fig. 1F). The experiment started with the analysis of 18 metabolites including amino acids, organic acids and lipids (Fig. S1 in Supporting information). nZVI particles were dispersed with isopropanol and then firstly dropped on the target plate, dried and then dropwise spiked with different small molecule mixture solutions before LDI MS analysis. The results show that the presence of nZVI allows all these small molecule metabolites to be detected by LDI MS, and that nZVI itself produces almost no interference peaks. Notably, all of these small molecules were detected as [M−H]⁻ (Fig. S1), indicating that nZVI has a strong affinity to proton, and explaining to some extent the ability of nZVI to enhance the analysis of small molecules by LDI MS in the negative ion mode. We further tested the ability of this nZVI-assisted LDI MS to analyze metabolites in serum (Fig. S2 in Supporting information). In the negative ion mode, about 360 MS peaks could be detected from the extracted serum solution with an analysis time of as short as 20 s. In contrast, under the same MS conditions, nZVI itself has only a dozen or so background peaks and very low intensity. This may be related to the crystal-structured nZVI nanoparticles having very high purity and good photothermal stability. It is worth mentioning that the nZVI particles can be obtained very easily from commercial sources at a price as low as about ＄1/g. Each gram of nZVI can detect ~105 of serum samples, which is very suitable for large-scale clinical analysis. The above results illustrate that the nZVI-based LDI MS platform we developed can obtain serum metabolic fingerprints in high-throughput and reliably, which provides a promising tool for metabolomics-based diagnostic analysis of diseases.

Having established the high-throughput nZVI-based LDI MS platform, we next aim to acquire the large-scale serum metabolic fingerprints to develop an efficient approach for the diagnosis of benign and malignant pulmonary nodules (Fig. 1G). To this goal, a total of 1099 serum samples were collected from 5 hospitals in different provinces of China (Fig. S3 in Supporting information). This study was reviewed and approved by the hospital ethics committees of Shandong Provincial Chest Hospital (ethical approval number 2021XKYYEC-14). Patients/participants provided their written informed consent to participate in this study. Among them, the numbers of benign and malignant participants are 465 and 634, respectively (Fig. 2A). No significant age and gender differences were observed between the groups (Table S1 in Supporting information). Cases of benign nodules were those with stable evaluations on follow-up CT scans for at least two years at the time of analysis or confirmed by pathological diagnosis. Cases of malignant nodules were collected before lung resection surgery and confirmed by pathological diagnosis. The serum metabolic fingerprints can be easily obtained for participants with benign and malignant pulmonary nodules using the developed nZVI-assisted LDI MS platform (Figs. 2B and C). We first performed a power analysis (a universal method for deriving optimal sample sizes by estimating statistical power in hypothesis testing) on a 20-sample (10/10, benign/malignant) dataset from a pilot study to calculate the minimum sample size required for a meaningful serum metabolic diagnosis. The results show that when the sample size reaches 200 (100/100, benign/malignant), the predictive power can reach nearly 0.9 with a false discovery rate (FDR) of 0.1 (Fig. S4 in Supporting information), which can be a sufficient confidence level to conclude the statistically meaningful results according to previous studies [32,33].

Figure 2

Figure 2. nZVI-assisted MS analysis of serum metabolites in patients with benign and malignant pulmonary nodules. (A) Gender and age information of benign and malignant pulmonary nodule cohorts. nZVI-assisted mass spectra of a randomly selected serum sample from patient with (B) benign and (C) malignant pulmonary nodules. (D) Volcano plot showing metabolites with significant differences in serum samples from patients with benign and malignant pulmonary nodules.

DownLoad: Full-Size Img PowerPoint

All 1099 extracted serum samples and quality controls (QC) were then analyzed by the nZVI-based LDI MS platform to obtain the metabolic fingerprints in a high-throughput manner (Figs. S5 and S6 in Supporting information). In this study, we developed a new MS calibration method for the negative ion mode using the reagents CuI and NaBr (see Methods in Supporting information). Various ions and adducts in the range of 50–1200 Da were obtained by ionization and ion exchange in solution, which effectively enables MALDI MS calibration over the entire metabolic range, thus ensuring the accuracy of detection. >360 high-quality peaks could be extracted from the raw MS data of the QC based on the signal-to-noise ratio (S/N ≥ 4). The peaks present in >80% of the spectra of all samples were retained, and m/z 94 features were obtained by further excluding the background peaks according to the threshold. These m/z 94 features were considered as the final MS output (metabolic pattern) for the disease classifier. The assignment of the possible metabolites for these peaks was performed by matching against the Human Metabolome Database (HMDB) (Table S2 in Supporting information). The heat maps of all the 1099 independent metabolic patterns from patients with benign and malignant pulmonary nodules shows that the metabolite signals were uniformly distributed in the given m/z range (Fig. S5). Our data were acquired with a high degree of consistency and reproducibility, the correlation coefficient was over 88% for a total of 60 QC samples (Fig. S6). This result indicates the reliability of the serum metabolic patterns obtained with the nZVI-based LDI MS platform.

We next investigated the possibility of identifying the malignant cases from pulmonary nodule patients based on the molecular signatures of metabolites (Table S2). By comparing the intensities of the 94 features, 14 features show significant difference between the benign and malignant groups (adjust P value < 0.05 and |log2(Fold change)| > 0.25, Fig. 2D, Fig. S7 and Table S3 in Supporting information). We then examined different algorithms for the discrimination of malignant pulmonary nodule from benign controls. First, the unsupervised learning methods principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) were used to reduce dimensions of the intensity matrix and compare the two groups of samples in a multidimensional space using all 94 features. The results show that the two group of samples could not be well separated (Fig. S8 in Supporting information), which may imply only the subtle differences between malignant and benign groups. Therefore, more advanced methods are required to discriminate them.

We then attempted to classify these samples by applying five different machine learning algorithms: Support vector machine (SVM), K-nearest neighbors (KNN), random forest (RF), multilayer perceptron (MLP) and eXtreme gradient boosting (XGBoost). Fivefold (outer) nested repeated (ten times) tenfold (inner) cross-validation was used for hyperparameters optimization and performance evaluation (Table S4 in Supporting information). The performance of each model was comprehensively evaluated by multiple metrics calculated by the outer loop, namely receiver operating characteristic (ROC) curve, precision-recall (PR) curve, accuracy, sensitivity and specificity (Fig. 3A and Table S5 in Supporting information). True positive rate (TPR) and false positive rate (FPR) at different thresholds were compared by ROC curves, and precision and recall were compared by PR curves. Accuracy, sensitivity, and specificity measure the proportion of all samples, positive samples, and negative samples correctly predicted using all 94 features, respectively. As shown in Fig. 3A, the area under curve (AUC) of all the five machine learning models are greater than 0.80, and the AUC of XGBoost and SVM are even higher than 0.88. Accordingly, the accuracies and sensitivities are greater than 80% except KNN model, although the specificities are relatively lower (0.725–0.750). Using the SVM model, 535/634 malignant pulmonary nodule samples and 359/435 benign controls were correctly diagnosed (Fig. S9 in Supporting information).

Figure 3

Figure 3. Diagnosis of benign and malignant pulmonary nodules by machine learning with serum metabolic fingerprints. (A) ROC and PR curves of five separate machine learning models. (B) Schematic workflows of the stacking and voting models with five-fold nested cross-validation, including the inner loop to tune the optimized hyperparameters of each separate classifier and the outer loop to evaluate the performance of the models. (C) ROC and PR curves of the developed stacking learning model. (D) ROC and PR curves of the developed voting learning model. (E) Performance metrics of five separate machine learning models and two ensemble learning models. (F, G) Confusion matrices of the stacking and voting models for the classification of benign and malignant pulmonary nodules.

DownLoad: Full-Size Img PowerPoint

Besides these traditional machine learning models, an ensemble learning scheme called stacking [34] has been attempted to combine multiple machine learning models to improve the performance. Stacking is a general two-level framework. The first layer consists of multiple machine learning models, while the second layer is a meta-learner that takes the output of the classifiers in the first layer as input to generate the final output of the entire model (Fig. 3B). In this case, we used SVM, XGBoost and MLP in the first layer. The stacking model with RF as the meta-learner could reach 0.818 accuracy, 0.855 sensitivity and 0.768 specificity (Fig. 3E and Table S6 in Supporting information). The AUC of ROC curve and AUC of PR curve of the stacking model could achieve 0.902 and 0.855, respectively (Fig. 3C). The confusion matrix shows that 542/634 malignant pulmonary nodule samples and 357/435 benign controls can be correctly diagnosed using this model (Fig. 3F).

We also proposed a voting algorithm to replace the meta-learner in the second layer, making the model more balanced in the sensitivity and specificity. In the voting algorithm, a sample will be predicted as malignant when the outputs of more than one of the classifiers in the first layer is malignant (Fig. 3B). When SVM, XGB and MLP were used in the first layer, the ensemble model with the new voting algorithm achieved the accuracy, sensitivity, and specificity of 0.811, 0.806, and 0.817, respectively (Fig. 3E). The AUC of ROC curve and PR curve could reach 0.880 and 0.900 (Fig. 3D). Using the voting model, 511/634 malignant pulmonary nodule samples and 380/435 benign controls were correctly diagnosed (Fig. 3G). Overall, better diagnostic performance can be achieved using the ensemble learning models. The stacking model has higher sensitivity, while the voting model has a better balance of both sensitivity and specificity. For clinical use, the appropriate model can be selected according to actual needs. Clearly, the overall performance of the machine learning models was significantly improved compared to PCA and UMAP, and the diagnostic accuracy is also superior to the clinically used Mayo Clinic and Veterans Affairs models [35,36]. This integrated workflow of nZVI LDI MS-based serum metabolic fingerprinting and machine learning provides an efficient diagnostic tool for pulmonary nodules.

Having demonstrated that serum metabolic fingerprints obtained by the nZVI-enhanced LDI MS can discriminate well between benign and malignant pulmonary nodules, and we next aim to explore whether a panel of a small number of key metabolites could also be used to achieve an efficient classification. A benefit of using XGBoost model is that after the boosted trees are constructed, it is relatively straightforward to retrieve importance scores for each attribute. So, we used the XGBoost model to rank the importance of each metabolic feature and calculated the sensitivity, specificity, accuracy and AUC values obtained by using different numbers of features (Fig. 4A). The results show that the machine learning model built using the top 10 ranked importance features already achieves almost the highest classification performance (Fig. 4A). When adding different numbers of features with decreasing importance sequentially, the classification performance of the model was not significantly improved. We further identified these 10 features by excluding the low-abundance isotopic peaks of m/z 282.26 and 90.00, as well as the last ranked peak m/z 225.05 which was not significantly different between the two groups. Ultimately, we used the remaining seven metabolite peaks as a biomarker panel for pulmonary nodule diagnosis (Fig. 4B).

Figure 4

Figure 4. Diagnosis of benign and malignant pulmonary nodules by machine learning with serum metabolic biomarker panel. (A) Changes of the performance metrics with the numbers of feature used in XGBoost model for the classification of pulmonary nodules. (B) Top 10 ranked importance features for classifying benign and malignant pulmonary nodules. Features marked with red asterisks are the final seven metabolic biomarkers selected. (C) ROC curves of five separate machine learning models. (D, E) ROC curves of the stacking and voting models. (F) Performance metrics of five separate machine learning models and two ensemble learning models using the seven metabolic biomarkers.

DownLoad: Full-Size Img PowerPoint

Five individual machine learning models including KNN, RF, MLP, XGB, and SVM and the proposed ensemble machine learning models, were applied to classify all the 1099 benign and malignant samples with the seven biomarkers (Figs. 4C-F and Figs. S10-S12 in Supporting information). For each model, fivefold (outer) nested repeated (ten times) tenfold (inner) cross-validation (with randomized stratified splitting) was used for hyperparameters optimization and performance evaluation (Table S7 in Supporting information). The results show that the AUC in distinguishing between benign and malignant pulmonary nodules can reach 0.85 or more whether using a single or ensemble machine learning model (Fig. 4F, Tables S8 and S9 in Supporting information). Therefore, we concluded that the panel of seven biomarkers was useful in discriminating disease from control samples. The construction of the biomarker panel could simplify the analysis and facilitate the large-scale clinical use of this approach.

By using accurate masses with the HMDB database and taking into account factors such as the concentration of the candidate in the blood and a reasonable form of ionization, we concluded that the most probable structures of these seven metabolites were: tyrosine (m/z 218.02), fatty acid FA 18:1 (m/z 281.25), glutaric acid (m/z 131.04), gentisic acid (m/z 153.02), threonine (m/z 118.05), aspartic acid (m/z 114.02) and diacylglycerol DG 36:4 (m/z 615.50) (Fig. 5 and Table S10 in Supporting information). The abundance of all seven metabolites in the serum of patients with benign and malignant pulmonary nodules was significantly different. Among them, tyrosine, gentisic acid and threonine were significantly more abundant in the serum of malignant pulmonary nodule patients than that of benign patients, whereas FA 18:1, glutaric acid, aspartic acid and DG 36:4 were found to be significantly lower in the serum of malignant pulmonary nodule patients (Fig. 5).

Figure 5

Figure 5. Comparison of the relative intensities of the seven metabolic biomarkers between the benign and malignant samples. Two-tailed Student t-test was performed for comparing groups, and adjusted P values were calculated by Benjamini and Hochberg correction. ^****P < 0.0001.

DownLoad: Full-Size Img PowerPoint

To interrogate the potential metabolic pathway alteration contributed by these metabolites (Fig. 6A), pathway analysis (Fig. 6B and Fig. S13 in Supporting information) was conducted in MetaboAnalyst (https://www.metaboanalyst.ca/). A total of three pathways were considered as altered (P < 0.05, pathway impact value > 0.1): (1) Phenylalanine, tyrosine and tryptophan biosynthesis, (2) alanine, aspartate and glutamate metabolism, and (3) tyrosine metabolism (Fig. 6B). These findings are consistent with the recent studies that the abnormal biosynthesis and metabolism of amino acid was observed in lung carcinoma patients [37,38].

Figure 6

Figure 6. Fold changes of the seven biomarkers and potential pathways alteration. (A) Fold changes of four down-regulated metabolites (blue) and three up-regulated metabolites (orange) in serum of participants with malignant pulmonary nodules compared with those with benign pulmonary nodules. (B) Potential pathways differentially regulated in benign and malignant pulmonary nodule groups. The seven selected metabolites were tested to identify altered pathways. The color and size of each circle were correlated to the P values and pathway impact values, respectively. A total of three pathways were considered as altered (P < 0.05, pathway impact value > 0.1): (1) phenylalanine, tyrosine and tryptophan biosynthesis, (2) alanine, aspartate and glutamate metabolism, and (3) tyrosine metabolism.

DownLoad: Full-Size Img PowerPoint

The survival rate of lung adenocarcinoma greatly depends on its stage at the time of diagnosis. Pulmonary nodules, as a form of early lung lesion, will bring hope for the early diagnosis of lung cancer if its benign or malignant nature can be accurately diagnosed. Non-invasive biomarkers that can meet the clinical requirements of high accuracy, low cost, and easy to analyze will be of great value in the diagnosis of early lung adenocarcinoma. However, the current lung cancer biomarkers, such as carcinoembryonic antigen (CEA), can only monitor the development of cancer and lack sufficient sensitivity for early diagnosis of lung cancer.

In this study, we develop a novel nZVI-assisted LDI MS method and establish a MS analysis platform for high-throughput and rapid acquisition of serum metabolic fingerprinting profiles, which allows us to explore the feasibility of serum metabolic fingerprints in the diagnosis of benign and malignant pulmonary nodules. By performing large-scale metabolic analysis on serum samples from 1099 lung nodule cases collected from multiple centers, we established machine learning-based diagnostic models. Delightfully, whether using metabolic fingerprinting or the seven selected metabolic biomarkers, the present method is very effective in differentiating benign and malignant nodules, with the AUC of the ROC curves above 0.85, and the AUC even reaches 0.90 using the stacking model, and the sensitivity of the method is basically over 80%.

It should be noted that pulmonary nodule is a very mild tissue lesion, which triggers a very small response and change in body function. Our results illustrate that it is feasible to effectively classify pulmonary nodules by detecting changes in the blood metabolome, and the metabolic biomarkers analyzed by the nZVI-based LDI MS platform have very high sensitivity. This study not only provides a new technique to effectively obtain metabolic fingerprint profiles in negative ion mode, but also provides an effective way to diagnose the benign and malignant nature of pulmonary nodules. However, this study still has several limitations. First, the number of benign and malignant samples is not well balanced in some centers due to the difficulty of multicenter studies and differences in conditions across centers. Second, this study focused mainly on the development and validation of diagnostic methods and models, and the potential pathways and mechanisms need to be further investigated. Finally, although our method has shown better results compared to previous models, the accuracy for the diagnosis of malignant nodules still needs to be further improved.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Qiongqiong Wan: Writing – original draft, Investigation, Formal analysis. Zhourui Zhang: Formal analysis. Mengmeng Zhao: Resources. Xianqin Ruan: Investigation. Yanhong Hao: Formal analysis. Jiajun Deng: Resources. Yunlang She: Resources, Conceptualization. Minglei Yang: Resources. Yongxiang Song: Resources. Feng Jin: Resources. Ailin Wei: Resources. Sheng Zhong: Conceptualization. Jie Zheng: Conceptualization. Dong Xie: Resources. Suming Chen: Writing – review & editing, Supervision, Methodology, Funding acquisition, Conceptualization.

Acknowledgments

This work was financially supported by the Fundamental Research Funds for the Central Universities (No. WHU 2042024kf0009), National Key Research and Development Program of China (No. 2021YFC2700700) and the National Natural Science Foundation of China (Nos. 22074111, 22004093). The authors thank Tianze Wang for the help in the sample preparation and MS experiment.

Supplementary materials

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.cclet.2024.110794.
1. [1]
  H. Sung, J. Ferlay, R.L. Siegel, et al., CA Cancer J. Clin. 71 (2021) 209–249. doi: 10.3322/caac.21660
2. [2]
  L. Wang, M. Zhang, X. Pan, et al., Adv. Sci. 9 (2022) e2203786. doi: 10.1002/advs.202203786
3. [3]
  S. Walters, C. Maringe, M.P. Coleman, et al., Thorax 68 (2013) 551–564. doi: 10.1136/thoraxjnl-2012-202297
4. [4]
  S.Blandin Knight, P.A. Crosbie, H. Balata, et al., Open Biol. 7 (2017) 170070. doi: 10.1098/rsob.170070
5. [5]
  J. He, B. Wang, J. Tao, et al., Lancet Digit. Health 5 (2023) e647–e656. doi: 10.1016/S2589-7500(23)00125-5
6. [6]
  P.J. Mazzone, L. Lam, JAMA 327 (2022) 264–273. doi: 10.1001/jama.2021.24287
7. [7]
  D.E. Ost, M.K. Gould, Am. J. Respir. Crit. Care Med. 185 (2012) 363–372. doi: 10.1164/rccm.201104-0679CI
8. [8]
  V.S. Nair, V. Sundaram, M. Desai, et al., Am. J. Respir. Crit. Care Med. 197 (2018) 1220–1223. doi: 10.1164/rccm.201708-1632le
9. [9]
  K. Kanazawa, Y. Kawata, N. Niki, et al., Comput. Med. Imaging Graph. 22 (1998) 157–167. doi: 10.1016/S0895-6111(98)00017-2
10. [10]
  P.P. Massion, S. Antic, S. Ather, et al., Am. J. Respir. Crit. Care Med. 202 (2020) 241–249. doi: 10.1164/rccm.201903-0505oc
11. [11]
  W. Liang, Z. Chen, C. Li, et al., J. Clin. Invest. 131 (2021) e145973. doi: 10.1172/JCI145973
12. [12]
  K. Dettmer, P.A. Aronov, B.D. Hammock, Mass Spectrom. Rev. 26 (2007) 51–78. doi: 10.1002/mas.20108
13. [13]
  Q. Wan, Y. Xiao, G. Feng, et al., Chin. Chem. Lett. 35 (2024) 108775. doi: 10.1016/j.cclet.2023.108775
14. [14]
  T. Zeng, Y.S. Liang, Q.Y. Dai, et al., Chin. Chem. Lett. 33 (2022) 5184–5188. doi: 10.1016/j.cclet.2022.03.020
15. [15]
  Y. Hao, Z. Zhang, G. Feng, et al., iScience 24 (2021) 102974. doi: 10.1016/j.isci.2021.102974
16. [16]
  L. Perez de Souza, S. Alseekh, F. Scossa, et al., Nat. Methods 18 (2021) 733–746. doi: 10.1038/s41592-021-01116-4
17. [17]
  Y. Yao, X.P. Wang, J. Guan, et al., Nat. Commun. 14 (2023) 2339. doi: 10.1038/s41467-023-37875-1
18. [18]
  J. Luo, Q. Wan, S. Chen, Chin. Chem. Lett. 36 (2025) 109836. doi: 10.1016/j.cclet.2024.109836
19. [19]
  A. Tan, X. Ma, Chin. Chem. Lett. 35 (2024) 109276. doi: 10.1016/j.cclet.2023.109276
20. [20]
  F. Zheng, X. Zhao, Z. Zeng, et al., Nat. Protoc. 15 (2020) 2519–2537. doi: 10.1038/s41596-020-0341-5
21. [21]
  Q. Wan, M. Chen, Z. Zhang, et al., Front. Chem. 9 (2021) 746134. doi: 10.3389/fchem.2021.746134
22. [22]
  S. Chen, C. Xiong, H. Liu, et al., Nat. Nanotechnol. 10 (2015) 176–182. doi: 10.1038/nnano.2014.282
23. [23]
  S. Chen, H. Zheng, J. Wang, et al., Anal. Chem. 85 (2013) 6646–6652. doi: 10.1021/ac401601r
24. [24]
  S. Chen, L. Chen, J. Wang, et al., Anal. Chem. 84 (2012) 10291–10297. doi: 10.1021/ac3021278
25. [25]
  L. Huang, L. Wang, X. Hu, et al., Nat. Commun. 11 (2020) 3556. doi: 10.1038/s41467-020-17347-6
26. [26]
  D. Li, J. Yi, G. Han, et al., ACS Meas. Sci. Au 2 (2022) 385–404. doi: 10.1021/acsmeasuresciau.2c00019
27. [27]
  W.H. Muller, A. Verdin, E. De Pauw, et al., Mass Spectrom. Rev. 41 (2022) 373–420. doi: 10.1002/mas.21670
28. [28]
  Z. Qiao, F. Lissel, Chem. Asian J. 16 (2021) 868–878. doi: 10.1002/asia.202100044
29. [29]
  H. Tang, J. Wang, S. Zhang, et al., J. Clean. Prod. 319 (2021) 128641. doi: 10.1016/j.jclepro.2021.128641
30. [30]
  Y. Liu, T. Wu, J.C. White, et al., Nat. Nanotechnol. 16 (2021) 197–205. doi: 10.1038/s41565-020-00803-1
31. [31]
  C. Wu, J. Tu, W. Liu, et al., Environ. Sci. Nano 4 (2017) 1544–1552. doi: 10.1039/C7EN00240H
32. [32]
  J.G. Xia, I.V. Sinelnikov, B. Han, et al., Nucleic Acids Res. 43 (2015) W251–W257. doi: 10.1093/nar/gkv380
33. [33]
  J. Chong, O. Soufan, C. Li, et al., Nucleic Acids Res. 46 (2018) W486–W494. doi: 10.1093/nar/gky310
34. [34]
  W.Z. Li, W. Miao, J.X. Cui, et al., J. Chem. Inf. Model. 59 (2019) 1849–1857. doi: 10.1021/acs.jcim.8b00878
35. [35]
  E.M. Schultz, G.D. Sanders, P.R. Trotter, et al., Thorax 63 (2008) 335–341. doi: 10.1136/thx.2007.084731
36. [36]
  A. Al-Ameri, P. Malhotra, H. Thygesen, et al., Lung Cancer 89 (2015) 27–30. doi: 10.1016/j.lungcan.2015.03.018
37. [37]
  A. Mohamed, X. Deng, F.R. Khuri, et al., Clin. Lung Cancer 15 (2014) 7–15. doi: 10.1016/j.cllc.2013.09.001
38. [38]
  M. Endicott, M. Jones, J. Hull, Amino Acids 53 (2021) 1169–1179. doi: 10.1007/s00726-021-03052-1
Figure 1 Nano-zero-valent iron (nZVI)-assisted LDI MS platform for serum fingerprinting and diagnosis of pulmonary nodules. (A) Transmission electron microscopy (TEM) image of nZVI particles. (B) Selected area electron diffraction (SAED) pattern of nZVI particles showing crystalline structure. (C) Scanning electron microscope (SEM) image of nZVI particles. (D) X-ray diffraction (XRD) pattern of nZVI particles. (E) Particle size distribution of the nZVI particles. (F) Schematic of the nZVI-assisted LDI MS analysis of metabolites in negative ion mode. (G) Workflow for the diagnosis of patients with benign and malignant pulmonary nodules using the nZVI-assisted LDI MS and machine learning.

下载: 全尺寸图片幻灯片

Figure 2 nZVI-assisted MS analysis of serum metabolites in patients with benign and malignant pulmonary nodules. (A) Gender and age information of benign and malignant pulmonary nodule cohorts. nZVI-assisted mass spectra of a randomly selected serum sample from patient with (B) benign and (C) malignant pulmonary nodules. (D) Volcano plot showing metabolites with significant differences in serum samples from patients with benign and malignant pulmonary nodules.

下载: 全尺寸图片幻灯片

Figure 3 Diagnosis of benign and malignant pulmonary nodules by machine learning with serum metabolic fingerprints. (A) ROC and PR curves of five separate machine learning models. (B) Schematic workflows of the stacking and voting models with five-fold nested cross-validation, including the inner loop to tune the optimized hyperparameters of each separate classifier and the outer loop to evaluate the performance of the models. (C) ROC and PR curves of the developed stacking learning model. (D) ROC and PR curves of the developed voting learning model. (E) Performance metrics of five separate machine learning models and two ensemble learning models. (F, G) Confusion matrices of the stacking and voting models for the classification of benign and malignant pulmonary nodules.

下载: 全尺寸图片幻灯片

Figure 4 Diagnosis of benign and malignant pulmonary nodules by machine learning with serum metabolic biomarker panel. (A) Changes of the performance metrics with the numbers of feature used in XGBoost model for the classification of pulmonary nodules. (B) Top 10 ranked importance features for classifying benign and malignant pulmonary nodules. Features marked with red asterisks are the final seven metabolic biomarkers selected. (C) ROC curves of five separate machine learning models. (D, E) ROC curves of the stacking and voting models. (F) Performance metrics of five separate machine learning models and two ensemble learning models using the seven metabolic biomarkers.

下载: 全尺寸图片幻灯片

Figure 5 Comparison of the relative intensities of the seven metabolic biomarkers between the benign and malignant samples. Two-tailed Student t-test was performed for comparing groups, and adjusted P values were calculated by Benjamini and Hochberg correction. ^****P < 0.0001.

下载: 全尺寸图片幻灯片

Figure 6 Fold changes of the seven biomarkers and potential pathways alteration. (A) Fold changes of four down-regulated metabolites (blue) and three up-regulated metabolites (orange) in serum of participants with malignant pulmonary nodules compared with those with benign pulmonary nodules. (B) Potential pathways differentially regulated in benign and malignant pulmonary nodule groups. The seven selected metabolites were tested to identify altered pathways. The color and size of each circle were correlated to the P values and pathway impact values, respectively. A total of three pathways were considered as altered (P < 0.05, pathway impact value > 0.1): (1) phenylalanine, tyrosine and tryptophan biosynthesis, (2) alanine, aspartate and glutamate metabolism, and (3) tyrosine metabolism.

下载: 全尺寸图片幻灯片