TY - JOUR
T1 - A metagenome-derived artificial intelligence modeling framework advances the predictive diagnosis and interpretation of petroleum-polluted groundwater
AU - Wijaya, Jonathan
AU - Park, Joonhong
AU - Yang, Yuyi
AU - Siddiqui, Sharf Ilahi
AU - Oh, Seungdae
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/7/5
Y1 - 2024/7/5
N2 - Groundwater (GW) quality monitoring is vital for sustainable water resource management. The present study introduced a metagenome-derived machine learning (ML) model aimed at enhancing the predictive understanding and diagnostic interpretation of GW pollution associated with petroleum. In this framework, taxonomic and metabolic profiles derived from GW metagenomes were combined for use as the input dataset. By employing strategies that optimized data integration, model selection, and parameter tuning, we achieved a significant increase in diagnostic accuracy for petroleum-polluted GW. Explanatory artificial intelligence techniques identified petroleum degradation pathways and Rhodocyclaceae as strong predictors of a pollution diagnosis. Metagenomic analysis corroborated the presence of gene operons encoding aminobenzoate and xylene biodegradation within the de novo assembled genome of Rhodocyclaceae. Our genome-centric metagenomic analysis thus clarified the ecological interactions associated with microbiomes in breaking down petroleum contaminants, validating the ML-based diagnostic results. This metagenome-derived ML framework not only enhances the predictive diagnosis of petroleum pollution but also offers interpretable insights into the interaction between microbiomes and petroleum.
AB - Groundwater (GW) quality monitoring is vital for sustainable water resource management. The present study introduced a metagenome-derived machine learning (ML) model aimed at enhancing the predictive understanding and diagnostic interpretation of GW pollution associated with petroleum. In this framework, taxonomic and metabolic profiles derived from GW metagenomes were combined for use as the input dataset. By employing strategies that optimized data integration, model selection, and parameter tuning, we achieved a significant increase in diagnostic accuracy for petroleum-polluted GW. Explanatory artificial intelligence techniques identified petroleum degradation pathways and Rhodocyclaceae as strong predictors of a pollution diagnosis. Metagenomic analysis corroborated the presence of gene operons encoding aminobenzoate and xylene biodegradation within the de novo assembled genome of Rhodocyclaceae. Our genome-centric metagenomic analysis thus clarified the ecological interactions associated with microbiomes in breaking down petroleum contaminants, validating the ML-based diagnostic results. This metagenome-derived ML framework not only enhances the predictive diagnosis of petroleum pollution but also offers interpretable insights into the interaction between microbiomes and petroleum.
KW - Groundwater monitoring
KW - Machine learning
KW - Metagenome
KW - Microbiome
KW - Petroleum
UR - http://www.scopus.com/inward/record.url?scp=85192675423&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192675423&partnerID=8YFLogxK
U2 - 10.1016/j.jhazmat.2024.134513
DO - 10.1016/j.jhazmat.2024.134513
M3 - Article
C2 - 38735183
AN - SCOPUS:85192675423
SN - 0304-3894
VL - 472
JO - Journal of Hazardous Materials
JF - Journal of Hazardous Materials
M1 - 134513
ER -