Challenges and opportunities for validation of AI-based new approach methods

Main Article Content

Thomas Hartung , Nicole Kleinstreuer
[show affiliations]

Abstract

The integration of artificial intelligence (AI) into new approach methods (NAMs) for toxicology rep­resents a paradigm shift in chemical safety assessment. Harnessing AI appropriately has enormous potential to streamline validation efforts. This review explores the challenges, opportunities, and future directions for validating AI-based NAMs, highlighting their transformative potential while acknowledging the complexities involved in their implementation and acceptance. We discuss key hurdles such as data quality, model interpretability, and regulatory acceptance, alongside opportunities including enhanced predictive power and efficient data integration. The concept of e-validation, an AI-powered framework for streamlining NAM validation, is presented as a comprehensive strategy to overcome limitations of traditional validation approaches, leveraging AI-powered modules for reference chemical selection, study simulation, mechanistic validation, and model training and evaluation. We propose robust validation strategies, including tiered approaches, performance benchmarking, uncertainty quantification, and cross-validation across diverse datasets. The importance of ongoing monitoring and refinement post-implementation is emphasized, addressing the dynamic nature of AI models. We consider ethical implications and the need for human oversight in AI-driven toxicology and outline the impact of trends in AI devel­opment, research priorities, and a vision for the integration of AI-based NAMs in toxicological practice, calling for collaboration among researchers, regulators, and industry stakeholders. We describe the vision of companion AI post-validation agents to keep methods and their validity status current. By addressing these challenges and opportunities, the scientific community can harness the potential of AI to enhance predictive toxicology while reducing reliance on traditional animal testing and increasing human relevance and translational capabilities.


Plain language summary
Scientists are using artificial intelligence (AI) to develop new ways of assessing chemical safety that do not rely on animal experiments. These methods can be faster, more accurate, more human-relevant, and more ethical than traditional approaches. However, before these new methods can be widely used, we need to make sure they are reliable and trustworthy. This article discusses the challenges in validating AI-based safety testing methods, such as ensuring data quality and making AI decisions transparent and understandable, and proposes strategies for thorough validation and ongoing monitoring of these AI methods. It also explores opportunities to use AI to simulate experi­ments, analyze complex biological information, and support validation of diverse NAMs. We emphasize the importance of collaboration among researchers, regulators, and industry to develop responsible AI use in toxicology. By addressing these challenges, we can harness AI’s power to improve chemical safety testing while reducing animal use.

Article Details

How to Cite
Hartung, T. and Kleinstreuer, N. (2025) “Challenges and opportunities for validation of AI-based new approach methods”, ALTEX - Alternatives to animal experimentation, 42(1), pp. 3–21. doi: 10.14573/altex.2412291.
Section
Food for Thought ...
References

Ali, S., Abuhmed, T., El-Sappagh, S. et al. (2023). Explainable artificial intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence. Inf Fusion 99, 101805. doi:10.1016/j.inffus.2023.101805

Alves, V. M., Capuzzi, S. J., Braga, R. C. et al. (2018). A perspective and a new integrated computational strategy for skin sensitization assessment. ACS Sustain Chem Eng 6, 2845-2859. doi:10.1021/acssuschemeng.7b04220

Ball, N., Cronin, M. T. D., Shen, J. et al. (2016). Toward good read-across practice (GRAP) guidance. ALTEX 33, 149-166. doi:10.14573/altex.1601251

Ball, N., Madden, J., Mathea, N. et al. (2020). Key read across framework components and biology based improvements. Mutat Res Genet Toxicol Environ Mutagen 853, 503172. doi:10.1016/j.mrgentox.2020.503172

Beilmann, M., Boonen, H., Czich, A. et al. (2019). Optimizing drug discovery by investigative toxicology: Current and future trends. ALTEX 36, 3-17. doi:10.14573/altex.1808181

Bhuller, Y., Deonandan, R. and Krewski, D. (2024). Relevance and feasibility of principles for health and environmental risk decision-making. J Toxicol Environ Health B 27, 189-211. doi:10.1080/10937404.2024.2338078

Blum, J., Brüll, M., Hengstler, J. et al. (2025). The long way from raw data to NAM based information: Overview on data layers and processing steps. ALTEX 42, 167-180. doi:10.14573/altex.2412171

Bottini A. A., Alépée, N., De Silva, O. et al. (2008). Optimization of the post-validation process. The report and recommendations of ECVAM workshop 67. Altern Lab Anim 36, 353-366. doi:10.1177/026119290803600312

Caloni, F., De Angelis, I. and Hartung, T. (2022). Replacement of animal testing by integrated approaches to testing and assessment (IATA): A call for in vivitrosi. Arch Toxicol 96, 1935-1950. doi:10.1007/s00204-022-03299-x

Carreras-Puigvert, J. and Spjuth, O. (2024). Artificial intelligence for high content imaging in drug discovery. Curr Opin Struct Biol 87, 102842. doi:10.1016/j.sbi.2024.102842

Chung, E., Russo, D. P., Ciallella, H. L. et al. (2023). Data-driven quantitative structure-activity relationship modeling for human carcinogenicity by chronic oral exposure. Environ Sci Technol 57, 6573-6588. doi:10.1021/acs.est.3c00648

Corradi, M., Luechtefeld, T., de Haan, A. M. et al. (2024). The application of natural language processing for the extraction of mechanistic information in toxicology. Front Toxicol 6, 1393662. doi:10.3389/ftox.2024.1393662

Crawford, S. E., Hartung, T., Hollert, H. et al. (2017). Green toxicology: A strategy for sustainable chemical and material development. Environ Sci Eur 29, 16. doi:10.1186/s12302-017-0115-z

Fink, F., Hartung, T., Lee, S. Y. et al. (2024). AI for scientific discovery – Pioneering new frontiers in knowledge. In: World Economic Forum, Top 10 Emerging Technologies of 2024, Flagship Report. https://www.weforum.org/publications/top-10-emerging-technologies-2024/in-full/1-ai-for-scientific-discovery/

Gadaleta, D., Garcia de Lomana, M., Serrano-Candelas, E. et al. (2024). Quantitative structure-activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity. J Cheminform 16, 122. doi:10.1186/s13321-024-00917-x

Hailesilassie, T. (2016). Rule extraction algorithm for deep neural networks: A review. International Journal of Computer Science and Information Security (IJCSIS) 14, 376-381. doi:10.48550/arXiv.1610.05267

Hartung, T., Luechtefeld, T., Maertens, A. et al. (2013a). Integrated testing strategies for safety assessments. ALTEX 30, 3-18. doi:10.14573/altex.2013.1.003

Hartung, T., Stephens, M. and Hoffmann, S. (2013b). Mechanistic validation. ALTEX 30, 119-130. doi:10.14573/altex.2013.2.119

Hartung, T. (2016). Making big sense from big data in toxicology by read-across. ALTEX 33, 83-93. doi:10.14573/altex.1603091

Hartung, T. (2017). Utility of the adverse outcome pathway concept in drug development. Expert Opin Drug Metab Toxicol 13, 1-3. doi:10.1080/17425255.2017.1246535

Hartung, T. (2023). A call for a human exposome project. ALTEX 40, 4-33. doi:10.14573/altex.2301061

Hartung, T. (2023a). ToxAIcology – The evolving role of artificial intelligence in advancing toxicology and modernizing regulatory science. ALTEX 40, 559-570. doi:10.14573/altex.2309191

Hartung, T. (2023b). Artificial intelligence as the new frontier in chemical risk assessment. Front Artif Intell 6, 1269932. doi:10.3389/frai.2023.1269932

Hartung, T. (2024). The validation of regulatory test methods – Conceptual, ethical, and philosophical foundations. ALTEX 41, 525-544. doi:10.14573/altex.2409271

Hartung, T. and Tsaioun, K. (2024). Evidence-based approaches in toxicology: Their origins, challenges, and future directions. Evid Based Toxicol 2, 2421187. doi:10.1080/2833373X.2024.2421187

Hartung, T., King, N., Kleinstreuer, N. et al. (2024a). Leveraging biomarkers and translational medicine for preclinical safety – Lessons for advancing the validation of alternatives to animal testing. ALTEX 41, 545-566. doi:10.14573/altex.2410011

Hartung, T., Maertens, A. and Luechtefeld, T. (2024b). E-validation – Unleashing AI for validation. ALTEX 41, 567-587. doi:10.14573/altex.2409211

Hartung, T. (submitted). AI and large language models for scientific discovery.

Hartung, T., Whelan, M., Califf, R. et al. (submitted). Is regulatory science ready for artificial intelligence?

Hulzebos, E., Sijm, D., Traas, T. et al. (2005). Validity and validation of expert (Q)SAR systems. SAR QSAR Environ Res 16, 385-401. doi:10.1080/10659360500204426

Kleinstreuer, N. and Hartung, T. (2024). Artificial intelligence (AI) – It’s the end of the tox as we know it (and I feel fine) – AI for predictive toxicology. Arch Toxicol 98, 735-754. doi:10.1007/s00204-023-03666-2

Leist, M., Ghallab, A., Graepel, R. et al. (2017). Adverse outcome pathways: Opportunities, limitations and open questions. Arch Toxicol 91, 3477-3505. doi:10.1007/s00204-017-2045-3

Linkov, I., Massey, O., Keisler, J. et al. (2015). From “weight of evidence” to quantitative data integration using multicriteria decision analysis and Bayesian methods. ALTEX 32, 3-8. doi:10.14573/altex.1412231

Luechtefeld, T., Marsh, D., Rowlands, C. et al. (2018). Machine learning of toxicological big data enables read-across structure activity relationships (RASAR) outperforming animal test reproducibility. Toxicol Sci 165, 198-212. doi:10.1093/toxsci/kfy152

Maertens, A., Anastas, N., Spencer, P. J. et al. (2014). Green toxicology. ALTEX 31, 243-249. doi:10.14573/altex.1406181

Maertens, A. and Hartung, T. (2018). Green toxicology – Know early about and avoid toxic product liabilities. Toxicol Sci 161, 285-289. doi:10.1093/toxsci/kfx243

Maertens, A., Golden, E., Luechtefeld, T. H. et al. (2022). Probabilistic risk assessment – The keystone for the future of toxicology. ALTEX 39, 3-29. doi:10.14573/altex.2201081

Maertens, A., Luechtefeld, T. and Hartung, T. (2024a). Alternative methods go green! Green toxicology as a sustainable approach for assessing chemical safety and designing safer chemicals. ALTEX 41, 3-19. doi:10.14573/altex.2312291

Maertens, A., Antignac, E., Benfenati, E. et al. (2024b). The probable future of toxicology – Probabilistic risk assessment. ALTEX 41, 273-281. doi:10.14573/altex.2310301

Maertens, A., Kincaid, B., Bridgeford, E. et al. (submitted). From cellular perturbation to probabilistic risk assessments.

Mansouri, K., Abdelaziz, A., Rybacka, A. et al. (2016). CERAPP: Collaborative estrogen receptor activity prediction project. Environ Health Perspect 124, 1023-1033. doi:10.1289/ehp.1510267

Mansouri, K., Kleinstreuer, N., Abdelaziz, A. M. et al. (2020) CoMPARA: Collaborative modeling project for androgen receptor activity. Environ Health Perspect 128, 27002. doi:10.1289/ehp5580

Mansouri, K., Karmaus, A., Fitzpatrick, J. et al. (2021). CATMoS: Collaborative acute toxicity modeling suite. Environ Health Perspect 129, 47013. doi:10.1289/ehp8495

Marx, U., Andersson, T. B., Bahinski, A. et al. (2016). Biology-inspired microphysiological system approaches to solve the prediction dilemma of substance testing using animals. ALTEX 33, 272-321. doi:10.14573/altex.160316

Marx, U., Akabane, T., Andersson, T. B. et al. (2020). Biology-inspired microphysiological systems to advance medicines for patient benefit and animal welfare. ALTEX 37, 364-394. doi:10.14573/altex.2001241

Marx, U., Beken, S., Chen, Z. et al. (under review). Biology-inspired dynamic microphysiological system approaches to revolutionize basic research, healthcare and animal welfare.

Mayr, A., Klambauer, G., Unterthiner, T. et al. (2016). DeepTox: Toxicity prediction using deep learning. Front Environ Sci 3, 80. doi:10.3389/fenvs.2015.00080

Moreira-Filho, J. T., Neves, B. J., Cajas, R. A. et al. (2023). Artificial intelligence-guided approach for efficient virtual screening of hits against Schistosoma mansoni. Future Med Chem 15, 2033-2050. doi:10.4155/fmc-2023-0152

Mostafa, F. and Chen, M. (2024). Computational models for predicting liver toxicity in the deep learning era. Front Toxicol 5, 1340860. doi:10.3389/ftox.2023.1340860

Niu, Z., Zhong, G. and Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing 452, 48-62. doi:10.1016/j.neucom.2021.03.091

Patlewicz, G. and Fitzpatrick, J. M. (2016). Current and future perspectives on the development, evaluation, and application of in silico approaches for predicting toxicity. Chem Res Toxicol 29, 438-451. doi:10.1021/acs.chemrestox.5b00388

Patlewicz, G., Worth, A. P. and Ball, N. (2016). Validation of computational methods. Adv Exp Med Biol 856, 165-187. doi:10.1007/978-3-319-33826-2_6

Patlewicz, G. (2020). Navigating the minefield of computational toxicology and informatics: Looking back and charting a new horizon. Front Toxicol 2, 2. doi:10.3389/ftox.2020.00002

Patlewicz, G. and Shah, I. (2023). Towards systematic read-across using generalised read-across (GenRA). Comput Toxicol 25, 100258. doi:10.1016/j.comtox.2022.100258

Reinke, E. N., Reynolds, J., Gilmour, N. et al. (2025). The skin allergy risk assessment-integrated chemical environment (SARA-ICE) defined approach to derive points of departure for skin sensitization. Curr Res Toxicol 8, 100205. doi:10.1016/j.crtox.2024.100205

Roth, A. and MPS-WS Berlin 2019. (2021). Human microphysiological systems for drug development. Science 373, 1304-1306. doi:10.1126/science.abc3734

Rovida, C., Alépée, N., Api, A. M. et al. (2015). Integrated testing strategies (ITS) for safety assessment. ALTEX 32, 171-181. doi:10.14573/altex.1506201

Rovida, C., Barton-Maclaren, T., Benfenati, E. et al. (2020). Internationalisation of read-across as a validated new approach method (NAM) for regulatory toxicology. ALTEX 37, 579-606. doi:10.14573/altex.1912181

Shah, I., Liu, J., Judson, R. S. et al. (2016). Systematically evaluating read-across prediction and performance using a local validity approach characterized by chemical structure and bioactivity information. Regul Toxicol Pharmacol 79, 12-24. doi:10.1016/j.yrtph.2016.05.008

Shah, I., Tate, T. and Patlewicz, G. (2021). Generalized read-across prediction using genra-py. Bioinformatics 37, 3380-3381. doi:10.1093/bioinformatics/btab210

Sillé, F. C. M., Karakitsios, S., Kleensang, A. et al. (2020). The exposome – A new approach for risk assessment. ALTEX 37, 3-23. doi:10.14573/altex.2001051

Sillé, F. C. M., Busquet, F., Fitzpatrick, S. et al. (2024). The implementation moonshot project for alternative chemical testing (IMPACT) toward a human exposome project. ALTEX 41, 344-362. doi:10.14573/altex.2407081

Smirnova, L., Kleinstreuer, N., Corvi, R. et al. (2018). 3S – Systematic, systemic, and systems biology and toxicology. ALTEX 35, 139-162. doi:10.14573/altex.1804051

Smirnova, L., Caffo, B. S., Gracias, D. H. et al. (2023a). Organoid intelligence (OI): The new frontier in biocomputing and intelligence-in-a-dish. Front Sci 1, 1017235. doi:10.3389/fsci.2023.1017235

Smirnova, L., Morales Pantoja, I. E. and Hartung, T. (2023b). Organoid Intelligence (OI) – The ultimate functionality of a brain microphysiological system. ALTEX 40, 191-203. doi:10.14573/altex.2303261

Swanson, K., Walther, P., Leitz, J. et al. (2024). ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. Bioinformatics 40, btae416, doi:10.1093/bioinformatics/btae416

Tichý, M. and Rucki, M. (2009). Validation of QSAR models for legislative purposes. Interdiscip Toxicol 2, 184-186. doi:10.2478/v10102-009-0014-2

Tollefsen, K. E., Scholz, S., Cronin, M. T. et al. (2014). Applying adverse outcome pathways (AOPs) to support integrated approaches to testing and assessment (IATA). Regul Toxicol Pharmacol 70, 629-640. doi:10.1016/j.yrtph.2014.09.009

Tong, W. and Baran, S.W. (2024). 50 shades of AI in regulatory science. Drug Discov Today 29, 104058. doi:10.1016/j.drudis.2024.104058

Tong, W., Renaudin, M. and GCRSR Interagency LLMs Taskforce (2024). Context is everything in regulatory application of large language models (LLMs). Drug Discov Today 29, 103916. doi:10.1016/j.drudis.2024.103916

Toussaint, P. A., Leiser, F., Thiebes, S. et al. (2024). Explainable artificial intelligence for omics data: A systematic mapping study. Brief Bioinform 25, bbad453, doi:10.1093/bib/bbad453

van Ertvelde, J., Verhoeven, A., Maerten, A. et al. (2023). Optimization of an adverse outcome pathway network on chemical-induced cholestasis using an artificial intelligence-assisted data collection and confidence level quantification approach. J Biomed Inform 145, 104465. doi:10.1016/j.jbi.2023.104465

van Vliet, E., Daneshian, M., Beilmann, M. et al. (2014). Current approaches and future role of high content imaging in safety sciences and drug discovery. ALTEX 31, 479-493. doi:10.14573/altex.1405271

Verhoeven, A., van Ertvelde, J., Boeckmans, J. et al. (2024). A quantitative weight-of-evidence method for confidence assessment of adverse outcome pathway networks: A case study on chemical-induced liver steatosis. Toxicology 505, 153814. doi:10.1016/j.tox.2024.153814

von Aulock, S., Busquet, F., Locke P. et al. (2022). Engagement of scientists with the public and policymakers to promote alternative methods. ALTEX 39, 543-559. doi:10.14573/altex.2209261

Watson, E. R., Taherian Fard, A. and Mar, J. C. (2022). Computational methods for single-cell imaging and omics data integration. Front Mol Biosci 8, 768106. doi:10.3389/fmolb.2021.768106

Yammouri, G. and Ait Lahcen, A. (2024). AI-reinforced wearable sensors and intelligent point-of-care tests. J Pers Med 14, 1088. doi:10.3390/jpm14111088

Most read articles by the same author(s)

<< < 1 2 3 4 5 6 7 8 9 10 > >>