Challenges and opportunities for validation of AI-based new approach methods
Main Article Content
Abstract
The integration of artificial intelligence (AI) into new approach methods (NAMs) for toxicology represents a paradigm shift in chemical safety assessment. Harnessing AI appropriately has enormous potential to streamline validation efforts. This review explores the challenges, opportunities, and future directions for validating AI-based NAMs, highlighting their transformative potential while acknowledging the complexities involved in their implementation and acceptance. We discuss key hurdles such as data quality, model interpretability, and regulatory acceptance, alongside opportunities including enhanced predictive power and efficient data integration. The concept of e-validation, an AI-powered framework for streamlining NAM validation, is presented as a comprehensive strategy to overcome limitations of traditional validation approaches, leveraging AI-powered modules for reference chemical selection, study simulation, mechanistic validation, and model training and evaluation. We propose robust validation strategies, including tiered approaches, performance benchmarking, uncertainty quantification, and cross-validation across diverse datasets. The importance of ongoing monitoring and refinement post-implementation is emphasized, addressing the dynamic nature of AI models. We consider ethical implications and the need for human oversight in AI-driven toxicology and outline the impact of trends in AI development, research priorities, and a vision for the integration of AI-based NAMs in toxicological practice, calling for collaboration among researchers, regulators, and industry stakeholders. We describe the vision of companion AI post-validation agents to keep methods and their validity status current. By addressing these challenges and opportunities, the scientific community can harness the potential of AI to enhance predictive toxicology while reducing reliance on traditional animal testing and increasing human relevance and translational capabilities.
Plain language summary
Scientists are using artificial intelligence (AI) to develop new ways of assessing chemical safety that do not rely on animal experiments. These methods can be faster, more accurate, more human-relevant, and more ethical than traditional approaches. However, before these new methods can be widely used, we need to make sure they are reliable and trustworthy. This article discusses the challenges in validating AI-based safety testing methods, such as ensuring data quality and making AI decisions transparent and understandable, and proposes strategies for thorough validation and ongoing monitoring of these AI methods. It also explores opportunities to use AI to simulate experiments, analyze complex biological information, and support validation of diverse NAMs. We emphasize the importance of collaboration among researchers, regulators, and industry to develop responsible AI use in toxicology. By addressing these challenges, we can harness AI’s power to improve chemical safety testing while reducing animal use.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Articles are distributed under the terms of the Creative Commons Attribution 4.0 International license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is appropriately cited (CC-BY). Copyright on any article in ALTEX is retained by the author(s).
Ali, S., Abuhmed, T., El-Sappagh, S. et al. (2023). Explainable artificial intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence. Inf Fusion 99, 101805. doi:10.1016/j.inffus.2023.101805
Alves, V. M., Capuzzi, S. J., Braga, R. C. et al. (2018). A perspective and a new integrated computational strategy for skin sensitization assessment. ACS Sustain Chem Eng 6, 2845-2859. doi:10.1021/acssuschemeng.7b04220
Ball, N., Cronin, M. T. D., Shen, J. et al. (2016). Toward good read-across practice (GRAP) guidance. ALTEX 33, 149-166. doi:10.14573/altex.1601251
Ball, N., Madden, J., Mathea, N. et al. (2020). Key read across framework components and biology based improvements. Mutat Res Genet Toxicol Environ Mutagen 853, 503172. doi:10.1016/j.mrgentox.2020.503172
Beilmann, M., Boonen, H., Czich, A. et al. (2019). Optimizing drug discovery by investigative toxicology: Current and future trends. ALTEX 36, 3-17. doi:10.14573/altex.1808181
Bhuller, Y., Deonandan, R. and Krewski, D. (2024). Relevance and feasibility of principles for health and environmental risk decision-making. J Toxicol Environ Health B 27, 189-211. doi:10.1080/10937404.2024.2338078
Blum, J., Brüll, M., Hengstler, J. et al. (2025). The long way from raw data to NAM based information: Overview on data layers and processing steps. ALTEX 42, 167-180. doi:10.14573/altex.2412171
Bottini A. A., Alépée, N., De Silva, O. et al. (2008). Optimization of the post-validation process. The report and recommendations of ECVAM workshop 67. Altern Lab Anim 36, 353-366. doi:10.1177/026119290803600312
Caloni, F., De Angelis, I. and Hartung, T. (2022). Replacement of animal testing by integrated approaches to testing and assessment (IATA): A call for in vivitrosi. Arch Toxicol 96, 1935-1950. doi:10.1007/s00204-022-03299-x
Carreras-Puigvert, J. and Spjuth, O. (2024). Artificial intelligence for high content imaging in drug discovery. Curr Opin Struct Biol 87, 102842. doi:10.1016/j.sbi.2024.102842
Chung, E., Russo, D. P., Ciallella, H. L. et al. (2023). Data-driven quantitative structure-activity relationship modeling for human carcinogenicity by chronic oral exposure. Environ Sci Technol 57, 6573-6588. doi:10.1021/acs.est.3c00648
Corradi, M., Luechtefeld, T., de Haan, A. M. et al. (2024). The application of natural language processing for the extraction of mechanistic information in toxicology. Front Toxicol 6, 1393662. doi:10.3389/ftox.2024.1393662
Crawford, S. E., Hartung, T., Hollert, H. et al. (2017). Green toxicology: A strategy for sustainable chemical and material development. Environ Sci Eur 29, 16. doi:10.1186/s12302-017-0115-z
Fink, F., Hartung, T., Lee, S. Y. et al. (2024). AI for scientific discovery – Pioneering new frontiers in knowledge. In: World Economic Forum, Top 10 Emerging Technologies of 2024, Flagship Report. https://www.weforum.org/publications/top-10-emerging-technologies-2024/in-full/1-ai-for-scientific-discovery/
Gadaleta, D., Garcia de Lomana, M., Serrano-Candelas, E. et al. (2024). Quantitative structure-activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity. J Cheminform 16, 122. doi:10.1186/s13321-024-00917-x
Hailesilassie, T. (2016). Rule extraction algorithm for deep neural networks: A review. International Journal of Computer Science and Information Security (IJCSIS) 14, 376-381. doi:10.48550/arXiv.1610.05267
Hartung, T., Luechtefeld, T., Maertens, A. et al. (2013a). Integrated testing strategies for safety assessments. ALTEX 30, 3-18. doi:10.14573/altex.2013.1.003
Hartung, T., Stephens, M. and Hoffmann, S. (2013b). Mechanistic validation. ALTEX 30, 119-130. doi:10.14573/altex.2013.2.119
Hartung, T. (2016). Making big sense from big data in toxicology by read-across. ALTEX 33, 83-93. doi:10.14573/altex.1603091
Hartung, T. (2017). Utility of the adverse outcome pathway concept in drug development. Expert Opin Drug Metab Toxicol 13, 1-3. doi:10.1080/17425255.2017.1246535
Hartung, T. (2023). A call for a human exposome project. ALTEX 40, 4-33. doi:10.14573/altex.2301061
Hartung, T. (2023a). ToxAIcology – The evolving role of artificial intelligence in advancing toxicology and modernizing regulatory science. ALTEX 40, 559-570. doi:10.14573/altex.2309191
Hartung, T. (2023b). Artificial intelligence as the new frontier in chemical risk assessment. Front Artif Intell 6, 1269932. doi:10.3389/frai.2023.1269932
Hartung, T. (2024). The validation of regulatory test methods – Conceptual, ethical, and philosophical foundations. ALTEX 41, 525-544. doi:10.14573/altex.2409271
Hartung, T. and Tsaioun, K. (2024). Evidence-based approaches in toxicology: Their origins, challenges, and future directions. Evid Based Toxicol 2, 2421187. doi:10.1080/2833373X.2024.2421187
Hartung, T., King, N., Kleinstreuer, N. et al. (2024a). Leveraging biomarkers and translational medicine for preclinical safety – Lessons for advancing the validation of alternatives to animal testing. ALTEX 41, 545-566. doi:10.14573/altex.2410011
Hartung, T., Maertens, A. and Luechtefeld, T. (2024b). E-validation – Unleashing AI for validation. ALTEX 41, 567-587. doi:10.14573/altex.2409211
Hartung, T. (submitted). AI and large language models for scientific discovery.
Hartung, T., Whelan, M., Califf, R. et al. (submitted). Is regulatory science ready for artificial intelligence?
Hulzebos, E., Sijm, D., Traas, T. et al. (2005). Validity and validation of expert (Q)SAR systems. SAR QSAR Environ Res 16, 385-401. doi:10.1080/10659360500204426
Kleinstreuer, N. and Hartung, T. (2024). Artificial intelligence (AI) – It’s the end of the tox as we know it (and I feel fine) – AI for predictive toxicology. Arch Toxicol 98, 735-754. doi:10.1007/s00204-023-03666-2
Leist, M., Ghallab, A., Graepel, R. et al. (2017). Adverse outcome pathways: Opportunities, limitations and open questions. Arch Toxicol 91, 3477-3505. doi:10.1007/s00204-017-2045-3
Linkov, I., Massey, O., Keisler, J. et al. (2015). From “weight of evidence” to quantitative data integration using multicriteria decision analysis and Bayesian methods. ALTEX 32, 3-8. doi:10.14573/altex.1412231
Luechtefeld, T., Marsh, D., Rowlands, C. et al. (2018). Machine learning of toxicological big data enables read-across structure activity relationships (RASAR) outperforming animal test reproducibility. Toxicol Sci 165, 198-212. doi:10.1093/toxsci/kfy152
Maertens, A., Anastas, N., Spencer, P. J. et al. (2014). Green toxicology. ALTEX 31, 243-249. doi:10.14573/altex.1406181
Maertens, A. and Hartung, T. (2018). Green toxicology – Know early about and avoid toxic product liabilities. Toxicol Sci 161, 285-289. doi:10.1093/toxsci/kfx243
Maertens, A., Golden, E., Luechtefeld, T. H. et al. (2022). Probabilistic risk assessment – The keystone for the future of toxicology. ALTEX 39, 3-29. doi:10.14573/altex.2201081
Maertens, A., Luechtefeld, T. and Hartung, T. (2024a). Alternative methods go green! Green toxicology as a sustainable approach for assessing chemical safety and designing safer chemicals. ALTEX 41, 3-19. doi:10.14573/altex.2312291
Maertens, A., Antignac, E., Benfenati, E. et al. (2024b). The probable future of toxicology – Probabilistic risk assessment. ALTEX 41, 273-281. doi:10.14573/altex.2310301
Maertens, A., Kincaid, B., Bridgeford, E. et al. (submitted). From cellular perturbation to probabilistic risk assessments.
Mansouri, K., Abdelaziz, A., Rybacka, A. et al. (2016). CERAPP: Collaborative estrogen receptor activity prediction project. Environ Health Perspect 124, 1023-1033. doi:10.1289/ehp.1510267
Mansouri, K., Kleinstreuer, N., Abdelaziz, A. M. et al. (2020) CoMPARA: Collaborative modeling project for androgen receptor activity. Environ Health Perspect 128, 27002. doi:10.1289/ehp5580
Mansouri, K., Karmaus, A., Fitzpatrick, J. et al. (2021). CATMoS: Collaborative acute toxicity modeling suite. Environ Health Perspect 129, 47013. doi:10.1289/ehp8495
Marx, U., Andersson, T. B., Bahinski, A. et al. (2016). Biology-inspired microphysiological system approaches to solve the prediction dilemma of substance testing using animals. ALTEX 33, 272-321. doi:10.14573/altex.160316
Marx, U., Akabane, T., Andersson, T. B. et al. (2020). Biology-inspired microphysiological systems to advance medicines for patient benefit and animal welfare. ALTEX 37, 364-394. doi:10.14573/altex.2001241
Marx, U., Beken, S., Chen, Z. et al. (under review). Biology-inspired dynamic microphysiological system approaches to revolutionize basic research, healthcare and animal welfare.
Mayr, A., Klambauer, G., Unterthiner, T. et al. (2016). DeepTox: Toxicity prediction using deep learning. Front Environ Sci 3, 80. doi:10.3389/fenvs.2015.00080
Moreira-Filho, J. T., Neves, B. J., Cajas, R. A. et al. (2023). Artificial intelligence-guided approach for efficient virtual screening of hits against Schistosoma mansoni. Future Med Chem 15, 2033-2050. doi:10.4155/fmc-2023-0152
Mostafa, F. and Chen, M. (2024). Computational models for predicting liver toxicity in the deep learning era. Front Toxicol 5, 1340860. doi:10.3389/ftox.2023.1340860
Niu, Z., Zhong, G. and Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing 452, 48-62. doi:10.1016/j.neucom.2021.03.091
Patlewicz, G. and Fitzpatrick, J. M. (2016). Current and future perspectives on the development, evaluation, and application of in silico approaches for predicting toxicity. Chem Res Toxicol 29, 438-451. doi:10.1021/acs.chemrestox.5b00388
Patlewicz, G., Worth, A. P. and Ball, N. (2016). Validation of computational methods. Adv Exp Med Biol 856, 165-187. doi:10.1007/978-3-319-33826-2_6
Patlewicz, G. (2020). Navigating the minefield of computational toxicology and informatics: Looking back and charting a new horizon. Front Toxicol 2, 2. doi:10.3389/ftox.2020.00002
Patlewicz, G. and Shah, I. (2023). Towards systematic read-across using generalised read-across (GenRA). Comput Toxicol 25, 100258. doi:10.1016/j.comtox.2022.100258
Reinke, E. N., Reynolds, J., Gilmour, N. et al. (2025). The skin allergy risk assessment-integrated chemical environment (SARA-ICE) defined approach to derive points of departure for skin sensitization. Curr Res Toxicol 8, 100205. doi:10.1016/j.crtox.2024.100205
Roth, A. and MPS-WS Berlin 2019. (2021). Human microphysiological systems for drug development. Science 373, 1304-1306. doi:10.1126/science.abc3734
Rovida, C., Alépée, N., Api, A. M. et al. (2015). Integrated testing strategies (ITS) for safety assessment. ALTEX 32, 171-181. doi:10.14573/altex.1506201
Rovida, C., Barton-Maclaren, T., Benfenati, E. et al. (2020). Internationalisation of read-across as a validated new approach method (NAM) for regulatory toxicology. ALTEX 37, 579-606. doi:10.14573/altex.1912181
Shah, I., Liu, J., Judson, R. S. et al. (2016). Systematically evaluating read-across prediction and performance using a local validity approach characterized by chemical structure and bioactivity information. Regul Toxicol Pharmacol 79, 12-24. doi:10.1016/j.yrtph.2016.05.008
Shah, I., Tate, T. and Patlewicz, G. (2021). Generalized read-across prediction using genra-py. Bioinformatics 37, 3380-3381. doi:10.1093/bioinformatics/btab210
Sillé, F. C. M., Karakitsios, S., Kleensang, A. et al. (2020). The exposome – A new approach for risk assessment. ALTEX 37, 3-23. doi:10.14573/altex.2001051
Sillé, F. C. M., Busquet, F., Fitzpatrick, S. et al. (2024). The implementation moonshot project for alternative chemical testing (IMPACT) toward a human exposome project. ALTEX 41, 344-362. doi:10.14573/altex.2407081
Smirnova, L., Kleinstreuer, N., Corvi, R. et al. (2018). 3S – Systematic, systemic, and systems biology and toxicology. ALTEX 35, 139-162. doi:10.14573/altex.1804051
Smirnova, L., Caffo, B. S., Gracias, D. H. et al. (2023a). Organoid intelligence (OI): The new frontier in biocomputing and intelligence-in-a-dish. Front Sci 1, 1017235. doi:10.3389/fsci.2023.1017235
Smirnova, L., Morales Pantoja, I. E. and Hartung, T. (2023b). Organoid Intelligence (OI) – The ultimate functionality of a brain microphysiological system. ALTEX 40, 191-203. doi:10.14573/altex.2303261
Swanson, K., Walther, P., Leitz, J. et al. (2024). ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. Bioinformatics 40, btae416, doi:10.1093/bioinformatics/btae416
Tichý, M. and Rucki, M. (2009). Validation of QSAR models for legislative purposes. Interdiscip Toxicol 2, 184-186. doi:10.2478/v10102-009-0014-2
Tollefsen, K. E., Scholz, S., Cronin, M. T. et al. (2014). Applying adverse outcome pathways (AOPs) to support integrated approaches to testing and assessment (IATA). Regul Toxicol Pharmacol 70, 629-640. doi:10.1016/j.yrtph.2014.09.009
Tong, W. and Baran, S.W. (2024). 50 shades of AI in regulatory science. Drug Discov Today 29, 104058. doi:10.1016/j.drudis.2024.104058
Tong, W., Renaudin, M. and GCRSR Interagency LLMs Taskforce (2024). Context is everything in regulatory application of large language models (LLMs). Drug Discov Today 29, 103916. doi:10.1016/j.drudis.2024.103916
Toussaint, P. A., Leiser, F., Thiebes, S. et al. (2024). Explainable artificial intelligence for omics data: A systematic mapping study. Brief Bioinform 25, bbad453, doi:10.1093/bib/bbad453
van Ertvelde, J., Verhoeven, A., Maerten, A. et al. (2023). Optimization of an adverse outcome pathway network on chemical-induced cholestasis using an artificial intelligence-assisted data collection and confidence level quantification approach. J Biomed Inform 145, 104465. doi:10.1016/j.jbi.2023.104465
van Vliet, E., Daneshian, M., Beilmann, M. et al. (2014). Current approaches and future role of high content imaging in safety sciences and drug discovery. ALTEX 31, 479-493. doi:10.14573/altex.1405271
Verhoeven, A., van Ertvelde, J., Boeckmans, J. et al. (2024). A quantitative weight-of-evidence method for confidence assessment of adverse outcome pathway networks: A case study on chemical-induced liver steatosis. Toxicology 505, 153814. doi:10.1016/j.tox.2024.153814
von Aulock, S., Busquet, F., Locke P. et al. (2022). Engagement of scientists with the public and policymakers to promote alternative methods. ALTEX 39, 543-559. doi:10.14573/altex.2209261
Watson, E. R., Taherian Fard, A. and Mar, J. C. (2022). Computational methods for single-cell imaging and omics data integration. Front Mol Biosci 8, 768106. doi:10.3389/fmolb.2021.768106
Yammouri, G. and Ait Lahcen, A. (2024). AI-reinforced wearable sensors and intelligent point-of-care tests. J Pers Med 14, 1088. doi:10.3390/jpm14111088