Development of a Defined Approach for Eye Hazard Identification of Solid Chemicals According to the Three UN GHS Categories

Currently two OECD adopted defined approaches (DA) for eye hazard identification of non-surfactant liquids exist (OECD TG467). The purpose of the current study was to develop a DA for eye hazard identification according to the three UN GHS categories (Cat.1, Cat. 2, No Cat.) for solid chemicals: the DAS. The DAS combines two test methods described in OECD TG437 and TG492. The DAS was developed based on in-depth statistical analysis of a database on solids for which in vitro and historically curated in vivo Draize eye test data exist. The performance of the DAS was assessed by comparing the predictions with the classification based on in vivo Draize eye test data, on the one hand, and with the performance criteria established by the OECD expert group, on the other hand. In a first tier of the DAS, the SkinEthic™ HCE EIT method (TG492) is used to distinguish No Cat. from classified substances. For classified substances the BCOP LLBO method (TG437) is used to identify Cat. 1, the remaining solids are predicted Cat. 2. In summary, 77.4% Cat. 1 (N=31), 52.3% Cat. 2 (N=18) and 70.0% of No Cat. (N=60) solids were correctly identified compared to the classification based on the Draize eye test. The percentage of correct predictions met the minimum OECD established performance values of 75% Cat. 1, 50% Cat. 2, and 70% No Cat. and the percentage of mispredictions was below the established maximum values. Therefore, inclusion of the DAS in OECD TG 467 has been achieved.


Introduction
In recent decades, many efforts have been made to develop new approach methodologies (NAMs) for eye hazard identification according to the United Nations Globally Harmonized System of Classification (UN, 2023).Since 2009, a number of Test Guidelines (TGs) have been adopted by the Organisation for Economic Cooperation and Development (OECD) for the identification of test chemicals inducing serious eye damage (UN GHS Cat. 1) or for the identification of test chemicals not requiring classification for eye irritation and serious eye damage hazards (UN GHS No Cat.).In Guidance Document No. 263 on an integrated approach to testing and assessment (IATA) for serious eye damage and eye irritation, it is proposed to use data generated with these NAMs together, as well as with information sources such as physicochemical properties, in silico and read-across predictions from chemical analogues (OECD, 2019).In this context, the publication of the Draize eye test Reference Database (DRD) by Cosmetics Europe was an important step towards understanding which of the in vivo effects are responsible for driving UN GHS classification (Barroso et al., 2017).The advantage of selecting substances from the DRD is that the database contains 681 independent historical in vivo studies (634 individual chemicals) conducted according to OECD TG 405 (OECD, 2023a), covering all drivers of classification based on the observed tissue effects, relevant chemical classes and physical states.In addition, the authors proposed a number of key criteria to be considered when selecting reference chemicals from the DRD for the evaluation of defined approaches for eye hazard identification.
All these efforts are finally starting to pay off, resulting in full replacement of the in vivo Draize eye test.In June 2022, the SkinEthic™ Human Corneal Epithelium (HCE) Time-to-Toxicity (TTT) was adopted by the OECD as a full replacement of the in vivo Draize eye test for eye hazard identification according to UN GHS for both liquids and solids (TG 492B;OECD, 2022a).Likewise, two defined approaches (DAs) for eye hazard identification of non-surfactant liquids were accepted (DAL-1 and DAL-2) and integrated into a new OECD TG (TG 467 Part I and Part II;OECD, 2022b).The DAL-1 is based on the combination of a Reconstructed human Cornea-like Epithelium test method (OECD TG 492 RhCE, EpiOcular™ Eye Irritation Test or SkinEthic™ HCE EIT for liquids; OECD, 2023b) and the Bovine Corneal Opacity and Permeability (BCOP) test method using the laser light-based opacitometer (TG 437, LLBO;OECD, 2023c) as well as four physicochemical properties of the chemical (water solubility, octanol-water partition coefficient, vapor pressure, and surface tension) (Alépée et al., 2019a).The DAL-2 is based on the combination of the Short Time Exposure (STE) test method (TG 491;OECD, 2023d) and the BCOP LLBO (Alépée et al., 2019b; OECD, 2023c; # SCCS, 2023 1 ).Next, a separate DA was developed to identify the eye hazard of liquid, semi-solid and solid chemicals having surfactant (SF) properties.The DASF is based on the combination of a RhCE test method (TG 492, EpiOcular™ EIT or SkinEthic™ HCE EIT) and a modification of the STE test method (Alépée et al., 2023).The DASF is currently under OECD consideration and was recently accepted by the Working Party of Hazard Assessment as an IATA Case Study to illustrate the use of the DASF for eye hazard identification of surfactants.
Knowing that the DAs listed above are applicable to liquids and/or surfactants only, the purpose of the current study was to develop a DA for eye hazard identification according to the three categories of the UN GHS (UN GHS Cat.1, Cat. 2 and No Cat) for solid chemicals (i.e., not pipettable test chemical).During the development phase, several OECD adopted test methods were not considered for the DAS because of their limited applicability with respect to solids.The Isolated Chicken Eye (ICE) test method has a high false negative rate for solids when used to identify Cat. 1 (TG 438; OECD, 2023e).Nonsurfactant solids are excluded from the applicability domain of the STE when used to identify No Cat.(OECD, 2023d).The Fluorescence Leakage (FL) test method can only be used for the identification of Cat. 1 water soluble chemicals 2 (TG 460; OECD 2023f), and was therefore not considered to include in a DA for solids.In the Vitrigel ® EIT method, test chemicals are dissolved or suspended in culture medium and acidic preparations (pH ≤ 5) and rapid phase separation are outside the applicability domain (TG 494;OECD, 2021). 2 The Ocular Irritation ® test method is only applicable to solids whose 10% solution/dispersion pH is in the range of 4 ≤ pH ≤ 9 (OECD TG 496, 2023g). 2 Knowing that solids are not considered outside the applicability domain of the BCOP test method 2 (TG 437; OECD, 2023c) and no restrictions are known for the RhCE 2 (TG 492; OECD, 2023a) test methods, only these OECD adopted NAMs were considered as potential components of a testing strategy.
The DA for solids (DAS) was developed based on a set of 71 solid chemicals.Additional solids were then selected for testing to obtain a more comprehensive set of 109 solids representing the different drivers of UN GHS classification.The performance of the DAS was assessed by comparing the predictions with the classification based on historical in vivo Draize eye test data, on the one hand, and with the performance criteria established by the OECD expert group on eye/skin irritation/corrosion and phototoxicity, on the other hand (GD 354; OECD, 2022c).

2
Materials and methods

Reference chemicals
The set of reference chemicals to support the review of the DAS was composed of 109 neat solids having high quality Draize eye test data and is listed in Tab. 1.The set covers a wide range of applications and chemical classes, includes small and large molecules, hydrophobic and hydrophilic chemicals, with a wide range of organic functional groups (112 different OFGs) defined according to OECD QSAR Toolbox analysis version 3.2. 3Summary statistics describing the chemical space of the chemicals tested are shown in Tab. 2.

Data sources
The historical Draize eye test data on solids were selected from the DRD according to the key principles described by Barroso and co-authors (Barroso et al., 2017), The criteria used for chemicals that should not be selected according to the key principles are listed in chapter 3.1.5. of the OECD supporting document (GD 354, Chapter 3;OECD, 2022c).For each Draize eye test study, detailed information was available on ocular tissue effects driving classification in vivo (Tab.1).

#
Abbreviations: BCOP, bovine corneal opacity and permeability; CASRN, Chemical Abstracts Service Registry Number; Cat. 1, UN GHS classification for chemicals causing irreversible effects on the eye / serious damage to the eye; Cat. 2, UN GHS classification for chemicals causing reversible effects on the eye/eye irritation, sub-categorised in 2A (irritant to eyes, eye effects are not fully reversible within 7 days of observation) and 2B (mild irritant to eyes, eye effects fully reversible within 7 days of observation); CO, corneal opacity; DA, defined approach; DAL, defined approach for liquids; DAS, defined approach for solids; DASF, defined approach for surfactants; DRD, Draize eye test Reference Database; EIT, eye irritation test; EITS, eye irritation test solids; FP, false positive; GD, Guidance Document; HCE, human corneal epithelium; IATA, integrated approach to testing and assessment; LLBO, laser light-based opacitometer; NAM, new approach methodology; No Cat., chemicals not classified for serious eye damage or eye irritation under GHS/EU CLP; NPCM, no prediction can be made; OECD, Organisation for Economic Co-operation and Development; OFG, organic functional group; RhCE, reconstructed human cornea-like epithelium; STE, short time exposure; TG, Test Guideline; TP, true positive; TTT, time-to-toxicity; UN GHS, United Nations Globally Harmonized System of Classification and Labelling of Chemicals 1 https://health.ec.europa.eu/system/files/2023-12/sccs_o_273_final.pdf 2 Eye Irritation -PETA Science Consortium International e.V. https://www.thepsci.eu/eye-irritation-2/(accessed 15.01.2024) 3 https://www.oecd.org/chemicalsafety/risk-assessment/oecd-qsar-toolbox.htm4-(1,1,3,3-Tetramethylbutyl   No Cat CO = 0 Test CO: corneal opacity; CO pers D21: CO persistence on day 21; Conj: conjunctival; mean: mean scores calculated from gradings at 24, 48, and 72 hours after instillation of the test chemical; CO > 0: CO scores > 0 in at least one animal and at least one observed time point; CO = 0: CO scores equal to 0 in all observation times in all animals.Studies marked with ** are studies for which at least one animal had a mean of the scores of days 1-3 above the classification cut-off for at least one endpoint but not in enough animals to generate a classification.Data on the SkinEthic™ HCE EITS and the BCOP LLBO test methods were taken from several peer-reviewed publications (Alépée et al., 2016;Verstraelen et al., 2017;Van Rompay et al., 2018, Alépée et al., 2019a,b;Adriaens et al., 2021).Additional solids were tested to fill the remaining data gaps for the SkinEthic™ HCE EITS and BCOP LLBO test method, resulting in an evaluation of 109 solids with the SkinEthic™ HCE EITS and 105 solids with the BCOP LLBO.Four solids were not tested with the BCOP LLBO because they were not commercially available or because they were very expensive (Tab.S14 ).The solids were tested accordingly to the OECD TG 437 (BCOP) and OECD TG 492 (RhCE).The protocols of the test methods are published through the DB-ALM dataset (SkinEthic™ HCE EITS: DB-ALM Protocol n° 191) 5 and MethodsX (BCOP LLBO: Van Rompay et al., 2020).An overview of the prediction models for the individual test methods that are part of the DAS is shown in Tab. 3.

Tab. 2: Summary of the physicochemical property ranges that describe the chemical space of the chemicals tested using the DAS UN GHS MW
Since data were also available for the EpiOcular™ EIT (N=106) and the BCOP OP-KIT (N=67) test methods (Tab.S1 4 ), two validated reference methods included in OECD TG 492 and 437, respectively, the performance of these methods was also assessed.Data on the EpiOcular™ EIT and the BCOP OP-KIT test methods were taken from several peer-reviewed publications (ICCVAM, 2006;Barroso et al., 2014;Verstraelen et al., 2017;Kandarova et al., 2018;Van Rompay et al., 2018;Adriaens et al., 2021).An overview of the prediction models for the individual test methods is shown in Tab. 4.

2.3.
Development of the DAS The DAS was developed based on the results of 71 neat solids (training set) that were available for the different components of the DAS.In a next step, the performance of the DAS was assessed for the test set (N=38).No changes were made to the data interpretation procedure (DIP) after assessing the performance of the test set, as no further improvement to the DIP was possible based on the performance of the training and test set results shown separately in Tab.S26 .The identification of the chemicals that were used in the training set and the test set is available in Tab. 1 and Tab.S1 4 .
The performance of the DAS was assessed by comparing the prediction results with the classification based on historical in vivo Draize eye test data.For each chemical, the predicted class was obtained by considering all available results of each in vitro test method.The performance of the DAS to distinguish between the three UN GHS categories was compared against minimum performance values for each UN GHS category that were accepted by the OECD Expert Group on Eye/Skin Irritation/Corrosion and Phototoxicity (GD 354;OECD, 2022c).The percentage of correct predictions when compared to the UN GHS classification based on the Draize eye test, should be least 75% for Cat. 1, 50% for Cat. 2 and 70% for No Cat.(Tab.4).The balanced accuracy, which is the average of the proportion of correct predictions of each UN GHS category, was reported as overall measure of accuracy.All analyses were performed with R version 4.3.1.7

Motivation minimum performance values
The minimum performance criteria were based on the uncertainty of the in vivo Draize eye test.The low acceptance value of at least 50% concordance for Cat. 2 was based on the variability of the Draize eye test, especially for the mild to moderate range.The between-test variability for non-surfactant liquids and solids which resulted in at least a) one Cat. 1 classification among all repeat studies, b) one Cat. 2 classification among all repeat studies and c) one No Cat.classification among all studies was as following: a) 41.7% (5/12) of the chemicals with at least one Cat. 1 study could be equally identified as Cat. 2, therefore the overall concordance of classifications was 58.3% (7/12) for Cat. 1. b) 50% (5/10) of the chemicals with at least one Cat. 2 study could be equally identified as Cat. 1, and 20% (2/10) could be equally identified as No Cat., therefore the overall concordance of classifications was 30% (3/10) for Cat. 2. c) 11.1% (2/18) of the chemicals with at least one No Cat.study could be equally identified as Cat. 2 or higher therefore the overall concordance of classifications was 88.9% (16/18) for No Cat.Based on these in vivo observations, the minimum performance criteria values to be met were discussed and approved by the OECD Working Group of National co-ordinators of the TGs programme and included in the supporting document (GD 354; OECD, 2022c) (Tab.5).

3.1.
Performance of the individual NAMs Tab. 6 shows the performance of the individual NAMs.Based on these results, it can be concluded that the SkinEthic™ HCE EITS is the most promising test method to identify No Cat., with 70.0% (N = 60) agreement with the UN GHS classification based in the Draize eye test.The EpiOcular™ EIT has a correct prediction rate of 56.8% for identifying No Cat.Note that results for two solids were not available, but if predicted as No Cat., this would still results in a concordance of 58.2% (N = 60), which is below the minimum value of 70% correct predictions for No Cat.compared to the UN GHS classification based on the Draize eye test.The BCOP LLBO test method is best suited to identify Cat. 1 with a correct prediction rate of 77.4% (N = 31).For the same set of 23 Cat. 1 solids, the agreement of between the UN GHS classification based on the Draize eye test and BCOP predictions was 66.9% for the OP-KIT, while this was 78.3% for the LLBO.
A summary of the predictions for individual solids for the suitable candidates for the DAS is given in Tab.S1 4 .The within and between laboratory reproducibility of both test methods was assessed during their respective multicentre studies with at least two participating laboratories (Alépée et al., 2016;Verstraelen et al., 2017;Adriaens et al., 2021;Van Rompay et al., 2020).In the context of the current study, more than one result was available for 92 out of 109 solids for the SkinEthic™ HCE EITS test method; in 95.7% the prediction was the same (88/92).Multiple BCOP LLBO results were available for 48 solids and in 93.7% (45/48) the prediction was the same (Tab.S1 4 ).The values between brackets correspond to the performance for the same set of reference solids.a OECD acceptance criteria ≥ 75% not met.b OECD acceptance criteria ≥ 70% not met.The reproducibility of the DAS was evaluated on 46 solids for which multiple results were available for both test methods, resulting in concordant predictions in 89.1% (41/46).Two Cat. 1 solids (No. 15: CAS RN 108-45-2 and No: 18: CAS  RN 54-21-7, Tab.S1 4 ) resulted in discordant predictions based on the BCOP LLBO.Two Cat. 2 solids resulted in discordant predictions, one solid (No. 38: CAS RN 83-56-7, Tab.S1 4 ) resulted in discordant predictions for the BCOP LLBO (once a Cat. 1 and once NPCM) and the SkinEthic™ HCE EITS test method (7 times NPCM and 4 times No Cat.).The second Cat. 2 solid (No. 47: CAS RN 99-65-0, Tab.S1 4 ) was 10 times predicted No Cat.and 1-time NPCM with the SkinEthic™ HCE EITS.One No Cat.solid (No. 73: CAS RN 1603-02-7, Tab.S1 4 ) resulted 4 times in NPCM and 5 times in No Cat.with the SkinEthic™ HCE EITS test method.

Discussion
The goal was to develop a DA for eye hazard identification of solids based on the combination of OECD adopted test methods.Overall, the percentage of correct predictions met the minimum performance values of 75% Cat. 1, 50% Cat.2, and 70% No Cat.established by the OECD experts.The SkinEthic™ HCE TTT is a stand-alone method applicable to liquids and solids, adopted by the OECD as TG 492B (OECD, 2022a).The method met the OECD acceptance criteria for the predictivity of eye hazard identification according to UN GHS.The SkinEthic™ HCE TTT, yielded similar results (correct prediction: 74.4% of 29 Cat. 1, 55.3% of 19 Cat. 2, and 71.7% of 33 No Cat.) compared to the DAS (77.4% of 31 Cat. 1, 52.3% of 18 Cat.2, and 70.0% of 60 No Cat.).In total, data on 72 solids were available for both the DAS and the SkinEthic™ HCE TTT and the performance compared the reference classification was similar for both NAMs (Tab.S3 6 ).The agreement in prediction between the NAMs was higher for UN GHS Cat. 1 (83.3%, 20/24 solids) and UN GHS No Cat.(81.3%, 26/32 solids).For UN GHS Cat. 2, 62.5% (10/16) of the solids resulted in the same prediction for both NAMs.More overpredictions were observed with the DAS while more false negatives were observed in the SkinEthic™ HCE TTT compared to the classification based on the Draize eye test (Tab.S3 6 ).This discordance in prediction between the NAMs was not related to the driver of classification, it is related to the difference in the methods.The DAS is a combination of the different NAMs, the SkinEthic™ HCE EITS (identify No Cat.) and the BCOP LLBO (identify Cat. 1).The SkinEthic™ HCE TTT method uses the same tissue construct as the SkinEthic™ HCE EITS but the protocols are different.In the SkinEthic™ HCE TTT method, the tissues are exposed for 30 minutes and 120 minutes to 80 mg of the neat solid with no post incubation period.In the SkinEthic™ HCE EITS, the tissues are exposed for four hours to 30 mg of the neat solid followed with an 18-hour post-exposure incubation period.As a result, five solids (No. 36,39,64,103,109) were predicted No Cat.with the SkinEthic™ HCE TTT (in all runs are in the majority of the runs) while they resulted in NPCM with the SkinEthic™ HCE EITS.In the DAS, UN GHS Cat. 1 and Cat. 2 solids were then distinguished from each other based on the organotypic BCOP LLBO, explaining a further discrepancy in the predictions based on the SkinEthic™ HCE TTT.Nevertheless, two options are available that meet the acceptance criteria to distinguish between the 3 UN GHS categories for the identification of eye hazard of solids.
For the Draize eye test Cat. 1 solids, 77.4% (N = 31) were predicted Cat. 1 with the DAS.The under-prediction rate for the in vivo Cat. 1 driver of classification CO mean ≥ 3 was low (11.1%)whereas this was 27.3% for the drivers CO = 4 and CO persistence D21 (Tab.8).For the Draize eye test Cat. 2 solids, 52.3% (N = 18) were predicted Cat. 2 with the DAS.The over-prediction rate was higher for the solids that were classified Cat. 2 based CO mean ≥ 1 (42.9%)compared to Conjunctiva mean ≥ 2 (21.1%,Tab. 8).The false negative rate for the in vivo Cat. 2 drivers was 14.3% and 20.7%, respectively.The concordance between the prediction of the DAS and the UN GHS No Cat.classification based on the Draize eye tested was 70.0% (N = 60).The No Cat.identification rate for solids that induced no opacity in the Draize eye test (CO = 0) was higher (75.0%) compared to the subgroup CO > 0 (55.6%,Tab. 8).No Cat.solids from the subgroup CO > 0 induced CO scores > 0 in at least one animal and at least one observed time point.Furthermore, studies marked with ** are studies for which at least one animal had a mean of the scores of days 1-3 above the classification cut-off for at least one endpoint but not in enough animals to generate a classification.The reference set contained three solids with these characteristics, all were

Tab. 4: Prediction models of the EpiOcular™ EIT and the BCOP OP-KIT test method TG
a Abbreviations: IVIS: in vitro irritancy score = mean opacity (read-out OP-KIT) + (15 x mean permeability OD490 value) Tab.

5: Performance metrics for the assessment of the predictivity of a DA of non-surfactant liquid test chemicals for eye hazard identification (OECD GD 354, 2022c) UN GHS Defined Approach Cat. 1 Cat. 2 No Cat.
(Yang et al., 2017)he test methods included in OECD TG 492 (RhCE) and TG 437 (BCOP) were considered to identify UN GHS No Cat.andCat. 1, respectively.The EpiOcular™ HCE EIT did not meet the OECD acceptance criteria for eye hazard identification according to the three UN GHS categories of at least 70% correct No Cat.identification compared to the classification based on the Draize eye test.Although the LabCyte CORNEA-MODEL24 EIT and the MCTT HCE™ EIT test method are two RhCE test methods included in OECD TG 492, they were not considered as possible components of the DAS due to the limited amount of publicly available data on solids.Furthermore, the specificity of the LabCyte CORNEA-MODEL24 EIT test method (LABCYTE BRD, 2017 8 ) based on the same set of 19 solids was 57.9% compared to 68.4% for SkinEthic™ HCE EIT.The MCTT HCE™ EIT test method looks promising, with a specificity of 82.4% based on 17 solids(Yang et al., 2017)of which 88.2% were predicted as No Cat.with the SkinEthic™ HCE EIT method.The sensitivity based on the same set of 20 classified solids was 95% and 88.2%, respectively.Therefore, the MCTT HCE™ EIT test method can probably be considered a potential me-too NAM if preferably all data gaps will be filled to gain sufficient confidence.The BCOP OP-KIT predicted 66.9% of UN GHS Cat. 1 as Cat. 1 which is below the minimum of at least 75% concordance with the classification based on the Draize eye test.Overall, only the SkinEthic™ HCE EITS (OECD TG 492) and the BCOP LLBO (OECD TG 437) met the acceptance criteria and are therefore the two test methods included in the DAS.Furthermore, it is recommended to use the bottom-up approach described in the IATA GD 263 (start with SkinEthic™ HCE EITS) to avoid false positives with the top-down approach when compared with the UN GHS classification based on the Draize eye test (No Cat.predicted Cat. 1 with BCOP LLBO; CAS RN 2736-23-4 and 21645-51-2).Additionally, the BCOP LLBO should not be used to identify No Cat.since some in vivo eye irritants that resulted in NPCM with the SkinEthic™ HCE EITS were predicted No Cat.with the BCOP LLBO(LIS < 30;.