Leading Edge Predictors for Drug Discovery

CSLogWS Home

CSLogD Home

CSLogP Home

CSpKa Home



CSGenoTox Home


Download a Pre-print
about CSGenoTox


     ...Calculation and Prediction

Statististics of the CSGenoTox  Predictor

Development of the CSGenotox  Predictor

The CSGenoTox predictor is based on topological structural descriptors (proprietary and published) and was developed by the use of artificial neural networks.  Neural network analysis was applied to select descriptors and then to optimize the relationship between experimental mutagenic index values (MI, 0=negative, 1=positive ) and the values calculated by the CSGenoTox predictor.  MI=1 signifies a mutagen, 0 a non-mutagen, as determined by Ames testing and reported as such.  The resultant predictor was cross-validated by the leave-group-out method then external validation was performed on a large test set of compounds (new chemical entities) tht wew not used in either descriptor selection of predictor development.

The overall accuracy of MI (AMI) is defined as the percentage of correctly predicted MI values divided the total.

AMI = (Total MIcorr / Total MIexp) x 100

The sensitivity of MI (MI0) is defined as percentage of correctly predicted non-mutagens divided by total number of non-mutagens in the dataset.

MI0 = Total MI(0)corr / Total MI(0)exp

The specificity of MI (MI1) is defined as percentage of correctly predicted mutagens divided by total number of mutagens in the dataset.

MI1 = Total MI(1)corr / Total MI(1)exp

The percent false negatives:

MI0(false) = (100 – MI1)

The percent false positives:

MI1(false) = (100 – MI0).

CSGenotox  Training and External Validation Sets

An overall dataset set of 3363 compounds was split randomly into a 2963 compound training set for predictor development and a 400 compound external validation set used to asses the accuracy of the final model.  The 2963 compound training set contained 290 commercial drugs and there were 39 drugs in the 400 compound external validation set.  Though the selection process was random, the balance between mutagens to non-mutagens was maintained in both the training and external validation sets.

Cross-Validation of CSGenoTox

Predicted Results from CSGenoTox  Cross Validation

Cross-validation testing was conducted by setting up a series of 10 cross-validation test sets (VTS), each containing approximately 296 (~10%) of the 2963 compounds. Each VTS contained a set of unique compounds where no compound was used in more than 1 test set and each compound was used exactly once.  For each VTS, a new neural network-based QSAR model was developed on the remaining 2667 compounds in the training set and applied to the VTS to predict MI.  The process was repeated 10 times, once for each for each VTS.

The 10-fold cross-validation prediction gave the following results:

     AMI = 89% (overall accuracy)

     MI(0) = 94% (accuracy for non-mutagens)

     MI(1) = 86% (accuracy for mutagens)

     MI0(false) = 8% (percentage for false negatives)

     MI1(false) = 3% (percentage of false positives)

External Validation of CSGenoTox

Predicted Results from CSGenoTox  External Validation

Validation of the CSGenoTox predictor involved the use of 400 unique compounds (NCEs, new chemical entities) not used in model building randomly selected from the initial dataset of 3363 compounds. The average MW was 237 for the NCEs with the preponderance of compounds containing aromatic, heteroaromatic ring systems as well as amine, nitro, epoxy, and amide groups

39 commercial drugs.
159 non-mutagenic NCE's
241 mutagenic NCE's
31 miscelaneous compounds from various literature sources

(1) RTECS (US Government)

(2) Handbook of Carcinogenic Potency and Genotoxicity Databases, L.S. Gold and E. Zeiger (CRC Press, 1996)

The composition of the validation set was a 60/40 percent split between mutagens and non-mutagens which was the same split in the overall 3363 compound dataset. 39 commercial drugs were present, four of which were mutagenic (positive Amestest).

External validation on 338 compounds gave the following results:

     AMI = 84% (overall accuracy)

     MI(0) = 87% (accuracy for non-mutagens)

     MI(1) = 82% (accuracy for mutagens)

     MI0(false) = 11% (percentage for false negatives)

     MI1(false) = 5% (percentage of false positives)

These are results were excellent as seen below in the chart and ROC results given below.  The most significant finding is that CSGenoTox gave a low percentage of false positives and negatives, which is evidence of its robustness for this diverse set of NCEs.  The number of commercial drugs in the validation set was 39.  Of the 4 the mutagenic drugs, CSGenoTox identified 3 correctly (MI1=75%), whereas MI0 =100% for the non-mutagenic drugs.  These are results are excellent even though the vast majority of entities are mutagens.

CSGenoTox Receiver Operator Curve

ROC (Receiver Operator Curve) is measure of sensitivity to predict true vs false positives in some confidence interval.  The technique was applied to 400 Validation set.  It can be seen below that area under the curve was 0.925 from results with CSGenoTox on 400 compounds.  This represents a 95% confidence interval.

CSGenoTox  Representative Compounds

Compounds from the CSGenoTox  External Validation Test Set

Follow the link below to a set of 30 representative compounds  of the 400 used in external validation testing of CSGenoTox.  Each structure is given along with a comparison of experimental and predicted MI values.

Go to: CSGenoTox  Compounds

Back to: CSGenoTox  Home Page

Go to: Next CSGenoTox  Topic

user login
contact us

To contact us:

Phone: (888) 636-8777

Fax: 781-275-5197

Email:  sales@chemsilico.com

Copyright © 2003 ChemSilico LLC All Rights Reserved

Terms and Conditions of Use | Privacy Policy

ChemSilico is a registered trademark of ChemSilico LLC, Tewksbury, MA 01876