Leading Edge Predictors for Drug Discovery

CSLogWS Home

CSLogD Home

CSLogP Home



CSGenoTox Home


Download a Pre-print
about CSGenoTox


     ...Dataset Profile

CSGenoTox  Data Sources

Date Sources Used in the Development of CSGenoTox

All calculated MI values were generated using CSGenoTox, and were based on the 2D drug structure. Mutagenicity data came from the sources listed bleow (1-5).  Data was available for 3262 compounds were selected for use in the development and validation of the CSGenoTox Predictor

(1)  Handbook of Carcinogenic Potency and Genotoxicity Databases, L.S. Gold and E. Zeiger (CRC Press, 1996)

(2)  RTECS (US Government)

(3)  Environmental and Molecular Mutagenesis (1985-2002), Publ. Alan R. Liss Inc.

(4)  Physician Desk Reference (PDR), Publ. Thomson

(5)  Mutation Research (1964-2002), Publ. Elsevier Science.

CSGenoTox  Compound Profile

General Compound Profile of the CSGenoTox Training and Validation Sets

Since the type of substituents and major structural elements (e.g., a N-heterocyclic ring) in a compound have a direct sterochemical bearing on electrophilic and non-electrophilic interactions, the following table gives a breakdown of such groups contained within the 3363 CSGenoTox dataset.  2963 of these compounds were used a train/test and 400 randomly selected were set aside for external validation.  Of 3262 entities, 329 were pharmaceutical drugs.  Also given are the average FW (formula weight in g/mol) and the percentage of drug-like compounds with the latter based on the permeability matrix from Lipinski's "rule of fives".

(see CSpKa Dataset Profile for an explanation of the Rule of Fives)

Analysis of various ring types in compounds gave the following:

   67% were aromatic

   10% were N-heterocyclics

   15% were N-heteroaromatics

    8% Non-heterocyclic

The primary substituent groups were amines and halogens followed by nitro and amide groups.  Nineteen percent of all compounds were non-ring entities (e.g., straight or branched hydrocarbons plus heteroatoms).  The average number of H-bond donors and acceptors per compound for various substituents were 1.1 and 3.8 respectively.

CSGenoTox  Representative Compounds

Compounds from the CSGenoTox  External Validation Test Set

Follow the link below to a set of 30 representative compounds  of the 400 used in external validation testing of CSGenoTox.  Each structure is given along with a comparison of experimental and predicted MI values.

Go to: CSGenoTox  Compounds

Back to: CSGenoTox  Home Page

user login
contact us

To contact us:

Phone: 978-501-0633

Fax: 781-275-5197

Email:  sales@chemsilico.com

Copyright © 2003 ChemSilico LLC All Rights Reserved

Terms and Conditions of Use | Privacy Policy

ChemSilico is a registered trademark of ChemSilico LLC, Tewksbury, MA 01876