Leading Edge Predictors for Drug Discovery

CSLogWS Home

CSLogD Home

CSLogP Home

CSpKa Home

CSBBB Home

CSPB Home

CSGenoTox Home

CSHIA Home

Download a Pre-print
about CSHIA

CSHIA

     ...Calculation and Prediction

The following information about CSHIA Calculation and Prediction is available on this page.  Please select the appropriate topic in the list below to navigate to the subject you are interested in.


    CSBBB Calculation and Prediction

      Development of the CSHIA Predictor

      Cross Validation of Human Intestinal Absorption Activity Data

      External Validation of CSHIA

      CSHIA External Validation Representative Compounds

Development of the CSHIA  Predictor

Sources for Experimental Oral Absorption Values:    (top of page)


The values for HIA (% Oral Absorption) used in the generation of the CSHIA predictor were taken from the following sources.  There was sufficient agreement of values reported for the same compound to justify combining the data from these sources.


    (1) Zhao Y. H. et al, J. Pharm. Sci. 90, 749 (2001)

    (2) Zmuidinavicus D. et al, J. Pharm. Sci. 92, 621 (2003)

    (3) Klopman G. et al, Eur. J. Pharm. Sci., 17, 253 (2002)

    (4) PDF (Physician Desk Reference), Publ. Thompson (2003)

    (5) Dolly C. Ed., Therapeutic Drugs, 2nd Ed, Publ. Churchill Livingstone (1999)

CSHIA Predictor Development:


The CSHIA predictor is based on topological structure descriptors, predicted LogP, and the surface area of hydrogen bond donating and accepting atoms and their associated hydrogens.  Included in the information is also information about molecular size, polarity, aromaticity and branching.  The model was developed by the use of artificial neural networks with the exception of a sub model used for compounds containing protonated amines.  The protonated amine sub-model was developed by the use of multi-linear regression was used for this small dataset.  Neural network analysis was applied to select descriptors and then optimize the relationship between experimental HIA (%OA) values and those calculated by the CSHIA predictor. The resulting predictor was cross-validated by the leave-group-out (10%) method and then external validation was performed on an external validation test set.

(Please see "Neural Network Analysis" on our Methods page for additional information)

Q210%

gives the correlation coefficient between experimental values and predicted values from a 10-fold leave-10%-out cross validation.  The compounds that generate Q2 contribute to descriptor selection, but the predicted values arise from calculations made when the compounds were not part of the training set for predictor development.

Q2ExVal

gives the square of the correlation coefficient between the predicted and experimental values. The compounds used to generate the Q2ExVal statistic were not used for either descriptor selection or predictor development.

Additional statistics are given as a measure of predictor performance:
MAE gives the mean absolute error.
s gives the standard deviation for regression.
RMSE

gives the root mean square error, used in place of s for validation, where degrees of freedom are undefined.

(Please see "Data Handling and Statistics" on our Methods page for additional information)

The CSHIA Training and External Validation Sets


The CSHIA training set consisted of 417 drugs selected from the total dataset of 612 in a randomized balancing process.  The division of drugs between training and external validation sets was managed in such a way as to allow both the structure and activity spaces to be as evenly populated as possible, given the available data.  Compounds were grouped by structural similarity and HIA activity and then random selection was used to assign compounds from each group to the training and external validation sets.  Figure 1 gives the percentage of compounds in each range of %OA (percent oral absorption) for both training and validation compounds.  Although compounds were selected at random, the number compounds with HIA values less than 60% were adjusted to give a higher percentage in train set.

Cross Validation of  HIA  Activity Data

Calculated Results from CSHIA Predictor Development    (top of page)


ChemSilico predictors are developed by training a network to optimize the fit of points in a 10 fold leave 10% out cross validation. A correlation of the CSHIA leave 10% out cross validation values with the known HIA (experimental) values gave the following statistics:

Q210% = 0.88

MAE = 8.5% (mean absolute error)

RMS = 11.6%

96% of predictions are within 25% of the experimental value.

The results are shown in the plot below below.

Cross Validation and Number of Compounds


An excellent fit was found for the cross-validated training set.  As seen in the table on the right, a 3% removal of compounds showed little change in both the MAE and RMSE values in going from 417 to 405 compounds, which indicated a good model and consistent compound end-points.


As seen in the above plot, 96% of the cross-validation HIA prediction values (10%-unique leave out, 10 fold) are within 25% of the experimental value.  This holds true over the entire %OA range from 0 to 100%.  These excellent results only involved use of 26 descriptors to cover quaternary amines, low MW, and gamut of absorbers, poor to good.

CSHIA Training Set 3-Bin Profile


Due to the general nature of oral absorption data, the results have been place into bins for graphic presentation and additional statistics.  The cross validation HIA values from the CSHIA training set are placed into bins for three absorption levels:

Low

0% - 20%

Medium

21% - 69%

High

69% - 100%

The following statistics were obtained:


92% (285/354) of all compounds are in the correct bin.

MAE of compounds in the correct bin = 7.6%   


A Plot of the 3-Bin Profile for Cross Validation is Given Below. The upper right hand table gives the percent and number of compounds in the each of the predicted and experimental bins.  Compounds in the blue bins were predicted to be in the same bin indicated by the experimental HIA value.

External Validation of the CSHIA predictor

External Validation on 195 Compounds    (top of page)

195 compounds that reflect the full range of oral absorption activity and structural diversity were set aside as an external validation test set.  The external validation dataset constitutes approximately 32% of the total HIA database. A breakdown of the 195 validation drugs as a frequency of %OA is given in Fig.2.  42% of external validation compounds had a HIA value less than or equal to 80% and 58% had a value greater than 80%.  11% of the total (21 compounds) had a value less than or equal to 40%.  The external validation had a reasonably balanced distribution given the skewness due to high number of good absorbers in total dataset.  The experimental HIA value and molecular structure of the compounds in the external validation set did not contribute to either descriptor selection or neural network modeling in the development of CSHIA.

A correlation of the CSHIA external validation values with the known HIA values gave the following statistics:

Q2ExVal = 0.71

MAE = 11.2% (mean absolute error)

RMS = 15.9%

92.0% of predictions are within 25% of the experimental value.

The results are shown in the plot below below.

External Validation and Number of Compounds


In the table on the right are presented the R2ExVal , MAE, and RMSE (root mean square error) values for the external validation set. The predicted results were in excellent agreement thorough-out the complete range of experiment values for such a large dataset.

CSHIA Training Set 3-Bin Profile


The predicted HIA values for the external validation test set were placed into 3 bins with the same magnitude as those used for the cross validation sets described in the previous section.

The resulting statistics were:

91% (177 /195) of all compounds were assigned to the correct bin.

MAE of compounds in the correct bin = 9.5%  


These results show the predictive capability of CSHIA  through a MW range of 114-924.


A Plot of the 3-Bin Profile for External Validation is Given Below.  The upper right hand table gives the percent and number of compounds in the each of the external validation and experimental bins.  Compounds in the blue bins were predicted to be in the same bin indicated by the experimental HIA value.

CSHIA  Representative Compounds

Examples from the HIA External Validation Dataset:    (top of page)


Follow the link below to a set of 30 representative compounds of the 203 used in external validation testing of CSHIA.  Each structure is given along with a comparison of known experimental values and their predicted % oral absorption value from external validation.

Go to: CSHIA  Compounds

Back to: CSHIA  Home Page

Go to: Next CSHIA  Topic

search
links
user login
contact us

To contact us:

Phone: (888) 636-8777

Fax: 781-275-5197

Email:  sales@chemsilico.com

Copyright © 2003 ChemSilico LLC All Rights Reserved

Terms and Conditions of Use | Privacy Policy

ChemSilico is a registered trademark of ChemSilico LLC, Tewksbury, MA 01876