Leading Edge Predictors for Drug Discovery

CSLogWS Home

CSLogD Home

CSLogP Home

CSBBB Home

CSPB Home

CSGenoTox Home

CSHIA Home

CSPB

     ...Calculation and Prediction

The following information about CSPB Calculation and Prediction is available on this page.

Please select the appropriate topic in the list below to navigate to the subject you are interested in.


    CSpKa Calculation and Prediction

      Development of the CSPB Predictor

      Calculation of Protein Binding Activity Data

      Prediction of Protein Binding Activity Data

      External Validation of CSPB

Development of the CSPB  Predictor

Sources for Experimental Protein Binding Values:    (top of page)


The values for binding activity (% fraction bound) used in the generation of the CSPB predictor were taken from the following sources.  There was sufficient agreement of values reported for the same compound to justify combining the data from these sources.


    The Pharmacological Basis of Therapeutics, 9th Edition

    AHFS Drug Information

    Nurse's Drug Guide

    Allgemeine Pharmakologie und Toxikologie

Treatment of Experimental Data:


The % fraction bound of a drug is not a single-value endpoint, but rather a composite endpoint (i.e. drug binding involves unknown multiple binding features and sites).  For these reasons, treatment of the data involves three steps:

Fit of 345 endpoints with its resultant R2 based on multiple linear regression.
Cross-validation with use of 3 and 5 bin models to determine the accuracy of compound placement in a specific range of %FB (the binding level).
Test the accuracy of the MLR on an external validation set (40 compounds) using placement in 5 bins as the predicted binding values.

CSPB Predictor Development:

The CSPB predictor is based on topological structure descriptors and was developed by the use of artificial neural networks and multiple linear regression.  Neural network analysis was applied to select descriptors and then the relationship between experimental PB values and those calculated by the CSPB predictor were optimized by an all possible subsets linear regression algorithm.  The Resulting predictor was cross-validated by the leave-group-out method and then external validation was performed on an external validation test set.

(Please see "Neural Network Analysis" on our Methods page for additional information)

 In each phase of development, a correlation coefficient was calculated as a measure of the quality of the predictor.

R2

gives the correlation between calculated and experimental values for the compounds in the training set.  Every compound in the training set contributed to descriptor selection and predictor development.

Q2

gives the correlation coefficient between experimental values and predicted values from a 10-fold leave-10%-out cross validation.  The compounds that generate Q2 contribute to descriptor selection, but the predicted values arise from calculations made when the compounds were not part of the training set for predictor development.

Additional statistics are given as a measure of predictor performance:
MAE gives the mean absolute error.
s gives the standard deviation for regression.

(Please see "Data Handling and Statistics" on our Methods page for additional information)

Calculation of  Protein Binding  Activity Data

Calculated Results from CSPB Predictor Development    (top of page)


A correlation of the CSPB (calculated) values with the known PB (experimental) values gave the following statistics:

R2 = 0.84

MAE = 9.3% (mean absolute error)

s = 12.4%

The results are shown in the plot below below.

Calculated PB Results on 345 Compounds

Due to the general nature of protein binding data, the results have been place into bins for graphic presentation and additional statistics.  The calculated protein binding values from the CSPB training set are placed into bins for three binding levels; Low (PB<45%), Medium (46%-70%) or High (PB>70%).

The following statistics were obtained:


82.6% (285/354) of all compounds are in the correct bin.

MAE of compounds in the correct bin = 11.2%   


A Plot of the 3-Bin Profile for Calculation is Given Below. The upper left hand table gives the percent and number of compounds in the each of the calculated and experimental bins.  Compounds in the blue bins were calculated to be in the same bin indicated by the experimental PB value.

Predicted PB Results on 345 Compounds

Cross-Validation on 345 Compounds    (top of page)

A correlation of the CSPB (predicted) values with the known PB (experimental) values gave the following statistics:

Q2 = 0.71

MAE = 12.7% (mean absolute error)

s = 17.0%


Cross validation was performed on the 345 compounds with a 10-fold leave-10%-out method.  The predicted PB results from the Q2 validation process were placed into 5 bins for graphic presentation.  When the predicted protein binding values from the CSPB validation sets are placed into bins, the resultant 5 levels of binding activity are:

Low

0% - 20%

Medium-Low

21% - 40%

Medium

41% - 60%

Medium-High

61% - 80%

High

81% - 100%

The resulting statistics are:

91.0% of all compounds are either in the correct bin, or the adjacent bin (1).

MAE of compounds in the correct or adjacent bin (1) = 11.5%


A Plot of the 5-Bin Profile for Prediction is Given Below. The upper left hand table gives the percent and number of compounds in the calculated and experimental bins.  Compounds in the blue bins were predicted to be in the same bin indicated by the experimental PB value.  Compounds in the purple bins were predicted to be in the bin adjacent to the bin indicated by the experimental value.

External Validation on 40 Compounds

External Validation Test Set for CSPB    (top of page)


40 compounds that reflect the full range of binding activities were set aside as an external validation test set.  The known experimental protein binding values for the compounds in the external validation set did not contribute to either descriptor selection or neural network modeling in the development of CSPB.


The predicted PB values for the external validation test set were placed into 5 bins with the same magnitude as those used for the prediction sets mentioned in the previous section.

The resulting statistics were:

87.5% (35 /40) of all compounds are either in the correct bin, or the adjacent bin (1).

MAE of compounds in the correct or adjacent bin (1) = 8.0%  


These results show the predictive capability of CSPB  through a MW range of 114-924.


A Plot of the 5-Bin Profile for External Validation is Given Below.  The upper left hand table gives the percent and number of compounds in the each of the calculated and experimental bins.  Compounds in the blue bins were calculated to be in the same bin indicated by the experimental PB value.

Back to: CSPB  Home Page

Go to: Next CSPB  Topic

search
links
user login
contact us

To contact us:

Phone: 978-501-0633

Fax: 781-275-5197

Email:  sales@chemsilico.com

Copyright © 2003 ChemSilico LLC All Rights Reserved

Terms and Conditions of Use | Privacy Policy

ChemSilico is a registered trademark of ChemSilico LLC, Tewksbury, MA 01876