Leading Edge Predictors for Drug Discovery

CSLogWS Home

CSLogD Home

CSLogP Home



CSGenoTox Home



     ...Calculation and Prediction

The following information about CSBBB Calculation and Prediction is available on this page.

Please select the appropriate topic in the list below to navigate to the subject you are interested in.

    CSBBB Calculation and Prediction

      CSBBB Data and Development

      Cross Validation of LogBB Activity Data

      Training Set Representative Compounds

      External Validation of CSBBB

      CSBBB External Validation Representative Compounds

CSBBB  Data and Development

Experimental Data Used in the Modeling Process     (top of page)

Blood-brain barrier partition data in the training set came from 11 sources (1-11), where the protocol used for all mesaurements was IV drug administration to white rats.  103 neutral compounds were selected for the study, of which 95% were drug-like in terms of Lipinski's rule of fives.  At the time that this predictor was developed, the 103 compounds constituted all of the available data that was measured by protocol similar enough to justify being used in a composite training set.  This presented considerable difficulty as no data was available for external validation of the predictor.

( see CSpKa Dataset Profile for an explination of the, "Rule of Fives" )

The opportunity for external validation arrose form the discovery of additional published data on drug distrubution to the central nervous system.  Because this data is somewhat different in nature than the rat data used in the training set, the external validation and training sets were not set up with the traditional random division of compounds.  The external validation set consisted of 74 commercial drugs with a mixture of plasma/CSF and plasma/brain data and was taken from sources 12-17, listed below.  The breakdown of the data by species and tissue type is also given in the table below.  It is generally considered that once a drug enters the CSF, it will distribute throughout the brain system in a reasonably even manner.  For this reason, it was determined that plasma/CSF partition values could be used to provide external validation data of the CSBBB predictor.  The decision was made, however, to exclude such data from the training set.  The majority of the data in the external validation set is derived from human subjects, and demonstrates the utility of using data measured on the rat to develop a model that is successful in predicting outcomes in human patients.

Training Set References

(1) R. C. Young (1988) J. Med. Chem. 31, 656;  (2) M. H. Abraham (1994) J. Pharm. Sci. 83, 1257;  (3) T. Salminem (1997) J. Pharm. Biomed. Anal. 15, 469;  (4) D. E. Clark (1999) J. Pharm. Sci. 83, 815;  (5) J. M. Luco (1999) J. Chem. Inf. Comput. Sci. 39, 396;  (6) M. Yazdanian (1998) J Pharm. Sci. 87, 306;  (7) N. H. Grieg (1995) in New Concepts of a Blood Brain Barier, Plenum: New York, 251;  (8) J. H. Lin (1994) J. Pharmacol. Exptl. Therapeut. 271, 1217;  (9) F. Lombardo  (1996)J. Med. Chem. 39, 4750;  (10) K. Van Belle (1995) J. Pharmacol. Exptl. Therapeut.272, 1217;  (11) J. A. D. Calder (1994) Drug Design Discovery11, 259.

External Validation Set References

(12) Dolly C. Ed. (1999), Therapeutic Drugs, 2nd Ed, Publ. Churchill Livingstone;  (13) Bouw et al. (2001) British Journal of Pharmacology 134:1796-1804;  (14) Proksch et al. (2000), Drug Metabolism and Disposition, Vol. 28, Issue 7, 742-747;  (15) Rivière et al (2000), Pharmacology, Vol. 292, Issue 3, 1042-1047;  (16) Tai et al. (2001), PNAS, March 13, 2001, vol. 98, no. 6, 3519-3524;  (17) Marston et al. (1998) JPET Vol. 285, Issue 3, 1023-1030. 

Development of the CSBBB Predictor

The CSBBB predictor is based on topological structure descriptors and was developed by the use of artificial neural networks.  Neural network analysis was applied to select descriptors and then optimize the relationship between experimental logBB values and those calculated by the CSBBB predictor.  The resulting predictor was cross-validated by the 10-fold leave-10%-out method.

(Please see "Neural Network Analysis" on our Methods page for additional information)

 In each phase of development, a correlation coefficient was calculated as a measure of the quality of the predictor.


gives the correlation coefficient between experimental values and predicted values from a 10-fold leave-10%-out cross validation.  The compounds that generate Q2 contribute to descriptor selection, but the predicted values arise from calculations made when the compounds were not part of the training set for predictor development.


gives the square of the correlation coefficient between the external validation predictions and the experimental values. The compounds used to generate the Q2ExVal statistic were not used for either descriptor selection or predictor development.

Several additional statistics are given as a measure of predictor performance:
MAE gives the mean absolute error.
s gives the standard deviation of regression.

gives the root mean square error, used in place of s for validation, where degrees of freedom are undefined.

(Please see "Data Handling and Statistics" on our Methods page for additional information)

Training and External Validation Set LogBB Distribution

Because a random selection process was not used to distribute compounds between train and external validation sets, the activity distributions is not as even as it might be.  There are, however, a sufficient number of compounds in each activity bin to evaluate the predictor performance across the entire activity range.  The distributions for train and validate are given in Fig.2 where percentages of compounds in three activity ranges are represented. The external validation se of 74 drugs constitutes approximately fourty percent of the total available data.  LogBB values ranged from -2 to 1.5 log units, a 3000 fold difference in compound concentration on either side of brain-barrier among members of these 177 entities.

Cross Validation of CSBBB

Calculation of LogBB Training Set      (top of page)

The CSBBB predictor makes use of artificial neural network modeling, employing topological, E-state, and a number of proprietary descriptors to model 103 drugs and drug-like compounds.

A correlation of the CSBBB (predicted) values with the experimental logBB values gave the following statistics:

Q2 = 0.76

s = 0.39

MAE = 0.30

The results are shown in the plot below.

CSBBB External Validation Set 3-Bin Profile

Due to the general nature of LogBB data, the results have been placed into bins for graphic presentation and additional statistics.  The external validation LogBB values from the CSBBB external validation set are placed into bins for three logBB activity levels:


-2.0 to -1.0 log units


-1.0 to 0.3 log units


0.3 to 1.5 log units

The following statistics were obtained:

83% (61/74) of all compounds are in the correct bin.

MAE of compounds in the correct bin = 0.32%   

A Plot of the 3-Bin Profile for Cross Validation is Given Below. The upper right hand matrix gives the percent and number of compounds in the each of the predicted and experimental bins.  Compounds in the blue bins were predicted to be in the same bin indicated by the experimental LogBB value.

External Validation of CSBBB

External Validation Results      (top of page)

A correlation of the CSBBB external validation values with the known LogBB values gave the following statistics:

Q2ExVal = 0.62

MAE = 0.39 (mean absolute error)

RMSE = 0.49

88% of predictions are within 0.75 log units of the experimental value.

The results are shown in the plot below below.

CSBBB  Training Set Compounds

Follow the link below to a set of 23 representative compounds of the 103 used in the development of CSBBB.

Each structure is given along with a comparison of known experimental values and their calculated LogBB.

Go to: CSBBB  Compounds

CSBBB  External Validation Compounds

Follow the link below to a set of 30 representative compounds of the 74 used in the validation of CSBBB.

Each structure is given along with a comparison of known experimental values and their predicted logBB.

Go to: Validation Compounds

Back to: CSBBB  Home Page

Go to: Next CSBBB  Topic

user login
contact us

To contact us:

Phone: 978-501-0633

Fax: 781-275-5197

Email:  sales@chemsilico.com

Copyright © 2003 ChemSilico LLC All Rights Reserved

Terms and Conditions of Use | Privacy Policy

ChemSilico is a registered trademark of ChemSilico LLC, Tewksbury, MA 01876