Back to main menu
Datasets (which are, in contrast to other dataset lists, available in a structural format)
This list will be expanded continuously. Please don't hesitate to make published datasets publicly available here.
Currently available: 44 Datasets
Note: The Briem/Lessel and Hert/Willett Dataset are only available as MDDR ID's due to license reasons. Please contact MDL for further information on the database. The datasets have nonethless been included here because they are standard datasets for similarity searching. - Andreas
Binary (active/inactive) datasets
top
| Name | Activity/Property | Reference | Dataset/Linkout | Trust Level | Comment |
| Fontaine Factor Xa Data Set | 290+145 (Training/Test Set) Factor Xa Inhibitors used for binary classification (<10nM and >1microM), but real-valued Ki values are also given in the file | Fabien Fontaine, Manuel Pastor, Ismael Zamora, and Ferran Sanz. Anchor-GRIND: Filling the Gap between Standard 3D QSAR and the GRid-INdependent Descriptors. J. Med. Chem. 2005 48 (7), 2687 - 2694 - linkout | Dataset | High - Original Author Data | Datasets in SD and SMILES Format with binary classification (<1nM/>10microM Ki) and real-valued Ki data |
| Jorisson/Gilson Data Set | 50 CDK2 Inhibitors 50 COX2 Inhibitors 50 FXa Inhibitors 50 PDE5 Inhibitors 50 A1A Antagonists Plus Decoy Structures | Robert N. Jorissen and Michael K. Gilson. Virtual Screening of Molecular Databases Using a Support Vector Machine. J. Chem. Inf. Model. 2005 45 (3), 549 -561 - linkout | Dataset | High - Original Author Data | Datasets in SD Format; please read readme.txt. Courtesy of Robert N. Jorissen and Michael K. Gilson. |
| Briem/Lessel Data Set | 49 5HT3 Ligands 40 ACE Inhibitors 111 HMG-CoA Inhibitors 134 PAF Antagonists 49 TXA2 Antagonists 574 "Inactives" (other activities) | Hans Briem and Uta F Lessel, Perspect Drug Discov. Des. 2000 (20), 231-244. - linkout | Dataset | High - Original Author Data | Courtesy of Uta Lessel MDDR ID's only due to license agreement - needs MDDR database for retrieval of structures |
| Hert/Willett Data Set | 752 5HT3 Antagonists 827 5HT1A Agonists 359 5HT Reuptake Inhibitors 395 D2 Antagonists 1130 Renin Inhibitors 943 Angiotensin AT1 Antagonists 803 Thrombin Inhbitors 1246 Substance P Inhibitors 750 HIV Protease Inhibitors 636 Cyclooxygenase Inhibitors 452 Protein Kinase C Inhibitors Total Size: 102,535 Compounds | Jerome Hert, Peter Willett, David J Wilton, Pierre Acklin, Kamal Azzaoui, Edgar Jacoby and Ansgar Schuffenhauer. Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures, J Chem Inf Comp Sci 2004, 44, 1177 - 1185. - linkout | Dataset | High - Original Author Data | Courtesy of Jerome Hert and Peter Willett MDDR ID's only due to license agreement - needs MDDR database for retrieval of structures |
| Jacobsson Data Set | 146 ER alpha ligands 54 AChE ligands 60 MMP-3 ligands 129 Factor Xa ligands | Micael Jacobsson, Per Lidén, Eva Stjernschantz, Henrik Boström and Ulf Norinder, J. Med. Chem., 2003, 46, 5781-5789. - linkout | http://www.compumine.com/ research/scoring.html | High - Original Author Data | - |
| Stahl Data Set | 128 COX2 Inhibitors 55 Estrogen Receptor Ligands 43 Gelastinase A and General MMP Ligands 17 Neuraminidase Inhibitors 25 p38 MAP Kinase Inhibitors 67 Thrombin Inhibitors | Stahl M and Rarey M. 2001. J. Med. Chem. 44, 1035-1042. - linkout | Dataset (SMILES) | High - Original Author Data | Courtesy of Martin Stahl - negative dataset available upon request |
QSAR datasets
top
| Name | Activity/Property | Reference | Dataset/Linkout | Trust Level | Comment |
| Guha PDGFR Inhibitors dataset | 79 Compounds with activity data | R. Guha and P. Jurs. The Development of Linear, Ensemble and Non-linear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors. J. Chem. Inf. Comput. Sci. 2004, 44 (6), 2179-2189. - linkout | Dataset (Please right click/save if you can open TGZ files directly) | High - Original Author Data | Tarball contains descriptors in desc.pickle.plasma, activities in plasma.depv and a directory hin/ containing 79 structures in Hyperchem HIN format. Courtesy of R. Guha. |
| Guha Artemisinin QSAR dataset | 179 Compounds with activity data | R. Guha and P. Jurs. The Development of QSAR Models To Predict and Interpret the Biological Activity of Artemisinin Analogues. J. Chem. Inf. Comput. Sci. 2004 (44), 1440-1449. - linkout | Dataset | High - Original Author Data | MOL format with activity data in separate text file. Courtesy of R. Guha. |
| Patterson Neighbourhood Behaviour Data Sets | 20 QSAR datasets from David Patterson's Neighbourhood Behaviour Paper with nM activity data | David E Patterson, Richard D Cramer, Allan M Ferguson, Robert D Clark, Laurence W Weinberger. Neighbourhood Behaviour: A Useful Concept for Validation of "Molecular Diversity" Descriptors. J. Med. Chem. 1996 (39) 3049 - 3059. - linkout | Dataset | High - Original Author Data | SD format with activity data. Courtesy of David Patterson. |
| Sutherland Data Set | 405 Benzodiazepine Receptor Ligands / IC50 467 Cox2 Inhibitors / IC50 756 DHFR inhibitors (of P. carinii DHFR) with IC50 616 nonredundant ER ligands from National Toxicology Program of the NIH with binding affinities relative to beta-estradiol 393 ER ligands selected from literature (see reference for details) with binding affinities relative to beta-estradiol | Jeffrey J. Sutherland, Lee A. O'Brien, and Donald F. Weaver. Spline-Fitting with a Genetic Algorithm: A Method for Developing Classification Structure-Activity Relationships. J. Chem. Inf. Comput. Sci. 2003 (43) 1906 - 1915. - linkout | http://pubs3.acs.org/ acs/journals/supporting_information.page? in_manuscript=ci034143r | High - Original Author Data | - |
| Steroid Data Set | 31 Steroids with Corticosteroid Binding Globulin (CBG) receptor affinity | E.A. Coats, Perspect. Drug Discov. Des., 1998, 3, 199-213. - linkout | www2.chemie.uni-erlangen.de/services/steroids/ | High - Original Author Data (Corrected Structures!) | - |
| Benzodiazepine Data Set | 245 Benzodiazepine receptor ligands with no common substructure | F.R.Burden, M.G.Ford, D.C.Whitley and D.A.Winkler, J.Chem.Inf.Comp.Sci., 2000, 40, 1423-1430. - linkout | http://www.disat.unimib.it/ chm/Datasets.htm | High - Original Author Data | - |
| Muscarinic Data Set | 162 Muscarinic M1 receptor ligands with no common substructure | B.S.Orlek, F.E.Blaney, F.Brown, M.S.G.Clark, M.S.Hadley, J.Hatcher, G.J. Riley, H.E.Rosenberg, H.J.Wadsworth and P.Wyman, J.Med.Chem., 1991, 34, 2726-2735. - linkout | http://www.disat.unimib.it/ chm/Datasets.htm | High - Original Author Data | - |
| Dopamine D2 Data Set | 26 Dopamine D2 receptor agonists | - | http://www.qsar.org/resource/ datasets/martin2.htm | High - Original Author Data | - |
| Bohm Serin Protease Inhibitor Data Set | Inhibitors of Thrombin/ Trypsin/ Factor Xa with activity data, Training set (72 Compounds), Test Set (16 Compounds) | M. Bohm, J. Sturzebecher, G. Klebe, J. Med. Chem., 1999, 42, 458 - 477. - linkout | Dataset (MOL/SDF) | Low - Check Before Use | Drawn Manually - Please Check for Errors Before Use |
| Benzodiazepine Inverse Agonist Data Set | 37 beta-Carbolenes, Pyridodiindoles and CGS compounds binding to Benzodiazepine Inverse Agonist Site | B. D. Silverman and Daniel. E. Platt, J. Med. Chem. 1996, 39, 2129-2140 - linkout | Dataset (SD File) | Low - Check Before Use | Drawn Manually - Please Check for Errors Before Use |
| 4 QSAR Data Sets | Inhibitors of ACE, GPB, THER, THR | J.J. Sutherland, L.A. O'Brien and D. F. Weaver, J. Med. Chem. 2004, 47, 5541 - 5554 - linkout | http://pubs3.acs.org/ acs/journals/supporting_information.page? in_manuscript=jm0497141 | High - Original Author Data | Original Information Supplied by Authors | |
QSPR datasets
top
| Name | Activity/Property | Reference | Dataset/Linkout | Trust Level | Comment |
Karthikeyan Melting Point Dataset | Melting Points for 4173 Training Set Molecules and 277 Test Set Compounds (Drug-Like) | Karthikeyan, M.; Glen, R.C.; Bender, A. General melting point prediction based on a diverse compound dataset and artificial neural networks. J. Chem. Inf. Model.; 2005; 45(3); 581-590. - linkout | Dataset (7MB; Excel File with Structures in SMILES Format, Melting Points and MOE 2D and 3D Descriptors) | High - Original Author Data | |
| Bergstrom Melting Point Dataset | Melting Point Data for 185 Training Set and 92 Test Set Compounds (Drug-Like) | Bergstrom, C. A. S.; Norinder, U.; Luthman, K.; Artursson, P. Molecular Descriptors Influencing Melting Point and Their Role in Classification of Solid Drugs. J. Chem. Inf. Comput. Sci.; (Article); 2003; 43(4); 1177-1185. - linkout | Dataset | High - Original Author Data | |
| Huuskonen Data Set | Aequeous Solubility Data for Training Set (1033 Compounds) Test Set 1 (258 Compounds) Test Set 2 (21 Compounds) | Jarmo Huuskonen, J. Chem. Inf. Comput. Sci., 2000, 40, 773-777. - linkout | Dataset (SMILES) | High - Original Author Data | Please acknowledge Jarmo Huuskonen in publications |
| Solubility Data Set | Aequeous Solubility Data for 1144 low molecular weight compounds | John S. Delaney, J. Chem. Inf. Comput. Sci., 2004, 44, 1000 - 1005. - linkout | http://pubs.acs.org/ subscribe/journals/jcisd8/suppinfo/ ci034243x/ci034243xsi20040112_053635.txt | High - Original Author Data. But some problems encountered (see right column!) | Structures in SMILES format with solubility in M/L - some structures with solubility 1E-10 have < (smaller than) sign missing! |
Toxicity datasets
top
| Name | Activity/Property | Reference | Dataset/Linkout | Trust Level | Comment |
Amicbase 300 Dataset | 300 Structures with Mouse oral/ip LD50 Data | http://www.informatics.indiana.edu/ djwild/datasets/index.html | Access Structures | High - Original Author Data | SD format with activity data in SD tags |
Benigni/Vari Carcinogenicity Dataset | 774 Structures with Mouse/Rat Carconogenicity (TD50) Data | http://progetti.iss.it/ binary/ampp/cont/ Presentation.1105610472.pdf | Access Structures | High - Original Author Data | SMILES and SD format, carcinogenicity data in Excel file |
| Bursi Mutagenicity Dataset | 4337 Compounds with Mutagenicity (AMES) Classification | Kazius, J.; McGuire, R.; Bursi, R. Derivation and Validation of Toxicophores for Mutagenicity Prediction. J. Med. Chem. 2005, 48(1), 312-320 - linkout | Access Structures | High - Original Author Data | SD format plus binary classification |
| ACD DSSTox Databases | 11 Datasets in total:- EPA Water Disinfection by-Products with Carcinogenicity Estimates (DBCAN)
- Fathead minnow Acute Toxicity Database
- Fathead minnow Acute Toxicity database - Defined Parent structures only
- CPDB Summary Table - Dogs
- CPDB (Carcinogenic Potency Database) Summary Table - Hamsters
- CPDB Summary Table - Non-Human Primates
- CPDB Summary Table - Rats and Mice
- CPDB Summary Table - Rats and Mice - Defined Organic Structures Only
- CPDB Summary Table - Rats and Mice - Defined Organic Structures Only
- NCTRER (National Center for Toxicological Research Estrogen Receptor Binding Database) Defined Organic Parent 3D Structures
- NCTRER_ Defined Organic Parent Structures
- NCTRER_ Defined Organic Parent Structures
| Various - see dataset page here | http://www.acdlabs.com/ download/db/weblibdb.html | High - Original Data | SD and CFD (Chem Folder) Format |
| Helma CPDB Mutagenicity Subset | 684 compounds with mutagenicity data - "cleaned" subset of CPDB | Christoph Helma, Tobias Cramer, Stefan Kramer, Luc De Raedt. J. Chem. Inf. Comput. Sci. 2004 (ASAP Article) - linkout | http://pubs3.acs.org/acs/ journals/supporting_information.page? in_manuscript=ci034254q | High - Original Author Data | SMILES Format plus activity data |
| National Toxicology Program Dataset | 503 Structures with carcinogenicity data for male/female mouse and male/female rat | See the web site of the National Toxicology Program (link) for further details | http://www.predictive-toxicology.org/data/ntp/ | High - Original Data | Data in SMILES format + Activity in txt format |
| FDA's Carcinogenicity Studies with Rats and Mice | 281 SMILES structures with carcinogenicity data for male/female rat and male/female mouse | Joseph F. Contrera, Abigail C. Jacobs and Joseph J. DeGeorge, Regulatory Toxicology and Pharmacology 1997, 25(2), 130-145 - linkout | http://www.predictive-toxicology.org/data/fda/ | High - Original Data | Data in SMILES format + Activity in txt format |
| Carcinogenic Potency Database (CPDB) | 1451 chemicals with results from chronic, long-term animal cancer tests | Various - see http://potency.berkeley.edu/ listofpubs.topic.html for details | http://potency.berkeley.edu/ cpdb.html | High - Original Data | Data in SMILES format + Lots of additional information |
| Fathead Minnow Acute Toxicity Dataset | 617 acute LC50 values and 225 associated behavioral assessments, 72 joint toxic action experiments with the fathead minnow | Russom, C.L., S.P. Bradbury, S.J. Broderius, D.E. Hammermeister and R.A. Drummond. Predicting modes of action from chemical structure: Acute toxicity in the fathead minnow (Pimephales promelas). Environmental Toxicology and Chemistryi, 1997, 16(5): 948-967. - linkout | http://www.epa.gov/med/ Prods_Pubs/fathead_minnow.htm | High - Original Data | Data in SMILES format + Activities |
| hERG Data Set | Compounds with HERG K+ channel blocking activity, Training Set (31 Compounds) and Test Set (6 Compounds) | . Cavalli, E. Poluzzi, F. De Ponti, M. Recanatini, J. Med. Chem. 2002 (45) 2844-2853. - linkout | Dataset (SMILES Format) | Low - Please Check | Drawn Manually - Please Check for Errors Before Use |
| Tox/Mutagenicity Data Set | 278 Substituted Benzenes with acute Toxicity Data 1863 Compounds with Mutagenicity Data 8885 Compounds from the NCI Yeast Anticancer Drug Screen data set NCI AntiHIV Drug Screen data set | Jun Feng et al, J. Chem. Inf Comput. Sci., 2003 (43) 1463-1470. linkout | http://www.niss.org/ publications.html | High - Original Data | - |
| Developmental Toxicity Data Set | 175 chemicals with indications of developmental toxicity (without regard to type of effect) in several species (4 classes) | F.R. Jelovsek, D.R. Mattison, D.R. and J.J. Chen, Obset. Gynecol. 74, 624-636. linkout | http://ecb.jrc.it/cgi-bin/ reframer.pl?A=ECB&B= /DOCUMENTS/QSAR/QSAR_DATASETS/ | High - Original Data | - |
| Endocrine Disruptor Data Set | 106 chemicals with endocrine disrupting potential in 4 classes | linkout | http://ecb.jrc.it/cgi-bin/ reframer.pl?A=ECB&B= /DOCUMENTS/QSAR/QSAR_DATASETS/> | High - Original Data | - |
Metabolism datasets
top
| Name | Activity/Property | Reference | Dataset/Linkout | Trust Level | Comment |
| Cytochrome P450 Dataset | Diverse set of 13 azole antifungal compounds with Cytochrome P450-14alphaDM inhibition constants | Tanaji T. Talele and Vithal M. Kulkarni, J. Chem. Inf. Comput. Sci. 1999 (39) 204-210. - linkout | Dataset (SD Files) | Low - Please Check | Drawn Manually - Please Check for Errors Before Use |
Permeability datasets
top
| Name | Activity/Property | Reference | Dataset/Linkout | Trust Level | Comment |
Li Blood-Brain-Barrier Penetration Set | 415 molecules with Binary Blood-Brain-Barrier Penetration Data (Penetrating/Non-Penetrating) with References | Hu Li, Chun Wei Yap, Choong Yong Ung, Ying Xue, Zhi Wei Cao and Yu Zong Chen, J. Chem. Inf. Model. 2005 (ASAP Article) - linkout | Download Dataset | High - Original Author Data | 415 Structures in SMILES Format With Binary Penetration Classification (p/n) and References |
| Hou Caco-2 Data Set | 110 Structurally Diverse Molecules (77+23) with Caco-2 Permeability Data | T. J. Hou, W. Zhang, K. Xia, X. B. Qiao, and X. J. Xu, J. Chem. Inf. Comput. Sci. (ASAP Article) - linkout | http://pubs3.acs.org/ acs/journals/ supporting_information.page? in_manuscript=ci049884m | High - Original Author Data | - |
| Ekins Caco-2 Data Set | 28 Inhibitors of Rhinovirus Replication with Caco-2 Permeability Data | Sean Ekins, Gregory L. Durst, Robert E. Stratford, David A. Thorner, Richard Lewis, Richard J. Loncharich, and James H. Wikel, J. Chem. Inf. Comput. Sci 2001 (41) 1578 - 1586. - linkout | Dataset (SD File) | Low - Please Check | Drawn Manually - Please Check for Errors Before Use |
| Blood-Brain-Barrier Data Set | Blood-Brain-Barrier Penetration Data for Training Set (57 Compounds) and Test Set (13 Compounds) | Ruifeng Liu, Hongmao Sun, and Sung-Sau So, J. Chem. Inf Comput Sci. 2001 (41) 1623 - 1632. - linkout | Dataset | Low - Please Check | Drawn Manually - Please Check for Errors Before Use |
Docking datasets
top
Mechanistic datasets
top
Mixed/other datasets
top
| Name | Activity/Property | Reference | Dataset/Linkout | Trust Level | Comment |
| Epoxide-Enantioselectivity Data Set | 28 Epoxides with associated enatioselectivity ratios | S. Funar-Timofei et Al., J. Chem. Inf. Comput. Sci., 2003, 43, 934-940. - linkout | Dataset | Low - Please Check | Drawn Manually - Please Check for Errors Before Use |
| NCI Data Set | 32,557 2D structures with cancer test data 42,689 2D structures with AIDS test data 23,031 2D structures with both cancer and AIDS test data | http://dtp.nci.nih.gov/ docs/cancer/cancer_data.html | http://cactus.nci.nih.gov/ ncidb2/download.html | High - Original Data | - |
|