Here the descriptor used, MOLPRINT 2D, and the method to compare fingerprints, the Tanimoto coefficient, are briefly described. For further details please see the references.
The Descriptor Used
A molecular descriptor calculates numerical or binary values from the molecular structure. These include (but are not limited to):
For general introductions to molecular descriptors see
- One-dimensional descriptors (e.g. logP, molecular weight),
- Two-dimensional descriptors (graph-based properties) and
- Three-dimensional properties (e.g. three-point pharmacophores; triangles between putative pharmacophore points that are described by both interaction types and distances between features).
- P. Willett, J.M. Barnard and G.M. Downs. Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 1998, 38, 983-996. http://dx.doi.org/10.1021/ci9800211
- A. Bender and R.C. Glen. Molecular similarity: a key technique in molecular informatics. Organic and Biomolecular Chemistry, 2004, 2, 3204-3218. - http://dx.doi.org/10.1039/b409813g
The descriptor used in this work is based on counts of atom types around each heavy atom of the molecule. This is illustrated here:
Thus the descriptor is similar to "Augmented Atoms" which were already around in the 70s, or e.g. to Scitegic ECFP fingerprints. It differs from Scitegic fingerprints in the information that is contained in addition to the chemical element: Whereas Scitegic ECFP fingerprints use a wide range of information ty type each atom (element, charge, valence, number of bonds to hydrogens and heavy atoms) the descriptor employed here simply uses mol2 atom types.
For details see
- A. Bender, H.Y. Mussa, R.C. Glen and S. Reiling. Molecular similarity searching using atom environments, information-based feature selection, and a naive bayesian classifier. Journal of Chemical Information and Computer Sciences, 2004, 44, 170-178. - http://dx.doi.org/10.1021/ci034207y
- A. Bender, H.Y. Mussa, R.C. Glen and S. Reiling. Similarity searching of chemical databases using atom environment descriptors: evaluation of performance. Journal of Chemical Information and Computer Sciences, 2004, 44, 1708-1718. - http://dx.doi.org/10.1021/ci0498719
Comparison of Structures
Comparison of fingerprints is performed by using the Tanimoto coefficient, Tc, which is defined by the number of common features of the two structures (AND), divided by the number of features which are contained in at least one of the structures(OR). In short: Tc = AND / OR.