There is no common agreement what diversity of chemical libraries actually comprises. While an abstract definition of diversity may be given and agreed to by most chemists, a single precise definition which enables the development of automated diversity assessment algorithms is not around. Here only some introductory references to diversity shall be given.

  • R.D. Brown. Descriptors for Diversity Analysis. Perspectives in Drug Discovery and Design, 1997, 7-8, 31-49.
  • Y.C. Martin. Diverse viewpoints on computational aspects of molecular diversity. Journal of Combinatorial Chemistry, 2001, 3, 231-250. - http://dx.doi.org/10.1021/cc000073e
  • R.S. Pearlman and K.M. Smith. Novel software tools for chemical diversity. Perspectives in Drug Discovery and Design, 1998, 9-11, 339-353. - http://dx.doi.org/10.1023/A:1027232610247

A molecular descriptor calculates numerical or binary values from the molecular structure. These include (but are not limited to):

  • One-dimensional descriptors (e.g. logP, molecular weight),
  • Two-dimensional descriptors (graph-based properties) and
  • Three-dimensional properties (e.g. three-point pharmacophores; triangles between putative pharmacophore points that are described by both interaction types and distances between features).
Why are those descriptors needed? Often the molecular connectivity table (or even its 3-dimensional structure) is not a suitable description for a molecule, in particular if not exact matching but similarity or diversity assessment is performed. Then other, more general description of the molecule are used, e.g. to discover structures with a totally different connectivity which still might exhibit the same biological activity as the structure used to query the database. (In addition, these new structure might have improved pharmacokinetic or other improved properties.)

For general introductions to molecular descriptors see
  • P. Willett, J.M. Barnard and G.M. Downs. Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 1998, 38, 983-996. http://dx.doi.org/10.1021/ci9800211
  • A. Bender and R.C. Glen. Molecular similarity: a key technique in molecular informatics. Organic and Biomolecular Chemistry, 2004, 2, 3204-3218. - http://dx.doi.org/10.1039/b409813g

The descriptor used in this work is based on counts of atom types around each heavy atom of the molecule. This is illustrated here:

For details see
  • A. Bender, H.Y. Mussa, R.C. Glen and S. Reiling. Molecular similarity searching using atom environments, information-based feature selection, and a naive bayesian classifier. Journal of Chemical Information and Computer Sciences, 2004, 44, 170-178. - http://dx.doi.org/10.1021/ci034207y
  • A. Bender, H.Y. Mussa, R.C. Glen and S. Reiling. Similarity searching of chemical databases using atom environment descriptors: evaluation of performance. Journal of Chemical Information and Computer Sciences, 2004, 44, 1708-1718. - http://dx.doi.org/10.1021/ci0498719

The reference for this interface

Please see article

S. Fergus, A. Bender and D.R. Spring, Assessment of Structural Diversity in Combinatorial Synthesis, Current Opinion in Chemical Biology 2005 (in press, DOI: http://dx.doi.org/10.1016/j.cbpa.2005.03.004).

for details and cite in publications.