There is no common agreement what diversity of chemical libraries actually comprises. While an abstract definition of diversity may be given and agreed to by most chemists, a single precise definition which enables the development of automated diversity assessment algorithms is not around. Here only some introductory references to diversity shall be given.

A molecular descriptor calculates numerical or binary values from the molecular structure. These include (but are not limited to):

  • One-dimensional descriptors (e.g. logP, molecular weight),
  • Two-dimensional descriptors (graph-based properties) and
  • Three-dimensional properties (e.g. three-point pharmacophores; triangles between putative pharmacophore points that are described by both interaction types and distances between features).
Why are those descriptors needed? Often the molecular connectivity table (or even its 3-dimensional structure) is not a suitable description for a molecule, in particular if not exact matching but similarity or diversity assessment is performed. Then other, more general description of the molecule are used, e.g. to discover structures with a totally different connectivity which still might exhibit the same biological activity as the structure used to query the database. (In addition, these new structure might have improved pharmacokinetic or other improved properties.)

The descriptor used in this work is based on counts of atom types around each heavy atom of the molecule. This is illustrated here:

