ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Seminal Papers in Cheminformatics
2
A list compiled by contributors from the Chemistry Development Kit (CDK) and the Blue Obelisk. © CC-0
3
4
5
ReferenceDOIScholiaNotes:
6
Morgan, H. L. (1965) The Generation of a Unique Machine Description for Chemical Structures - A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113.https://doi.org/10.1021/c160017a018
https://tools.wmflabs.org/scholia/doi/10.1021/c160017a018
Improves upon the previous work by D. J. Gluck, which could not handle graphs like the Lehman counter-example shown in CIDS report 3 at http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=AD0460819 .
7
Marsili, M., and Gasteiger, J. (1980) PI-charge Distribution from Molecular Topology and PI-Orbital Electronegativity Croat Chem Acta 53, 601–614.
8
Gasteiger, J., Rudolph, C., and Sadowski, J. (1990) Automatic Generation of 3D-Atomic Coordinates for Organic Molecules. Tetrahedron Comp. Method. 3, 537–547.https://doi.org/10.1016/0898-5529(90)90156-3
https://tools.wmflabs.org/scholia/doi/10.1016/0898-5529(90)90156-3
9
1Weininger, D. (1988) SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. Journal of Chemical Information \& Computer Sciences 28, 31–36.https://doi.org/10.1021/ci00057a005
https://tools.wmflabs.org/scholia/doi/10.1021/ci00057a005
10
Weininger, D., Weininger, A., and Weininger, J. L. (1989) SMILES. 2. Algorithm for generation of unique SMILES notation.https://doi.org/10.1021/ci00062a008
https://tools.wmflabs.org/scholia/doi/10.1021/ci00062a008
While of historic interest, the description is incomplete and it does not support chirality.
11
Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D., and Pletnev, I. (2013) InChI - the worldwide chemical structure identifier standard. J Cheminform 5, 1–9.https://dx.doi.org/10.1186/1758-2946-5-7
https://tools.wmflabs.org/scholia/doi/10.1186/1758-2946-5-7
12
Ray, Louis C., and Russell A. Kirsch. “Finding Chemical Records by Digital Computers.” Science 126, no. 3278 (October 25, 1957): 814–19.
https://doi.org/10.1126/science.126.3278.814
https://tools.wmflabs.org/scholia/doi/10.1126/science.126.3278.814
First description of a computer-based substructure search implementation at the atom and bond level. Introduces the term 'screen' as a fast filter before doing subgraph isomorphism.
13
Penny, Robert H. “A Connectivity Code for Use in Describing Chemical Structures.” Journal of Chemical Documentation 5, no. 2 (1965): 113–17https://doi.org/10.1021/c160017a019
https://tools.wmflabs.org/scholia/doi/10.1021/c160017a019
First description of what are now called circular fingerprints
14
6. Fingerprints - Screening and Similarity - https://www.daylight.com/dayhtml/doc/theory/theory.finger.htmlDescription of Daylight fingerprints (introduced ~1990 but never published in the academic literature). Earlier fingerprint-like systems typically used a fixed set of substructure keys developed for substructure screening.
15
Adamson, George W., and Judith A. Bush. “A Method for the Automatic Classification of Chemical Structures.” Information Storage and Retrieval 9, no. 10 (October 1, 1973): 561–68.

https://doi.org/10.1016/0020-0271(73)90059-4
https://tools.wmflabs.org/scholia/doi/10.1016/0020-0271(73)90059-4
First paper to compare chemical structures based on what would later be known as bitstring fingerprints.
16
(1) Willett, P.; Winterman, V. A Comparison of Some Measures for the Determination of Inter-Molecular Structural Similarity Measures of Inter-Molecular Structural Similarity. Quant. Struct.-Act. Relat. 1986, 5 (1), 18–25.
https://doi.org/10.1002/qsar.19860050105
https://tools.wmflabs.org/scholia/doi/10.1002/qsar.19860050105
First paper to highlight the usefulness of Tanimoto similarity compared to other similarity methods.
17
Wiener H. Structural Determination of Paraffin Boiling Points. Journal of the American Chemical Society. 1947 Jan;69(1):17–20.https://doi.org/10.1021/ja01193a005
https://tools.wmflabs.org/scholia/doi/10.1021/ja01193a005
Johnson and Maggiora (1990) refer to this as the origin of molecular topology descriptors. From page 29: "The first topological index was put forward in 1947 by Wiener [129], and since then over 120 other topological and information-theoretical indices have been described in the literature [130]."
18
Frear, Donald E. H. “Punch Cards in Correlation Studies.” Chemical & Engineering News Archive 23, no. 22 (November 25, 1945): 2077.
https://doi.org/10.1021/cen-v023n022.p2077
https://tools.wmflabs.org/scholia/doi/10.1021/cen-v023n022.p2077
First description of using machines (in this case, punched cards) "to make correlation studies between chemical constitution and any desired property, chemical or physical."
19
Tversky, Amos, "Features of Similarity." Psychology Review, Vol 84 No. 4, p327 (July 1977) https://doi.org/10.1037/0033-295X.84.4.327https://tools.wmflabs.org/scholia/work/Q56454678http://www.cogsci.ucsd.edu/~coulson/203/tversky-features.pdf
20
Tanimoto, Taffee T. (17 Nov 1958). "An Elementary Mathematical theory of Classification and Prediction". Internal IBM Technical Report. 1957 (8?).The Wikipedia page for this has no further online references. (I put a copy at http://dalkescientific.com/tanimoto.pdf . I think the Science reference, https://science.sciencemag.org/content/132/3434/1115 is more germane - and definitely easier to access. -- Andrew Dalke)
21
Bradshaw, John, "Introduction to Tversky similarity measure", Daylight Chemical Information Systems MUG '97 meeting.Never published in an academic journal, but very influential throughout the cheminformatics community. John Bradshaw was the one who originally brought Tanimoto and Tversky similarity to Dave Weininger's attention. https://www.daylight.com/meetings/mug97/Bradshaw/MUG97/tv_tversky.html
22
THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONEhttps://doi.org/10.1111/j.1469-8137.1912.tb05611.x
https://tools.wmflabs.org/scholia/doi/10.1111/j.1469-8137.1912.tb05611.x
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100