|N. M. O'Boyle; R. A. Sayle||Comparing structural fingerprints using a literature-based similarity benchmark||Journal of Cheminformatics||J. Cheminf.||2016||8||36||10.1186/s13321-016-0148-0||NextMove Software||<b>Background</b>|
The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common.
Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark.
Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384.
|To develop or compare measures of whether two molecules are similar, a benchmark is needed. Our benchmark is based upon molecules appearing together in published papers (from the medicinal chemistry literature).||2|
|N. M. O'Boyle; J. Boström; R. A. Sayle; A. Gill||Using matched molecular series as a predictive tool to optimize biological activity||Journal of Medicinal Chemistry||J. Med. Chem.||2014||57||2704||2713||10.1021/jm500022q||NextMove Software||A matched molecular series is the general form of a matched molecular pair and refers to a set of two or more molecules with the same scaffold but different R groups at the same position. We describe Matsy, a knowledge-based method that uses matched series to predict R groups likely to improve activity given an observed activity order for some R groups. We compare the Matsy predictions based on activity data from ChEMBLdb to the recommendations of the Topliss tree and carry out a large scale retrospective test to measure performance. We show that the basis for predictive success is preferred orders in matched series and that this preference is stronger for longer series. The Matsy algorithm allows medicinal chemists to integrate activity trends from diverse medicinal chemistry programs and apply them to problems of interest as a Topliss-like recommendation or as a hypothesis generator to aid compound design.||We have developed a method that answers the question "What compound to make next?" for medicinal chemists working on a particular series of analogues.||2||1|
|J. K. Wegner; A. Sterling; R. Guha; A. Bender; J.-L. Faulon; J. Hastings; N. O'Boyle; J. Overington; H. Van Vlijmen; E. Willighagen||Cheminformatics||Communcations of the ACM||Commun. ACM||2012||55||65||75||10.1145/2366316.2366334||Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||Open-source chemistry software and molecular databases broaden the research horizons of drug discovery.||High-level introduction to cheminformatics suitable for a computer-science audience.|
|A. R. Maguire; C. Daly; K. Eccles; L. M. Bateman; N. O'Boyle; S. Lawrence||Investigating the influence of the sulfur oxidation state on solid state conformation||CrystEngComm||CrystEngComm||2012||14||7848||7850||10.1039/C2CE26298C||Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||Design, synthesis and structural characterization of a series of diphenylacetylene derivatives bearing organosulfur, amide and amine moieities has been achieved in which the molecular conformation is controlled through variation of the hydrogen bond properties on alteration of the oxidation level of sulfur.|
|N. M. O'Boyle||Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI||Journal of Cheminformatics||J. Cheminf.||2012||4||22||10.1186/1758-2946-4-22||Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||<b>Background</b>|
There are two line notations of chemical structures that have established themselves
in the field: the SMILES string and the InChI string. The InChI aims to provide a
unique, or canonical, identifier for chemical structures, while SMILES strings are
widely used for storage and interchange of chemical structures, but no standard exists
to generate a canonical SMILES string.
I describe how to use the InChI canonicalisation to derive a canonical SMILES string
in a straightforward way, either incorporating the InChI normalisations (Inchified
SMILES) or not (Universal SMILES). This is the first description of a method to generate
canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m
compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance
database, no canonicalisation failures were found with Inchified SMILES. Using Universal
SMILES, 99.79% the ChEMBL database was canonicalised successfully and 99.77% of the
The InChI canonicalisation algorithm can successfully be used as the basis for a common
standard for canonical SMILES. While challenges remain -- such as the development
of a standard aromatic model for SMILES -- the ability to create the same SMILES using
different toolkits will mean that for the first time it will be possible to easily
compare the chemical models used by different toolkits.
|K. S. Eccles; R. E. Morrison; S. P. Stokes; G. E. O'Mahony; J. A. Hayes; D. M. Kelly; N. M. O'Boyle; L. Fábián; H. A. Moynihan; A. R. Maguire; S. E. Lawrence||Utilizing Sulfoxide···Iodine Halogen Bonding for Cocrystallization||Crystal Growth & Design||Cryst. Growth Des.||2012||12||2969||2977||10.1021/cg300189v||Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||The propensity of a range of different sulfoxides and sulfones to |
cocrystallize with either 1,2- or 1,4-diiodotetrafluorobenzene, via
I···O=S halogen bonding, was investigated. Cocrystallization occurred
exclusively with 1,4-diiodotetrafluorobenzene in either a 1:1 or 1:2
stoichiometry of the organohalide and the sulfoxide, respectively,
depending on the sulfoxide used. It was found that the stoichiometry
observed was not necessarily related to whether the oxygen acts as a
single halogen bond acceptor or if it is bifurcated; with I···π
interactions observed in two of the cocrystals synthesized. Only those
cocrystals with a 1:2 stoichiometry exhibit C–H···O hydrogen bonding in
addition to I···O=S halogen bonding. Examination of the Cambridge
Structural Database shows that (i) the I···O=S interaction is similar to
other I···O interactions, and (ii) the I···π interaction is
significant, with the distances in the two cocrystals among the shortest
|N. M. O'Boyle; R. Guha; E. L. Willighagen; S. E. Adams; J. Alvarsson; J.-C. Bradley; I. V. Filippov; R. M. Hanson; M. D. Hanwell; G. R. Hutchison; C. A. James; N. Jeliazkova; A. S. I. D. Lang; K. M. Langner; D. C. Lonie; D. M. Lowe; J. Pansanel; D. Pavlov; O. Spjuth; C. Steinbeck; A. L. Tenderholt; K. J. Theisen; P. Murray-Rust||Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on||Journal of Cheminformatics||J. Cheminf.||2011||3||37||10.1186/1758-2946-3-37||Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||<b>Background</b>|
The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards.
This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry.
We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.
|N. M. O'Boyle; M. Banck; C. A. James; C. Morley; T. Vandermeersch; G. R. Hutchison||Open Babel: An open chemical toolbox||Journal of Cheminformatics||J. Cheminf.||2011||3||33||10.1186/1758-2946-3-33||Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||<b>Background</b>|
A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats.
We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion.
Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from <a href="http://openbabel.org">http://openbabel.org</a>.
|N. M. O'Boyle; C. M. Campbell; G. R. Hutchison||Computational design and selection of optimal organic photovoltaic materials||Journal of Physical Chemistry C||J. Phys. Chem. C||2011||115||16200||16210||10.1021/jp202765c||Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||Conjugated organic polymers are key building blocks of low-cost photovoltaic materials. We have examined over 90 000 copolymers using computational predictions to solve the “inverse design” of molecular structures with optimum properties for highly efficient solar cells (specifically matching optical excitation energies and excited-state energies). Our approach, which uses a genetic algorithm to search the space of synthetically accessible copolymers of six or eight monomer units, yields hundreds of candidate copolymers with predicted efficiencies over 8% (the current experimental record), including many predicted to be over 10% efficient. We discuss trends in polymer sequences and motifs found in the most frequent monomers and dimers in these highly efficient targets and derive design rules for the selection of appropriate donor and acceptor molecules. We show how additional computationally intensive filtering steps can be used, for example, to eliminate targets likely to have poor hole mobilities. Our method effectively targets optimum electronic structure and optical properties far more efficiently than time-consuming serial experiments or computational studies and can be applied to similar problems in other areas of materials science.|
|N. M. O'Boyle; T. Vandermeersch; C. J. Flynn; A. R. Maguire; G. R. Hutchison||Confab - Systematic generation of diverse low-energy conformers||Journal of Cheminformatics||J. Cheminf.||2011||3||8||10.1186/1758-2946-3-8||Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||cheminformatics;conformation_generator||<b>Background</b>|
Many computational chemistry analyses require the generation of conformers, either on-the-fly, or in advance. We present Confab, an open source command-line application for the systematic generation of low-energy conformers according to a diversity criterion.
Confab generates conformations using the 'torsion driving approach' which involves iterating systematically through a set of allowed torsion angles for each rotatable bond. Energy is assessed using the MMFF94 forcefield. Diversity is measured using the heavy-atom root-mean-square deviation (RMSD) relative to conformers already stored. We investigated the recovery of crystal structures for a dataset of 1000 ligands from the Protein Data Bank with fewer than 1 million conformations. Confab can recover 97% of the molecules to within 1.5 A at a diversity level of 1.5 A and an energy cutoff of 50 kcal/mol.
Confab is available from <a href="http://confab.googlecode.com">http://confab.googlecode.com</a>.
|N. M. O'Boyle; J. W. Liebeschuetz; J. C. Cole||Testing assumptions and hypotheses for rescoring success in protein-ligand docking||Journal of Chemical Information and Modeling||J. Chem. Inf. Model.||2009||49||1871||1878||10.1021/ci900164f||Cambridge_Crystallographic_Data_Centre; UK; Analytical_and_Biological_Chemistry_Research_Facility; University_College_Cork; Ireland||docking; scoring_function||In protein−ligand docking, the scoring function is responsible for identifying the correct pose of a particular ligand as well as separating ligands from nonligands. Recently there has been considerable interest in schemes that combine results from several scoring functions in an effort to achieve improved performance in virtual screens. One such scheme is consensus scoring, which involves combining the results from several rescoring experiments. Although there have been a number of studies that have investigated factors affecting success in consensus scoring, these studies have not addressed the question of why a rescoring strategy works in the first place. Here we propose and test two alternative hypotheses for why rescoring has the potential to improve results, using GOLD 4.0. The “consensus” hypothesis is that rescoring is a way of combining results from two scoring functions such that only true positives are likely to score highly. The “complementary” hypothesis is that the two scoring functions used in rescoring have complementary strengths; one is better at ranking actives with respect to inactives while the other is better at ranking poses of actives. We find that in general it is this hypothesis that explains success in a rescoring experiment. We also test an assumption of any rescoring method, which is that the scores obtained are representative of the fitness of the docked pose. We find that although rescored poses tended to have slightly higher clash values than their docked equivalents, in general the scores were representative.||A common strategy for improving performance in a virtual screen is to use an alternative scoring function to rescore the results of a docking experiment. This study is the first to investigate why rescoring can improve performance, and how this should guide the choice of scoring functions for docking and rescoring.||51.892; -8.4956||0|
|L. Conboy; A. G. Foley; N. M. O'Boyle; M. Lawlor; H. C. Gallagher; K. J. Murphy; C. M. Regan||Curcumin-induced degradation of PKCδ is associated with enhanced dentate NCAM PSA expression and spatial learning in adult and aged Wistar rats||Biochemical Pharmacology||Biochem. Pharm.||2009||77||1254||1265||10.1016/j.bcp.2008.12.011||University_College_Dublin; Applied_Neurotherapeutics_Research_Group; Ciaran_Regan_Group; Conway_Institute; Dublin; Ireland||Polysialylation of the neural cell adhesion molecule (NCAM PSA) is necessary for the consolidation processes of hippocampus-based learning. Previously, we have found inhibition of protein kinase C delta (PKCδ) to be associated with increased polysialyltransferase (PST) activity, suggesting inhibitors of this kinase might ameliorate cognitive deficits. Using a rottlerin template, a drug previously considered an inhibitor of PKCδ, we searched the Compounds Available for Purchase (CAP) database with the Accelrys<sup>®</sup> Catalyst programme for structurally similar molecules and, using the available crystal structure of the phorbol-binding domain of PKCδ, found that diferuloylmethane (curcumin) docked effectively into the phorbol site. Curcumin increased NCAM PSA expression in cultured neuro-2A neuroblastoma cells and this was inversely related to PKCδ protein expression. Curcumin did not directly inhibit PKCδ activity but formed a tight complex with the enzyme. With increasing doses of curcumin, the Tyr<sup>131</sup> residue of PKCδ, which is known to direct its degradation, became progressively phosphorylated and this was associated with numerous Tyr<sup>131</sup>-phospho-PKCδ fragments. Chronic administration of curcumin <i>in vivo</i> also increased the frequency of polysialylated cells in the dentate infragranular zone and significantly improved the acquisition and consolidation of a water maze spatial learning paradigm in both adult and aged cohorts of Wistar rats. These results further confirm the role of PKCδ in regulating PST and NCAM PSA expression and provide evidence that drug modulation of this system enhances the process of memory consolidation.||53.3102; -6.2253||0|
|N. M. O'Boyle; G. R. Hutchison||Cinfony - combining Open Source cheminformatics toolkits behind a common interface.||Chemistry Central Journal||Chem. Cent. J.||2008||2||24||10.1186/1752-153X-2-24||Blue_Obelisk_Group||cheminformatics||<b>Background:</b>|
Open Source cheminformatics toolkits such as OpenBabel, the CDK and the RDKit share the same core functionality but support different sets of file formats and forcefields, and calculate different fingerprints and descriptors. Despite their complementary features, using these toolkits in the same program is difficult as they are implemented in different languages (C++ versus Java), have different underlying chemical models and have different application programming interfaces (APIs).
We describe Cinfony, a Python module that presents a common interface to all three of these toolkits, allowing the user to easily combine methods and results from any of the toolkits. In general, the run time of the Cinfony modules is almost as fast as accessing the underlying toolkits directly from C++ or Java, but Cinfony makes it much easier to carry out common tasks in cheminformatics such as reading file formats and calculating descriptors.
By providing a simplified interface and improving interoperability, Cinfony makes it easy to combine complementary features of OpenBabel, the CDK and the RDKit.
|N. M. O'Boyle; D. S. Palmer; F. Nigsch; J. B. O. Mitchell||Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction.||Chemistry Central Journal||Chem. Cent. J.||2008||2||21||10.1186/1752-153X-2-21||Department_of_Chemistry; John_Mitchell_Group; UK; Unilever_Centre_for_Molecular_Informatics; University_of_Cambridge||<b>Background:</b>|
We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024–1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581–590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset.
Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6°C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, ε of 0.21) and an RMSE of 45.1°C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3°C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5°C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest.
With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.
|N. M. O'Boyle; S. C. Brewerton; R. Taylor||Using buriedness to improve discrimination between actives and inactives in docking||Journal of Chemical Information and Modeling||J. Chem. Inf. Model.||2008||48||1269||1278||10.1021/ci8000452||Cambridge_Crystallographic_Data_Centre; UK||docking; GOLD; scoring_function||A continuing problem in protein−ligand docking is the correct relative ranking of active molecules versus inactives. Using the ChemScore scoring function as implemented in the GOLD docking software, we have investigated the effect of scaling hydrogen bond, metal−ligand, and lipophilic interactions based on the buriedness of the interaction. Buriedness was measured using the receptor density, the number of protein heavy atoms within 8.0 Å. Terms in the scaling functions were optimized using negative data, represented by docked poses of inactive molecules. The objective function was the mean rank of the scores of the active poses in the Astex Diverse Set (Hartshorn et al. J. Med. Chem., 2007, 50, 726) with respect to the docked poses of 99 inactives. The final four-parameter model gave a substantial improvement in the average rank from 18.6 to 12.5. Similar results were obtained for an independent test set. Receptor density scaling is available as an option in the recent GOLD release.||52.1976; 0.1265||0|
|N. M. O'Boyle; C. Morley; G. R. Hutchison||Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit||Chemistry Central Journal||Chem. Cent. J.||2008||2||5||10.1186/1752-153X-2-5||Blue_Obelisk_Group; John_Mitchell_Group; Cambridge_Crystallographic_Data_Centre; University_of_Cambridge; Unilever_Centre_for_Molecular_Informatics; UK||cheminformatics; Python||<b>Background:</b>|
Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolkit.
Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by Pybel.
Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers.
|Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers.||52.1975; 0.1252||2||1|
|N. M. O'Boyle; A. L. Tenderholt; K. M. Langner||cclib: A library for package-independent computational chemistry algorithms||Journal of Computational Chemistry||J. Comp. Chem.||2008||29||839||845||10.1002/jcc.20823||Blue_Obelisk_Group||There are now a wide variety of packages for electronic structure calculations, each of which differs in the algorithms implemented and the output format. Many computational chemistry algorithms are only available to users of a particular package despite being generally applicable to the results of calculations by any package. Here we present cclib, a platform for the development of package-independent computational chemistry algorithms. Files from several versions of multiple electronic structure packages are automatically detected, parsed, and the extracted information converted to a standard internal representation. A number of population analysis algorithms have been implemented as a proof of principle. In addition, cclib is currently used as an input filter for two GUI applications that analyze output files: PyMOlyze and GaussSum.||cclib is an open source Python module that can be used to extract information from the log files created by computational chemistry software. It can be used, for example, to extract energy levels or vibrational frequencies.||52.1975; 0.1252||0|
|N. M. O'Boyle; T. Albrecht; D. H. Murgida; L. Cassidy; J. Ulstrup; J. G. Vos||A density functional theory study of the electronic properties of Os(II) and Os(III) complexes immobilized on Au(111).||Inorganic Chemistry||Inorg. Chem.||2007||46||117||124||10.1021/ic060903e||Dublin_City_University; Han_Vos_Research_Group; Ireland; School_of_Chemical_Sciences; Dublin||We present a density functional theory (DFT) study of an osmium polypyridyl complex adsorbed on Au(111). The osmium polypyridyl complex [Os(bpy)<sub>2</sub>(P0P)Cl]<i><sup>n</sup></i><sup></sup><sup>+</sup> [bpy is 2,2‘-bipyridine, P0P is 4,4‘-bipyridine, <i>n</i> = 1 for osmium(II), and <i>n</i> = 2 for osmium(III)] is bound to the surface through the free nitrogen of the P0P ligand. The calculations illuminate electronic properties relevant to recent comprehensive characterization of this class of osmium complexes by electrochemistry and electrochemical scanning tunneling microscopy. The optimized structures for the compounds are in close agreement with crystallographic structures reported in the literature. Oxidation of the complex has little effect on these structural features, but there is a substantial reordering of the electronic energy levels with corresponding changes in the electron density. Significantly, the highest occupied molecular orbital shifts from the metal center to the P0P ligand. The surface is modeled by a cluster of 28 gold atoms and gives a good description of the effect of immobilization on the electronic properties of the complexes. The results show that the coupling between the immobilized complex and the gold surface involves electronic polarization at the adsorbate/substrate interface rather than the formation of a covalent bond. However, the cluster is too small to fully represent bulk gold with the result that, contrary to what is experimentally observed, the DFT calculation predicts that the gold surface is more easily oxidized than the osmium(II) complex.||53.3851; -6.2551||0|
|E. L. Willighagen; N. M. O'Boyle; H. Gopalakrishnan; D. Jiao; R. Guha; C. Steinbeck; D. J. Wild||Userscripts for the life sciences||BMC Bioinformatics||BMC Bioinformatcs||2007||8||487||10.1186/1471-2105-8-487||Blue_Obelisk_Group||Greasemonkey; userscript; Firefox||<b>Background:</b>|
The web has seen an explosion of chemistry and biology related resources in the last 15 years: thousands of scientific journals, databases, wikis, blogs and resources are available with a wide variety of types of information. There is a huge need to aggregate and organise this information. However, the sheer number of resources makes it unrealistic to link them all in a centralised manner. Instead, search engines to find information in those resources flourish, and formal languages like Resource Description Framework and Web Ontology Language are increasingly used to allow linking of resources. A recent development is the use of userscripts to change the appearance of web pages, by on-the-fly modification of the web content. This opens possibilities to aggregate information and computational results from different web resources into the web page of one of those resources.
This paper discusses a number of userscripts that aggregate information from two or more web resources. Examples are shown that enrich web pages with information from other resources, and show how information from web pages can be used to link to, search, and process information in other resources. Due to the nature of userscripts, scientists are able to select those scripts they find useful on a daily basis, as the scripts run directly in their own web browser rather than on the web server. This flexibility allows the scientists to tune the features of web resources to optimise their productivity.
|D. S. Palmer; N. M. O'Boyle; R. C. Glen; J. B. O. Mitchell||Random Forest Models To Predict Aqueous Solubility||Journal of Chemical Information and Modeling||J. Chem. Inf. Model.||2007||47||150||158||10.1021/ci060164k||UK; Department_of_Chemistry; John_Mitchell_Group; Unilever_Centre_for_Molecular_Informatics; University_of_Cambridge||Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 °C gave an <i>r</i><sup>2</sup> = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.||52.1975; 0.1252||0|
|N. M. O'Boyle; G. L. Holliday; D. E. Almonacid; J. B. O. Mitchell||Using Reaction Mechanism to Measure Enzyme Similarity||Journal of Molecular Biology||J. Mol. Biol.||2007||368||1484||1499||10.1016/j.jmb.2007.02.065||The concept of reaction similarity has been well studied in terms of the overall transformation associated with a reaction, but not in terms of mechanism. We present the first method to give a quantitative measure of the similarity of reactions based upon their explicit mechanisms. Two approaches are presented to measure the similarity between individual steps of mechanisms: a fingerprint-based approach that incorporates relevant information on each mechanistic step; and an approach based only on bond formation, cleavage and changes in order. The overall similarity for two reaction mechanisms is then calculated using the Needleman–Wunsch alignment algorithm. An analysis of MACiE, a database of enzyme mechanisms, using our measure of similarity identifies some examples of convergent evolution of chemical mechanisms. In many cases, mechanism similarity is not reflected by similarity according to the EC system of enzyme classification. In particular, little mechanistic information is conveyed by the class level of the EC system.||52.1975; 0.1252||0|
|W. R. Browne; P. Passanati; M. T. Gandolfi; R. Ballardini; W. Henry; A. Guckian; N. M. O'Boyle; J. J. McGarvey; J. G. Vos||Probing inter-ligand excited state interaction in homo and heteroleptic ruthenium(II) polypyridyl complexes using selective deuteriation||Inorganica Chimica Acta||Inorg. Chim. Acta||2007||360||1183||1190||10.1016/j.ica.2006.08.049||Han_Vos_Research_Group; Dublin_City_University; Dublin; Ireland||The effect of deuteriation on the photophysical properties of two series of regioselectively deuteriated Ru(II) complexes ([Ru(bipy)<sub><i>x</i></sub>(ph<sub>2</sub>phen)<sub>3−<i>x</i></sub>]<sup>2+</sup>, where <i>x</i> = 0–3 and ph<sub>2</sub>phen is 4,7-diphenyl-1,10-phenanthroline and [Ru(bipy)<sub>2</sub>(dcbipy<sup>2−</sup>)], where H<sub>2</sub>dcbipy is 4,4′-dicarboxy-2,2′-bipyridyl) is reported. Although overall, deuteriation results in an increase in emission lifetime for all complexes, the effect of substitution of hydrogen for deuterium shows strong regioselectivity both in terms of the ligand and the position on individual ligands that are exchanged.||53.3851; -6.2551||0|
|G. L. Holliday; D. E. Almonacid; G. J. Bartlett; N. M. O'Boyle; J. W. Torrance; P. Murray-Rust; J. B. O. Mitchell; J. M. Thornton||MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms||Nucleic Acids Research||Nucleic Acid Res.||2007||35||D515||D520||10.1093/nar/gkl774||John_Mitchell_Group; Unilever_Centre_for_Molecular_Informatics; Department_of_Chemistry; University_of_Cambridge; UK||MACiE||MACiE (Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and is publicly available as a web-based data resource. This paper presents the first release of a web-based search tool to explore enzyme reaction mechanisms in MACiE. We also present Version 2 of MACiE, which doubles the dataset available (from Version 1). MACiE can be accessed from <a href="http://www.ebi.ac.uk/thornton-srv/databases/MACiE/">http://www.ebi.ac.uk/thornton-srv/databases/MACiE/</a>.||52.1975; 0.1252||2|
|R. M. Jarvis; D. Broadhurst; H. Johnson; N. M. O'Boyle; R. Goodacre||PYCHEM: a multivariate analysis package for python||Bioinformatics||Bioinformatics||2006||22||2565||2566||10.1093/bioinformatics/btl416||The_World||We have implemented a multivariate statistical analysis toolbox, with an optional standalone graphical user interface (GUI), using the Python scripting language. This is a free and open source project that addresses the need for a multivariate analysis toolbox in Python. Although the functionality provided does not cover the full range of multivariate tools that are available, it has a broad complement of methods that are widely used in the biological sciences. In contrast to tools like MATLAB, PyChem 2.0.0 is easily accessible and free, allows for rapid extension using a range of Python modules and is part of the growing amount of complementary and interoperable scientific software in Python based upon SciPy. One of the attractions of PyChem is that it is an open source project and so there is an opportunity, through collaboration, to increase the scope of the software and to continually evolve a user-friendly platform that has applicability across a wide range of analytical and post-genomic disciplines.||52.1975; 0.1252||2|
|G. L. Holliday; G. J. Bartlett; D. E. Almonacid; N. M. O'Boyle; P. Murray-Rust; J. M. Thornton; J. B. O. Mitchell||MACiE: a database of enzyme reaction mechanisms||Bioinformatics||Bioinformatics||2005||21||4315||4316||10.1093/bioinformatics/bti693||John_Mitchell_Group; Unilever_Centre_for_Molecular_Informatics; University_of_Cambridge; UK; Department_of_Chemistry||MACiE||MACiE (mechanism, annotation and classification in enzymes) is a publicly available web-based database, held in CMLReact (an XML application), that aims to help our understanding of the evolution of enzyme catalytic mechanisms and also to create a classification system which reflects the actual chemical mechanism (catalytic steps) of an enzyme reaction, not only the overall reaction.||52.1975; 0.1252||2|
|W. Henry; W. R. Browne; K. L. Ronayne; N. M. O'Boyle; J. G. Vos; J. J. McGarvey||Ground vs. excited state interaction in ruthenium-thienyl dyads: implications for through bond interactions in multicomponent systems||Journal of Molecular Structure||J. Mol. Struct.||2005||735-736||123||134||10.1016/j.molstruc.2004.10.114||School_of_Chemical_Sciences; Dublin; Dublin_City_University; Han_Vos_Research_Group; Ireland||The vibrational and photophysical properties of mononuclear ruthenium(II) and ruthenium(III) polypyridyl complexes based on the ligands 2-(5′-(pyridin-2″-yl)-1′H-1′,2′,4′-triaz-3′-yl)-thiophene, 2-(5′-(pyrazin-2″-yl)-1′H-1′,2′,4′-triaz-3′-yl)-thiophene, are reported. The effect of the introduction of the non-innocent thiophene group on the properties of the triazole based ruthenium(II) complex is examined. The pH sensitive 1,2,4-triazole group, although influenced by the electron withdrawing nature of the thiophene group, does not facilitate excited state interaction of the thiophene and Ru(II) centre. Deuteriation and DFT calculations are employed to enable a deeper understanding of the interaction between the two redox-active centres and rationalise the difference between the extent of ground and excited state interaction in this simple dyad. The results obtained provide considerable evidence in support of earlier studies examining differences in ground and excited state interaction in multinuclear thiophene-bridged systems, in particular with respect to HOMO- and LUMO- mediated superexchange interaction processes.||53.3851; -6.2551||0|
|W. R. Browne; N. M. O'Boyle; W. Henry; A. L. Guckian; S. Horn; T. Fett; C. M. O'Connor; M. Duati; L. De Cola; C. G. Coates; K. L. Ronayne; J. J. McGarvey; J. G. Vos||Ground- and Excited-State Electronic Structure of an Emissive Pyrazine-Bridged Ruthenium(II) Dinuclear Complex||Journal of the American Chemical Society||J. Am. Chem. Soc.||2005||127||1229||1241||10.1021/ja046034e||Dublin; Dublin_City_University; Han_Vos_Research_Group; Ireland; School_of_Chemical_Sciences||The synthesis, characterization, and electrochemical, photophysical, and photochemical properties of the binuclear compounds [(Ru(H<sub>8</sub>-bpy)<sub>2</sub>)<sub>2</sub>((Metr)<sub>2</sub>Pz)](PF<sub>6</sub>)<sub>2</sub> (<b>1</b>) and [(Ru(D<sub>8</sub>-bpy)<sub>2</sub>)<sub>2</sub>((Metr)<sub>2</sub>Pz)](PF<sub>6</sub>)<sub>2</sub> (<b>2</b>), where bpy is 2,2'-bipyridine and H<sub>2</sub>(Metr)<sub>2</sub>Pz is the planar ligand 2,5-bis(5'-methyl-4'<i>H</i>-[1,2,4]triaz-3'-yl)pyrazine, are reported. Electrochemical and spectro-electrochemical investigations indicate that the ground-state interaction between each metal center is predominantly electrostatic and in the mixed-valence form only a low level of ground-state delocalization is present. Resonance Raman, transient, and time-resolved spectroscopies enable a detailed assignment to be made of the excited-state photophysical properties of the complexes. Deuteriation is employed to both facilitate spectroscopic characterization and investigate the nature of the lowest excited states.||53.3851; -6.2551||0|
|W. R. Browne; N. M. O'Boyle; J. J. McGarvey; J. G. Vos||Elucidating excited state electronic structure and intercomponent interactions in multicomponent and supramolecular systems||Chemical Society Reviews||Chem. Soc. Rev.||2005||34||641||663||10.1039/b400513a||Dublin; Dublin_City_University; Han_Vos_Research_Group; Ireland; School_of_Chemical_Sciences||Rational design of supramolecular systems for application in photonic devices requires a clear understanding of both the mechanism of energy and electron transfer processes and how these processes can be manipulated. Central to achieving these goals is a detailed picture of their electronic structure and of the interaction between the constituent components. We review several approaches that have been taken towards gaining such understanding, with particular focus on the physical techniques employed. In the discussion, case studies are introduced to illustrate the key issues under consideration.||53.3851; -6.2551||0|
|L. O'Brien; M. Duati; S. Rau; A. L. Guckian; T. E. Keyes; N. M. O'Boyle; A. Serr; H. Gorls; J. G. Vos||Synthesis and characterisation of ruthenium complexes containing a pendent catechol ring||Dalton Transactions||Dalton Trans.||2004||514||522||10.1039/b311989k||The_World||A series of [Ru(bipy)<small><sub>2</sub></small>L]<small><sup>+</sup></small> and [Ru(phen)<small><sub>2</sub></small>L]<small><sup>+</sup></small> complexes where L is 2-[5-(3,4-dimethoxyphenyl)-4<i>H</i>-1,2,4-triazol-3-yl]pyridine (<strong>HL1</strong>) and 4-(5-pyridin-2-yl-4<i>H</i>-1,2,4-triazol-3-yl)benzene-1,2-diol (<strong>HL2</strong>) are reported. The compounds obtained have been characterised using X-ray crystallography, NMR, UV/Vis and emission spectroscopies. Partial deuteriation is used to determine the nature of the emitting state and to simplify the NMR spectra. The acid-base properties of the compounds are also investigated. The electronic structures of [Ru(bipy)<small><sub>2</sub></small><strong>L1</strong>]<small><sup>+</sup></small> and Ru(bipy)<small><sub>2</sub></small><strong>HL1</strong>]<small><sup>2+</sup></small> are examined using ZINDO. Electro and spectroelectrochemical studies on [Ru(bipy)<small><sub>2</sub></small>(L2)]<small><sup>+</sup></small> suggest that proton transfer between the catechol and triazole moieties on <strong>L2</strong> takes place upon oxidation of the <strong>L2</strong> ligand.||53.3851; -6.2551||0|
|A. L. Guckian; M. Doering; M. Ciesielski; O. Walter; J. Hjelm; N. M. O'Boyle; W. Henry; W. R. Browne; J. J. McGarvey; J. G. Vos||Assessment of intercomponent interaction in phenylene bridged dinuclear ruthenium(II) and osmium(II) polypyridyl complexes||Dalton Transactions||Dalton Trans.||2004||3943||3949||10.1039/b409189b||Dublin; Dublin_City_University; Han_Vos_Research_Group; Ireland; School_of_Chemical_Sciences||The synthesis and characterisation of [Ru(bipy)<small><sub>2</sub></small>(<strong>L1</strong>)]<small><sup>2+</sup></small> and the homodinuclear complexes [M(bipy)<small><sub>2</sub></small>(<strong>L1</strong>)M(bipy)<small><sub>2</sub></small>]<small><sup>4+</sup></small>(where M = Ru or Os), employing the ditopic ligand, 1,4-phenylene-bis(1-pyridin-2-ylimidazo[1,5-<i>a</i>]pyridine)(<strong>L1</strong>), are reported. The complexes are identified by elemental analysis, UV/Vis, emission, resonance Raman, transient resonance Raman and <small><sup>1</sup></small>H NMR spectroscopy, mass spectrometry and electrochemistry. The X-ray structure of the complex [Ru(bipy)<small><sub>2</sub></small>(<strong>L1</strong>)(bipy)<small><sub>2</sub></small>Ru](PF<small><sub>6</sub></small>)<small><sub>4</sub></small> is also reported. DFT calculations, carried out to model the electronic properties of the compounds, are in good agreement with experiment. Minimal communication between the metal centres is observed. The low level of ground state electronic interaction is rationalized in terms of the poor ability of the phenyl spacer in facilitating superexchange interactions. Using the electronic and electrochemical data a detailed picture of the electronic properties of the <strong>RuRu</strong> compound is presented.||53.3851; -6.2551||0|