2014

- R. Baron, “Molecular Simulation for Molecular Biology,” Mol. Simulation, March issue on advances in molecular simulation, 2014.
- M. G. Lopez, M. Horton, and E. Chow, “Brownian Dynamics Simulations with Hydrodynamic Interactions on GPUs”, The 3rd Conference of the Extreme Science and Engineering Discovery Environment (XSEDE ’14), July, 2014.
- J. Young, M. G. Lopez, M. Horton, R. Glassbrook, and J.S. Vetter, “Advanced Application Support for Improved GPU Utilization on Keeneland”, The 3rd Conference of the Extreme Science and Engineering Discovery Environment (XSEDE ’14), July, 2014.

2013

- J. Briggs, J. Robertson, N. Hurley, et al., “Expanding the Druggable Space of the LSD1/CoREST Epigenetic Target: New Potential Binding Regions for Drug-Like Molecules, Peptides, Protein Partners, and Chromatin,” PLoS Computational Biology, 9 / 7, 2013.
- L. M. Burko and G. Khanna, “Self-force gravitational waveforms for extreme and intermediate mass ratio inspirals. II: Importance of the second-order dissipative effect,” Phys. Rev. D, 88, 2013.
- M. Burtscher and H. Rabeti, "A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches." 27th IEEE International Parallel & Distributed Processing Symposium, 1289-1298, 2013.
- C. Cao, J. Dongarra, P. Du, M. Gates, P. Luszczek, and S. Tomov, “clMAGMA: High Performance Dense Linear Algebra with OpenCL”, The International Workshop on OpenCL (IWOCL’13), Atlanta, GA, May 13-14, 2013.
- Y. Cui, E. Poyraz, K. Olsen, J. Zhou, K. Withers, S. Callaghan, J. Larkin, C. Guest, D. Choi, A. Chourasia, Z. Shi, S. Day, P. Maechling, and T. Jordan, “Physics-based Seismic Hazard Analysis on Petascale Heterogeneous Supercomputers,” SC13, Denver, Nov 17-22, 2013, accepted.
- Y. Cui, E. Poyraz, J. Zhou, S. Callaghan, P. Maechling,T. Jordan, L. Shih, and P. Chen, “Accelerating CyberShake Calculations on the XE6/XK7 Platform of Blue Waters,” Extreme Scaling Workshop, Boulder, August 14-15, 2013 (accepted).
- E. D'Azevedo, Z. Hu, S. Su, and K. Wong, "A Performance Study of Solving a Large Dense Matrix for Radiation Heat Transfer,”
- S. Donfack, S. Tomov, and J. Dongarra, “Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” submitted to SC’13, 2013.
- T. Dong, V. Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra, “Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters,” UTK CS Technical report, May, 2013.
- T. Dong, J. Dongarra, S. Tomov, and I. Yamazaki, “Tridiagonalization of a symmetric dense matrix on a GPU cluster,” AsHES’13 IPDPS workshop, Boston, USA, 2013.
- A. M. Eiring, B. D. G. Page, I. L. Kraft, T. Y. Zhang, N. A. Vellore, K. R. Reynolds, A. Senina, A. D. Pomicter, J. S. Khorashad, Z. Gu, D. J. Anderson, M. S. Zabriskie, C. C. Arpin, R. Cologouri, S. Ahmad, R. Moriggl, R. Baron, T. O’Hare, P. T. Gunning, and M. W. Deininger,

“BP5-087, a Potent STAT3 Inhibitor, Combines with BCR-ABL1 Inhibition to Overcome Kinase-Independent TKI Resistance in Chronic Myeloid Leukemia,” Cancer Cell, to be submitted. - J. Glaser, J. Qin, P. Medapuram, and D. C. Morse. "Collective and single-chain correlations in disordered diblock copolymer melts: Comparison of simulations and theory," in preparation.
- “GPU-enabled Studies of Molecular Systems on Keeneland at ORNL - On pursuing high resource utilization and coordinated simulations’ progression,” NVIDIA GPU Technology Conference, San Jose, California, March 2013.
- A. Haidar, M. Gates, S. Tomov, and J. Dongarra, “Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” ICS’2013, Eugene, Oregon, USA, 2013.
- A. Haidar, R. Solca, M. Gates, S. Tomov, T. Schulthess, and J. Dongarra, “Leading edge hybrid multi-GPU algorithms for generalized eigenproblems in electronic structure calculations,” ISC’13, Leipzig, Germany, 2013.
- C. Hall. W. Ji, and E. Blaisten-Barojas, "The Metropolis Monte Carlo method w ith CUDA enabled Graphic Processing Units," 2013 submitted.
- N. M. Henriksen, D. R. Roe, and T. E. Cheatham III, “Reliable Oligonucleotide Conformational Ensemble Generation in Explicit Solvent for Force Field Assessment Using Reservoir Replica Exchange Molecular Dynamics Simulations,” The Journal of Physical Chemistry B, 117, 4014-4027, 2013.
- F. Q. Hu, I. Kocaogul, and X. D. Li, “On the adjoint problem in duct acoustics and its solution by Time Domain Wave Packet Method,” AIAA paper, 2012-2247, 2013.
- F. Q. Hu, X. D. Li, M. Jiang, and X. Y. Li, “Time Domain Wave Packet method and suppression of instability w ave in aeroacoustic computations,” revised and submitted to Journal of Fluid Engineering, 2013.
- F. Q. Hu, “An efficient solution of time domain boundary integral equations for acoustic scattering and its acceleration by Graphics Processing Units,” AIAA paper, 2013-2018, 2013.
- G. Khanna, “High-Precision Numerical Simulations on a CUDA GPU: Kerr Black Hole Tails,” Journal of Scientific Computing, 56, 366, 2013.
- N. Ivanova, S. Justham, J. L. Avendano Nandez, and

J. C. Lombardi, "Identification of the Long-Sought Common-Envelope Events," Science, 339, 2013. - W. Ji, C. Hall, and E. Blaisten-Barojas, "Monte Carlo study of highly oxidized oligopyrroles in condensed phases," submitted August 2013.
- R. A. Kulkarni, S. M. Stanford, N. A. Vellore, M. R. Bliss, D. Krishnamurthy, R. Baron, N. Bottini, and A. M. Barrios, “Thiuram Disulfides as Pseudo-irreversible Inhibitors of the Lymphoid Tyrosine Phosphatase,” ChemMedChem, in press.
- R. A. Kulkarni, N. A. Vellore, M. R. Bliss, S. M. Stanford, N. Bottini, R. Baron, and A. M. Barrios, “A Combined Computational and Experimental Approach Identifies Potent Lymphoid Tyrosine Phosphatase Inhibitors,” ChemBioChem, in press.
- "Language Support for Dynamic Hierarchical Data Partitioning", OOPSLA 2013 (to appear)
- T. S. Lee, “On the negative regulation and activation of JAK2: A novel hypothetical model,” Molecular Cancer Research, 2013, in press.
- T. S. Lee, B. K. Radak, A. Pabis, and D. M. York, “A New Maximum Likelihood Approach for Free Energy Profile Construction from Molecular Simulations,” Journal of chemical theory and computation, 9 (1), 153-164, 2013.
- H. Li, G. Fox, G. Laszewski, Z. Guo, and J. Qui, “Co-processing SPMD Computation on GPUs and CPUs on Shared Memory System,”
- H. Liu, J. Seo, R. Mittal, and H. H. Huang, “GPU-Accelerated Scalable Solver for Banded Linear Systems,” Proceedings of IEEE Cluster 2013, To Appear.
- “MAGMA: State-of-the-art developments in high-performance linear algebra for GPUs,” Mini-symposium at the GPU Technology Conference 2013, San Jose, CA, March 11-21, 2013.
- P. Medapuram and D. C. Morse, "Thermodynamic Integration Method of Estimating ODT in Diblock Copolymers," in preparation.
- “Numerical Linear Algebra Libraries for Emerging Architectures: Challenges and Approaches,” Presentation at Numerical Methods for PDEs: in Occasion of Raytcho Lazarov’s 70th Birthday College Station, TX, January 25-26, 2013.
- A. Ozsoy, A. Chauhan, and M. Swany, "Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs," The 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS'13), 2013, submitted.
- J. D. Perlmutter, C. Qiao, and M. F. Hagan, “Viral genome structures are optimal for capsid assembly,” eLife, 2:e00632, 2013.
- S. Potluri, D. Bureddy, H. Wang, H. Subramoni and D. K. Panda, “Extending OpenSHMEM for GPU Computing,” Int'l Parallel and Distributed Processing Symposium (IPDPS '13), May 2013.
- T. Ruiz-Herrero and M. F. Hagan, “Virus assembly on a membrane is facilitated by membrane microdomains,” PLoS Comp. Biol. submitted.
- V. H. Rusu, V. A. C. Horta, B. A. C. Horta, R. D. Lins, and R. Baron, “MDWiZ: A Platform for the Automated Preparation and Translation of Molecular Dynamics Simulations,” J. Mol. Graph. Model., to be submitted.
- J. H. Saltz, G. Teodoro, T. Pan, L. AD Cooper, J. Kong, S. Klasky, and T. M. Kurc, "Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures," International Journal of High Performance Computing Applications, 2013.
- S. Schlachter, S. Herbein, S. Ou, J.S. Logan, S. Patel, and M. Taufer, “Efficient SDS Simulations on Multi-GPU Nodes of XSEDE High-end Clusters,” Proceedings of the Eighth IEEE International Conference on e-Science and Grid Technologies (eScience), Beijing, China, October, 2013.
- S. Schlachter, S. Herbein, S. Ou, J.S. Logan, S. Patel, and M. Taufer, “Pursuing Resource Utilization and Coordinated Progression in GPUenabled Molecular Simulations,” IEEE Design&Test of Computers, 2013. in review
- P. Setny, R. Baron, P. M. Kekenes-Huskey, J. A. McCammon, and J. Dzubiella, “Solvent Fluctuations in Hydrophobic Cavity-Ligand Binding Kinetics,” Proc. Natl. Acad. Sci. 110, 1197-1202, USA, 2013.
- P. Setny, R. Baron, and J. A. McCammon, Comment on “Molecular Driving Forces of the Pocket-Ligand Hydrophobic Association,” by G. Graziano, Chem. Phys. Lett. 533, 2012; Chem. Phys. Lett., 555, 306-309, 2013.
- D. A. Sivak, J. D. Chodera, and G. E. Crooks, "Using nonequilibrium fluctuation theorems to understand and correct errors in equilibrium and nonequilibrium discrete Langevin dynamics simulations," Phys. Rev. X, 3:011007, 2013.
- S. Song, "Power, Performance and Energy Models and Systems for Emergent Architectures," PhD diss., Virginia Polytechnic Institute and State University, 2013.
- S. Song, C. Su, B. Rountree, and K. W. Cameron, "A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures," 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2013.
- W. Tang, "Parallel construction of large circular cartograms using graphics processing units," International Journal of Geographical Information Science, 1-25, 2013.
- G. Teodoro, T. Pan, T. M. Kurc, J. Kong, L. AD Cooper, and J. H. Saltz. "Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines," Parallel Computing, 2013.
- G. Teodoro, T. Pan, T. M. Kurc, J. Kong, L. AD Cooper, N. Podhorszki, S. Klasky, and J. H. Saltz, "High-throughput analysis of large microscopy image datasets on CPU-GPU cluster platforms," In Parallel & Distributed Processing (IPDPS), IEEE 27th International Symposium on, 103-114. IEEE, 2013.
- M. Tortorici, M. Borrello, M. Tardugno, et al., “Protein Recognition by Short Peptide Reversible Inhibitors of the Chromatin-Modifying LSD1/CoREST Lysine Demethylase,” ACS Chemical Biology, 2013.
- S. Treichler, M. Bauer, and A. Aiken, “Language Support for Dynamic, Hierarchical Data Partitioning,” Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2013.
- N. A. Vellore and R. Baron, “Epigenetic Molecular Recognition: A Biomolecular Modeling Perspective,” ChemMedChem, 2013.
- J. S. Vetter, R. Glassbrook, K. Schwan, S. Yalamanchili, M. Horton, A. Gavrilovska, M. Slawinska, J. Dongarra, J. S. Meredith, P. C. Roth, et al., “Keeneland: Computational Science using Heterogeneous GPU Computing,” Contemporary High Performance Computing: From Petascale Toward Exascale, Volume 1, Boca Raton, p.900, 2013.
- R. C. Walker and R. M. Betz, "An investigation of the effects of error correcting code on GPU-accelerated molecular dynamics simulations," Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, 8, ACM, 2013.
- K. Wang, J.D. Chodera, Y. Yang, and M.R. Shirts, “Identifying ligand binding sites and poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics”, Journal of Computer-Aided Molecular Design, (27) 12, 989-1007, 2013.
- B. Wibking, "Simulating the universe with GPU-accelerated supercomputers: N-body methods, tests, and examples," 2013.
- I. Yamazaki, T. Dong, R. Solca, S. Tomov, J. Dongarra, and T. Schilthess, "Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems," Concurrency and Computation: Practice and Experience, submitted.
- J. Yan, L. Li, and C. O’Grady, “Graphics Processing Unit acceleration of the Random Phase Approximation in the projector augmented wave method,” Computer Physics Communications, 2013.
- M. S. Zabriskie, C. A. Eide, T. Lange, J. S. Khorashad, N. A. Vellore, J. C. Robertson, B. J. Druker, R. Baron, M. W. Deininger, and T. O’Hare, “Profiling BCR-ABL1 compound mutant resistance in Philadelphia chromosomepositive leukemia,” Cancer Discovery, to be submitted
- J. Zhou, Y. Cui, E. Poyraz, D. J. Choi, and C. C. Guest, "Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers," International Conference on Computational Science, Spain, 2013.

2012

- “Acceleration of the BLAST hydro code on GPUs,” Poster at SC’12, Salt Lake City, Utah, November, 2012.
- “Achieving high performance with multiple-GPU non-symmetric eigenvalue solver,” Presentation at SIAM CSE’13, Boston, MA, February 25-March 1, 2012.
- “A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines,” Presentation at ICCS’12, Omaha, Nebraska, June 4-6, 2012.
- “A Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations,” Presentation at SIAM CSE’13, Boston, MA, February 25-March 1, 2012.
- “A novel hybrid CPU- GPU generalized eigensolver for electronic structure calculations based on fine grained memory aware tasks,” Poster at the International Conference for High Performance Computing, Networking, Storage and Analysis SC’12, Salt Lake City, Utah, November, 2012.
- H. Anzt, S. Tomov, J. Dongarra, and V. Heuveline, “A Block-Asynchronous Relaxation Method for Graphics Processing Units,” SIAM Journal on Scientific Computing, 2012.
- H. Anzt, S. Tomov, J. Dongarra, and V. Heuveline, “Weights for Block-Asynchronous Iteration on GPU-Accelerated Systems,” Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms, Best paper award, Rhodes Island, Greece, August 2012.
- H. Anzt, S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, “Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,” Proc. of ICCS’12, Omaha, Nebraska, June 4-6, 2012.
- M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy, and S. Tomov, “A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines,” Proc. of ICCS’12, Omaha, Nebraska, June 4-6, 2012.
- E. Barausse, A. Buonanno, S. A. Hughes, G. Khanna, S. O'Sullivan, and Y. Pan, “Modeling multipolar gravitational-wave emission from small mass-ratio mergers,” Phys. Rev. D 85, 024046, 2012.
- R. Baron and V. Molinero, “Water-Driven Cavity–Ligand Binding: Comparison of Thermodynamic Signatures from Coarse-Grained and Atomic-Level Simulations,” Journal of Chemical Theory and Computation, 8/10, 2012.
- R. Baron, P. Setny, and F. Paesani, “Water Structure, Dynamics, and Spectral Signatures: Changes Upon Model Cavity-Ligand Recognition,” J. Phys. Chem. B., 46, 13774-13780, 2012.
- R. Baron and N. A. Vellore, “LSD1/CoREST is an Allosteric Nanoscale Clamp Regulated by H3-Histone-Tail Molecular Recognition,” Proc. Natl. Acad. Sci. USA, 109, 12509-12514, 2012.
- M. Bauer, S. Treichler, E. Slaughter, and A. Aiken, “Legion: Expressing Locality and Independence with Logical Regions,” SC’12, Salt Lake City, Utah, November, 2012.
- “Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,” Presentation at ICCS’12, Omaha, Nebraska, June 4-6, 2012.
- E. F. Bollig, N. Flyer, and G. Erlebacher, "Solution to PDEs using radial basis function finite-differences (RBF-FD) on multiple GPUs," Journal of Computational Physics, 2012.
- M. Burtscher and H. Rabeti, “A Scalable Heterogeneous Parallelization Framew ork for Iterative Local Searches,” 27th IEEE International Parallel & Distributed Processing Symposium, May 2013.
- R. M. Caplan, "NLSEmagic: nonlinear Schrödinger equation multidimensional Matlab-based GPU-accelerated integrators using compact high-order schemes," Computer Physics Communications, 2012.
- M. J. Cawkwell, E. J. Sanville, S. M. Mniszewski, and A. MN Niklasson, "Computing the density matrix in electronic structure theory on graphics processing units," Journal of Chemical Theory and Computation 8, 4094-4101, 2012.
- “clMAGMA: Heterogeneous High-Performance Linear Algebra with OpenCL,” AMD Fusion Developer Summit, Bellevue, WA, June 11-14, 2012.
- “Computational Challenges for Large Scale Heterogeneous Applications,” Mini-symposiums MS 44 and MS63 at SIAM CSE’13, Boston, MA, February 25-March 1, 2012.
- K. Czechowski, C. Battaglino, C. McClanahan, K. Iyer, P-K. Yeung, and R. Vuduc, "On the communication complexity of 3D FFTs and its implications for exascale," Proceedings of the 26th ACM international conference on Supercomputing, 205-214, ACM, 2012.
- P. D'Alberto, "A Heterogeneous Accelerated Matrix Multiplication: OpenCL+ APU+ GPU+ Fast Matrix Multiply," arXiv preprint arXiv:1205.2927, 2012.
- A. Danalis, C. McCurdy, and J. S. Vetter, “Efficient Quality Threshold Clustering for Parallel Architectures,” IEEE International Parallel and Distributed Processing Symposium, Shanghai, 2012.
- H. Dashti, A. Siahpirani, J. Driver, and A. H. Assadi. "Information Surfaces in Systems Biology and Applications to Engineering Sustainable Agriculture." Technological Innovation for Value Creation, 77-84, Springer Berlin Heidelberg, 2012.
- “Dense Linear Algebra Libraries for High Performance Computing,” ISC’12 Tutorial 2, June 17, 2012.
- “Developing Numerical Algorithms on Heterogeneous Architectures with High Productivity in Mind,” Presentation at SIAM CSE’13, Boston, MA, February 25-March 1, 2012.
- S. Donfack, S. Tomov, and J. Dongarra, “Performance evaluation of LU factorization in PLASMA and CALU through hardware counter measurement,” ICL Technical report, August, 2012.
- T. Dong, T. Kolev, R. Rieben, V. Dobrev, S. Tomov, and J. Dongarra, “Acceleration of the BLAST hydro code on GPUs”, Poster at SC’12, Salt Lake City, Utah, November, 2012.
- P. Du, S. Tomov, and J. Dongarra, “Providing GPU capability to LU and QR within the ScaLAPACK framework,” LAPACK Working Note #272 (also available as UTK CS Technical report UT-CS-12-699, September, 2012.
- M. K. Elteir, "A MapReduce Framework for Heterogeneous Computing Architectures," PhD diss., Virginia Polytechnic Institute and State University, 2012.
- “Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems,” Presentation at ICS’12, 365-376, Venice, Italy, June 25-29, 2012.
- S. Gao and G. D. Peterson. "GASPRNG: GPU accelerated scalable parallel random number generator library," Computer Physics Communications, 2012.
- J. Glaser, J. Qin, P. Medapuram, M. Mueller, and D. C. Morse, "Test of a scaling hypothesis for the structure factor of disordered diblock copolymer melts," Soft Matter, 8, 11310-11317, 2012.
- S. L. Grand, A. W. Götzx, and R. C. Walker, "SPFP: Speed without compromise—A mixed precision model for GPU accelerated molecular dynamics simulations," Computer Physics Communications, 2012.
- A. Haidar, S. Tomov, J. Dongarra, R. Solca, and T. Schulthess, “A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine grained memory aware tasks,” Poster at SC’12, Salt Lake City, Utah, November, 2012.
- C. Hsieh, C. Chou, T. Tsai, Y. Cheng, and S. Kuo, "NCHC's Formosa V GPU cluster enters the TOP500 ranking," Cloud Computing Technology and Science (CloudCom), IEEE 4th International Conference on, 622-624. IEEE, 2012.
- H. Huang, L. Wang, E. Lee, and P. Chen, "An MPI-CUDA implementation and optimization for parallel sparse equations and least squares (LSQR)," Procedia Computer Science, 9, 76-85, 2012.
- A. Humphrey, Q. Meng, M. Berzins, and T. Harman, “Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System," Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment (XSEDE 2012), 2012.
- D. D. Jenkins, R. J. Hinde, and G. D. Peterson, "Quantum Mechanical Simulations of Crystalline Helium Using High Performance Architectures," High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, 1468-1469, IEEE, 2012.
- F. Ji, A. M. Aji, J. Dinan, D. Buntinas, P. Balaji, W. Feng, and X. Ma, "Efficient intranode communication in GPU-accelerated systems," Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), IEEE 26th International, 1838-1847, IEEE, 2012.
- F. Ji, A. M. Aji, J. Dinan, D. Buntinas, P. Balaji, R. Thakur, W. Feng, and X. Ma, "DMA-Assisted, Intranode Communication in GPU Accelerated Systems," High Performance Computing and Communication & IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 461-468, IEEE, 2012.
- K. Kasichayanula, D. Terpstra, P. Luszczek, S. Tomov, S. Moore, and G. Peterson, "Power Aware Computing on GPUs," SAAHPC '12 (Best Paper Award), Argonne, IL, July 10-11, 2012.
- A. Khajeh-Saeed, and J. B. Perot, "Direct numerical simulation of turbulence using GPU accelerated supercomputers," Journal of Computational Physics, 2012.
- J. Kurzak, R. Nath, P. Du, and J. Dongarra, "An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs," Applied Parallel and Scientific Computing, Lristjan Jonasson eds., 7133, 248-257, 2012.
- J. Kurzak, P. Luszczek, M. Faverge, and J. Dongarra, "Programming the LU Factorization for a Multicore System with Accelerators," Proceedings of VECPAR’12, Kobe, Japan, April, 2012.
- J. Kurzak, P. Luszczek, and J. Dongarra, "LU Factorization with Partial Pivoting for a Multicore System with Accelerators," IEEE Transactions on Parallel and Distributed Computing (accepted), August, 2012 .
- S. Lee and J. S. Vetter, “Early Evaluation of Directive-Based GPU Programming Models for Productive Exascale Computing,” International Conference for High Performance Computing, Networking, Storage and Analysis SC’12, Salt Lake City, Utah, November, 2012.
- Legion: Expressing Locality and Independence with Logical Regions", SC’12, Salt Lake City, Utah, November, 2012.
- “Linear Algebra Libraries for High-Performance Computing: Scientific Computing with Multicore and Accelerators,” Tutorial at SC12, Salt Lake City, Utah, November, 2012.
- “MAGMA – a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures," GPU Technology Theater at SC12, Salt Lake City, Utah, November, 2012.
- “Matrices Over Runtime Systems @ Exascale,” Poster at SC’12, Salt Lake City, Utah, November, 2012.
- J. McKennon, G. Forrester, and G. Khanna, “High Accuracy Gravitational Waveforms from Black Hole Binary Inspirals Using OpenCL,” Proceedings of the XSEDE12 Conference, Chicago, IL, 2012.
- Q. Meng, A. Humphrey, and M. Berzins, "The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System," Digital Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, WOLFHPC Workshop, 2012.
- J. S. Meredith, R. Sisneros, D. Pugmire, and S. Ahern, "A distributed data-parallel framework for analysis and visualization algorithm development," Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 11-19, ACM, 2012.
- “Multi-GPU Tridiagonalization on Shared-and-distributed-memory systems,” Presentation at SIAM CSE’13, Boston, MA, February 25-March 1, 2012.
- “One-sided dense matrix factorizations on a multicore with multiple GPU accelerators,” Presentation at ICCS’12, Omaha, Nebraska, June 4-6, 2012.
- T. K. Samuel, S. McNally, and J. Wynkoop, "An analysis of GPU utilization trends on the Keeneland initial delivery system," Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond, 5, ACM, 2012.
- F. Song and J. Dongarra, “A scalable framework for Heterogeneous GPU-Based Clusters,” The 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2012), Pittsburgh, USA, 2012.
- F. Song, S. Tomov, and J. Dongarra, “Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems,” Proc. of ICS’12, 365-376, Venice, Italy, June 25-29, 2012.
- L. S. Song, C. Su, B. Rountree, and K. W. Cameron, "Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems," High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, 1344-1345. IEEE, 2012.
- K. Spafford, J. S. Meredith, S. Lee, D. Li, P. C. Roth, and J. S. Vetter, “The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Architectures,” ACM Computing Frontiers (CF), Cagliari, Italy, ACM, 2012.
- K. L. Spafford, and J. S. Vetter, "Aspen: a domain specific language for performance modeling," High Performance Computing, Networking, Storage and Analysis (SC), International Conference for, 1-11, IEEE, 2012.
- J. M. Swails and A. E. Roitberg, "Enhancing conformation and protonation state sampling of hen egg white lysozyme using pH replica exchange molecular dynamics," Journal of Chemical Theory and Computation, 8, 4393-4404, 2012.
- G. Teodoro, T. Pan, T. M. Kurc, J. Kong, L. AD Cooper, N. Podhorszki, S. Klasky, and J. H. Saltz, "CPU-GPU Cluster Platforms," Technical Report, Center for Comprehensive Informatics, 2012.
- G. Teodoro, T. M. Kurc, T. Pan, L. AD Cooper, J. Kong, P. Widener, and J. H. Saltz. "Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems,” Parallel & Distributed Processing Symposium (IPDPS), IEEE 26th International, 1093-1104, 2012.
- G. Teodoro, T. Pan, T. M. Kurc, J. Kong, L. AD Cooper, and J. H. Saltz. "High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms," arXiv preprint arXiv:1209.3332, 2012.
- V. Tipparaju and J. S. Vetter, “GA-GPU: Extending a Library-based Global Address Space Programming Model for Scalable Heterogeneous Computing Systems,” ACM Computing Frontiers (CF), 2012.
- L. Wan, C. R. Iacovella, T. D. Nguyen, H. Docherty, and P. T. Cummings, "Confined fluid and the fluid-solid transition: Evidence from absolute free energy calculations," Physical Review B 86, 214105, 2012.
- H. Wu, G. Diamos, J. Wang, H. Cadambi, S. Yalamanchili, and S. Chakradhar, “Optimizing Data Warehousing Applications for GPUs using Kernel Fusion/Fission,” Workshop on Multicore and GPU Programming Models, Languages and Compilers, May, 2012.
- J. Wu, B. Hong, T. Takeda, and J. Guo, “High performance transcription factor-DNA docking with GPU computing,” Proteome Science, 10, S17, 2012.
- J. Wu, C. Chen, and B. Hong, “A GPU-Based Approach to Accelerate Computational Protein-DNA Docking,” IEEE Computing in Science and Engineering, 14(3), 20-29, 2012.
- I. Yamazaki, S. Tomov, and J. Dongarra, “One-sided dense matrix factorizations on a multicore with multiple GPU accelerators,” Proc. of ICCS’12, Omaha, Nebraska, June 4-6, 2012.
- J. Zhou, D. Unat, D. Choi, C. Guest, and Y. Cui, “Hands-on performance tuning of 3D finite difference earthquake simulation on GPU fermi chipset,” Proceedings of International Conference on Computational Science (ICCS’12, Omaha, Nebraska, June 4-6, 2012), 9, 976-985, 2012.

2011

- E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, and S. Tomov, “LU factorization for Accelerator-based systems,” submitted to THE 9TH ACS/IEEE International Conference on Computer Systems and Applications AICCS, Sharm El-Sheikh, Egypt, June 27-30, 2011.
- E. Agullo, C. Augonnet, J. Dongarra, M. Feverge, H. Ltaief, S. Thibault, and S. Tomov, “QR Factorization on a Multicore Node Enahanced with Multiple GPU Accelerators,” IPDPS 2011, Anchorage, AK, May 2011.
- C. Augonnet, "Scheduling Tasks over Multicore machines enhanced with acelerators: a Runtime System's Perspective," PhD diss., Université Bordeaux 1, 2011.
- E. Barausse, V. Cardoso, and G. Khanna, “Testing the Cosmic Censorship Conjecture with point particles: the effect of radiation reaction and the self-force,” Phys. Rev. D 84, 104006, 2011.
- G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark, “Ocelot: A Dynamic Optimizing Compiler for Bulk Synchronous Applications in Heterogeneous Systems,” IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, September 2010.
- J. Dongarra, J. Kurzak, P. Luszczek, and S. Tomov, “Dense Linear Algebra on Accelerated Multicore Hardware,” High Performance Scientific Computing: Algorithms and Applications, Editors Michael W. Berry, Efstratios Gallopoulos, Ananth Grama, Bernard Philippe, Alex Pothen, and Yousef Saad, 2011.
- P. Du, P. Luszczek, S. Tomov, and J. Dongarra, "Soft Error Resilient QR Factorization for Hybrid System with GPGPU," Journal of Computational Science, Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems at SC11, Seattle, WA, November 14, 2011.
- N. Farooqui, A. Kerr, G. Diamos, S. Yalamanchili, and K. Schwan, “A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot,” Proceedings of Fourth Workshop on General-Purpose Computation on Graphics Processing Units, March 2011.
- V. Gupta, K. Schwan, N. Tolia, V. Talwar, and P. Ranganathan, “Pegasus: Coordinated Scheduling for Virtualized Accelerator-based systems,” ATC, 2011.
- B. Hong, J. Wu, and J. Guo, “Improving Prediction Accuracy of Protein-DNA Docking with GPU Computing,” Best Paper Award, IEEE International Conference on Bioinformatics and Biomedicine, November 2011.
- M. Horton, S. Tomov, and J. Dongarra, “A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures,” submitted to 2011 Symposium on Application Accelerators in High Performance Computing, Knoxville, TN, July 19-21, 2011.
- C. R. Iacovella, W. R. French, B. G. Cook, P. RC Kent, and P. T. Cummings, "Role of Polytetrahedral Structures in the Elongation and Rupture of Gold Nanowires," ACS nano 5, no. 12, 10065-10073, 2011.
- A. Kerr, G. Diamos, and S. Yalamanchili, “GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot,” GPU Computing GEMS, vol. 2, 2011.
- R. Kim, R. Corona, B. Hong, and J. Guo, “Benchmarks for Flexible and Rigid Transcription Factor-DNA Docking,” BMC Structural Biology 11(45), 2011.
- J. Kurzak, S. Tomov, and J. Dongarra, “Autotuning GEMMs for Fermi,” submitted to SC11, November 2011.
- J. Kurzak and J. Dongarra, “Linear Algebra Libraries for High-Performance Computing: Scientific Computing with Multicore and Accelerators,” submitted to SC11, November 2011.
- J. S. Meredith, P. C. Roth, K. L. Spafford, and J. S. Vetter, "Performance implications of nonuniform device topologies in scalable heterogeneous architectures," Micro, IEEE 31, 66-75, 2011.
- A. Merritt, V. Gupta, A. Verma, A. Gavrilovska, and K. Schwan, “Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies,” VTDC 2011.
- "Quantifying NUMA and contention effects in multi-GPU systems," Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, 2011.
- K. L. Spafford, J. S. Meredith, and J. S. Vetter, "Quartile and outlier detection on heterogeneous clusters using distributed radix sort," In Cluster Computing (CLUSTER), IEEE International Conference on, 412-419, IEEE, 2011.
- F. Song, S. Tomov, and J. Dongarra, "Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures," University of Tennessee Computer Science Technical Report, UT-CS-11-668, (also Lawn 250), June 16, 2011.
- J. S. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili, “Keeneland: Bringing heterogeneous GPU computing to the computational science community,” IEEE Computing in Science and Engineering, 13(5), 90-5, 2011 .

2010

- E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, S. Thibault, and S. Tomov, “Faster, Cheaper, Better - a Hybridization Methodology to Develop Linear Algebra Software for GPUs,” Nvidia GPU Gems, 2010.
- P. Du, R. Weber, P. Luszczek, S. Tomov, G. Peterson, and J. Dongarra, “From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming,” submitted to Parallel Computing, August 2010.
- H. Ltaief, S. Tomov, R. Nath, and J. Dongarra, “Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators,” submitted to IEEE Transaction on Parallel and Distributed Computing, 2010.
- S. Tomov, R. Natha, and J. Dongarra, “Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms Through Hybrid GPU-Based Computing,” accepted in Parallel Computing, July 2010.