|Proposal Title||Workgroup/ Committee||Contact Person(s)||Project Description/Elevator Pitch|
|Attribution, Phase 2||People Workgroup||Kristi Holmes <firstname.lastname@example.org>; Eichmann, David A <email@example.com>||There has been a fundamental shift that recognizes both the interdisciplinary, team-based approach to science as well as the more fine-grained characterization and contextualization of the hundreds and thousands of contributions of varying types and intensities that are necessary to move science forward. Unfortunately, little infrastructure exists to identify, aggregate, present, and (ultimately) assess the impact of these contributions. These significant problems are technical as well as social and require an approach that assimilates cultural and social aspects of these problems in an open and community-driven manner. Here we are developing a contribution role ontology to support modeling of the multiple additional ways in which the translational workforce contributes to research. This effort also includes mining of acknowledgements section of publications to harvest existing contributor roles to serve as “ground truth” and demonstrate that the population of the ontology with actual data is successful and drives additional development.|
|BioData Club Kit Proposal||Education Committee||Ted Laderas <firstname.lastname@example.org>||The translational and informatics workforce is dependent upon the collaboration of individuals across multiple domains with varying levels of expertise. While many institutions provide formal data science education, these resources are not available to and do not meet the needs of all learners. Moreover, these formal educational resources do not facilitate community building. |
However, it is a challenge to identify or create community building models, learning strategies, and content that appeals across domains and can be leveraged and adopted to implement these types of learning initiatives. At OHSU, BioData Club is a successful community building forum and Data Science/Open Science training resource. The BioData Club kit deliverables produced in phase 1 provide CTSA hubs and institutional stakeholders with resources and guidance for implementing similar initiatives at their institutions. Our phase 2 work will be focused on piloting the implementation of the BioData Club Kit at least 2 CTSA hubs.
|CD2H CIELO||Software Workgroup||Payne, Philip <email@example.com>||Historically, the CTSA hubs have minimally collaborated across sites, particularly as it relates to the development of software, tools, and algorithms that address common use cases and information needs. This has created barriers to efficiency and impact of such work and lost economies of scale. There is an increasing need to improve the sharing and reusability of software, tools, algorithms, and reference datasets for use in health-related research. Further, Federal reporting requirements of required funded work to be published or made available in an open access or equivalent environment. Recent efforts to create data commons that can address such needs have occurred in both the NIH and non-profit domains, but such platforms are for the most part not interoperability and face a number of usability challenges.|
|Cloud Infrastructure for Patient Centric Clinical Data Sharing||Software Workgroup||Kari A. Stephens <firstname.lastname@example.org>||CTSAs use federated networks within and between themselves with little to no ability to interoperate, bottlenecked by human analysts for access to aggregated and raw datasets. CTSAs are struggling to leverage cloud based data sharing architectures to support federated ownership of datasets, both technologically and socially through proper scalable governance solutions aimed at research use. CTSAs are missing out on capitalizing on the many innovations and efficiencies the cloud offers for both existing and future software tools for big data.|
|Community Data Warehouse for Research Metadata||People Workgroup||Eichmann, David A <email@example.com>||Most, if not all, of the CTSA hubs expend substantial effort maintaining local databases of effectively the same data - people, publications, grants, etc. A shared data environment in the form of a warehouse of research data was strongly endorsed by participants in the most recent PEA Community meeting. Collaborative population and maintenance of common data would reduce local hub effort, improve data quality and serve as an exemplar of collaborative activity for the CTSA program and NIH programs overall.|
|Competitions||People Workgroup||Firas Wehbe <firstname.lastname@example.org>||Competitions is a software tool for investigators, reviewers and administrators to run various types of NIH-style reviews and competitions, including pilot projects, research awards and reviews. It will support CTSA consortium-wide peer-review activities through a cloud-deployment that supports single sign-on and enhance the ease and robustness of its adoption by interested institutions. Here, we will upgrade the Northwestern Competitions platform to support consortium-wide peer-review activities.|
|CTSA Hub-centric Clinical Data Sharing Governance to Enable the Learning Healthcare System||Software Workgroup||Kari A. Stephens <email@example.com>||EHR data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Harmonized datasets need to conform to an established standard format and vocabulary before any analysis can be done. They need to have a bare minimum threshold of completeness (i.e., what percentage of values are null or empty). They also need to prove a certain level of plausibility (i.e., do the data make sense for what is expected, are they believable and credible). To date, most data sharing networks have developed internal protocols and tools to manage data harmonization, but no publicly available tool with a standard operating procedure exists to easily assess and visualize data quality tests across institutions. Therefore, data quality remains a problem that is inconsistently tackled and only by high level analytic teams if available.|
|Data Harmonization||Data Workgroup||Christopher Chute <firstname.lastname@example.org>||Clinical data in CTSA hubs are not readily queryable in a federated fashion. Many efforts exist to address this, including TriNetX, ACT, PCORNet, and OHDSI among others. Unifying these with an HL7 FHIR framework is an aspiration.|
|Data Quality Methods and Tool to Support CTSA Hub Data Sharing||Software Workgroup||Kari A. Stephens <email@example.com>||EHR data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Harmonized datasets need to conform to an established standard format and vocabulary before any analysis can be done. They need to have a bare minimum threshold of completeness (i.e., what percentage of values are null or empty). They also need to prove a certain level of plausibility (i.e., do the data make sense for what is expected, are they believable and credible). To date, most data sharing networks have developed internal protocols and tools to manage data harmonization, but no publicly available tool with a standard operating procedure exists to easily assess and visualize data quality tests across institutions. Therefore, data quality remains a problem that is inconsistently tackled and only by high level analytic teams if available.|
|Data Sharing & Governance Project||Data Workgroup||John Wilbanks <firstname.lastname@example.org>; Melissa Haendel <email@example.com>||Build out the DUA infrastructure in Synapse along the lines of the GA4GH work|
|DREAM challenge pilot at the University of Washington||Data Workgroup||Timothy Bergquist <firstname.lastname@example.org>; Justin Guinney <email@example.com>||DREAM challenges are an instrumental tool for harnessing the wisdom of the broader scientific community to develop computational solutions to biomedical problems. While previous DREAM challenges have worked with complex biological data as well as sensitive medical data, running DREAM Challenges with Electronic Health Records present unique complications. While previous challenges developed techniques to facilitate model to data approaches to maintain the privacy of the data, ensuring that the EHR data is of a specific quality is important for a challenge. EHR data is also more complicated than the data previously used in model to data approaches. We will be using the OMOP data standard to make development of models standardized, but even with a standardized and well documented dataset, complications can arise when facilitating the model development and submission from multiple parties.|
|Evaluate FHIR Based Adaptor for OMOP||Software Workgroup||Nicholas J. Dobbins <firstname.lastname@example.org>||Since FHIR and OMOP CDM cannot be mapped one to one, an adaptor needs to be developed. Currently there is only one institution in the nation trying to tackle this problem on their own EDW. We would like to create an adaptor to enhance not just our own EDW, but also enable hubs the ability to do so as well. This project supports the various data sharing efforts in place under the CD2H initiative.|
|FISMA compliant REDCap Instances in the Cloud for CTSA Hubs||Software Workgroup||Liz Zampino <email@example.com>||FISMA compliance creates a security stronghold for REDCap instances under federated ID infrastructure.|
We are awaiting descriptive text from Paul Harris, so our efforts here are subject to be informed of that.
|LOINC2HPO for CD2H||Data Workgroup||Peter Robinson <Peter.Robinson@jax.org>||The following provides a description of the planned trajectory of the LOINC2HPO project: 1. EHR semantic modeling and integration, 2. Modelling a medical encounter/patient, 3. Rare disease use case, 4. Common disease use case, 5. Data mining use case, |
|N-Lighten-CD2H Education Collaborations||Education Committee||Shannon McWeeney <firstname.lastname@example.org>||The primary goal are to enhance discoverability, meta-data and ontology development, and assist in populating N-Lighten to facilitate its use by CTSA sites for training and education. |
|Open Science for Decision Makers Course||Education Committee||Ted Laderas <email@example.com>||Key decision makers (PIs, grant reviewers, program managers, educators) need a knowledge base to make decisions about open science and data science driven proposals within their CTSAs. This knowledge base includes understanding of how data integration, software development, and large-scale computation in the biological and biomedical sciences can be utilized. These key decision makers need a broad understanding of the success factors that drive successful projects, rather than in depth expertise.|
|Open Source Clinical Enterprise Data Warehouse (EDW) Data Browser (Leaf)||Software Workgroup||Nicholas J. Dobbins <firstname.lastname@example.org>||Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary uses of data from a multitude of sources. To that end, the UW Medicine Enterprise Data Warehouse (EDW) has emerged as a central port for all data that can include clinical, research, administrative, financial and other data types. Although EDW’s have been popular and successful in providing a single stop for data, they are often non-self service and require an informatician or clinical informatics expert to access. |
To address this challenge, we have developed an easy to use, self service web-based tool for querying, browsing and extracting clinical cohorts from the UW Medicine EDW, called Leaf. Leaf enables querying by data dictionaries or ontologies and allows both de-identified and identified access to patient data and grants access to these datasets in a compliant manner. While Leaf provides basic visualizations, it contains robust tools for exporting directly to REDCap projects. The users of Leaf include both quality improvement and research investigators and has been developed using an Agile development process with a soft production rollout to identify and address software, support and data quality concerns.
|Personas for Clinical and Translational Science||People Workgroup||Sara Gonzales <email@example.com>||The concept of the translational workforce is an important one and plays a prominent role in the work and communication of CTSA Program hubs, NCATS, and beyond. However, to date, a defined list of translational workforce roles in CTS does not exist. Here, we aim to better define the specific roles included in the translational workforce, establish a set of personas to reflect those roles, construct a portfolio of these persona profiles, and disseminate broadly to the CTSA program and to the CD2H for use in development of use cases, communications, training materials, and more.|
|REDCap open vocabulary services||Data Workgroup||Melissa Haendel <firstname.lastname@example.org>||Currently, support for controlled vocabularies in RedCap is limited, both in terms of the available ontologies, and in terms of configurability. via Bioportal’s API. This artificially constrains the power of RedCap: not only excludes vital ontologies that are outside Bioportal, but even those ontologies inside Bioportal can not yet be maximally useful. There are longstanding unmet ontology needs within the RedCap community; the frequently-requested features include:|
Configurability for vocabulary “slims” (specific ontology subsets of interest)
Specific presentation criteria (e.g. display synonyms with annotations).
Value sets or logical relations between fields using ontologies.
Recent collaborations between the Australian Genomics group and GA4GH with RedCap have now enabled the use of other vocabulary services, and the Australian Genomics FHIR vocabulary server has been implemented.
|Repository & Data Catalog||People Workgroup||Matt Carson <email@example.com>, Kristi Holmes <firstname.lastname@example.org>||Infrastructure that can be deployed and managed locally to collect, record, preserve, and disseminate a wide range of digital works across the translational workforce (e.g., datasets, protocols, educational materials, technical reports, supplemental materials, survey instruments) is critical to enhance their visibility, promote people and their expertise, support attribution of their work, aid the discovery and accessibility by the international scientific community, and support open and FAIR-TLC science. Such an initiative requires a trusted framework for digital objects, good data practice workflows; incorporation of standards and persistent identifiers; incorporation of privacy considerations, and strategies to support implementation and integration as well as incentivize individuals participating in such an ecosystem. |
|Research IT and informatics Maturity and Deployment Improvement Survey||Data Workgroup||David Dorr <email@example.com>; Adam Wilcox <firstname.lastname@example.org>||Organizations that engage in research, especially those with Clinical and Translational Science groups, may want to self-assess their maturity of key research IT capabilities and learn to improve these capabilities. This project intends to develop an approach to help organizations through that process. It builds on other assessments by Embi, Knosp, Barnett, and Anderson by narrowing the focus to key areas related to collaborative and open science, and provides more clarity and context for the possibilities for improvement. It also intends to facilitate the process of improvement through guided vignettes and tools.|
|Reusable Data Guidebook, Data-sharing best practice portal, Metadata schema sharing portal|
|Scholar Tracking (investigation phase: landscape analysis and requirement gathering)||People Workgroup||Keith Alan Herzog <email@example.com>||Workforce development is a significant priority for the CTSA Program and the broader research community, yet the challenge of streamlined and operationalized scholar disambiguation and longitudinal data collection and tracking remain unsolved. A great amount of effort has been spent on this topic by hubs establishing priorities and developing manual and semi-automated processes which can help to guide efforts toward automation. Ultimately, we can leverage this proposed landscape analysis and other existing efforts by hubs and others to develop workflows that will result in improved data quality, process efficiency, and automation, benchmarking, etc. for hubs.|
|Sleep Research Data Harmonization and CDE Development||Data Workgroup||Eilis Boudreau, firstname.lastname@example.org, OHSU||The major problem we are trying to address is lack of data harmonization across sleep and circadian research, including pragmatic research. This is not only a problem for experts in the field but also for non-sleep researchers who would like to incorporate sleep and circadian measures in their studies. Below is a summary of our next steps and a use case that was developed during the SRN meeting small group session.|
Start with canonical data dictionary in PhysioMIMI for terms that are already mapped (>900 terms are already mapped, choosing sampling of terms that are representative of the types of data we have in sleep medicine)
—diagnostics terms (eg. sleep apnea syndrome)
—ultradian rhythm (as a neurologic brain marker)
*For complicated terms deconstruct them into core and add on components
|SPARC Instances in the Cloud for CTSA Hubs||Software Workgroup||Robert Schuff <email@example.com>; Liz Zampino <firstname.lastname@example.org>||SPARC is a web based tool for supporting CTSA hub administration. This application supports CTSA hub services delivery, evaluation, CTMS and other important functions. It is open source (Ruby on Rails) and widely used. |
Similarly, NCATS has invested heavily in cloud infrastructure and wants to make available to hubs. Here we bring these two projects together as a pilot.
|User interface development for discovery of CD2H/CTSA resources||People Workgroup||Eichmann, David A <email@example.com>||The 4DM Project (Drug Discovery, Development and Deployment Map) initiated by a collaboration led by Chris Austin has generated substantial interest in understanding the interdependencies of translational research and the entities involved. NCATS’ 4DM prototype currently lacks relevant backing data to display when selecting a vertex in the visualization graph.|