KC7 Tasks by Team
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
$
%
123
 
 
 
 
 
 
 
 
 
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABAC
1
2
Common MilestonesTeam Argon (Foster)Team Oxygen (Ohno-Machado)Team Phosphorus (White)Team Xenon (Davis-Dusenbery)Team Helium (Ahalt)Team Calcium (Grossman, Benedict, Philipakis)
3
1Define Stage 1 KC7 metadata model
4
1.1Perform a (brief) landscape analysis of existing metadata models, such as ISA, DATS, bioschemas, HCLS dataset, Dublin Core, etc.Participate in definition of common metadata model.Lead on the definition of the common metadata model

Especially reviewing and extending DATS (creating a new V3 for the DCPPC), focussing on accessibilty metadata to be performed in conjunction with KC6, and bridging with ISA and schema/bioschema efforts
Participate in the definition of the common metadata model - DATS v3, ISA and bioschemas

Participate in the selection of controlled vocabularies and/or ontologies, which will be based on the subset of metadata chosen in 2.1
Participate in definition of the common metadata model. Participate in definition of the common metadata model.Participate in definition of the common metadata model.
5
1.2Select and document entities to be used during Stage 1 of DCPP based on analysis performed in milestone 1.1.
6
1.3Select controlled vocabularies, values, and/or ontologies to be used to create Stage 1 attributes. Specific vocabulary or bridge terms may be used.
7
1.4Define and document attributes and allowed values of Stage 1 Metadata Model from the conventions selected in milestone 1.2.
8
1.5Publish Crosscut Metadata Model for use by all DCPPC teams.
9
2Specify Exchange Format for metadata ingestion, export, and exchange
10
2.1Perform requirements analysis and brief review of implementation options.Participate in definition of Common Exchange Format.Lead on the definition of Common Exchange Format.Participate in definition of Common Exchange Format.Participate in definition of Common Exchange Format.Participate in the selection of the metadata subsets.Participate in definition of Common Exchange Format.
(Did Team Helium mean to put "Participate in definition of Common Exchange Format." here?)
11
2.2Publish Exchange Format for use by all DCPCC teams.
12
3Produce Stage 1 KC7 Stage 1 Metadata Instance
13
3.1In concert with use case refinement and activities in milestone 1 (i.e., metadata model development), define and document the subset of metadata entities from TopMED, GTEx, and AGR that will be represented in the Metadata Instance and subsequently used to demonstrate Stage 1 MVPs.Participate in the selection of the metadata subsets.- Provide phenotypic metadata currently collected by DataMed
- Define genetic metatdata based on existing annotation tools
Participate in the selection of the metadata subsets.Participate in the selection of the metadata subsets.Participate in the selection of the metadata subsets.either participate in the production of the Stage 1 KC7 Metadata Instance or produce a Calcium Metadata Instance
14
3.2Map Stage 1 Metadata to the Crosscut Metadata Model. Document map.Contribute to mapping of metadata to model.- Transform original metadata to the new DATS V3
- Recognize important entities such as genes and diseases from textual data fields using NLP
- Map study variables to CDE (common data elements) if possible
Participate in identification of elements from the Stage 1 metadata that are equivalent to elements in the standards chosen in 1.2. Provision for elements that don't map to the standard.Contribute to mapping the metadata to model, referring to work from 1.2 and 1.3Contribute to mapping the metadata to model, referring to work from 1.2 and 1.3
15
3.3Extract metadata defined in 3.1 from each repository.Different ingestors for retrieval mode(REST, FTP) and data format (XML, CSV, JSON) combinations Work on building ingestors for TopMed and GTEx (Repositive)
16
3.4Transform Stage 1 Metadata to conform to the model, including any metadata cleaning required to conform to allowed values of the Crosscut Metadata Model. Document transformations performed.Participate in "cleaning" metadata to adhere to standards, maintain both original and cleaned metadata for indexing and search purposes.
17
3.5Implement the transformed Stage 1 Metadata in the Exchange Format.
18
3.6Make available the Stage 1 Metadata Instance to the DCPPC.
19
4Produce Metadata Matrix
20
4.1Perform requirements analysis and provide a set of competency questions that can be used for benchmarking and troubleshooting.Contribute to competency questions Participate in drafting competency questions.Contribute to competency questions Participate in drafting competency questions.
21
4.2Select the additional standards, controlled vocabularies, and ontologies to be used to create the Metadata Matrix.Provide expertise in medical ontologies Participate in identifying additional standards that can map to those chosen in 1.2Provide expertise from work with TCGA datasetProvision integrated ontologies and equivalencies between IDs for diseases, phenotypes, genes, etc.
22
4.3Define the mappings from Crosscut Metadata Model attributes to the additional standards, controlled vocabularies, and ontologies identified in 4.2.
23
4.4Define mappings among termed defined within the additional standards, controlled vocabularies, and ontologies identified in 4.2.Participate in establishing indexing rules.Participate in establishing indexing rules.Participate in establishing indexing rules.
24
4.5Integrate the mappings defined in 4.3 and 4.4 to produce the Metadata Matrix, and make the Metadata Matrix available to DCPPC teams.
Contribute to mapping Stage 1 metadata to Stage 1 conventions.
25
5Verify availability of Stage 1 Data Instance
26
5.1Verify that data stewards have provided DCPPC participants with access to the data underlying the Stage 1 Metadata Instance, plus associated listings and documentation.
27
MVP Specific MilestonesTeam Argon (Foster)Team Oxygen (Ohno-Machado)Team Phosphorus (White)Team Xenon (Davis Dusenbery)Team Helium (Ahalt)
28
6Demonstrate Search Engine index and search of Stage 1 Metadata Instance
Demonstrate index and search with complementary technologies: DERIVA and Globus Search
29
6.1Ingest Stage 1 Metadata Instance.Ingest Stage 1 metadata into the Entity/Attribute/Value based model of Globus Search and into Crosscut Metadata Model in DERIVA. Ingest Stage 1 metadata into DATS using the ingestion/enhancer pipeline of DataMedIngest Stage 1 metadata into OSDF or something comparableIngest Stage 1 metadata Ingest Stage 1 metadata Ingest Stage 1 metadata or Calcium's instance of metadata
30
6.2Index Stage 1 Metadata Instance.Use Globus Search and DERIVA to Index Stage 1 metadata.Use Elasticsearch to index the ingested metadataUse Elastic Search to index ingested metadataIndex Stage 1 metadataIndex Stage 1 metadataIndex Stage 1 metadata or Calcium's instance of metadata
31
6.3Search Stage 1 Metadata Instance.Use Globus Search, based on web-scale search technology (Apache Lucene and Elasticsearch) to perform queries over Stage 1 data. Use DERIVA to perform queries over Stage 1 data. Extend DataMed search engine (based on Elasticsearch) to allow searching for both phenotypic and genetic variables included in datasetsUse Elastic Search Query Language to search dataSeven Bridges Data Browser on visual interface, Seven Bridges Datasets API, Repositive Search APISearch with ElasticSearch, Blazegraph, ProvStore.Search with ElasticSearch either Stage 1 Metadata Instance or Calcium's Instance of metadata
32
6.4Export search results in Exchange Format.Export search results from Globus Search and DERIVA in agreed upon common export/exchange format.Export search results in common format
Allow users to save search results online
Export search results in agreed upon formatExport search results in agreed upon formatExport search results to common format.
33
6.5Provide search engine API.Provide the DERIVA and Globus Search APIs currently in production for users to create and manage indexes, ingest data, and query those indexes.Provide the DataMed Search APIs to DCPPC Seven Bridges Data Browser on visual interface, Seven Bridges Datasets API, Repositive Search APIProvide ElasticSearch API.Provide API for search for either Stage 1 Metadata Instance or Calcium's Instance
34
7Demonstrate Query Portal to Common Metadata Instance
35
7.1Provide a user interface that allows for:
36
7.1.1Faceted searchConfigure DERIVA UI to display shared modelExtend DataMed faceted search Use GDC portal or derivative to support faceted search. This work will be done jointly with Team CalciumFaceted search at dataset level through Repositive's existing search engine and faceted search at data file level through Seven Bridges Data Browser and Datasets APISupport faceted search
37
7.1.2Search enabled by metadata mappingConfigure DERIVA to use shared metadata model and controlled vocabularyExternd DataMed Terminology Service for metadata mappingUse OSDF to map shared metadata model and controlled vocabulary
38
7.1.3Ability to pass data to workspaces without local downloadIntegrate BDBag export into DERIVA and Globus Search to allow data to pass to workspaces without local download.Develop export function within DataMedIntegrate agreed upon export format (manifests) into search portalAbility to pass data to workspaces without local download (feature of Seven Bridges Core Infrastructure)Support the ability to pass data to workspaces without download.
39
7.1.4Ability to build custom data sets/virtual cohorts (e.g., run query against TOPMed metadata, identify records based on that query, package results in a BDBag to pass to analysis workflow).Deploy versioned catalog into MVP, create identifiers for dynamic identifiers, provide export via BDBagsUse GDC portal to create virtual cohorts and save cohort to commons for sharing with others. We will also support a mechanism to save the query used to build the cohort so chohorts can be updated as more data become availableprovide ability to build synthetic cohorts through Repositive search engine. Demonstrate concept of passing cohorts to Seven Bridges for 7.1.3Support the ability to build custom datasets
40
7.1.5Ability to search across data sourcesUse Globus Search to search across data sources (API only)DataMed search across data repositories GDC search across data sourcesAbility to search across data sources within and outside the commons (Repositive search engine)Support the ability to search across data sources
41
7.1.6Search leveraging Metadata MatrixGlobus Search (API only) and DERIVA search leveraging Metadata MatrixGDC search across harmonized metadataEnable search over harmonized metadata
42
7.2Demonstrate interoperability with DCPPC access control mechanisms, workspaces, cloud providers, and identifiers.Demonstrate interoperability of Globus Search and DERIVA with DCPPC access control mechanisms, workspaces, cloud providers, and identifiers.Demonstrate interoperability with data access control mechanisms, workspaces, cloud agnostic architecture for data storage and computation, and identifiers.Demonstrate interop with DCPPC access control, cloud providers and identifiers
43
7.3Document and provide access to the APIs and programmatic interfaces to the different KC7 search platforms.Provide documentation for Globus Search, DERIVA-PY, and DERVIA APIProvide documentation for GDC Search, OSDF Query Language (OQL), and OQL APIProvide documentation for Seven Bridges Data Browser, Seven Bridges Datasets API, and Repositive Search APIProvide documentation for ElasticSearch, Blazegraph SPARQL, and ProvStore APIs.document and provide access to the APIs
44
8Interoperate with other KCs, full stack, data providers, and cloud providersInteroperate with multiple full stacks, KCs, and cloud providers.
45
8.1Make search results available to the DCPPC in the Exchange Format.Provide export and download of Globus Search and DERIVA query results in agreed upon exhange format.Provide DataMed search results in common formatProvide export and download of search results using UI or APIProvide export and download of search results using UI or API
46
8.2Interoperate with KC2 MVPs and Full Stack MVPs.Indexed objects in Globus Search and DERIVA will be associated with identifiers as established by KC2. Results from services (e.g., analysis results, query results) will be indexed and made availabe for search through Globus Search and DERIVA.Integrate identifiers created by KC2 in Indexing and searchIntegrate GUIDs created by KC2 in search and indexingIntegrate GUIDs created by KC2 in search and indexingIntegrate GUIDs identified by KC2 in search and indexingintegrate into Calcium full stack and support GUIDs
47
8.3Work with KC6 to protect query results for controlled data, e.g., count query protection framework.Leverage Globus Auth to provide fine grained data access that can be applied at the level of metadata fields.Work with KC6 to implement count query protection for controlled dataUse authorization metatags and Department of Veterans Affairs Genhub Honest Broker System to protect controlled data during search.Work with KC6 to protect query results for controlled data.Support protection of controlled access data
48
8.4Demonstrate index and search capabilities on two or more cloud providers.Demonstrate Globus Search capabilities on data hosted by AWS and Google.Deploy DataMed on cloud Demonstrate search across cloud providersDemonstrate search across cloud providersDemonstrate on mulltiple clouds.demonstrate working on 2 CSP
49
9Extend and enhance common resources, search engine, and query portal
50
The following are examples enhancements individual KC7 teams may wish to include in their MVP.
51
9.1Extend Crosscut Metadata Model.
52
9.2Enhance Stage 1 Metadata Instance.Deploy deep indexing tools to enable indexing and search over schemaless metadata and unstructured files from data providers to augment the structured Stage 1 metadata.Link addition information such as publications and grants if available Index non-structured data tags for querying
53
9.3Demonstrate provenance indexing and search.Incorporate PROV metadata concepts into the metadata model.
54
9.4Expand query capabilities.Query expansion using DataMed NLP and Terminology servicesExpand query capabilities to store query and use restful API to execute queries
55
9.5Incorporate user feedback.Get feedback through online feedback/question system.tools to enable users to discuss datasets, annotate datasets, and add metadata
56
9.6Publish APIs for the mapping/ingestion process.Seven Bridges Datasets API and Repositive Search APIIdentify subset of overall APIs relevant to Helium stack.
57
10Engage the community
58
10.1Provide user manual and help pages for query portal.Provide user manuals and help pages for DERIVA UI.Provide manuals and FAQ onlineProvide documentation and helpdesk via online forms.Provide documentation explaining all search capabilities on the data commonsProvide manuals and FAQ.provide user manual
59
10.2Provide API documentation.Provide Globus Search and DERIVA APIs documentation.Provide DataMed API documentation Provide OSDF API documentationProvide API documentationProvide API documentation.provide API documentation
60
10.3Perform training and user feedback sessions.Provide training and support to DCPPC members who wish to use the Globus Search API and DERIVA. Collect user feedback.Provide training for DataMed API
Collect user feedback through survey, Github issue tracking etc.
Provide training through webinars and videos. Get feedback through online feedback forms.User training for search will be provided. Perform user experiance feedback interviews. Send out surveys and questionaires.
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Loading...