HCLS dataset description comparison
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
$
%
123
 
 
 
 
 
 
 
 
 
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJ
1
identifierobjectivecommentexamplein notepriority levelRec RelRec TypeValidatorRDF / RDFS / OWLCiTODC Terms/ DCMI Types DCATFOAFLEXVOIDOTPAVPROVSDSKOSVANNVOAFVOAGVOIDBio2RDFBiositemapsOpenPhactsthedatahubBioDBCoreMIRIAMSADI serviceUniProt SPARQL service descriptionOMVintegbio DB catalogoriginal itemtype : itemprop (by schema.org) comment
2
typeTo specify the type (dataset)let us restrict our conversation to datasets[] rdf:type void:Datasetmustrdf:typedct:Dataset | void:Datasetxrdf:type [rdfs:Class]dct:type [dc:Dataset]rdf:type [dcat:Dataset][foaf:Project][prov:Collection]rdf:type [void:Dataset]rdf:type [ dcat:Dataset | void:Dataset ]rdf:type [bsm:Resource_Description]rdf:type [void:Dataset]rdf:type [sd:graph]
3
identifierTo provide an alphanumeric string used by the creator to identify the datasetdataset identifiermaydct:identifierxsd:stringxdct:identifier [rdfs:Literal]dct:identifier[rdfs:Literal]dct:identifier [string]dct:identifier [rdfs:Literal](not captured) NBDC ID [i.e. nbdc00061] ?? -> internal ID for the catalog entry
4
titleTo provide a short textual description of the datasetno restriction on languages usedUniProtmustdct:title@langxrdfs:label [rdfs:Literal]dct:title [rdfs:Literal]dct:title[rdfs:Literal]lexvo:labelrdfs:label [xsd:string] | dct:title [string]bsm:resource_name [ xsd:string ] bsm:resource_type [xsd:string]dct:titleXXxdct:titleDatabase name [EzCatDB]Thing : name (BioDB)
5
alternative nameTo provide an abbreviation or commonly used short label for a datasetmaydct:alternative@langxdct:alternative [rdfs:Literal]skos:preferredLabelAlternative name [A Database of Enzyme Catalytic Mechanisms]
6
descriptionTo provide a more elaborate textual description of the dataset.A comprehensive, high-quality and freely accessible resource of protein sequence and functional information.mustdct:description@langxrdfs:comment [rdfs:Literal]dct:description [rdfs:Literal]dct:description[rdfs:Literal]dct:description [string]bsm:description [ xsd:string ]dct:descriptionXxDatabase description *in preparationThing : desctiprion (BioDB)
7
date createdTo specify the date at which the dataset was created.either dct:created or dct:issued *must* be used2002shoulddct:createdxsd:dateTimexdct:created [rdfs:Literal]dct:issued[rdfs:Literal]pav:createdOn/pav:authoredOn/pav:curatedOnprov:generatedAtTimedct:created [xsd:date]bsm:release_date [xsd:string]pav:authoredOn/pav:createdOnXdct:issuedX
8
relevant datesrecommended to use prov event model to associate dates with people and activitiesmaypav:createdOn, pav:authoredOn or pav:curatedOnxsd:dateTimexX
9
creatorTo specify the agent(s) responsible for the bringing the dataset into existencewho created the specific manifestation (file(s))?UniProt consortiummustdct:creator<uri> or xsd:stringxdct:creator [dct:Agent][foaf:Agent]pav:createdBy/pav:authoredBy/pav:curatedBy/pav:contributedBy[prov:Agent]dct:creator [github-script-uri]bsm:author [xsd:string];bsm:contact_person [xsd:string]pav:authoredBy/pav:createdByX (author)dct:contributorDatabase maintenance site [National Institute of Advanced Industrial Science and Technology]
10
contributorTo specify the agent(s) that have contributed to the dataset contentEMBL-EBI, SIB, PIRmaydct:contributor | pav:createdBy, pav:authoredBy, pav:curatedBy<uri> or xsd:stringxdct:contributorpav:createdBy/pav:authoredBy/pav:curatedBy/pav:contributedBydct:contributor [uri]X
11
homepageTo specify the webpage that provides information about the dataset.http://www.uniprot.orgshouldfoaf:page<url>xdcat:landingPage[foaf:Document]; foaf:page[foaf:Document]foaf:homepagefoaf:homepage [url]bsm:URL [ xsd:string ]foaf:homepage (dcat:landingPage)XXURL [http://mbs.cbrc.jp/EzCatDB/]
12
logoTo specify the logo for the datasetThis allows application developers to display the dataset logofoaf:logo
13
keywordTo provide keywords (free text or controlled vocabulary) that characterize the dataset."protein", "sequence"maydcat:theme [skos:Concept]; dcat:keyword [rdfs:Literal]skos:Concept or xsd:stringxrdfs:seeAlso [rdfs:Resource]dct:subjectdcat:theme [skos:Concept]; dcat:keyword [rdfs:Literal]foaf:primaryTopicdct:subjectbsm:keywords [ xsd:string ]; bsm:related_areas_of_research [ bsm:ResearchArea; BRO ], dct:subjectXCategory - Target [RNA, Protein, Enzyme]|Category - Information type [Structure]Creativework : keywords (BioDB)
14
licenseTo specify a document that describes the rights and responsibilities of the user and responsible organization in relation to the datasethttp://www.uniprot.org/help/licensemustdct:license<url>xdct:license [dct:LicenseDocument]dct:license [dctype:LicenseDocument]dct:license [url]bsm:license [ xsd:string ]dct:licenseXXdct:licenseX
15
rightsTo specify a simplified set of rights and obligations associated with the dataset.maydct:rightsxsd:stringxdct:rights [dct:RightsStatement]dct:rights ["use","share","modify","use-share-modify","commercial use requires licensing","restricted-by-source-license"]bsm:resource_sharable [ "yes","no" ];dct:rights [dct:RightsStatement](not captured)XX
16
languageTo specify the languages used in dataset literalsa list of the languages that are represented in the datasetEnglishshoulddct:languagehttp://lexvo.org/id/iso639-3/{tag}xdct:language [rdfs:Resource]dct:language [rdfs:Resource]lvont:languagedct:languagedct:language(not captured)Language(s) [Japanese, English]Language : inLanguage (BioDB)use language tag as defined by rdf allowed set: http://www.lexvo.org/linkeddata/details.html ; e.g. http://lexvo.org/id/iso639-3/fra . We need to distinguish if you can access information in different languages, or if information is only available in some languages.
17
literature referenceTo provide a reference to a published work that describes or reports the dataset.http://nar.oxfordjournals.org/content/40/D1/D71maycito:citesAsAuthority<url> (doi;pubmed uri/url;isbn)xcito:citeAsAuthoritydct:bibliographicCitation [Literal]bsm:publication_identifier [ xsd:string ]xxReference(s) - PubMed ID [17039546, 15608227]|Reference(s) - Other than PubMed ID [bibliographic information]
18
vocabularyTo specify the vocabularies used to describe the data items.http://purl.uniprot.org/core/shouldvoid:vocabulary<uri>xvoid:vocabularyvoid:vocabularyvoid:vocabularysd:vocabularyX
19
subsetTo indicate that the dataset has a named subsetmayvoid:subset (RDF); dct:hasPart (for all other)<uri>xdct:hasPartsd:graphvoid:subsetvoid:subsetvoid:subsetsd:graph
20
21
22
identifiers
23
preferred prefixTo specify a short, unbroken label that is commonly used to refer to the dataset and could be used as symbol for the base URI of the datasetuniprotmayvann:preferredNamespacePrefixxsd:stringidot:namespacevann:preferredNamespacePrefixvann:preferredNamespacePrefix(not captured)
24
base URITo specify the URI that acts as a prefix for identifying objects in that namespace.http://purl.uniprot.org/uniprot/mayvoid:uriSpacexsd:stringvann:preferredNamespaceUrivoid:uriSpace; void:uriRegexPatternvoid:uriSpacevoid:uriSpaceX
25
data item identifer regex patternTo specify a regular expression to validate data item identifier syntax and/or range^([A-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9])|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])(\.\d+)?$mayxsd:stringidot:idRegexPatternx
26
data item URI pattern^http://purl.uniprot.org/uniprot/([A-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9])|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])(\.\d+)?$mayvoid:uriRegexPatternxsd:string
27
data item example identifierTo specify an example identifier in the dataset that conforms to the identifier regex patternP04637mayxsd:string
28
data item example URIhttp://purl.uniprot.org/uniprot/P04637mayvoid:exampleResource <uri>void:exampleResource
29
versioned graph URI pattern
30
item to dataset relationTo provide a relation to link datasets to data items. allows us to link each data item to a common dataset descriptionvoid:inDatasetshouldvoid:inDataset<uri>rdfs:isDefinedBy [rdfs:Resource]dct:isPartOf [ URI ]foaf:primaryTopicvoid:inDataset; void:exampleResourcevoid:inDatasetvoid:inDataset
31
32
provenance and change
33
versionTo specify an alphanumerical string that indicates the dataset versionRefSeq Release 58shouldpav:versionxsd:stringxowl:versionInfo/owl:versionIRI [rdfs:Resource]dct:hasVersion [ URI ]pav:version [literal]pav:versionbsm:version [ xsd:string ] ; development_stage [xsd:string]pav:versionXomv:currentVersion
34
date modifiedTo indicate the date at which the dataset was modified, especially in the absence of a versioining system.full datetime + timezoneJanuary 21, 2013 at 12:03pmmust notdct:modified [rdfs:Literal]dct:modified [rdfs:Literal]pav:lastUpdateOnprov:generatedAtTimedct:modified/pav:importedOn/pav:retrievedOn/pav:createdOnCreativeWork : dateModified
35
modified byTo indicate the agent that was responsible for making a modification.uriNCBImust notdct:publisher/pav:retrievedBy/pav:importedBy/pav:createdBy
36
source/derivationTo indicate the source from which the dataset was obtained or derived from.traceback to original files used Bio2RDF RefSeq dataset from NCBI RefSeq Release 57 in XML shouldprov:wasDerivedFrom, pav:retrievedFrom<uri>xdct:source [ URI ]pav:retrievedFrom/pav:importedFrom/pav:derivedFromprov:hadPrimarySource/prov:wasDerivedFromprov:wasDerivedFrom [void:Dataset]bsm-extension:curation-information[xsd:string]pav:retrievedFrom/pav:importedFrom/pav:derivedFrom/prov:wasDerivedFromX
37
latest version / subsequent / superceding versionTo indicate a subsequent or superceding version of the dataset.problematic for maintenance<RefSeq Release 58><simon to check out options><uri>
38
frequency of changeTo indicate the frequency at which a new version of the dataset is expected to be made available.every four weeksshoulddct:accrualPeriodicitydct:Frequencydct:accrualPeriodicity [dct:Frequency]dct:accrualPeriodicity [dct:Frequency]voag:frequencyOfChangedc:accrualPeriodicitybsm-extension:release-information; dct:accrualPeriodicity [dct:Frequency]voag:frequencyOfChange
39
latency of changeTo indicate the latency between the release of a dataset and its availability in a specific form or from some source processing4 hoursmay
40
created withTo indicate the tools or approaches used to generate the datasetmay overlap with dct:creator; * suggest PROVmaypav:createdWith<uri>pav:createdWithpav:createdWith
41
prior versionTo indicate a prior version of the dataset.<RefSeq Release 57>shouldpav:previousVersion<uri>owl:priorVersion [rdfs:Resource]pav:previousVersionprov:wasRevisionOfpav:previousVersion
42
43
availability
44
date issuedTo indicate the date at which the dataset is made publicly available, which may differ from the date it was created.either dct:created or dct:issued *must* be usedshould dct:issuedxsd:dateTimedct:issued [rdfs:Literal]dct:issued [rdfs:Literal]prov:generatedAtTimedct:issued [xsd:dateTime ](inferred from last modified)dct:issued
45
publisher / responsible organizationTo specify the agent(s) that is responsible for making the dataset available.person, organization, software?UniProt Consortiummustdct:publisher<uri>dct:publisher [dct:Agent]dct:publisher [foaf:Agent][foaf:Organization];[foaf:Group][prov:Organization]dct:publisher [http://bio2rdf.org]bsm:organization [ xsd:string ]dct:publisherdct:contributorDatabase maintenance site
46
formatTo specify the format of the dataset (e.g. csv, xml, rdf, etc)iana mimetypes; edam; biosharing sitexml, rdf/xml, fasta, gffmustdct:format [ iana;edam;biosharing; literal as last resort ]dct:formatdcat:mediaType [dct:MediaTypeOrExtent]; dct:format[dct:MediaTypeOrExtent]sd:resultFormatdct:formatbsm:implements [bsm:Resource_Description](not captured)XXsd:resultFormat
47
data item HTML URL templateTo provide the a template URL that when completed with the identifier for a data item, will provide an HTML description of it.http://purl.uniprot.org/uniprot/$idrdfs:seeAlso [rdfs:Resource]rdfs:seeAlsovoid:uriLookupEndpointX
48
data filesTo provide the URL to download a version of the dataset in some format.ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/shoulddcat:distribution [ a dcat:Distribution; dcat:downloadURL <uri>]rdfs:seeAlso [rdfs:Resource]dcat:distribution [ a dcat:Distribution; dcat:downloadURL <uri>]bsm-extension:service-endpoint[xsd:string]Thing : url (BioDB)
49
RDF filesTo provide the URL of an RDF-formatted version of the dataset.ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/shouldvoid:dataDump + dcat:distributionvoid:dataDumpvoid:dataDumpbsm-extension:service-endpoint[xsd:string]void:dataDumpvoid:dataDump
50
SPARQL endpointTo provide the URL of the SPARQL endpoint that the dataset is located in.http://beta.sparql.uniprot.org/mayvoid:sparqlEndpointvoid:sparqlEndpointvoid:sparqlEndpointbsm-extension:service-endpoint[xsd:string]void:sparqlEndpointXvoid:sparqlEndpoint
51
API documentation pageTo provide the URL that documents an API to access the datasetmaydcat:landingPageX
52
catalog/registry (e.g. thedatahub, identifiers.org, BioPortal, etc)To provide a URL to a catalog describing the dataset.URL of integbio (http://integbio.jp/dbcatalog/?lang=en)
53
54
statistics- reduces dataset scans; easy lookup of dataset contentwe should have code to do this.bio2rdf.org/datasets/gene [dcat:Catalog]see https://code.google.com/p/void-impl/wiki/SPARQLQueriesForStatistics
55
# triplesTo provide the number of statements in the dataset.shouldvoid:triplesxsd:integervoid:triplesvoid:triplesvoid:triplesXvoid:triples
56
# distinct entitiesTo provide the number of distinct entities in the dataset.shouldvoid:entitiesxsd:integervoid:entitiesvoid:entities
57
# distinct subjectsTo provide the number of distinct entities that appear as the subject in a statement.shouldvoid:distinctSubjectsxsd:integervoid:distinctSubjectsvoid:distinctSubjectsvoid:distinctSubjects
58
# distinct predicatesshouldvoid:propertiesxsd:integervoid:propertiesvoid:properties
59
# distinct objectsTo provide the number of distinct entities that appear as the object in a statement.shouldvoid:distinctObjectsxsd:integervoid:distinctObjectsvoid:distinctObjectsvoid:distinctObjects
60
# distinct literalsshouldvoid:classPartition [ void:class <rdfs:Literal>; void:entities ""]void:classPartition [ void:class <rdfs:Literal>; void:entities ""]
61
# and list of named graphsTo provide the number and list of distinct graphs in the datasetshouldsd:namedGraph
62
distinct predicates and their frequencyTo provide the number and list of distinct relations in the dataset.shouldvoid:propertyPartition [void:property <uri>; void:triples ""^^xsd:integer]voaf:propertyNumbervoid:propertyPartition [void:property <uri>; void:entities ""]void:propertyPartition [void:property <uri>; void:entities ""]
63
distinct classes and their frequencyTo provide the number and list of distinct types (i.e. classes) in the dataset.shouldvoid:classPartition[ void:class <uri>; void:entities ""^^xsd:integer]voaf:classNumbervoid:classPartition[ void:class <uri>; void:entities ""]void:classPartition[ void:class <uri>; void:entities ""]void:classes
64
predicate and distinct object frequencyTo provide the number and list of predicate-object pairsshould[a void:LinkSet; void:target <dataset_uri>; void:linkPredicate <uri>; void:objectsTarget[ void:class<rdfs:Class>; void:entities ""]][a void:LinkSet; void:target <dataset_uri>; void:linkPredicate <uri>; void:objectsTarget[ void:class<rdfs:Class>; void:entities ""]]
65
predicate and distinct literal frequencyTo provide the number and list of predicate-literal pairsshould[a void:LinkSet; void:target <dataset_uri>; void:linkPredicate <uri>; void:objectsTarget[ void:class<rdfs:Literal>; void:entities ""]][a void:LinkSet; void:target <dataset_uri>; void:linkPredicate <uri>; void:objectsTarget[ void:class<rdfs:Literal>; void:entities ""]]
66
distinct subject, predicate, and distinct object frequencyTo provide the number and list of unique subject, predicate, unique object tuples in the dataset.should[a void:LinkSet; void:target <dataset_uri>; void:subjectsTarget[void:class <rdfs:Class>;void:entities ""]; void:linkPredicate <uri>; void:objectsTarget [void:class <rdfs:Class>; void:entities ""]]][a void:LinkSet; void:target <dataset_uri>; void:subjectsTarget[void:class <rdfs:Class>;void:entities ""]; void:linkPredicate <uri>; void:objectsTarget [void:class <rdfs:Class>; void:entities ""]]]
67
distinct subject, predicate, and distinct literal frequencyTo provide the number and list of unique subject, predicate and unique literal tuplesshould[a void:LinkSet; void:target <dataset_uri>; void:subjectsTarget[void:class <rdfs:Class>;void:entities ""]; void:linkPredicate <uri>; void:objectsTarget[void:class<rdfs:Literal>;void:entities ""] ][a void:LinkSet; void:target <dataset_uri>; void:subjectsTarget[void:class <rdfs:Class>;void:entities ""]; void:linkPredicate <uri>; void:objectsTarget[void:class<rdfs:Literal>;void:entities ""] ]
68
distinct subject type, predicate, distinct object type frequenciesTo provide the number and list of unique subject type, predicate and unique object type tuplesshould[a void:LinkSet; void:target <dataset_uri>; void:linkPredicate <uri>; void:subjectsTarget [void:class <uri>; void:entities ""]; void:objectsTarget [ void:class <uri>; void:entities ""]][a void:LinkSet; void:target <dataset_uri>; void:linkPredicate <uri>; void:subjectsTarget [void:class <uri>; void:entities ""]; void:objectsTarget [ void:class <uri>; void:entities ""]]dv:has_type_relation_type_count [ dv:has_subject_type; dv:has_subject_count; dv_has_object_type; dv_has_object_count; dv_has_predicate]X
69
(dataset,predicate,dataset) frequenciesTo provide the number and list of relations between two datasets.should[a void:LinkSet; void:target <dataset_uri>; void:subjectsTarget :dataset1; void:objectsTarget :dataset2; void:linkPredicate <uri>; void:triples ""][a void:LinkSet; void:target <dataset_uri>; void:subjectsTarget :dataset1; void:objectsTarget :dataset2; void:linkPredicate <uri>; void:triples ""][void:LinkSet]; void:subjectsTarget; void:objectsTarget; void:linkPredicate; dul:expresses
70
71
dataset indicators
72
availabilityTo indicate whether the dataset is still available.last date/version of availabilityno longer available (as of X/version)Offer : availability
73
byte sizeTo describe the total byte size of the versioned and formatted dataset.dcat:byteSize [rdfs:Literal]void:entities
74
rate (size) of changeTo describe the rate at which the dataset changes (e.g. # of additions, deletions, merges, etc).specified time interval
75
extent of use (used in)To describe the extent to which the dataset is used.voaf:usedBy
76
coverage of languageTo describe the extent to which all literals are in different languagesCreativeWork : inLanguage (BioDB)
77
78
service descriptors
79
SPARQL 1.0-compliantTo specify whether the service is SPARQL 1.0 compliant
80
SPARQL 1.1 [partial|full] supportTo specify whether the service is partially or fully SPARQL 1.1 compliant.sd:SPARQL11Query
81
SPARQL federationTo specify whether the SPARQL endpoint allows SPARQL 1.1 query federationsd:feature [sd:basicFederation]
82
SPARQL default graphTo specify the default graph for a SPARQL endpointsd:defaultDataset
83
SADI-compliantTo specify whether the service is SADI compliant
84
sd:defaultDataset
85
service indicatorssd:feature [sd:basicFederation]
86
uptimeTo specify the uptime of a service
87
response timeTo specify the response time for a service using some parameter.
88
89
BioDBCore
90
guidelineTo specify the guideline that was followed in preparing the datasetmaydct:conformsTo<url>
91
contact emailTo specify the email for the responsible organizationattribute of organization/publisher/creator, etc
92
organismTo specify taxon for the dataset <http://purl.uniprot.org/core/organism> <http://identifiers.org/taxonomy/9606> maydwc:taxonIDXOrganism(s) covered *taxon id and taxon name (Homo Sapiens (9606), Mus musculus (10090), Selenastrum capricornutum (118073), Daphnia magna (35525), Oryzias latipes (8090), etc.)BiologicalDatabaseEntry : taxon (BioDB)
93
countryTo specify the country where the contributor is located <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan>; consider using geonamesmaydwc:scientificNameCountry/Region (Japan)Country : name
94
dwc:genus
95
funding / grant supportUniProt is mainly supported by the National Institutes of Health (NIH) grant 1U41HG006104-01. Additional support for the EBI's involvement in UniProt comes from EMBL and the NIH GO grant 2P41HG02273-07. Past funding. UniProt activities at EBI have benefited from the FP7 SLING project (2009-2012, contract number 226073) and a British Hearth Foundation grant (SP/07/007/23671). UniProt activities at SIB have benefited from activities at the SIB are additionally supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI, and by the EC grants GEN2PHEN (200754) and MICROME (222886-2). PIR's UniProt activities are also supported by the NIH grants 5R01GM080646-07, 3R01GM080646-07S1, 5G08LM010720-03, and 8P20GM103446-12, and the National Science Foundation (NSF) grant DBI-1062520.X
96
contact
97
maintainerTo specify the agent(s) responsible for the maintaining the datasetCreativeWork : provider (BioDB)
98
To specify the agent(s) to be contacted related to the dataset<uri>
99
Operational statusTo specify the operational status of the websiteActive, Inactive, Closuremaydct:availableOperational status (Active, Inactive, Closed)
100
Link to LSDB Archive, Link to MEDALS Database list, etc.. (http://medals.jp/elist/detail/46)
Loading...