Crossref schema update (draft for public comment)
The feedback period is over. If you do have additional feedback to share, please contact Patricia (pfeeney@crossref.org). We’ll be gathering feedback through January 15, 2020. |
Our metadata input schema was originally created to capture basic bibliographic information and facilitate matching DOIs to citations. Over the past 20 years the bibliographic metadata we collect has deepened, and we’ve expanded our schema to include funding information, license, updates, relations, and other metadata that can be used to find, cite, link and assess scholarly content. Updates have been made with some input from our membership, but going forward we will be requesting public feedback for all new versions.
This document contains proposed changes to Crossref’s metadata input schema, and is our first fully public request for feedback for our metadata input schema. The proposed changes are intended to address some long-standing issues with the schema (weak affiliation support) as well as add support for newer initiatives (CRediT, data citation). These changes don’t address all of the requests we’ve had, and we’re saving some big updates for when we’re able to do them well (Open Access indicators, anyone?). We’re looking forward to your feedback! Please leave feedback, ask questions, and make suggestions in this document, or if you prefer send feedback via email to feedback@crossref.org.
Backwards compatibility
This proposed schema breaks backwards compatibility within the contributors section (roles and affiliations specifically), but otherwise the changes are to collect additional metadata only. I’m presenting this as a single schema update but that may change as we move closer to implementation.
Proposed changes
1. Identifiers in our metadata
Proposed changes to support data citation:
Proposed @cited_item_type values
8. Additional smallish updates
9. Other updates to consider but will probably have to wait for later
We are proposing to expand support for external identifiers - read this for details. Note that this change currently impacts only contributors and organizations (publisher names, for example).
The ‘contributors’ section of our metadata deposit schema currently supports very basic contributor metadata with the intention of capturing basic bibliographic information and matching DOIs to citations. In line with our goals of enabling machine-readable metadata that both describes and connects scholarly content, we are revisiting how we handle contributors overall.
To fully and elegantly support affiliation identifiers and multiple author roles we need to break backwards compatibility. It makes sense to take this opportunity to address other outstanding contributor-related issues as well. This change allows us to align more with JATS contributor recommendations, making it easy for the many JATS-supporting publishers to provide full contributor metadata.
The goals of the next contributors update are to:
<contributors>
<person_name sequence="first" contributor_role="chair">
<given_name>Minerva</given_name>
<surname>Housecat</surname>
<affiliation>Crossref University</affiliation>
<ORCID authenticated="true">https://orcid.org/0000-0002-4011-3590</ORCID>
</person_name>
<organization sequence=”first” contributor_role=”author”>Crossref</organization>
</contributors>
We currently include an alt-name section to capture alternate author names, this option was never promoted, isn’t indexed or included in our JSON output, and needs some attention. I propose we replace this with a repeating string-based alternate_name element.
A group (or corporate) author is currently collected as a string in the organization element. As we are adding organization identifier support, the proposal is to:
Question: should we add affiliations to collab? We don’t currently collect affiliations for group authors, but presumably there may be some?
Example:
<role role-type=”conceptualization”/>
Current roles:
CRediT Roles to add:
The CRediT taxonomy terms contain spaces and capitalization, for our implementation we’ll replace spaces with underscores to make the outputs more machine-friendly and consistent with the values we use elsewhere.
Also adding:
Final list:
Note that there is some overlap between the existing Crossref roles and the (‘author’ = ‘Writing - original draft’ and ‘editor’ = ‘Writing - reviewing & editing’) but we’ll keep them to support members who do not use CRediT
Affiliations are now just a single repeatable tag, affiliation, intended to contain an org. name and maybe location. affiliation will be replaced with the affiliations container tag, :
Example:
<affiliations>
<institution>
<institution_id institution_id_type="ror">https://ror.org/02twcfp32</institution_id>
<institution_id institution_id_type="isni">0000000405062673</institution_id>
<institution_id institution_id_type=”wikidata”>Q5188229</institution_id>
<institution_name>Crossref</institution_name>
<institution_acronym>CR</institution_acronym>
<institution_place country=”us”>Lynnfield, MA</institution_place>
<institution_department>Feline Outreach</institution_department>
</institution>
<institution>
<institution_name>University of Somewhere Awesome</institution_name>
<institution_acronym>USA</institution_acronym>
<institution_id institution_id_type="ror">https://ror.org/02twcfxyz</institution_id>
<institution_id institution_id_type="isni">0000000401234567</institution_id>
<institution_id institution_id_type=”wikidata”>Q11111111</institution_id>
<institution_place country=”ca”>Winnepeg</institution_place>
<institution_department>Feline Research</institution_department>
</institution>
</affiliations>
new or changed tagging in green; tags that break backwards-compatibility in red
<contributors>
<person_name sequence=”first”>
<given_name>Minerva</given_name>
<family_name>Housecat</family_name>
<role role-type=”conceptualization”/>
<role role-type=”author”/>
<affiliations>
<institution>
<institution_id institution_id_type="ror">https://ror.org/02twcfp32</institution_id>
<institution_id institution_id_type="isni">0000000405062673</institution_id>
<institution_id institution_id_type=”wikidata”>Q5188229</institution_id>
<institution_name>Crossref</institution_name>
<institution_acronym>CR</institution_acronym>
<institution_place country=”us”>Lynnfield, MA</institution_place>
<institution_department>Feline Outreach</institution_department>
</institution>
<institution>
<institution_name>University of Somewhere Awesome</institution_name>
<institution_acronym>USA</institution_acronym>
<institution_id institution_id_type="ror">https://ror.org/02twcfxyz</institution_id>
<institution_id institution_id_type="isni">0000000401234567</institution_id>
<institution_id institution_id_type=”wikidata”>Q11111111</institution_id>
<institution_place country=”ca”>Winnepeg</institution_place>
<institution_department>Feline Research</institution_department>
</institution>
</affiliations>
<contrib_id contrib_id_type=”isni”>0000000121032683</contrib_id>
<contrib_id contrib_id_type=”orcid” authenticated=”true”>https://orcid.org/0000-0002-4011-3590</contrib_id>
<alternate_name name_style=”western” xml:lang=”en”>Minnie H</alternate_name>
<alternate_name name_style=”eastern“ xml:lang=”jp”>ミネルバハウスキャット
</alternate_name>
</person_name>
<collab sequence=”additional”>
<collab_name>Crossref</collab_name>
<role role-type=”data_curation”/>
<contrib_id contrib_id_type="ror">https://ror.org/02twcfp32</contrib_id>
<contrib_id contrib_id_type="isni">0000000405062673</contrib_id>
<contrib_id contrib_id_type="wikidata”>Q5188229</contrib_id>
</collab>
</contributors>
Individual citations can be submitted as a string of text, as an unformatted_citation or as a set of tags containing basic citation metadata. These tags support journals and books fairly well, but do not support other types of content, particularly data citations. We’ll be expanding our support for tagged metadata in citations. This will include support for identifying a publication type for each citation. This means a data citation will clearly be a data citation, and a journal article will clearly be a journal article.
I suggest we loosely follow the jats4r recommendations for capturing data citations (https://jats4r.org/data-citations). Our citation markup differs enough that to do so exactly would require a greater overhaul, but we can adapt some of the concepts if not the exact tags.
Current elements:
Of the above, only author, doi, and cYear neatly apply to data citations. This is not enough to identify data citations as such without a DataCite DOI. It’s also a challenge to clearly identify how to mark up citations of software, videos and other media, blogs, and other content.
The set of elements / attributes appropriate for data citations will be:
* new elements
Note that all elements for citations are optional.
new or changed tagging in green; tags that break backwards-compatibility in red
a data citation:
<citation cited_item_type=”data” key=”ref4”>
<author>Morinha F</author>
<cYear>2017</cYear>
<item_title>Extreme genetic structure in a social bird species despite high dispersal capacity</data_title>
<institution>Dryad Digital Repository</institution>
<identifier type=”uri”>http://www.example.org/boohooidonthaveadoi</identifier>
<doi>10.3201/nowihaveadoi</doi>
</citation>
a journal citation:
<citation cited_item_type=”journal_article” key="ref5">
<journal_title>Information Technology and Libraries</journal_title>
<author>Park</author>
<volume>29</volume>
<issue>3</issue>
<first_page>104</first_page>
<cYear>2010</cYear>
<doi>10.6017/ital.v29i3.3136</doi>
<article_title>Metadata creation practices in digital repositories and collections: Schemata, selection criteria, and interoperability</article_title>
</citation>
We support a number of book types that can be applied at the title and chapter (etc.) level. The types need some refining - while we don’t want to be overly granular, we want to capture
Current book types:
Book content-item types (typically chapters):
A more complete revamping of books metadata is forthcoming, for this update I plan to eliminate the ‘book-track’ option and provide best practices for the other types.
We currently do not collect publisher metadata for anything other than books but there is a need to distinguish the publisher of a registered item from the prefix and member info. We can do this by adding the existing publisher element to all content types and adding identifier support and country code. We can also make publisher_name repeatable and add an @xml:lang attribute.
Example:
<publisher>
<publisher_name xml:lang=”en”>Pumpkin Spice League of Hatred</publisher_name>
<publisher_name xml:lang=”jp”>パンプキンスパイスリーグオブ憎しみ</publisher_name>
<publisher_place country=”us”>Burlington, VT</publisher_place>
</publisher>
Ideally I’d like this update to contain the Conference ID updates as well, this will depend on timing as we don’t want one change to hold up the other.
We have had requests to expand the dates we collect and would like feedback on adding the following types of dates:
We want to limit the scope of this update to allow us to roll out some much-needed changes, but are aware that more metadata is always needed. Things that we want to change or add include: