| A | B | C | D | E | F | G | ||
|---|---|---|---|---|---|---|---|---|
1 | InvenioRDM field | # values | Procedure | CodeMeta fields | CFF fields | GitHub release JSON fields | GitHub repo JSON fields | |
2 | additional_descriptions | Multiple | Add separate items as follows: • If the CodeMeta releaseNotes is set and it’s not a URL and we didn’t use it as the value of the main description, add it with the InvenioRDM CV value “other”. • If the CodeMeta description is set and we didn’t use it as the value of the main description, add it with the InvenioRDM CV value “other”. • If the CFF abstract is set and we didn’t use it as the value of the main description, add it with the InvenioRDM CV value “other”. • If the GitHub repo description is set and we didn’t use it as the value of the main description, add it with the InvenioRDM CV value “other”. • If the CodeMeta readme is set and it’s not a URL, add it with the InvenioRDM CV value “technical-info”. (If the value is a URL, create a string of the form “Additional information is available at {URL}” and add that instead.) Deduplicate the resulting list of descriptions to avoid duplicate values. | description readme releaseNotes | abstract | (None) | description | |
3 | additional_titles | Multiple | Add separate items as follows: • If the CodeMeta name is set, add it with InvenioRDM CV type “alternate-title”. • If the CFF title is set, add it with InvenioRDM CV type “alternate-title”. Deduplicate the resulting list of descriptions to avoid duplicate values. | name | title | (Not used here; see title below) | (Not used here; see title below) | |
4 | contributors | Multiple | Add separate items as follows: • If the CFF contact is set, add the (single) identity with an InvenioRDM role CV value of “contactperson”. • If the CodeMeta maintainer is set, add each identity in the list with an InvenioRDM role CV value of “other”. • If the CodeMeta sponsor is set, add each identity in the list with an InvenioRDM role CV value of “sponsor”. • If the CodeMeta producer is set, add each identity in the list with an InvenioRDM role CV value of “producer”. • If the CodeMeta editor is set, add each identity in the list with an InvenioRDM role CV value of “editor”. • If the CodeMeta copyrightHolder is set, add each identity in the list with an InvenioRDM role CV value of “rightsholder”. • If the CodeMeta provider is set, add each identity in the list with an InvenioRDM role CV value of “other”. • If the CodeMeta contributor is set, add the identities with role "other"; else, if CodeMeta contributor is not set, use the GitHub repo contributors field to create a list of contributors, using the GitHub API to look up people’s names, and add them with an InvenioRDM role CV value of “other”. Remove identities that have a role of “other” and are also listed in creators. | sponsor producer editor copyrightHolder maintainer provider contributor | contact contributors | (None) | contributors | |
5 | creators | Multiple | Add separate items for each identity in the list of values from CodeMeta author or CFF author (but not both) if any are present; else, use the (single) GitHub release author if present; else, use the (single) GitHub repo owner. The method uses ORCID to look up names if only ORCID ID’s are given, as well as multiple NLP methods to split names into given/family name parts if names are given as single strings. | author | author | author | owner | |
6 | dates | Multiple | Add separate items as follows: • An item with InvenioRDM date CV type “created” using the value of CodeMeta dateCreated (if set) or the GitHub repo created_at. • An item with InvenioRDM date CV type “updated” using the value of CodeMeta dateModified (if set) or the GitHub repo updated_at. • An item with InvenioRDM date CV type “available” using the value of the GitHub release published_at. • If CodeMeta copyrightYear is set, an item with InvenioRDM date CV type “copyrighted” using the value of CodeMeta copyrightYear. | dateCreated dateModified copyrightYear | (None) | published_at | created_at updated_at | |
7 | description | One | If the GitHub release body is not empty, use that; else, if the CodeMeta releaseNotes is not empty and not a URL, use that; else, try CFF description, CFF abstract, and the GitHub repo’s description field, in that order. | releaseNotes | description abstract | body | description | |
8 | formats | Multiple | If the GitHub release has a value for tarball_url, add “application/x-tar-gz”. If the GitHub release has a value for zipball_url, add “application/zip”. If there are values in the GitHub release assets list, infer additional MIME types based on file extensions. | (fileFormat – not used) | (None) | If tarball_url set ⟹ tgz If zipball_url set ⟹ zip Values in assets may imply additional types. | (None) | |
9 | funding | Multiple | Use CodeMeta funding and funder values, intelligently constructing InvenioRDM funding objects with names of funders (looking up ROR identifiers in ROR.org if necessary). | funding funder | (None) | (None) | (None) | |
10 | identifiers | Multiple | For every item in CodeMeta identifier and CFF identifiers, detect recognizable identifiers of type ARXIV, DOI, GND, ISBN, ISNI, ORCID, PMCID, PMID, ROR, and SWH, and add InvenioRDM objects with scheme based on InvenioRDM identifier-types CV terms. | identifier | identifiers | (None) | (None) | |
11 | languages | Multiple | Hardwired to the value representing “English”. | (None) | (None) | (None) | (None) | |
12 | locations | Multiple | Hardwired to an empty list. | (None) | (None) | (None) | (None) | |
13 | publication_date | One | Use CodeMeta datePublished, CFF date-released, or the GitHub release published_at, tried in that order. | datePublished | date-released | published_at | (None) | |
14 | publisher | One | Set to the name of the InvenioRDM server | (Not used) | (Not used) | (None) | (None) | |
15 | references | Multiple | Look at each item in CodeMeta referencePublication and CFF preferred-citation and references and collect identifiers of type DOI, ARXIV, ISBN, PMCID, and PMID. Use a combination of Crossref and Python’s isbnlib to get the corresponding reference metadata, then generate plain-text references in APA format, and finally add each item to the InvenioRDM references field. | referencePublication | preferred-citation references | (None) | (None) | |
16 | related_identifiers | Multiple | Add separate items as follows: • The GitHub release html_url field value with InvenioRDM relation CV term “isidenticalto” and scheme “url” • The value of one of the fields CodeMeta codeRepository, CFF repository-code, or the GitHub repo html_url (whichever has a value first) with InvenioRDM relation CV term “isderivedfrom” and scheme “url” • If the CodeMeta releaseNotes is a URL, add it with the invenioRDM relation CV term “isdescribedby”. • The value of one of the fields CodeMeta url, CFF url, or the GitHub repo homepage field (whichever has a value first) with InvenioRDM relation CV term “isdescribedby” and scheme “url” • The value of CodeMeta sameAs with InvenioRDM relation CV term “isversionof” and scheme “url” • The value of Codemeta downloadUrl or CFF repository-artifact (whichever has a value first) with InvenioRDM relation CV term "isvariantformof" and scheme "url" • The value of Codemeta installUrl with InvenioRDM relation CV term "isvariantformof" and scheme "url" • If CodeMeta softwareHelp is set, or if the GitHub repo has an associated GitHub Pages URL, add one of them with InvenioRDM relation CV term “isdocumentedby” and scheme “url” • If the CodeMeta issueTracker is set, add it with the invenioRDM relation CV term “issupplementedby”; else if the GitHub repo issues_url is set, add it instead. • The value(s) of CodeMeta relatedLink with InvenioRDM relation CV term “references” and scheme “url” • For each value in the CodeMeta referencePublication and CFF preferred-citation and references that has not already been added as a related identifier, add the identifier with InvenioRDM relation CV term “isreferencedby” and scheme according to the identifier type | codeRepository downloadUrl installUrl issueTracker referencePublication relatedLink releaseNotes sameAssoftwareHelp url | preferred-citation references repository-artifact repository-code url | html_url | html_url homepage has_pages issues_url | |
17 | resource_type | One | If the CFF field type is set to “dataset”, use InvenioRDM CV value “dataset”, otherwise in all other cases use “software”. | (None) | type | (None) | (None) | |
18 | rights | Multiple | Look for CodeMeta license, CFF license, and CFF license-url in that order; if none are available, look for GitHub repo license field value; if not set, look in the GitHub repository’s files for a file named “LICENSE”, “License”, “COPYING”, or similar. If the info found includes a name or a URL, match it against known SPDX licenses and use the identifier (e.g. "bsd-1-clause") as the value of the rights object's "id" field, with the title of the license as the "title" value and the URL of the license as the "link" value. If only a license file is found in the repo, create a value of the form {"title": {"en": "License"}, "link": URL}. | license | license license-url | (None) | license | |
19 | sizes | Multiple | Set to the sizes of the file(s) uploaded to the InvenioRDM server. Value is a list of strings, with each value given in the same order as the values of the formats field. | (fileSize – not used) | (None) | tarball_url zipball_url assets | (None) | |
20 | subjects | Multiple | Create a union of all terms found in the repo topics field, CodeMeta keywords, CFF keywords, CodeMeta programmingLanguage, and the GitHub repo languages_url. | keywords programmingLanguage | keywords | (None) | topics languages_url | |
21 | title | One | Construct a string of the form “title_part – version_part”, using an en-dash instead of a colon to separate the parts in order to avoid accidentally introducing two colons into the string. • For title_part, use the CodeMeta name; if that’s not set, use the CFF title; and if that’s not set, use the GitHub repository full_name. • For version_part, use the GitHub release name, or if that’s not set, the GitHub release tag_name. | name | title | name or tag_name (if name is empty) | full_name | |
22 | version | One | Use the GitHub release tag_name, first removing any leading text of “v” or “version” if it appears as part of the tag name. | (Not used) | (Not used) | tag_name | (None) |