ABCDEFGHIJKLMNOPQRSTUV
1
guage
2
IDDescription of problemExplanationExampleWhat is in the data nowWhat we would like to seeCan be quantified?Severity (pre-2022)RemarksDiscovery Scenario Affected (Impact)
3
1. Duplicate / Redundant information (within a field, across fields and across a collection)
4
P1Systematic use of the same titleWithin the dataset/collection multiple records use the same titleExample 1
Example 2
Example 3
<dc:title>OLJEMÅLNING</dc:title>
<dc:title>OLJEMÅLNING</dc:title>
<dc:title>OLJEMÅLNING - [X]</dc:title>
<dc:title>OLJEMÅLNING - [Y]</dc:title>
yeswarningIf the data is not available for completely unique titles, consider appending another value to tell something unique about the object: for example in the 'Rijksmonument' append with the location to get for example: "Rijksmonument Gelderland"Basic Retrieval

Lack of differentiation in search; title uninformative; negative for SEO
5
P2Equal title and description fieldsThe title is a repeat of the exact information in the description, or the other way aroundExample 1
Example 2
<dc:title>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:title>
<dc:description>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:description>
<dc:title>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:title>
[no mapping of "Doll, [...]" to dc:description]
yes (in SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:title; sh:equals dc:description ] .)warningIf the information within the properties is identical, there is no need to duplicate; either dc:title or dc:description is mandatory, so we rather have the information somewhere as specific as possible.Basic Retrieval

Repetition of field values does not increase searchability and it hampers visualization

Distorts search weightings
6
P3Near-identical descriptions and title fieldsThe description is nearly the same as the title, with maybe some additional information that comes from other properties[EXAMPLE MISSING]<dc:title>repeated text</dc:title>
<dc:creator>name of creator</dc:creator>
<dc:description>repeated text + name of creator</dc:description>
<dc:title>repeated text</dc:title>
<dc:creator>name of creator</dc:creator>
[no mapping of "repeated text + name of creator" to dc:description]
not easilywarningIf the information within the properties is identical, there is no need to duplicate; either dc:title or dc:description is mandatory. Concatening with another property that is already present is superfluous.Basic Retrieval

Distorts search weightings; distorts completeness measurement

Repetition of field values does not increase searchability and it hampers visualization
7
P32Duplicate metadata statements The same property with the same value is repeated twiceExample 1dcterms:spatial "London" is repeated twice. dcterms:spatial "urn:rijksmuseum:thesaurus:RM0001.THESAU.4157" is repeated twice. urn:rijksmuseum:thesaurus:RM0001.THESAU.24458 is repeated twice in dc:subject. No duplicationyesNote: this duplication happens in the provider metadata not between the provider metadata and the Europeana enrichment.
The problem is only partially handled by Metis normalisation (dc:title, dcterms:alternative, dc:subject, and dc:identifier).
It can be easily be fixed with normalization at ingestion time, or during solr ingestion (cf https://europeana.atlassian.net/browse/SEAR-93) .

Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.hmzmrda9pa40can
Affects user experience for display and search behaviour.
8
P33Duplicate objects within a datasetDatasets contain "repeated" objects with different identifiers (and sometimes metadata) but same image.Example 1a
Example 2a (same URL in edm:isShownBy, metadata is different)
yes (in principle)warningCf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.4ymd6yez22giBasic retrieval
Illegibility, distorts search.
9
2. Irrelevant information
10
P5Unrecognizable titlesNo descriptive information is given in the titles; only identifiers and shelfmarks[EXAMPLE MISSING] (here we have an example of shelfmark but in dc:description)<dc:title>NLD-820630-AMSTERDAM</dc:title>
(in example with dc:description: "Call Number - 0028619")
[no mapping to dc:title]<dc:title></dc:title>
<dc:identifier>NLD-820630-AMSTERDAM</dc:identifier
Not easily. E.g. "London-1998" could be an id but it could also be a title (a painting of London in 1998). Uniqueness tests could help to recognize the titles that are actually identifiers: because they're identifiers, they're likely to be unique within our datasets.warningIdentifiers should be put in dc:identifier. Shelfmarks belong in dc:description with clarification that these are shelfmarks. dc:title is not mandatory if dc:description is present
Basic Retrieval

Bypasses completeness metrics; exposes internal messages to external users
11
P6Non-meaningful titleA standard value is put in when there is no title attached to the record: "no suitable title found", "unknown title", etc. ...in many languagesExample 1
Example 2
Example 3
Example 4
Example 5
<dc:title>No title</dc:title>[no mapping to dc:title]Yes, if we look for specific values in some catalogue of unwanted keywords, like "no title". However, there are object (e.g. paintings) that really don't have a title. And it wouldn't be easy to get all possible variants (e.g. across languages) of unwanted keywords.warningdc:title is only mandatory when dc:description is not present. If both are unavailable or not meaningul, the record may not be findable anyway and not suitable for publishing on Europeana.Basic Retrieval

Bypasses completeness metrics ("empty" being a non-empty value); exposes internal messages to external users; negative for click-through rates in-Europeana and in search engines

Non-meaningful titles do not help searchability of an item
12
3. Missing or Incomplete information
13
P7Missing description fieldsA description is available on the website of the provider, but has not been mapped to EDMExample 1 [no mapping of "information about the object" to dc:description]<dc:description>information about the object</dc:description>Not easily. For big datasets with no description we could check their websiteswarningdc:description is not mandatory if dc:title is present. However, it is a pity to not exploit existing descriptions, if it is possible.Basic Retrieval: record unlikely to be retrieved; record uninformative

If a dc:description is available, there is more descriptive information available for the user and heightens the findability of the object
14
P8Missing lang tagThe language of metadata is not specified in an xml:lang tag, when data is monolingual per property

NB: This is for specific fields, since not all values are language specific.
Example 1<dc:subject>oil on canvas</dc:subject>
<dc:subject>l'huile sur toile</dc:subject>
<dc:subject xml:lang="en">oil on canvas</dc:subject>
<dc:subject xml:lang="fr">l'huile sur toile</dc:subject>
Partly. Circa 20m records have less than 25% lang attributes (metadata tier 0)warningThe Europeana R&D team are working on experiment to detect language of metadata.Basic Retrieval; Cross-language recall; Improved language facets

Language-tagging data greatly improves enrichment possibilities in the data, as well as enable Europeana to give the user more data which they can understand in their own language

Not language-tagging data in the Europeana context would also mean lower metadata tier
15
P9Very short description fieldNot sufficient information is provided in the description of the provided CHO.Example 1<dc:description>China</dc:description>Easy, though one needs an agreement on what is "very short". Also, dc:description is often used as the recipient for any info that does not fit into any other field) In SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:description; sh:minLength 50 ] .warningImplementation could measure confidence (description of one letter is certainly not enough; one word is not enough 99% of the time; three words 95% of the time; etc).Basic Retrieval: record unlikely to be retrieved; record uninformative
16
P10Empty literalsA property is mapped, but there is no value in the property; just an empty space or no data at all[EXAMPLE MISSING] (not easy to find as they are removed from search)<dc:subject></dc:subject>[no mapping to dc:subject]yes (in SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:subject; sh:hasValue "" ] .)warningThese empty values can interfere with the mapping process and invalidate records. Empty literals are now removed during normalisation so they are not indexed anymore.Basic retrieval: diminished chance of retrieval; record uninformative; potentially breaks completeness measure

If these empty values were not removed, they would skew the searchability of data
17
P34(seemingly) empty fieldA property is mapped, but the value is not a relevant valueExample 1
Example 2
Example 3 (in dcterms:spatial)
Example 4 (in dcterms:spatial)
Example 5 (in dc:subject & dc:type)
<dc:title>???</dc:title>[no mapping to dc:title]yesCf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
18
P42Lack of context to geotagging/location infoIn some cases it is not clear if the provided location information is the location of the object or the location depicted in the object or where it was made, etc.[Example missing]Hard to detect.
Method: Semantic Enrichment, with NERD + Classification or reasoning
informativeShould we try to discourage providers to use dcterms:spatial in favour of dc:subject and edm:currentLocation when these fields are more appropriate (which is of course not always the case)?
For now we plan to act on this problem by updating the mapping guidelines. cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
Basic retrieval; spatial search
19
4. Non-optimal use of fields
20
P12Extremely long titlesValues that are not the actual titles are given in the dc:title, while they would belong in dc:descriptionExample 1
Example 2
<dc:title>[transcription of complete poem]</dc:title><dc:description>[transcription of complete poem]</dc:description>Easy, though one needs an agreement on what is "extremely long". warningLonger values can be mapped to dc:description. Also note that when there is a subtitle for an object, this can be given in dcterms:alternativeBasic Retrieval
Distorts search weightings; limits legibility to user

Titles should be clear and concise to match usual practices of (web page) display. They also help searchability by lowering noise.
21
P13edm:type with same content as dc:typeInstead of using dc:type as a specification like 'poetry' for edm:type=TEXT or 'painting' for edm:type=IMAGE, the value dc:type repeats the information of edm:typeExample 1dc:type - "Image" and edm:type "IMAGE" OR
dc:type - "Text" and edm:type "TEXT"
<edm:type>TEXT</edm:type>
<dc:type>Poetry</dc:type>
OR
<edm:type>IMAGE</edm:type>
<dc:type>Photography</dc:type>
yes (queries: text, image, sound, video, 3D)warningCf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.vb938dqwyl3j

Make providers stop using image or text as a value for dc:type seems a bit exagerated, especially when it is in different languages than English. DCMI says the scope of the dc:type element refers to various types (movie, sound, book, collection) and also to genre. The point is maybe to be informative and help providers to be more specific rather than pretend to fix it.
Basic Retrieval
Slight distortion of search weightings

The more specific the information in dc:type, the richer experience for the user (for finding and understanding the item)
22
P14Swapped thumbnail and full-size imagea thumbnail is given in edm:isShownBy, and the full sized object is given in edm:objectExample 1<edm:object rdf:resource="http://bigimage-url.jpg"/>
<edm:isShownBy rdf:resource="http://thumbnail-url.jpg/>
<edm:object rdf:resource="http://thumbnail-url.jpg"/>
<edm:isShownBy rdf:resource="http://bigimage-url.jpg"/>
This can be checked by comparing resolutions (in the technical metadata) but we can only compare the thumbnail with the biggest resolution image.warningImpact on accessing the item:
- for images, this will cause a beautiful thumbnail to be created, but when the user clicks to see it in full screen (and not in Europeana thumbnail mode) it sees a smaller thumbnail.
- for other file types, this will result in unviewable/playable content

Lower content tier. Slow loading? Poor image display?
|
23
P35Unfit edm:isShownBy in edm:objectedm:isShownBy has been filled in the edm:object, while it is not an image (for example a PDF or audio file) edm:object MUST be an imageExample 1<edm:object rdf:resource="http://hdl.handle.net/11088/de-bo133:doc:140628"/> (PDF)yesCf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.hmzmrda9pa40
24
P36Generic property is used while there is a more specific appropriate oneA property is used but there is a more appropriate one (e.g. edm:hasMet instead of dcterms:spatial)Example 1<edm:hasMet>geo:16.067,108.233</edm:hasMet><dcterms:spatial>geo:16.067,108.233</dcterms:spatial>yes, partially (for example checking which items have dc:date while they may have dcterms:created or dcterms:issued instead)Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
25
P37There is confusion between genre and typeThe DCMI specification for the scope of the type element refers to various types (movie, sound, book, collection) and also to genre. But in Europeana dc:type is generally used to record the digital type of the CHO, and the vocabularies that cover genres (of music, architecture, paintings) are often linked to subject. There's a need to review how the elements dc:type and dc:subject are actually used in Europeana and whether the semantics are clear. Then to make a recommendation on best practice, which explicitly clarifies where genre should be recorded[EXAMPLE MISSING]not easy. Maybe by use semantic enrichment with list of genres and typeswarningThe mapping guidelines have been updated earlier this year so that they're no longer confusing. But maybe they can be enhanced. https://europeana.atlassian.net/wiki/spaces/EF/pages/2106294284/edm+ProvidedCHO#dc:type

Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
26
5. Wrong data
27
P18Impossible dates e.g. spring 20011Due to automatization/wrong values, dates that are obviously wrong are provided. This information compromises the dates we have in Europeana and makes it impossible to reliably search by yearExample 1
Example 2
Example 3
Example 4
<dc:date>3500</dc:date>
<dc:date>31 June 1954</dc:date>
<dcterms:created>30.02.1902 (Herstellung)</dcterms:created>
<dc:date>1962-11-31</dc:date>
Earlier examples include <dc:date>spring 20011</dc:date>
<dcterms:published>-44050</dcterms:published
[no mapping]Hard. There are so many patterns so it is very difficult even to extract the year in order to check whether it is valid. Example queries: 1, 2warningIn these cases the values given are innacurate and should be mapped out. If it turns out they are wrongly mapped identifiers they can be remapped, or if a character confused the date this can be corrected.

We could ask EKT if they have something about it as part of their time enrichment (https://docs.google.com/presentation/d/18itsU8-KZ4kpEMJG_LEL_GLeeW3kcj1NMTTVxN-BKY8/edit#slide=id.g80c2fee03b_0_425)

Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.azn3kbomh4ek
Basic Retrieval

Prevents creation of proper date filters (this example from Europeana Fashion), where date is interpreted as the year 11
28
P19Wrong references for a controlled vocabulary. a. wrong URI (differs from authoritative version) b. not a URI When providing references to a vocabulary Europeana needs a the URI at which the resource data is machine-accessible (as LOD). Giving the literals from the vocabulary means giving less rich data to Europeana than possibleExample 1
Example 2
<dc:subject><iconclass=49G35(+52)> tools, instruments; laboratory equipment - scientific research</dc:subject>
<dc:creator rdf:resource="http://d-nb.info/gnd/138541442"/>
<dc:subject rdf:resource="http://iconclass.org/rkd/49G35(%2B52)"/>
<dc:creator rdf:resource="https://d-nb.info/gnd/138541442"/>
b can be automatically checked. a is less easy. There is a suggestion:
1) Make a list of domains and URIs in Europeana;
2) Analyze the outcomes for the 'bad' URIs per domain and group them;
3) Create rules per domain to extract @id var;
4) Build and output autorithative URIs by concatenating "autorithative_domain_URI + @id"

Checking for special characters (such as =) or numbers could be another option.
warningHas been renamed, cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45

More examples/stats:
2.541.297 records with GND links using HTTP protocol (instead of HTTPS): https://api.europeana.eu/record/search.json?query=%22http://d-nb%22*&wskey=
3.421.669 records with Geonames links using HTTP protocol (instead of HTTPS): https://api.europeana.eu/record/search.json?query=%22http://sws.geonames.org%22*&wskey=&profile=rich
708.154 records with Geonames links using www instead of sws: https://api.europeana.eu/record/search.json?query=*%22www.geonames.org%22*&wskey=&profile=rich
566.549 records with Wikidata links using HTTPS (instead of HTTP): https://api.europeana.eu/record/search.json?query=%22https://www.wikidata.org%22*&wskey=&profile=rich
662.028 records with Wikidata links using /wiki instead of /entity: https://api.europeana.eu/record/search.json?query=*%22www.wikidata.org/wiki/%22*&wskey=&profile=rich
59742 records with a BNE URI that starts with a whitespace: https://api.europeana.eu/record/search.json?query=%22%20http://datos.bne.es/resource/%22*&wskey=
61552 records referring to the vocabulary of the International Music Score Library Project using the HTTP version instead of the HTTPS official: https://api.europeana.eu/record/search.json?query=%22http://imslp.org/wiki/Category:%22*&wskey=
10019 records refer to the Finnish National Gallery using a URL that is no longer recognisable/resolvable (the Wikidata Property is https://www.wikidata.org/wiki/Property:P4177 which indicates a different URI pattern: https://api.europeana.eu/record/search.json?query=%22http://kansallisgalleria.fi/%22*&wskey=)
31 records referring to catalogue.bnf.fr instead of data.bnf.fr: https://api.europeana.eu/record/search.json?query=%22http://catalogue.bnf.fr/ark:/%22*&wskey=
Basic Retrieval

Lowers metadata tier
29
P41Use of vocabulary references that are correct but not de-referencedUse of vocabulary references (without a corresponding EDM contextual class) that are correct but not de-referenced. a. the URI correspond to a vocabulary that is de-referenceable but currently not supported by Europeana
b. the URI corresponds to a vocabulary that is not de-referenceable in absolute
Example 1a
Example 1b
Not easilywarningCf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.9evs7xm4qhvw
30
6. Normalisation
31
P20Time period in specific formatting: 3200[ac]-2250[ac]Because there is data from many providers who keep different standards, having minimal formatting in values like dates is desirable.Example 1
Example 2
Example 3
<dcterms:created>3200[ac]-2250[ac]</dcterms:created>TBD: cf best practices for datesHardwarningFor yearspans BC there is currently no best practice. This is something the DQC and Europeana need to think about! For general years BC '-3200' would be a good practice. It is also important to note that yearspans are not able for use in facets as of yet.

Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.9evs7xm4qhvw

Although we initially thought that this could be merged with 'P17 - Term not fitting against a controlled list of terms' [deprecated], we now realise that P20 is not necessarily a matter of controlled vocabulary; it could rather be a matter of best practices to represent dates. It may also overlap with ongoing efforts on date normalization
Basic Retrieval;

Normalizing dates to a consistent representation would help search and visualization
32
P38Incorrect lang tagThe provided language attribute for the metadata value is not correct (e.g. 'nl' instead of 'en')Example 1
Example 2
<dc:title xml:lang="nl">Actualités britanniques 1914-1915</dc:title> <dc:title xml:lang="en">Actualités britanniques 1914-1915</dc:title>
<dc:title xml:lang="ro">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837 = Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title>
<dc:title xml:lang="fr">Actualités britanniques 1914-1915</dc:title> (only one title)
<dc:title xml:lang="de">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837</dc:title> <dc:title xml:lang="ro">Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title>
Not easy in general: language detection should be applied. But it could be easily partially detected by checking cardinality of languages in the specific case when only one value is expected by language for a property (if a language has several labels, a warning could be sent). Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.vb938dqwyl3j
33
P15Invalid language tags (xml:lang attributes)xml:lang attributes are not valid according to ISO 639-1 or ISO 639-1 or ISO 639-3Example 1 (gr, note that it's also a case of incorrect tagging)
Example 2 (mehrspr)
Cf. column C for most frequent tags
<dc:title xml:lang="gr">Golgi's votive relief inscription - image (E6 in Voskos, 1997)</dc:title> (also a case of incorrect tagging)
<dc:title xml:lang="mehrspr">Thesaurus inscriptionum Aegyptiacarum: altaegyptische Inschriften</dc:title>
<dc:title xml:lang="gr">Golgi's votive relief inscription - image (E6 in Voskos, 1997)</dc:title>
For example 2 ("Thesaurus inscriptionum Aegyptiacarum: altaegyptische Inschriften"), mul is apparently acceptable in ISO but it is hard for us to recommend it (if the portal cannot handle it properly). Maybe the value could be in the "main" language (trying to identify a primary language being used; if text in other languages is present it could be represented using quotes): Latin title could be quoted and the text marked as German. And additionally, an alternative title could be added with the Latin text only.
Yes, by checking compliance against a controlled list of terms. Some normalization can be done (and has been done), such as mapping "Spanish" to "es", but it does not catch every case. Cf stats on (failure of) language normalization at https://rnd-2.eanadev.org/share/language-normalisation/Language_Provider_xml_lang_not_normalizable.txt In 2022, the most frequent invalid language tags were nah (114866), za (67134), bh (14457), sgs (13974), ltg (10030), cel (1693), enenen (1334), xxx (390) warningMetis normalizes many invalid tags already - but not all (its normalization focuses on normalizing valid tags).

Ongoing work at https://europeana.atlassian.net/browse/RD-111 Jena RIOT was suggested as a checking method in the past
See stats at https://rnd-2.eanadev.org/share/language-normalisation/LanguageDataReport.html

What cannot be normalized could be flagged as potentially invalid. The severity in the report could be "Error" considering that this is a case of invalidity.

Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.azn3kbomh4ek
Basic Retrieval; Cross-language recall; Improved language facets
34
7. Dependency to external resources
35
P22Swapped isShownBy and isShownAt valuesthe isShownBy and isShownAt values are mixed; this causes an error in processing files for Europeana service and blocks the user from finding the institution's websiteExample 1
Example 2

<edm:isShownAt rdf:resource="http://website.com/image.jpg"/>
<edm:isShownBy rdf:resource="http://website.com/image-information.html"/>
<edm:isShownAt rdf:resource="http://website.com/image-information.html"/>
<edm:isShownBy rdf:resource="http://website.com/image.jpg"/>
Yes if we check the edm:isShownAt for specific mime types (e.g. existence of .jpg). Example queries: jpg, jpeg, png, PDF, pdf mp3, gif, mp4...warningExcept for embedding casesLowers content tier
36
8. Serialization / format / encoding
37
P25Field should be literal but is URLFor some fields we only allow literal values, even if there are URIs or URLs available on your side.Example 1 <dc:title>http://dx.doi.org/10.1080/00905990701368738</dc:title><dc:title xml:lang="en">Crystallizing and Emancipating Identities in Post-Communist Estonia</dc:title>Yes, using e.g. IRI validation in RDF validatorsErrorDatasets with http in the title: https://www.europeana.eu/api/v2/search.json?query=proxy_dc_title:(*http*)&rows=0&start=1&facet=edm_datasetName&profile=facets&f.edm_datasetName.facet.limit=1000&wskey=tbc

[AI: 12-04-2023: some issues for contextual classes come from mappings and could be fixed via them]

Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.nbmjo5zhye16
Basic Retrieval

Results in field treated as local-file URL
38
P27Many URLs in one fieldWhen multiple URLs are given in one field, it will be read as one link, which will fail. All values should be given in separate fieldsExample 1 (in edm:isShownBy)<edm:isShownBy rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_001.tif; http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_002.tif;"/><edm:isShownBy rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_001.tif"/>
<edm:hasView rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_002.tif"/>
Yes, using e.g. IRI validation in RDF validatorsErrorEach record can have only 1 isShownBy, however any additional views of the object can be given in edm:hasView, which can be repeated.

[HS: 16-03-2023: it seems that at least for media links the problem is pretty small. Do we have other fields in mind?]

Cases of duplicate URL could be corrected
Field cannot be parsed; link cannot be resolved

While not a technical problem, such URL has a low chance of being permanent
39
P30HTML in fieldsThe use of html is not supported in literal fields of EDM. It makes literals largely unreadable for humans.

On the other hand, <br/> is needed for poetry or lyrics but the portal doesn't handle line breaks.
Example 1
Example 2
Example 3
<dc:description>Ta gjerne med borna på tur langs Rallarvegen. <br/> <p><b><span>Borna sine turreglar</span></b></p> </dc:description><dc:description>Ta gjerne med borna på tur langs Rallarvegen. Borna sine turreglar</dc:description>Yes (using pattern matching)WarningHTML is in many cases website specific, and may not belong in the source XML of data

Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45
Basic Retrieval

Can create false positives in search; display unreadable
40
P31Schematic data within fieldsDuring a mapping from one format to another, the schematic data of the original mark up is incorporated as value into EDMExample 1 (for the title in Swedish)
Example 2 (for ISBD notation)
Example 3 (for escaped HTML link)
<dc:title>{"danish"=>["Vester Sakskærsgård"]}</dc:title>
<dc:title xml:lang="ro">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837 = Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title>
<dc:publisher>&lt;a href=\u0022http://www.wydawnictwo.pk.edu.pl/\u0022 target=\u0022_blank\u0022&gt;Wydawnictwo PK&lt;/a&gt;</dc:publisher>
<dc:title xml:lang="da">Vester Sakskærsgård</dc:title>
<dc:title xml:lang="de">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837</dc:title> <dc:title xml:lang="ro">Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title>
Having only the name of the Publisher would be ok
A good way would be to count the nr of non-text characters contrast the percentage against text characters (control characters used in schemas, e.g <>=/{}[])WarningCf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45 Basic Retrieval

Distorts search metrics; relevant values are cluttered with machine-readable formating
41
P39Escape and special characters in titles and descriptions The issue is mostly relevant for translations provided from our data providersExample 1
Example 2
<dc:title>Patrick\ n\ t\ n\ t Patrick. </dc:title>
<dc:title>Exotic visitors for London_x000D_ H H</dc:title>
<dc:title>Patrick Patrick.</dc:title>
<dc:title>Exotic visitors for London H H</dc:title>
yesCf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45

Should we check more fields or only titles and descriptions?
42
P40Special characters are not represented with the right encodingSome characters (such as music notations) can be provided using the wrong encoding, for example using HTML code like &#x266d; instead of the UTF-8 encoding. This relates to the issue of having HTML in the value of metadata fields, though the motivation for the problem is quite different. Sometimes the special character is only represented with a normal letter ('b' for flat sign), losing the original information.Example 1 (for no encoding), Example 2 (for Unicode U+FFFD character �<dc:descrption>Karl Sch�nb�ck</dc:descrption>
<dc:descrption>Karl Schönböck</dc:descrption>
yesCf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100