| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | guage | |||||||||||||||||||||
2 | ID | Description of problem | Explanation | Example | What is in the data now | What we would like to see | Can be quantified? | Severity (pre-2022) | Remarks | Discovery Scenario Affected (Impact) | ||||||||||||
3 | 1. Duplicate / Redundant information (within a field, across fields and across a collection) | |||||||||||||||||||||
4 | P1 | Systematic use of the same title | Within the dataset/collection multiple records use the same title | Example 1 Example 2 Example 3 | <dc:title>OLJEMÅLNING</dc:title> <dc:title>OLJEMÅLNING</dc:title> | <dc:title>OLJEMÅLNING - [X]</dc:title> <dc:title>OLJEMÅLNING - [Y]</dc:title> | yes | warning | If the data is not available for completely unique titles, consider appending another value to tell something unique about the object: for example in the 'Rijksmonument' append with the location to get for example: "Rijksmonument Gelderland" | Basic Retrieval Lack of differentiation in search; title uninformative; negative for SEO | ||||||||||||
5 | P2 | Equal title and description fields | The title is a repeat of the exact information in the description, or the other way around | Example 1 Example 2 | <dc:title>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:title> <dc:description>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:description> | <dc:title>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:title> [no mapping of "Doll, [...]" to dc:description] | yes (in SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:title; sh:equals dc:description ] .) | warning | If the information within the properties is identical, there is no need to duplicate; either dc:title or dc:description is mandatory, so we rather have the information somewhere as specific as possible. | Basic Retrieval Repetition of field values does not increase searchability and it hampers visualization Distorts search weightings | ||||||||||||
6 | P3 | Near-identical descriptions and title fields | The description is nearly the same as the title, with maybe some additional information that comes from other properties | [EXAMPLE MISSING] | <dc:title>repeated text</dc:title> <dc:creator>name of creator</dc:creator> <dc:description>repeated text + name of creator</dc:description> | <dc:title>repeated text</dc:title> <dc:creator>name of creator</dc:creator> [no mapping of "repeated text + name of creator" to dc:description] | not easily | warning | If the information within the properties is identical, there is no need to duplicate; either dc:title or dc:description is mandatory. Concatening with another property that is already present is superfluous. | Basic Retrieval Distorts search weightings; distorts completeness measurement Repetition of field values does not increase searchability and it hampers visualization | ||||||||||||
7 | P32 | Duplicate metadata statements | The same property with the same value is repeated twice | Example 1 | dcterms:spatial "London" is repeated twice. dcterms:spatial "urn:rijksmuseum:thesaurus:RM0001.THESAU.4157" is repeated twice. urn:rijksmuseum:thesaurus:RM0001.THESAU.24458 is repeated twice in dc:subject. | No duplication | yes | Note: this duplication happens in the provider metadata not between the provider metadata and the Europeana enrichment. The problem is only partially handled by Metis normalisation (dc:title, dcterms:alternative, dc:subject, and dc:identifier). It can be easily be fixed with normalization at ingestion time, or during solr ingestion (cf https://europeana.atlassian.net/browse/SEAR-93) . Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.hmzmrda9pa40can | Affects user experience for display and search behaviour. | |||||||||||||
8 | P33 | Duplicate objects within a dataset | Datasets contain "repeated" objects with different identifiers (and sometimes metadata) but same image. | Example 1a Example 2a (same URL in edm:isShownBy, metadata is different) | yes (in principle) | warning | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.4ymd6yez22gi | Basic retrieval Illegibility, distorts search. | ||||||||||||||
9 | 2. Irrelevant information | |||||||||||||||||||||
10 | P5 | Unrecognizable titles | No descriptive information is given in the titles; only identifiers and shelfmarks | [EXAMPLE MISSING] (here we have an example of shelfmark but in dc:description) | <dc:title>NLD-820630-AMSTERDAM</dc:title> (in example with dc:description: "Call Number - 0028619") | [no mapping to dc:title]<dc:title></dc:title> <dc:identifier>NLD-820630-AMSTERDAM</dc:identifier | Not easily. E.g. "London-1998" could be an id but it could also be a title (a painting of London in 1998). Uniqueness tests could help to recognize the titles that are actually identifiers: because they're identifiers, they're likely to be unique within our datasets. | warning | Identifiers should be put in dc:identifier. Shelfmarks belong in dc:description with clarification that these are shelfmarks. dc:title is not mandatory if dc:description is present | Basic Retrieval Bypasses completeness metrics; exposes internal messages to external users | ||||||||||||
11 | P6 | Non-meaningful title | A standard value is put in when there is no title attached to the record: "no suitable title found", "unknown title", etc. ...in many languages | Example 1 Example 2 Example 3 Example 4 Example 5 | <dc:title>No title</dc:title> | [no mapping to dc:title] | Yes, if we look for specific values in some catalogue of unwanted keywords, like "no title". However, there are object (e.g. paintings) that really don't have a title. And it wouldn't be easy to get all possible variants (e.g. across languages) of unwanted keywords. | warning | dc:title is only mandatory when dc:description is not present. If both are unavailable or not meaningul, the record may not be findable anyway and not suitable for publishing on Europeana. | Basic Retrieval Bypasses completeness metrics ("empty" being a non-empty value); exposes internal messages to external users; negative for click-through rates in-Europeana and in search engines Non-meaningful titles do not help searchability of an item | ||||||||||||
12 | 3. Missing or Incomplete information | |||||||||||||||||||||
13 | P7 | Missing description fields | A description is available on the website of the provider, but has not been mapped to EDM | Example 1 | [no mapping of "information about the object" to dc:description] | <dc:description>information about the object</dc:description> | Not easily. For big datasets with no description we could check their websites | warning | dc:description is not mandatory if dc:title is present. However, it is a pity to not exploit existing descriptions, if it is possible. | Basic Retrieval: record unlikely to be retrieved; record uninformative If a dc:description is available, there is more descriptive information available for the user and heightens the findability of the object | ||||||||||||
14 | P8 | Missing lang tag | The language of metadata is not specified in an xml:lang tag, when data is monolingual per property NB: This is for specific fields, since not all values are language specific. | Example 1 | <dc:subject>oil on canvas</dc:subject> <dc:subject>l'huile sur toile</dc:subject> | <dc:subject xml:lang="en">oil on canvas</dc:subject> <dc:subject xml:lang="fr">l'huile sur toile</dc:subject> | Partly. Circa 20m records have less than 25% lang attributes (metadata tier 0) | warning | The Europeana R&D team are working on experiment to detect language of metadata. | Basic Retrieval; Cross-language recall; Improved language facets Language-tagging data greatly improves enrichment possibilities in the data, as well as enable Europeana to give the user more data which they can understand in their own language Not language-tagging data in the Europeana context would also mean lower metadata tier | ||||||||||||
15 | P9 | Very short description field | Not sufficient information is provided in the description of the provided CHO. | Example 1 | <dc:description>China</dc:description> | Easy, though one needs an agreement on what is "very short". Also, dc:description is often used as the recipient for any info that does not fit into any other field) In SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:description; sh:minLength 50 ] . | warning | Implementation could measure confidence (description of one letter is certainly not enough; one word is not enough 99% of the time; three words 95% of the time; etc). | Basic Retrieval: record unlikely to be retrieved; record uninformative | |||||||||||||
16 | P10 | Empty literals | A property is mapped, but there is no value in the property; just an empty space or no data at all | [EXAMPLE MISSING] (not easy to find as they are removed from search) | <dc:subject></dc:subject> | [no mapping to dc:subject] | yes (in SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:subject; sh:hasValue "" ] .) | warning | These empty values can interfere with the mapping process and invalidate records. Empty literals are now removed during normalisation so they are not indexed anymore. | Basic retrieval: diminished chance of retrieval; record uninformative; potentially breaks completeness measure If these empty values were not removed, they would skew the searchability of data | ||||||||||||
17 | P34 | (seemingly) empty field | A property is mapped, but the value is not a relevant value | Example 1 Example 2 Example 3 (in dcterms:spatial) Example 4 (in dcterms:spatial) Example 5 (in dc:subject & dc:type) | <dc:title>???</dc:title> | [no mapping to dc:title] | yes | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote | ||||||||||||||
18 | P42 | Lack of context to geotagging/location info | In some cases it is not clear if the provided location information is the location of the object or the location depicted in the object or where it was made, etc. | [Example missing] | Hard to detect. Method: Semantic Enrichment, with NERD + Classification or reasoning | informative | Should we try to discourage providers to use dcterms:spatial in favour of dc:subject and edm:currentLocation when these fields are more appropriate (which is of course not always the case)? For now we plan to act on this problem by updating the mapping guidelines. cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote | Basic retrieval; spatial search | ||||||||||||||
19 | 4. Non-optimal use of fields | |||||||||||||||||||||
20 | P12 | Extremely long titles | Values that are not the actual titles are given in the dc:title, while they would belong in dc:description | Example 1 Example 2 | <dc:title>[transcription of complete poem]</dc:title> | <dc:description>[transcription of complete poem]</dc:description> | Easy, though one needs an agreement on what is "extremely long". | warning | Longer values can be mapped to dc:description. Also note that when there is a subtitle for an object, this can be given in dcterms:alternative | Basic Retrieval Distorts search weightings; limits legibility to user Titles should be clear and concise to match usual practices of (web page) display. They also help searchability by lowering noise. | ||||||||||||
21 | P13 | edm:type with same content as dc:type | Instead of using dc:type as a specification like 'poetry' for edm:type=TEXT or 'painting' for edm:type=IMAGE, the value dc:type repeats the information of edm:type | Example 1 | dc:type - "Image" and edm:type "IMAGE" OR dc:type - "Text" and edm:type "TEXT" | <edm:type>TEXT</edm:type> <dc:type>Poetry</dc:type> OR <edm:type>IMAGE</edm:type> <dc:type>Photography</dc:type> | yes (queries: text, image, sound, video, 3D) | warning | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.vb938dqwyl3j Make providers stop using image or text as a value for dc:type seems a bit exagerated, especially when it is in different languages than English. DCMI says the scope of the dc:type element refers to various types (movie, sound, book, collection) and also to genre. The point is maybe to be informative and help providers to be more specific rather than pretend to fix it. | Basic Retrieval Slight distortion of search weightings The more specific the information in dc:type, the richer experience for the user (for finding and understanding the item) | ||||||||||||
22 | P14 | Swapped thumbnail and full-size image | a thumbnail is given in edm:isShownBy, and the full sized object is given in edm:object | Example 1 | <edm:object rdf:resource="http://bigimage-url.jpg"/> <edm:isShownBy rdf:resource="http://thumbnail-url.jpg/> | <edm:object rdf:resource="http://thumbnail-url.jpg"/> <edm:isShownBy rdf:resource="http://bigimage-url.jpg"/> | This can be checked by comparing resolutions (in the technical metadata) but we can only compare the thumbnail with the biggest resolution image. | warning | Impact on accessing the item: - for images, this will cause a beautiful thumbnail to be created, but when the user clicks to see it in full screen (and not in Europeana thumbnail mode) it sees a smaller thumbnail. - for other file types, this will result in unviewable/playable content Lower content tier. Slow loading? Poor image display? | | | ||||||||||||
23 | P35 | Unfit edm:isShownBy in edm:object | edm:isShownBy has been filled in the edm:object, while it is not an image (for example a PDF or audio file) edm:object MUST be an image | Example 1 | <edm:object rdf:resource="http://hdl.handle.net/11088/de-bo133:doc:140628"/> (PDF) | yes | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.hmzmrda9pa40 | |||||||||||||||
24 | P36 | Generic property is used while there is a more specific appropriate one | A property is used but there is a more appropriate one (e.g. edm:hasMet instead of dcterms:spatial) | Example 1 | <edm:hasMet>geo:16.067,108.233</edm:hasMet> | <dcterms:spatial>geo:16.067,108.233</dcterms:spatial> | yes, partially (for example checking which items have dc:date while they may have dcterms:created or dcterms:issued instead) | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote | ||||||||||||||
25 | P37 | There is confusion between genre and type | The DCMI specification for the scope of the type element refers to various types (movie, sound, book, collection) and also to genre. But in Europeana dc:type is generally used to record the digital type of the CHO, and the vocabularies that cover genres (of music, architecture, paintings) are often linked to subject. There's a need to review how the elements dc:type and dc:subject are actually used in Europeana and whether the semantics are clear. Then to make a recommendation on best practice, which explicitly clarifies where genre should be recorded | [EXAMPLE MISSING] | not easy. Maybe by use semantic enrichment with list of genres and types | warning | The mapping guidelines have been updated earlier this year so that they're no longer confusing. But maybe they can be enhanced. https://europeana.atlassian.net/wiki/spaces/EF/pages/2106294284/edm+ProvidedCHO#dc:type Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote | |||||||||||||||
26 | 5. Wrong data | |||||||||||||||||||||
27 | P18 | Impossible dates e.g. spring 20011 | Due to automatization/wrong values, dates that are obviously wrong are provided. This information compromises the dates we have in Europeana and makes it impossible to reliably search by year | Example 1 Example 2 Example 3 Example 4 | <dc:date>3500</dc:date> <dc:date>31 June 1954</dc:date> <dcterms:created>30.02.1902 (Herstellung)</dcterms:created> <dc:date>1962-11-31</dc:date> Earlier examples include <dc:date>spring 20011</dc:date> <dcterms:published>-44050</dcterms:published | [no mapping] | Hard. There are so many patterns so it is very difficult even to extract the year in order to check whether it is valid. Example queries: 1, 2 | warning | In these cases the values given are innacurate and should be mapped out. If it turns out they are wrongly mapped identifiers they can be remapped, or if a character confused the date this can be corrected. We could ask EKT if they have something about it as part of their time enrichment (https://docs.google.com/presentation/d/18itsU8-KZ4kpEMJG_LEL_GLeeW3kcj1NMTTVxN-BKY8/edit#slide=id.g80c2fee03b_0_425) Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.azn3kbomh4ek | Basic Retrieval Prevents creation of proper date filters (this example from Europeana Fashion), where date is interpreted as the year 11 | ||||||||||||
28 | P19 | Wrong references for a controlled vocabulary. a. wrong URI (differs from authoritative version) b. not a URI | When providing references to a vocabulary Europeana needs a the URI at which the resource data is machine-accessible (as LOD). Giving the literals from the vocabulary means giving less rich data to Europeana than possible | Example 1 Example 2 | <dc:subject><iconclass=49G35(+52)> tools, instruments; laboratory equipment - scientific research</dc:subject> <dc:creator rdf:resource="http://d-nb.info/gnd/138541442"/> | <dc:subject rdf:resource="http://iconclass.org/rkd/49G35(%2B52)"/> <dc:creator rdf:resource="https://d-nb.info/gnd/138541442"/> | b can be automatically checked. a is less easy. There is a suggestion: 1) Make a list of domains and URIs in Europeana; 2) Analyze the outcomes for the 'bad' URIs per domain and group them; 3) Create rules per domain to extract @id var; 4) Build and output autorithative URIs by concatenating "autorithative_domain_URI + @id" Checking for special characters (such as =) or numbers could be another option. | warning | Has been renamed, cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45 More examples/stats: 2.541.297 records with GND links using HTTP protocol (instead of HTTPS): https://api.europeana.eu/record/search.json?query=%22http://d-nb%22*&wskey= 3.421.669 records with Geonames links using HTTP protocol (instead of HTTPS): https://api.europeana.eu/record/search.json?query=%22http://sws.geonames.org%22*&wskey=&profile=rich 708.154 records with Geonames links using www instead of sws: https://api.europeana.eu/record/search.json?query=*%22www.geonames.org%22*&wskey=&profile=rich 566.549 records with Wikidata links using HTTPS (instead of HTTP): https://api.europeana.eu/record/search.json?query=%22https://www.wikidata.org%22*&wskey=&profile=rich 662.028 records with Wikidata links using /wiki instead of /entity: https://api.europeana.eu/record/search.json?query=*%22www.wikidata.org/wiki/%22*&wskey=&profile=rich 59742 records with a BNE URI that starts with a whitespace: https://api.europeana.eu/record/search.json?query=%22%20http://datos.bne.es/resource/%22*&wskey= 61552 records referring to the vocabulary of the International Music Score Library Project using the HTTP version instead of the HTTPS official: https://api.europeana.eu/record/search.json?query=%22http://imslp.org/wiki/Category:%22*&wskey= 10019 records refer to the Finnish National Gallery using a URL that is no longer recognisable/resolvable (the Wikidata Property is https://www.wikidata.org/wiki/Property:P4177 which indicates a different URI pattern: https://api.europeana.eu/record/search.json?query=%22http://kansallisgalleria.fi/%22*&wskey=) 31 records referring to catalogue.bnf.fr instead of data.bnf.fr: https://api.europeana.eu/record/search.json?query=%22http://catalogue.bnf.fr/ark:/%22*&wskey= | Basic Retrieval Lowers metadata tier | ||||||||||||
29 | P41 | Use of vocabulary references that are correct but not de-referenced | Use of vocabulary references (without a corresponding EDM contextual class) that are correct but not de-referenced. a. the URI correspond to a vocabulary that is de-referenceable but currently not supported by Europeana b. the URI corresponds to a vocabulary that is not de-referenceable in absolute | Example 1a Example 1b | Not easily | warning | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.9evs7xm4qhvw | |||||||||||||||
30 | 6. Normalisation | |||||||||||||||||||||
31 | P20 | Time period in specific formatting: 3200[ac]-2250[ac] | Because there is data from many providers who keep different standards, having minimal formatting in values like dates is desirable. | Example 1 Example 2 Example 3 | <dcterms:created>3200[ac]-2250[ac]</dcterms:created> | TBD: cf best practices for dates | Hard | warning | For yearspans BC there is currently no best practice. This is something the DQC and Europeana need to think about! For general years BC '-3200' would be a good practice. It is also important to note that yearspans are not able for use in facets as of yet. Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.9evs7xm4qhvw Although we initially thought that this could be merged with 'P17 - Term not fitting against a controlled list of terms' [deprecated], we now realise that P20 is not necessarily a matter of controlled vocabulary; it could rather be a matter of best practices to represent dates. It may also overlap with ongoing efforts on date normalization | Basic Retrieval; Normalizing dates to a consistent representation would help search and visualization | ||||||||||||
32 | P38 | Incorrect lang tag | The provided language attribute for the metadata value is not correct (e.g. 'nl' instead of 'en') | Example 1 Example 2 | <dc:title xml:lang="nl">Actualités britanniques 1914-1915</dc:title> <dc:title xml:lang="en">Actualités britanniques 1914-1915</dc:title> <dc:title xml:lang="ro">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837 = Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title> | <dc:title xml:lang="fr">Actualités britanniques 1914-1915</dc:title> (only one title) <dc:title xml:lang="de">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837</dc:title> <dc:title xml:lang="ro">Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title> | Not easy in general: language detection should be applied. But it could be easily partially detected by checking cardinality of languages in the specific case when only one value is expected by language for a property (if a language has several labels, a warning could be sent). | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.vb938dqwyl3j | ||||||||||||||
33 | P15 | Invalid language tags (xml:lang attributes) | xml:lang attributes are not valid according to ISO 639-1 or ISO 639-1 or ISO 639-3 | Example 1 (gr, note that it's also a case of incorrect tagging) Example 2 (mehrspr) Cf. column C for most frequent tags | <dc:title xml:lang="gr">Golgi's votive relief inscription - image (E6 in Voskos, 1997)</dc:title> (also a case of incorrect tagging) <dc:title xml:lang="mehrspr">Thesaurus inscriptionum Aegyptiacarum: altaegyptische Inschriften</dc:title> | <dc:title xml:lang="gr">Golgi's votive relief inscription - image (E6 in Voskos, 1997)</dc:title> For example 2 ("Thesaurus inscriptionum Aegyptiacarum: altaegyptische Inschriften"), mul is apparently acceptable in ISO but it is hard for us to recommend it (if the portal cannot handle it properly). Maybe the value could be in the "main" language (trying to identify a primary language being used; if text in other languages is present it could be represented using quotes): Latin title could be quoted and the text marked as German. And additionally, an alternative title could be added with the Latin text only. | Yes, by checking compliance against a controlled list of terms. Some normalization can be done (and has been done), such as mapping "Spanish" to "es", but it does not catch every case. Cf stats on (failure of) language normalization at https://rnd-2.eanadev.org/share/language-normalisation/Language_Provider_xml_lang_not_normalizable.txt In 2022, the most frequent invalid language tags were nah (114866), za (67134), bh (14457), sgs (13974), ltg (10030), cel (1693), enenen (1334), xxx (390) | warning | Metis normalizes many invalid tags already - but not all (its normalization focuses on normalizing valid tags). Ongoing work at https://europeana.atlassian.net/browse/RD-111 Jena RIOT was suggested as a checking method in the past See stats at https://rnd-2.eanadev.org/share/language-normalisation/LanguageDataReport.html What cannot be normalized could be flagged as potentially invalid. The severity in the report could be "Error" considering that this is a case of invalidity. Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.azn3kbomh4ek | Basic Retrieval; Cross-language recall; Improved language facets | ||||||||||||
34 | 7. Dependency to external resources | |||||||||||||||||||||
35 | P22 | Swapped isShownBy and isShownAt values | the isShownBy and isShownAt values are mixed; this causes an error in processing files for Europeana service and blocks the user from finding the institution's website | Example 1 Example 2 | <edm:isShownAt rdf:resource="http://website.com/image.jpg"/> <edm:isShownBy rdf:resource="http://website.com/image-information.html"/> | <edm:isShownAt rdf:resource="http://website.com/image-information.html"/> <edm:isShownBy rdf:resource="http://website.com/image.jpg"/> | Yes if we check the edm:isShownAt for specific mime types (e.g. existence of .jpg). Example queries: jpg, jpeg, png, PDF, pdf mp3, gif, mp4... | warning | Except for embedding cases | Lowers content tier | ||||||||||||
36 | 8. Serialization / format / encoding | |||||||||||||||||||||
37 | P25 | Field should be literal but is URL | For some fields we only allow literal values, even if there are URIs or URLs available on your side. | Example 1 | <dc:title>http://dx.doi.org/10.1080/00905990701368738</dc:title> | <dc:title xml:lang="en">Crystallizing and Emancipating Identities in Post-Communist Estonia</dc:title> | Yes, using e.g. IRI validation in RDF validators | Error | Datasets with http in the title: https://www.europeana.eu/api/v2/search.json?query=proxy_dc_title:(*http*)&rows=0&start=1&facet=edm_datasetName&profile=facets&f.edm_datasetName.facet.limit=1000&wskey=tbc [AI: 12-04-2023: some issues for contextual classes come from mappings and could be fixed via them] Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.nbmjo5zhye16 | Basic Retrieval Results in field treated as local-file URL | ||||||||||||
38 | P27 | Many URLs in one field | When multiple URLs are given in one field, it will be read as one link, which will fail. All values should be given in separate fields | Example 1 (in edm:isShownBy) | <edm:isShownBy rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_001.tif; http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_002.tif;"/> | <edm:isShownBy rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_001.tif"/> <edm:hasView rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_002.tif"/> | Yes, using e.g. IRI validation in RDF validators | Error | Each record can have only 1 isShownBy, however any additional views of the object can be given in edm:hasView, which can be repeated. [HS: 16-03-2023: it seems that at least for media links the problem is pretty small. Do we have other fields in mind?] Cases of duplicate URL could be corrected | Field cannot be parsed; link cannot be resolved While not a technical problem, such URL has a low chance of being permanent | ||||||||||||
39 | P30 | HTML in fields | The use of html is not supported in literal fields of EDM. It makes literals largely unreadable for humans. On the other hand, <br/> is needed for poetry or lyrics but the portal doesn't handle line breaks. | Example 1 Example 2 Example 3 | <dc:description>Ta gjerne med borna på tur langs Rallarvegen. <br/> <p><b><span>Borna sine turreglar</span></b></p> </dc:description> | <dc:description>Ta gjerne med borna på tur langs Rallarvegen. Borna sine turreglar</dc:description> | Yes (using pattern matching) | Warning | HTML is in many cases website specific, and may not belong in the source XML of data Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45 | Basic Retrieval Can create false positives in search; display unreadable | ||||||||||||
40 | P31 | Schematic data within fields | During a mapping from one format to another, the schematic data of the original mark up is incorporated as value into EDM | Example 1 (for the title in Swedish) Example 2 (for ISBD notation) Example 3 (for escaped HTML link) | <dc:title>{"danish"=>["Vester Sakskærsgård"]}</dc:title> <dc:title xml:lang="ro">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837 = Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title> <dc:publisher><a href=\u0022http://www.wydawnictwo.pk.edu.pl/\u0022 target=\u0022_blank\u0022>Wydawnictwo PK</a></dc:publisher> | <dc:title xml:lang="da">Vester Sakskærsgård</dc:title> <dc:title xml:lang="de">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837</dc:title> <dc:title xml:lang="ro">Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title> Having only the name of the Publisher would be ok | A good way would be to count the nr of non-text characters contrast the percentage against text characters (control characters used in schemas, e.g <>=/{}[]) | Warning | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45 | Basic Retrieval Distorts search metrics; relevant values are cluttered with machine-readable formating | ||||||||||||
41 | P39 | Escape and special characters in titles and descriptions | The issue is mostly relevant for translations provided from our data providers | Example 1 Example 2 | <dc:title>Patrick\ n\ t\ n\ t Patrick. </dc:title> <dc:title>Exotic visitors for London_x000D_ H H</dc:title> | <dc:title>Patrick Patrick.</dc:title> <dc:title>Exotic visitors for London H H</dc:title> | yes | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45 Should we check more fields or only titles and descriptions? | ||||||||||||||
42 | P40 | Special characters are not represented with the right encoding | Some characters (such as music notations) can be provided using the wrong encoding, for example using HTML code like ♭ instead of the UTF-8 encoding. This relates to the issue of having HTML in the value of metadata fields, though the motivation for the problem is quite different. Sometimes the special character is only represented with a normal letter ('b' for flat sign), losing the original information. | Example 1 (for no encoding), Example 2 (for Unicode U+FFFD character � | <dc:descrption>Karl Sch�nb�ck</dc:descrption> | <dc:descrption>Karl Schönböck</dc:descrption> | yes | Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote | ||||||||||||||
43 | ||||||||||||||||||||||
44 | ||||||||||||||||||||||
45 | ||||||||||||||||||||||
46 | ||||||||||||||||||||||
47 | ||||||||||||||||||||||
48 | ||||||||||||||||||||||
49 | ||||||||||||||||||||||
50 | ||||||||||||||||||||||
51 | ||||||||||||||||||||||
52 | ||||||||||||||||||||||
53 | ||||||||||||||||||||||
54 | ||||||||||||||||||||||
55 | ||||||||||||||||||||||
56 | ||||||||||||||||||||||
57 | ||||||||||||||||||||||
58 | ||||||||||||||||||||||
59 | ||||||||||||||||||||||
60 | ||||||||||||||||||||||
61 | ||||||||||||||||||||||
62 | ||||||||||||||||||||||
63 | ||||||||||||||||||||||
64 | ||||||||||||||||||||||
65 | ||||||||||||||||||||||
66 | ||||||||||||||||||||||
67 | ||||||||||||||||||||||
68 | ||||||||||||||||||||||
69 | ||||||||||||||||||||||
70 | ||||||||||||||||||||||
71 | ||||||||||||||||||||||
72 | ||||||||||||||||||||||
73 | ||||||||||||||||||||||
74 | ||||||||||||||||||||||
75 | ||||||||||||||||||||||
76 | ||||||||||||||||||||||
77 | ||||||||||||||||||||||
78 | ||||||||||||||||||||||
79 | ||||||||||||||||||||||
80 | ||||||||||||||||||||||
81 | ||||||||||||||||||||||
82 | ||||||||||||||||||||||
83 | ||||||||||||||||||||||
84 | ||||||||||||||||||||||
85 | ||||||||||||||||||||||
86 | ||||||||||||||||||||||
87 | ||||||||||||||||||||||
88 | ||||||||||||||||||||||
89 | ||||||||||||||||||||||
90 | ||||||||||||||||||||||
91 | ||||||||||||||||||||||
92 | ||||||||||||||||||||||
93 | ||||||||||||||||||||||
94 | ||||||||||||||||||||||
95 | ||||||||||||||||||||||
96 | ||||||||||||||||||||||
97 | ||||||||||||||||||||||
98 | ||||||||||||||||||||||
99 | ||||||||||||||||||||||
100 | ||||||||||||||||||||||