Problem patterns

	A	B	C	D	E	F	G	H	I	J	V
1	guage
2	ID	Description of problem	Explanation	Example	What is in the data now	What we would like to see	Can be quantified?	Severity (pre-2022)	Remarks	Discovery Scenario Affected (Impact)

3	1. Duplicate / Redundant information (within a field, across fields and across a collection)
4	P1	Systematic use of the same title	Within the dataset/collection multiple records use the same title	Example 1 Example 2 Example 3	<dc:title>OLJEMÅLNING</dc:title> <dc:title>OLJEMÅLNING</dc:title>	<dc:title>OLJEMÅLNING - [X]</dc:title> <dc:title>OLJEMÅLNING - [Y]</dc:title>	yes	warning	If the data is not available for completely unique titles, consider appending another value to tell something unique about the object: for example in the 'Rijksmonument' append with the location to get for example: "Rijksmonument Gelderland"	Basic Retrieval Lack of differentiation in search; title uninformative; negative for SEO
5	P2	Equal title and description fields	The title is a repeat of the exact information in the description, or the other way around	Example 1 Example 2	<dc:title>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:title> <dc:description>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:description>	<dc:title>Doll, dressed as a nurse in costume of the Diaconessenhuis in Leeuwarden in 1934</dc:title> [no mapping of "Doll, [...]" to dc:description]	yes (in SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:title; sh:equals dc:description ] .)	warning	If the information within the properties is identical, there is no need to duplicate; either dc:title or dc:description is mandatory, so we rather have the information somewhere as specific as possible.	Basic Retrieval Repetition of field values does not increase searchability and it hampers visualization Distorts search weightings
6	P3	Near-identical descriptions and title fields	The description is nearly the same as the title, with maybe some additional information that comes from other properties	[EXAMPLE MISSING]	<dc:title>repeated text</dc:title> <dc:creator>name of creator</dc:creator> <dc:description>repeated text + name of creator</dc:description>	<dc:title>repeated text</dc:title> <dc:creator>name of creator</dc:creator> [no mapping of "repeated text + name of creator" to dc:description]	not easily	warning	If the information within the properties is identical, there is no need to duplicate; either dc:title or dc:description is mandatory. Concatening with another property that is already present is superfluous.	Basic Retrieval Distorts search weightings; distorts completeness measurement Repetition of field values does not increase searchability and it hampers visualization
7	P32	Duplicate metadata statements	The same property with the same value is repeated twice	Example 1	dcterms:spatial "London" is repeated twice. dcterms:spatial "urn:rijksmuseum:thesaurus:RM0001.THESAU.4157" is repeated twice. urn:rijksmuseum:thesaurus:RM0001.THESAU.24458 is repeated twice in dc:subject.	No duplication	yes		Note: this duplication happens in the provider metadata not between the provider metadata and the Europeana enrichment. The problem is only partially handled by Metis normalisation (dc:title, dcterms:alternative, dc:subject, and dc:identifier). It can be easily be fixed with normalization at ingestion time, or during solr ingestion (cf https://europeana.atlassian.net/browse/SEAR-93) . Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.hmzmrda9pa40can	Affects user experience for display and search behaviour.
8	P33	Duplicate objects within a dataset	Datasets contain "repeated" objects with different identifiers (and sometimes metadata) but same image.	Example 1a Example 2a (same URL in edm:isShownBy, metadata is different)			yes (in principle)	warning	Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.4ymd6yez22gi	Basic retrieval Illegibility, distorts search.
9	2. Irrelevant information
10	P5	Unrecognizable titles	No descriptive information is given in the titles; only identifiers and shelfmarks	[EXAMPLE MISSING] (here we have an example of shelfmark but in dc:description)	<dc:title>NLD-820630-AMSTERDAM</dc:title> (in example with dc:description: "Call Number - 0028619")	[no mapping to dc:title]<dc:title></dc:title> <dc:identifier>NLD-820630-AMSTERDAM</dc:identifier	Not easily. E.g. "London-1998" could be an id but it could also be a title (a painting of London in 1998). Uniqueness tests could help to recognize the titles that are actually identifiers: because they're identifiers, they're likely to be unique within our datasets.	warning	Identifiers should be put in dc:identifier. Shelfmarks belong in dc:description with clarification that these are shelfmarks. dc:title is not mandatory if dc:description is present	Basic Retrieval Bypasses completeness metrics; exposes internal messages to external users
11	P6	Non-meaningful title	A standard value is put in when there is no title attached to the record: "no suitable title found", "unknown title", etc. ...in many languages	Example 1 Example 2 Example 3 Example 4 Example 5	<dc:title>No title</dc:title>	[no mapping to dc:title]	Yes, if we look for specific values in some catalogue of unwanted keywords, like "no title". However, there are object (e.g. paintings) that really don't have a title. And it wouldn't be easy to get all possible variants (e.g. across languages) of unwanted keywords.	warning	dc:title is only mandatory when dc:description is not present. If both are unavailable or not meaningul, the record may not be findable anyway and not suitable for publishing on Europeana.	Basic Retrieval Bypasses completeness metrics ("empty" being a non-empty value); exposes internal messages to external users; negative for click-through rates in-Europeana and in search engines Non-meaningful titles do not help searchability of an item
12	3. Missing or Incomplete information
13	P7	Missing description fields	A description is available on the website of the provider, but has not been mapped to EDM	Example 1	[no mapping of "information about the object" to dc:description]	<dc:description>information about the object</dc:description>	Not easily. For big datasets with no description we could check their websites	warning	dc:description is not mandatory if dc:title is present. However, it is a pity to not exploit existing descriptions, if it is possible.	Basic Retrieval: record unlikely to be retrieved; record uninformative If a dc:description is available, there is more descriptive information available for the user and heightens the findability of the object
14	P8	Missing lang tag	The language of metadata is not specified in an xml:lang tag, when data is monolingual per property NB: This is for specific fields, since not all values are language specific.	Example 1	<dc:subject>oil on canvas</dc:subject> <dc:subject>l'huile sur toile</dc:subject>	<dc:subject xml:lang="en">oil on canvas</dc:subject> <dc:subject xml:lang="fr">l'huile sur toile</dc:subject>	Partly. Circa 20m records have less than 25% lang attributes (metadata tier 0)	warning	The Europeana R&D team are working on experiment to detect language of metadata.	Basic Retrieval; Cross-language recall; Improved language facets Language-tagging data greatly improves enrichment possibilities in the data, as well as enable Europeana to give the user more data which they can understand in their own language Not language-tagging data in the Europeana context would also mean lower metadata tier
15	P9	Very short description field	Not sufficient information is provided in the description of the provided CHO.	Example 1	<dc:description>China</dc:description>		Easy, though one needs an agreement on what is "very short". Also, dc:description is often used as the recipient for any info that does not fit into any other field) In SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:description; sh:minLength 50 ] .	warning	Implementation could measure confidence (description of one letter is certainly not enough; one word is not enough 99% of the time; three words 95% of the time; etc).	Basic Retrieval: record unlikely to be retrieved; record uninformative
16	P10	Empty literals	A property is mapped, but there is no value in the property; just an empty space or no data at all	[EXAMPLE MISSING] (not easy to find as they are removed from search)	<dc:subject></dc:subject>	[no mapping to dc:subject]	yes (in SHACL this could be done with a test shape like <IssueShape> sh:property [ sh:predicate dc:subject; sh:hasValue "" ] .)	warning	These empty values can interfere with the mapping process and invalidate records. Empty literals are now removed during normalisation so they are not indexed anymore.	Basic retrieval: diminished chance of retrieval; record uninformative; potentially breaks completeness measure If these empty values were not removed, they would skew the searchability of data
17	P34	(seemingly) empty field	A property is mapped, but the value is not a relevant value	Example 1 Example 2 Example 3 (in dcterms:spatial) Example 4 (in dcterms:spatial) Example 5 (in dc:subject & dc:type)	<dc:title>???</dc:title>	[no mapping to dc:title]	yes		Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
18	P42	Lack of context to geotagging/location info	In some cases it is not clear if the provided location information is the location of the object or the location depicted in the object or where it was made, etc.	[Example missing]			Hard to detect. Method: Semantic Enrichment, with NERD + Classification or reasoning	informative	Should we try to discourage providers to use dcterms:spatial in favour of dc:subject and edm:currentLocation when these fields are more appropriate (which is of course not always the case)? For now we plan to act on this problem by updating the mapping guidelines. cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote	Basic retrieval; spatial search
19	4. Non-optimal use of fields
20	P12	Extremely long titles	Values that are not the actual titles are given in the dc:title, while they would belong in dc:description	Example 1 Example 2	<dc:title>[transcription of complete poem]</dc:title>	<dc:description>[transcription of complete poem]</dc:description>	Easy, though one needs an agreement on what is "extremely long".	warning	Longer values can be mapped to dc:description. Also note that when there is a subtitle for an object, this can be given in dcterms:alternative	Basic Retrieval Distorts search weightings; limits legibility to user Titles should be clear and concise to match usual practices of (web page) display. They also help searchability by lowering noise.
21	P13	edm:type with same content as dc:type	Instead of using dc:type as a specification like 'poetry' for edm:type=TEXT or 'painting' for edm:type=IMAGE, the value dc:type repeats the information of edm:type	Example 1	dc:type - "Image" and edm:type "IMAGE" OR dc:type - "Text" and edm:type "TEXT"	<edm:type>TEXT</edm:type> <dc:type>Poetry</dc:type> OR <edm:type>IMAGE</edm:type> <dc:type>Photography</dc:type>	yes (queries: text, image, sound, video, 3D)	warning	Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.vb938dqwyl3j Make providers stop using image or text as a value for dc:type seems a bit exagerated, especially when it is in different languages than English. DCMI says the scope of the dc:type element refers to various types (movie, sound, book, collection) and also to genre. The point is maybe to be informative and help providers to be more specific rather than pretend to fix it.	Basic Retrieval Slight distortion of search weightings The more specific the information in dc:type, the richer experience for the user (for finding and understanding the item)
22	P14	Swapped thumbnail and full-size image	a thumbnail is given in edm:isShownBy, and the full sized object is given in edm:object	Example 1	<edm:object rdf:resource="http://bigimage-url.jpg"/> <edm:isShownBy rdf:resource="http://thumbnail-url.jpg/>	<edm:object rdf:resource="http://thumbnail-url.jpg"/> <edm:isShownBy rdf:resource="http://bigimage-url.jpg"/>	This can be checked by comparing resolutions (in the technical metadata) but we can only compare the thumbnail with the biggest resolution image.	warning		Impact on accessing the item: - for images, this will cause a beautiful thumbnail to be created, but when the user clicks to see it in full screen (and not in Europeana thumbnail mode) it sees a smaller thumbnail. - for other file types, this will result in unviewable/playable content Lower content tier. Slow loading? Poor image display?	\|
23	P35	Unfit edm:isShownBy in edm:object	edm:isShownBy has been filled in the edm:object, while it is not an image (for example a PDF or audio file) edm:object MUST be an image	Example 1	<edm:object rdf:resource="http://hdl.handle.net/11088/de-bo133:doc:140628"/> (PDF)		yes		Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.hmzmrda9pa40
24	P36	Generic property is used while there is a more specific appropriate one	A property is used but there is a more appropriate one (e.g. edm:hasMet instead of dcterms:spatial)	Example 1	<edm:hasMet>geo:16.067,108.233</edm:hasMet>	<dcterms:spatial>geo:16.067,108.233</dcterms:spatial>	yes, partially (for example checking which items have dc:date while they may have dcterms:created or dcterms:issued instead)		Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
25	P37	There is confusion between genre and type	The DCMI specification for the scope of the type element refers to various types (movie, sound, book, collection) and also to genre. But in Europeana dc:type is generally used to record the digital type of the CHO, and the vocabularies that cover genres (of music, architecture, paintings) are often linked to subject. There's a need to review how the elements dc:type and dc:subject are actually used in Europeana and whether the semantics are clear. Then to make a recommendation on best practice, which explicitly clarifies where genre should be recorded	[EXAMPLE MISSING]			not easy. Maybe by use semantic enrichment with list of genres and types	warning	The mapping guidelines have been updated earlier this year so that they're no longer confusing. But maybe they can be enhanced. https://europeana.atlassian.net/wiki/spaces/EF/pages/2106294284/edm+ProvidedCHO#dc:type Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
26	5. Wrong data
27	P18	Impossible dates e.g. spring 20011	Due to automatization/wrong values, dates that are obviously wrong are provided. This information compromises the dates we have in Europeana and makes it impossible to reliably search by year	Example 1 Example 2 Example 3 Example 4	<dc:date>3500</dc:date> <dc:date>31 June 1954</dc:date> <dcterms:created>30.02.1902 (Herstellung)</dcterms:created> <dc:date>1962-11-31</dc:date> Earlier examples include <dc:date>spring 20011</dc:date> <dcterms:published>-44050</dcterms:published	[no mapping]	Hard. There are so many patterns so it is very difficult even to extract the year in order to check whether it is valid. Example queries: 1, 2	warning	In these cases the values given are innacurate and should be mapped out. If it turns out they are wrongly mapped identifiers they can be remapped, or if a character confused the date this can be corrected. We could ask EKT if they have something about it as part of their time enrichment (https://docs.google.com/presentation/d/18itsU8-KZ4kpEMJG_LEL_GLeeW3kcj1NMTTVxN-BKY8/edit#slide=id.g80c2fee03b_0_425) Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.azn3kbomh4ek	Basic Retrieval Prevents creation of proper date filters (this example from Europeana Fashion), where date is interpreted as the year 11
28	P19	Wrong references for a controlled vocabulary. a. wrong URI (differs from authoritative version) b. not a URI	When providing references to a vocabulary Europeana needs a the URI at which the resource data is machine-accessible (as LOD). Giving the literals from the vocabulary means giving less rich data to Europeana than possible	Example 1 Example 2	<dc:subject><iconclass=49G35(+52)> tools, instruments; laboratory equipment - scientific research</dc:subject> <dc:creator rdf:resource="http://d-nb.info/gnd/138541442"/>	<dc:subject rdf:resource="http://iconclass.org/rkd/49G35(%2B52)"/> <dc:creator rdf:resource="https://d-nb.info/gnd/138541442"/>	b can be automatically checked. a is less easy. There is a suggestion: 1) Make a list of domains and URIs in Europeana; 2) Analyze the outcomes for the 'bad' URIs per domain and group them; 3) Create rules per domain to extract @id var; 4) Build and output autorithative URIs by concatenating "autorithative_domain_URI + @id" Checking for special characters (such as =) or numbers could be another option.	warning	Has been renamed, cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45 More examples/stats: 2.541.297 records with GND links using HTTP protocol (instead of HTTPS): https://api.europeana.eu/record/search.json?query=%22http://d-nb%22&wskey= 3.421.669 records with Geonames links using HTTP protocol (instead of HTTPS): https://api.europeana.eu/record/search.json?query=%22http://sws.geonames.org%22&wskey=&profile=rich 708.154 records with Geonames links using www instead of sws: https://api.europeana.eu/record/search.json?query=%22www.geonames.org%22&wskey=&profile=rich 566.549 records with Wikidata links using HTTPS (instead of HTTP): https://api.europeana.eu/record/search.json?query=%22https://www.wikidata.org%22&wskey=&profile=rich 662.028 records with Wikidata links using /wiki instead of /entity: https://api.europeana.eu/record/search.json?query=%22www.wikidata.org/wiki/%22&wskey=&profile=rich 59742 records with a BNE URI that starts with a whitespace: https://api.europeana.eu/record/search.json?query=%22%20http://datos.bne.es/resource/%22&wskey= 61552 records referring to the vocabulary of the International Music Score Library Project using the HTTP version instead of the HTTPS official: https://api.europeana.eu/record/search.json?query=%22http://imslp.org/wiki/Category:%22&wskey= 10019 records refer to the Finnish National Gallery using a URL that is no longer recognisable/resolvable (the Wikidata Property is https://www.wikidata.org/wiki/Property:P4177 which indicates a different URI pattern: https://api.europeana.eu/record/search.json?query=%22http://kansallisgalleria.fi/%22&wskey=) 31 records referring to catalogue.bnf.fr instead of data.bnf.fr: https://api.europeana.eu/record/search.json?query=%22http://catalogue.bnf.fr/ark:/%22*&wskey=	Basic Retrieval Lowers metadata tier
29	P41	Use of vocabulary references that are correct but not de-referenced	Use of vocabulary references (without a corresponding EDM contextual class) that are correct but not de-referenced. a. the URI correspond to a vocabulary that is de-referenceable but currently not supported by Europeana b. the URI corresponds to a vocabulary that is not de-referenceable in absolute	Example 1a Example 1b			Not easily	warning	Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.9evs7xm4qhvw
30	6. Normalisation
31	P20	Time period in specific formatting: 3200[ac]-2250[ac]	Because there is data from many providers who keep different standards, having minimal formatting in values like dates is desirable.	Example 1 Example 2 Example 3	<dcterms:created>3200[ac]-2250[ac]</dcterms:created>	TBD: cf best practices for dates	Hard	warning	For yearspans BC there is currently no best practice. This is something the DQC and Europeana need to think about! For general years BC '-3200' would be a good practice. It is also important to note that yearspans are not able for use in facets as of yet. Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.9evs7xm4qhvw Although we initially thought that this could be merged with 'P17 - Term not fitting against a controlled list of terms' [deprecated], we now realise that P20 is not necessarily a matter of controlled vocabulary; it could rather be a matter of best practices to represent dates. It may also overlap with ongoing efforts on date normalization	Basic Retrieval; Normalizing dates to a consistent representation would help search and visualization
32	P38	Incorrect lang tag	The provided language attribute for the metadata value is not correct (e.g. 'nl' instead of 'en')	Example 1 Example 2	<dc:title xml:lang="nl">Actualités britanniques 1914-1915</dc:title> <dc:title xml:lang="en">Actualités britanniques 1914-1915</dc:title> <dc:title xml:lang="ro">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837 = Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title>	<dc:title xml:lang="fr">Actualités britanniques 1914-1915</dc:title> (only one title) <dc:title xml:lang="de">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837</dc:title> <dc:title xml:lang="ro">Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title>	Not easy in general: language detection should be applied. But it could be easily partially detected by checking cardinality of languages in the specific case when only one value is expected by language for a property (if a language has several labels, a warning could be sent).		Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.vb938dqwyl3j
33	P15	Invalid language tags (xml:lang attributes)	xml:lang attributes are not valid according to ISO 639-1 or ISO 639-1 or ISO 639-3	Example 1 (gr, note that it's also a case of incorrect tagging) Example 2 (mehrspr) Cf. column C for most frequent tags	<dc:title xml:lang="gr">Golgi's votive relief inscription - image (E6 in Voskos, 1997)</dc:title> (also a case of incorrect tagging) <dc:title xml:lang="mehrspr">Thesaurus inscriptionum Aegyptiacarum: altaegyptische Inschriften</dc:title>	<dc:title xml:lang="gr">Golgi's votive relief inscription - image (E6 in Voskos, 1997)</dc:title> For example 2 ("Thesaurus inscriptionum Aegyptiacarum: altaegyptische Inschriften"), mul is apparently acceptable in ISO but it is hard for us to recommend it (if the portal cannot handle it properly). Maybe the value could be in the "main" language (trying to identify a primary language being used; if text in other languages is present it could be represented using quotes): Latin title could be quoted and the text marked as German. And additionally, an alternative title could be added with the Latin text only.	Yes, by checking compliance against a controlled list of terms. Some normalization can be done (and has been done), such as mapping "Spanish" to "es", but it does not catch every case. Cf stats on (failure of) language normalization at https://rnd-2.eanadev.org/share/language-normalisation/Language_Provider_xml_lang_not_normalizable.txt In 2022, the most frequent invalid language tags were nah (114866), za (67134), bh (14457), sgs (13974), ltg (10030), cel (1693), enenen (1334), xxx (390)	warning	Metis normalizes many invalid tags already - but not all (its normalization focuses on normalizing valid tags). Ongoing work at https://europeana.atlassian.net/browse/RD-111 Jena RIOT was suggested as a checking method in the past See stats at https://rnd-2.eanadev.org/share/language-normalisation/LanguageDataReport.html What cannot be normalized could be flagged as potentially invalid. The severity in the report could be "Error" considering that this is a case of invalidity. Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.azn3kbomh4ek	Basic Retrieval; Cross-language recall; Improved language facets
34	7. Dependency to external resources
35	P22	Swapped isShownBy and isShownAt values	the isShownBy and isShownAt values are mixed; this causes an error in processing files for Europeana service and blocks the user from finding the institution's website	Example 1 Example 2	<edm:isShownAt rdf:resource="http://website.com/image.jpg"/> <edm:isShownBy rdf:resource="http://website.com/image-information.html"/>	<edm:isShownAt rdf:resource="http://website.com/image-information.html"/> <edm:isShownBy rdf:resource="http://website.com/image.jpg"/>	Yes if we check the edm:isShownAt for specific mime types (e.g. existence of .jpg). Example queries: jpg, jpeg, png, PDF, pdf mp3, gif, mp4...	warning	Except for embedding cases	Lowers content tier
36	8. Serialization / format / encoding
37	P25	Field should be literal but is URL	For some fields we only allow literal values, even if there are URIs or URLs available on your side.	Example 1	<dc:title>http://dx.doi.org/10.1080/00905990701368738</dc:title>	<dc:title xml:lang="en">Crystallizing and Emancipating Identities in Post-Communist Estonia</dc:title>	Yes, using e.g. IRI validation in RDF validators	Error	Datasets with http in the title: https://www.europeana.eu/api/v2/search.json?query=proxy_dc_title:(http)&rows=0&start=1&facet=edm_datasetName&profile=facets&f.edm_datasetName.facet.limit=1000&wskey=tbc [AI: 12-04-2023: some issues for contextual classes come from mappings and could be fixed via them] Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.nbmjo5zhye16	Basic Retrieval Results in field treated as local-file URL
38	P27	Many URLs in one field	When multiple URLs are given in one field, it will be read as one link, which will fail. All values should be given in separate fields	Example 1 (in edm:isShownBy)	<edm:isShownBy rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_001.tif; http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_002.tif;"/>	<edm:isShownBy rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_001.tif"/> <edm:hasView rdf:resource="http://foodanddrink.image.ntua.gr/image/DianthaOs/00010362_002.tif"/>	Yes, using e.g. IRI validation in RDF validators	Error	Each record can have only 1 isShownBy, however any additional views of the object can be given in edm:hasView, which can be repeated. [HS: 16-03-2023: it seems that at least for media links the problem is pretty small. Do we have other fields in mind?] Cases of duplicate URL could be corrected	Field cannot be parsed; link cannot be resolved While not a technical problem, such URL has a low chance of being permanent
39	P30	HTML in fields	The use of html is not supported in literal fields of EDM. It makes literals largely unreadable for humans. On the other hand, <br/> is needed for poetry or lyrics but the portal doesn't handle line breaks.	Example 1 Example 2 Example 3	<dc:description>Ta gjerne med borna på tur langs Rallarvegen. <br/> <p><b><span>Borna sine turreglar</span></b></p> </dc:description>	<dc:description>Ta gjerne med borna på tur langs Rallarvegen. Borna sine turreglar</dc:description>	Yes (using pattern matching)	Warning	HTML is in many cases website specific, and may not belong in the source XML of data Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45	Basic Retrieval Can create false positives in search; display unreadable
40	P31	Schematic data within fields	During a mapping from one format to another, the schematic data of the original mark up is incorporated as value into EDM	Example 1 (for the title in Swedish) Example 2 (for ISBD notation) Example 3 (for escaped HTML link)	<dc:title>{"danish"=>["Vester Sakskærsgård"]}</dc:title> <dc:title xml:lang="ro">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837 = Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title> <dc:publisher><a href=\u0022http://www.wydawnictwo.pk.edu.pl/\u0022 target=\u0022_blank\u0022>Wydawnictwo PK</a></dc:publisher>	<dc:title xml:lang="da">Vester Sakskærsgård</dc:title> <dc:title xml:lang="de">Moldau und Wallachei. Romänische oder Wallachische Sprache und Literatur [...] Berlin, 18 Januar 1837</dc:title> <dc:title xml:lang="ro">Moldova și Valahia. Limba și Literatura Românească sau Valahică [...] Berlin, 18 Ianuarie 1837</dc:title> Having only the name of the Publisher would be ok	A good way would be to count the nr of non-text characters contrast the percentage against text characters (control characters used in schemas, e.g <>=/{}[])	Warning	Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45	Basic Retrieval Distorts search metrics; relevant values are cluttered with machine-readable formating
41	P39	Escape and special characters in titles and descriptions	The issue is mostly relevant for translations provided from our data providers	Example 1 Example 2	<dc:title>Patrick\ n\ t\ n\ t Patrick. </dc:title> <dc:title>Exotic visitors for London_x000D_ H H</dc:title>	<dc:title>Patrick Patrick.</dc:title> <dc:title>Exotic visitors for London H H</dc:title>	yes		Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.kmourueotf45 Should we check more fields or only titles and descriptions?
42	P40	Special characters are not represented with the right encoding	Some characters (such as music notations) can be provided using the wrong encoding, for example using HTML code like ♭ instead of the UTF-8 encoding. This relates to the issue of having HTML in the value of metadata fields, though the motivation for the problem is quite different. Sometimes the special character is only represented with a normal letter ('b' for flat sign), losing the original information.	Example 1 (for no encoding), Example 2 (for Unicode U+FFFD character �	<dc:descrption>Karl Sch�nb�ck</dc:descrption>	<dc:descrption>Karl Schönböck</dc:descrption>	yes		Cf https://docs.google.com/document/d/1Y9acb6yUAdZALUKIAMXiHWyu3H8AIDMUJRLbB5sXgPc/edit#heading=h.5b9br1gg5ote
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100