| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1  | The current document represents the detailed test results from an initial round of testing the Automated Topic Detection tool developed in cooperation between the Archives Portal Europe Foundation and King's College London between May and September 2020 as a proof-of-concept. You can find the following information in the different tabs of this document: | |||||||||||||||||||||||||
2  | ||||||||||||||||||||||||||
3  | 1. Tab "Overview_Searches_SearchResults" | |||||||||||||||||||||||||
4  | This tab provides the most general overview of the test searches conducted including the search terms used, the topics to which these search terms relate, the languages of the search terms, and the type of search applied (entity or concept). It then continues to provide the number of search results returned, the number of search results already tagged with the topic in question as well as the number of search results tagged with a different topic, and the number and overall percentage of differently tagged results that were found to still be relevant to the topic of interest. | |||||||||||||||||||||||||
5  | ||||||||||||||||||||||||||
6  | 2. Tab "Overview_RelevantSearchResults_Top10" | |||||||||||||||||||||||||
7  | This tab follows up on the first tab and concentrates on the differently tagged results and the checks conducted to confirm if they were also relevant for the topic of interest. While tab 1 gives the percentage of differently tagged, but relevant results in relation to the total number of search results returned, this tab gives the percentage in relation to the number of search results checked. For each topic different from the one of the search, we checked up to the first 10 results. In addition, tab 2 also provides the search terms used, the topics to which these search terms relate, the languages of the search terms, and the type of search applied (entity or concept).  | |||||||||||||||||||||||||
8  | ||||||||||||||||||||||||||
9  | 3. Tab "TopicsOfSearchResults_Entities" | |||||||||||||||||||||||||
10  | This tab looks at the entity type searches only, again providing the search terms used, the topics to which these search terms relate, and the number of search results returned. It then continues by breaking down this number by the nine topics covered by the test data set. The topic of interest is indicated by "n/a" while other topics will have number allocated to them between 0 and 100 (depending on the type of search and the spread of the search results). If next to the topic of interest all other topics show the number "0", this means that all search results were already tagged with the topic of interest in this specific search case. If all topics show "n/a", this means the search did not return any results. Figures in bold indicate the topic that the majority of results referred to.  Next to this breakdown, there also is a comparison again between the number of results tagged and not tagged with the topic in question.  | |||||||||||||||||||||||||
11  | ||||||||||||||||||||||||||
12  | 4. Tab "TopicsOfSearchResults_Concepts" | |||||||||||||||||||||||||
13  | This tab looks at the cnocept type searches only, again providing the search terms used, the topics to which these search terms relate, and the number of search results returned. It then continues by breaking down this number by the nine topics covered by the test data set. The topic of interest is indicated by "n/a" while other topics will have a number allocated to them between 0 and 100 (depending on the type of search and the spread of the search results). If next to the topic of interest all other topics show the number "0", this means that all search results were already tagged with the topic of interest in this specific search case. If all topics show "n/a", this means the search did not return any results. Figures in bold indicate the topic that the majority of results referred to.  Next to this breakdown, there also is a comparison again between the number of results tagged and not tagged with the topic in question.  | |||||||||||||||||||||||||
14  | ||||||||||||||||||||||||||
15  | 5. Tab "LanguagesOfSearchResults" | |||||||||||||||||||||||||
16  | This tab starts with a similar structure as tab 1 providing the search terms used, the topics to which these search terms relate, the languages of the search terms, the type of search applied (entity or concept), and the number of search results returned. It then continues by breaking down this number by the seven languages used for cross-lingual references within the test data set. The language of the search term is indicated by "n/a" while other languages will have a number allocated to them between 0 and 100 (depending on the type of search and the spread of the search results). If next to the language of the search term all other languages show the number "0", this means that all search results were in the language of the search term in this specific search case. If all languages show "n/a", this means the search did not return any results. Furthermore, the language with the highest number of results for each search is indicated by being printed in bold.  | |||||||||||||||||||||||||
17  | ||||||||||||||||||||||||||
18  | 6. Tab "LanguageOfSearch2LanguageOfResults" | |||||||||||||||||||||||||
19  | This tab picks up on the information in tab 6 and again includes the search terms used, the topics to which these search terms relate, the languages of the search terms, and the type of search applied (entity or concept). It then adds the language with the most results for each search and furthermore combines both languages in an abbreviated form to indicate a possible change in language, or cross-lingual coverage (e.g. "Fr2G" represents a search conducted with a search term in French for which the majority of results returned were in German). In addition, this change is also used to create two groups of searches for which the two languages are different ("diff") or the same ("same"). The value "n/a" in this context indicates a search without results. These details have also been summarised at the bottom of the list of all searches to represent: the "Language distribution of search terms", the "Language distribution of search results (by language with most search results)", and the "Changes in language". Furthermore, there is a breakdown per language to show potential changes between the language of the search term and the language of the search results.  | |||||||||||||||||||||||||
20  | ||||||||||||||||||||||||||
21  | 7. Tab "Overview_Searches_MostRelevantWords" | |||||||||||||||||||||||||
22  | This tab concentrates on the suggestions of the top 10 most relevant words that each search has generated. Similar to tab 1, it provides the search terms used, the topics to which these search terms relate, the languages of the search terms, and the type of search applied (entity or concept). It then gives the number of most relevant words (which is either "10" or "n/a", the latter indicating a search that did not return any results) and the number and percentage of these words that indeed did relate to the topic in question. | |||||||||||||||||||||||||
23  | ||||||||||||||||||||||||||
24  | 8. Tab "WordNetwork" | |||||||||||||||||||||||||
25  | This tab provides more details about the top 10 most relevant words. For each search, it gives the search term, the topic to which the search term relates, the language of the search term and the dominant language currently represented in the documents tagged with the topic in question. It then continues to list the top 10 most relevant words for each search, using colour coding to indicate to which topic each of these relevant words belongs. | |||||||||||||||||||||||||
26  | ||||||||||||||||||||||||||
27  | 9. Tab "WordNetwork_ColoursByTopic" | |||||||||||||||||||||||||
28  | This tab is based on the information in tab 8. It creates coloured representations for each of the nine topics that are part of the test data set to indicate to which topics the top 10 most relevant words for each of the searches in the respective topic area refer to. This is meant to provide a direct overview of how well the relevant words match the topic in question. Any empty (or white) cells/rows indicate a search that did not return any results. | |||||||||||||||||||||||||
29  | ||||||||||||||||||||||||||
30  | 10 Tab "WordNetwork_ColoursByLanguage" | |||||||||||||||||||||||||
31  | This tab is based on the information in tab 8 as well. Contrary to tab 9, this tab here, however, looks at an illustration of language representation within the top 10 most relevant words, and at comparing this with the languages currently represented in the documents tagged with each topic. Any empty (or white) cells/rows indicate a search that did not return any results. | |||||||||||||||||||||||||
32  | ||||||||||||||||||||||||||
33  | ||||||||||||||||||||||||||
34  | ||||||||||||||||||||||||||
35  | ||||||||||||||||||||||||||
36  | ||||||||||||||||||||||||||
37  | ||||||||||||||||||||||||||
38  | ||||||||||||||||||||||||||
39  | ||||||||||||||||||||||||||
40  | ||||||||||||||||||||||||||
41  | ||||||||||||||||||||||||||
42  | ||||||||||||||||||||||||||
43  | ||||||||||||||||||||||||||
44  | ||||||||||||||||||||||||||
45  | ||||||||||||||||||||||||||
46  | ||||||||||||||||||||||||||
47  | ||||||||||||||||||||||||||
48  | ||||||||||||||||||||||||||
49  | ||||||||||||||||||||||||||
50  | ||||||||||||||||||||||||||
51  | ||||||||||||||||||||||||||
52  | ||||||||||||||||||||||||||
53  | ||||||||||||||||||||||||||
54  | ||||||||||||||||||||||||||
55  | ||||||||||||||||||||||||||
56  | ||||||||||||||||||||||||||
57  | ||||||||||||||||||||||||||
58  | ||||||||||||||||||||||||||
59  | ||||||||||||||||||||||||||
60  | ||||||||||||||||||||||||||
61  | ||||||||||||||||||||||||||
62  | ||||||||||||||||||||||||||
63  | ||||||||||||||||||||||||||
64  | ||||||||||||||||||||||||||
65  | ||||||||||||||||||||||||||
66  | ||||||||||||||||||||||||||
67  | ||||||||||||||||||||||||||
68  | ||||||||||||||||||||||||||
69  | ||||||||||||||||||||||||||
70  | ||||||||||||||||||||||||||
71  | ||||||||||||||||||||||||||
72  | ||||||||||||||||||||||||||
73  | ||||||||||||||||||||||||||
74  | ||||||||||||||||||||||||||
75  | ||||||||||||||||||||||||||
76  | ||||||||||||||||||||||||||
77  | ||||||||||||||||||||||||||
78  | ||||||||||||||||||||||||||
79  | ||||||||||||||||||||||||||
80  | ||||||||||||||||||||||||||
81  | ||||||||||||||||||||||||||
82  | ||||||||||||||||||||||||||
83  | ||||||||||||||||||||||||||
84  | ||||||||||||||||||||||||||
85  | ||||||||||||||||||||||||||
86  | ||||||||||||||||||||||||||
87  | ||||||||||||||||||||||||||
88  | ||||||||||||||||||||||||||
89  | ||||||||||||||||||||||||||
90  | ||||||||||||||||||||||||||
91  | ||||||||||||||||||||||||||
92  | ||||||||||||||||||||||||||
93  | ||||||||||||||||||||||||||
94  | ||||||||||||||||||||||||||
95  | ||||||||||||||||||||||||||
96  | ||||||||||||||||||||||||||
97  | ||||||||||||||||||||||||||
98  | ||||||||||||||||||||||||||
99  | ||||||||||||||||||||||||||
100  | ||||||||||||||||||||||||||