A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Reviewer sign up | Lit Review ID | Assessment grouping | Tool Name | Designed for: | Type | URL | Abstract | Other | Tool Creator/Maintainer | Source code / download URL | Documentation URL | GUI | CLI | Free? | OSS or proprietary | Written in... | ||||
2 | Anna Neatrour | MA-002 | Interface (GUI) tools designed for assessing metadata | DPLA Aggregation tools | assessing metadata | tools package | https://github.com/ncdhc/dpla-aggregation-tools | This set of tools provides a way of visually browsing metadata from OAI-PMH feeds, with the option to check for values in required fields. Data is displayed in grids, allowing a user to more effectively assess an entire set/collection. Can be particularly useful for people who would like to assess the metadata available over OAI-PMH but who are not comfortable reviewing XML. While the tools are set to review simple dublin core and a set of required fields that applies to NCDHC, this can be modified by changing the code to review a qualified dublin core OAI-PMH feed, and the setting for required fields can also be adjusted. At the University of Utah, we are using these tools (modified by the Mountain West Digital Library) to assess mappings and required field values for legacy collections. | North Carolina Digital Heritage Center | https://github.com/ncdhc/dpla-aggregation-tools | https://github.com/ncdhc/dpla-aggregation-tools/wiki | y | y | OSS | |||||||
3 | Rachel T. | MA-005 | Command-line scripts designed for assessing metadata | Metadata Breakers | assessing metadata | stand-alone script | https://github.com/vphill/metadata_breakers | This Python script allows you to parse digital library metadata exposed in an OAI-PMH repository. The data comes in as OAI, and Metadata Breakers provides flexible options for outputting the data in a format that other tools can easily use it. More detailed explanation and examples of how the tool could be used are found in a 2013 Code4Lib article http://journal.code4lib.org/articles/7818 | Mark Phillips | y | OSS | Python | |||||||||
4 | MA-006 | Command-line scripts designed for assessing metadata | Completeness rating in Europeana | assessing metadata | tool | https://docs.google.com/document/d/1Henbc0lQ3gerNoWUd5DcPnNq4YxOxDW5SQ7g4f26Py0/edit#heading=h.l2fg46yn5tej | This Java program assigns point-based values to “score” individual metadata records for completeness and assumed “information value” [attractiveness?] to humans. The score is used to increase the visibility of best record in the Europeana portal by boosting their ranking. Logic for points awarded to a record is laid out in supporting documentation. Note from Borys Omelayenko (in-line comment via github): “It gives rank from 0 to 10 for a record, that consists of two parts: up to 5 points for tags with values (potentially) coming from controlled vocabularies, and up to 5 points for free-text fields.” | Notes from Borys Omelayenko (in-line comment via github): “It gives rank from 0 to 10 for a record, that consists of two parts: up to 5 points for tags with values (potentially) coming from controlled vocabularies, and up to 5 points for free-text fields.” (line 27-52) | Hugo Manguinhas (editor) | https://github.com/europeana/uim-europeana/blob/master/workflow_plugins/europeana-uim-plugin-enrichment/src/main/java/eu/europeana/uim/enrichment/utils/RecordCompletenessRanking.java | Java | ||||||||||
5 | Andrea L. | MA-017 | Other | Google Analytics | business intelligence | tool | https://analytics.google.com/ | Offers event tracking to discover which links are clicked and files downloaded. Track how the search feature is used. APIs, such as the Google Tag Manager, can extract data into various formats. Customize dimensions and metrics to collect data on specific metadata fields. Can be used with R to extract data. | related question: | ||||||||||||
6 | MA-010 | Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadata | D3 | data visualization | programming language or library | https://d3js.org/ | D3 is a JavaScript library for visualizing data with HTML, SVG, and CSS. | Mike Bostock https://github.com/mbostock | https://github.com/d3/d3 | https://github.com/d3/d3/wiki | JavaScript | ||||||||||
7 | MA-011 | Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadata | Plot.ly | data visualization | programming language or library | https://plot.ly/ | An online analytics and data visualization tool | Plotly | https://github.com/d3/d3/wiki | n | y/n (free for Python, R, and Matlab) | both (OSS for Python, R, and Matlab) | |||||||||
8 | Kathryn G | MA-001 | Interface (GUI) tools designed for assessing metadata | OpenRefine | efficiency and assessment across large datasets | tool | http://openrefine.org/ | OpenRefine is a free, open source data normalization and reconciliation tool that runs locally in a web browser. Can work with large sets of data, but does best processing <100k lines at a time. Users can utilize faceted search and browsing to identify similar data, or rely on the built-in, super-charged algorithms that suggest ‘clusters’ of data that OpenRefine thinks can be normalized to a single value (including suggesting the ‘best’ value based on relevancies defined in the algorithm). Very useful for assessing and migrating legacy metadata from different systems, and plays well with lots of standard data storage formats (CSV and other *-delimited files, RDF, XML, JSON, etc). Advanced users can explore OpenRefine as a tool for linking existing data to external sources (eg FreeBase) or normalizing data using programming languages for complex queries. Relatively short learning curve for ‘basic’ level of usage - common actions have built in buttons, pretty intuitive navigation and design, and import/export is very simple. openrefine.org provides easy-to-understand video tutorials, in addition to text-based documentation | Documentation Notes: openrefine.org provides easy-to-understand video tutorials, in addition to text-based documentation. Also have a documentation wiki here: https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users Source link: https://github.com/OpenRefine/OpenRefine | community-maintained (http://openrefine.org/community.html) | https://github.com/OpenRefine/OpenRefine | https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users | y | n | y | OSS | Java | ||||
9 | Kathryn G | MA-004 | Interface (GUI) tools designed for assessing metadata | LODrefine | efficiency and assessment across large datasets | tool | https://github.com/sparkica/LODRefine | While still operational, this tool longer supported/maintained. This is a LOD-enabled version of OpenRefine. It builds off OpenRefine version 2.5 with integrated extensions that make transition from tabular data to Linked Data more streamlined. It was built by a post-doc student at the now-defunt DERI institue, and makes heavy use of the DERI RDF extension. Last updated in 2013. Documentation URL not active. | Information on installing LODRefine or OpenRefine with the DERI extension: https://github.com/LODLAM/LODLAMTO16/blob/master/OpenRefine_Tutorial/Installation/README.md | Mateja Verlic (@sparkica) | https://github.com/sparkica/LODRefine | [url inactive] http://code.zemanta.com/sparkica | y | n | y | OSS | Java | ||||
10 | Sara R. | MA-012 | Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadata | Anaconda distribution of Python | efficiency and assessment across large datasets | programming language or library | https://www.python.org/ | Python is a widely-used programming language. The Anaconda distribution of Python comes bundled with packages useful for metadata assessment, including data analysis and visualization libraries (e.g., scikit-learn, pandas, NumPy, SciPy, NLTK, matplotlib), as well as the Jupyter (IPython) notebook interactive computational environment. | Continuum Analytics | https://www.continuum.io/downloads | https://docs.continuum.io/anaconda/index | n | y | y | OSS | Python | |||||
11 | Sara R. | MA-013 | Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadata | Python pandas | efficiency and assessment across large datasets | programming language or library | http://pandas.pydata.org/ | Python library for analyzing data. It is available as a standalone download or as part of the “Anaconda distribution of Python” (see above.) | Wes McKinney (creator); Python for Data community (maintainer) | http://pandas.pydata.org/pandas-docs/stable/ | http://pandas.pydata.org/pandas-docs/stable/ | n | n | y | OSS | Python | |||||
12 | Laura A. (?) | MA-014 | Infrastructure tools that make data processing more efficient and allow for assessment across large datasets | Apache Spark | efficiency and assessment across large datasets | computing framework | http://spark.apache.org/ | A fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It's an open source cluster computing framework. | Apache Software Foundation | ||||||||||||
13 | MA-015 | Infrastructure tools that make data processing more efficient and allow for assessment across large datasets | Hadoop | efficiency and assessment across large datasets | computing framework | https://hadoop.apache.org/ | Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. (summary from www.sas.com/en_us/insights/big-data/hadoop.html) | Apache Software Foundation | |||||||||||||
14 | Conal T | MA-021 | Interface (GUI) tools designed for assessing metadata | Gadget | efficiency and assessment across large datasets | tool | https://github.com/Conal-Tuohy/SIMILE-Gadget/wiki | Gadget is an XML inspector designed to create useful summaries of large XML datasets. It generates sparklines, and displays frequencies of values clustered by XPaths. | Created by Stefano Mazzocchi for MIT SIMILE project | https://github.com/Conal-Tuohy/SIMILE-Gadget | https://github.com/Conal-Tuohy/SIMILE-Gadget/wiki | y (website) | y | y | OSS | Java | |||||
15 | MA-009 | Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadata | R Studio | integrated development environment (IDE) | programming language or library | https://www.rstudio.com/ | Free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics | https://support.rstudio.com/hc/en-us | y | y | OSS | ||||||||||
16 | MA-016 | eCommons Metadata | sharing and testing | dataset | https://github.com/cmh2166/eCommonsMetadata | Review of the eCommons DSpace metadata as of Wednesday, February 3rd, 2016. | Christina Harlow | ||||||||||||||
17 | Rachel T. | MA-018 | Dataset | Digital Public Library of America: Bulk Metadata Download Feb 2015 | sharing and testing | dataset | http://digital.library.unt.edu/ark:/67531/metadc502991/ | Dataset containing metadata (~8 million records) contributed to the Digital Public Library of America (DPLA) and normalized into their internal format. This provides and easy, ready-to-download example of DPLA data for testing and experiementation. The full DPLA dataset can also be accessed directly from DPLA. | Mark Phillips | ||||||||||||
18 | Laura A. | MA-019 | Dataset | UNT Libraries Metadata Edit Dataset | sharing and testing | dataset | http://digital.library.unt.edu/ark:/67531/metadc304852/ | This dataset contains data samples from metadata records (1,193,814 samples per file) extracted from the UNT Libraries' Digital Collections. It contains one sample per metadata record version in the system with aggregate counts of fields and also hash values of an element as well. Data was collected in March 2014 with dates from May 19, 2004 to February 4, 2014. | |||||||||||||
19 | Kathryn G | MA-020 | Dataset | Internet Archive Dataset Collection | sharing and testing | dataset | https://archive.org/details/datasets | The Dataset Collection is an aggregation resource for large data archives from both organizations/sites and individuals. It is accessible via API (vs. bulk download). | Internet Archive for collection. Data sets are provided by individuals or organizations. | each data set has a unique URL | each data set has a unique URL | y (website) | y | N/A | |||||||
20 | Rachel T. | MA-003 | Interface (GUI) tools designed for statistical computing and which could be used for assessing metadata | SPSS | statistical computing | tool | http://www-01.ibm.com/software/analytics/spss/ | Statistical analysis tool widely used in the social sciences, commercially available from IBM. Useful for identifying meaningful relationships between variables. | IBM | y | n | n | proprietary | Java | |||||||
21 | Rachel T. | MA-007 | Other | Tableau | statistical computing and graphics | tool | http://www.tableau.com/ | Tableau is a popular commercial tool for data analysis and visualization, designed to be usable for people without programming skills and often used in business settings for market analytics. It can intake multiple types of data (spreadsheets, databases, and web data) and allow users to design a dashboard of visualizations from that data. | Tableau Software | y | no, but scripts can be added | Tableau Public Free is free, but projects must be saved to public.tableau.com | proprietary | ||||||||
22 | Andrea L. | MA-008 | Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadata | R | statistical computing and graphics | programming language or library | https://www.r-project.org/ | R is a (no cost) statistical computing software "environment" that can be used for data analysis and displaying it graphcially. Runs on UNIX, FreeBSD, Linux, Windows, and MacOS. | The R Foundation | R | |||||||||||
23 | |||||||||||||||||||||
24 | |||||||||||||||||||||
25 | |||||||||||||||||||||
26 | |||||||||||||||||||||
27 | |||||||||||||||||||||
28 | |||||||||||||||||||||
29 | |||||||||||||||||||||
30 | |||||||||||||||||||||
31 | |||||||||||||||||||||
32 | |||||||||||||||||||||
33 | |||||||||||||||||||||
34 | |||||||||||||||||||||
35 | |||||||||||||||||||||
36 | |||||||||||||||||||||
37 | |||||||||||||||||||||
38 | |||||||||||||||||||||
39 | |||||||||||||||||||||
40 | |||||||||||||||||||||
41 | |||||||||||||||||||||
42 | |||||||||||||||||||||
43 | |||||||||||||||||||||
44 | |||||||||||||||||||||
45 | |||||||||||||||||||||
46 | |||||||||||||||||||||
47 | |||||||||||||||||||||
48 | |||||||||||||||||||||
49 | |||||||||||||||||||||
50 | |||||||||||||||||||||
51 | |||||||||||||||||||||
52 | |||||||||||||||||||||
53 | |||||||||||||||||||||
54 | |||||||||||||||||||||
55 | |||||||||||||||||||||
56 | |||||||||||||||||||||
57 | |||||||||||||||||||||
58 | |||||||||||||||||||||
59 | |||||||||||||||||||||
60 | |||||||||||||||||||||
61 | |||||||||||||||||||||
62 | |||||||||||||||||||||
63 | |||||||||||||||||||||
64 | |||||||||||||||||||||
65 | |||||||||||||||||||||
66 | |||||||||||||||||||||
67 | |||||||||||||||||||||
68 | |||||||||||||||||||||
69 | |||||||||||||||||||||
70 | |||||||||||||||||||||
71 | |||||||||||||||||||||
72 | |||||||||||||||||||||
73 | |||||||||||||||||||||
74 | |||||||||||||||||||||
75 | |||||||||||||||||||||
76 | |||||||||||||||||||||
77 | |||||||||||||||||||||
78 | |||||||||||||||||||||
79 | |||||||||||||||||||||
80 | |||||||||||||||||||||
81 | |||||||||||||||||||||
82 | |||||||||||||||||||||
83 | |||||||||||||||||||||
84 | |||||||||||||||||||||
85 | |||||||||||||||||||||
86 | |||||||||||||||||||||
87 | |||||||||||||||||||||
88 | |||||||||||||||||||||
89 | |||||||||||||||||||||
90 | |||||||||||||||||||||
91 | |||||||||||||||||||||
92 | |||||||||||||||||||||
93 | |||||||||||||||||||||
94 | |||||||||||||||||||||
95 | |||||||||||||||||||||
96 | |||||||||||||||||||||
97 | |||||||||||||||||||||
98 | |||||||||||||||||||||
99 | |||||||||||||||||||||
100 |