DLF Metadata Assessment Working Group - Tools & Tools Documentation supplement
Comments
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
$
%
123
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ABCDEFGHIJKLMNOPQRST
1
Reviewer sign upLit Review IDAssessment groupingTool NameDesigned for:TypeURLAbstractOtherTool Creator/MaintainerSource code / download URLDocumentation URLGUICLIFree?
OSS or proprietary
Written in...
2
Kathryn GMA-001Interface (GUI) tools designed for assessing metadataOpenRefineefficiency and assessment across large datasetstoolhttp://openrefine.org/OpenRefine is a free, open source data normalization and reconciliation tool that runs locally in a web browser. Can work with large sets of data, but does best processing <100k lines at a time. Users can utilize faceted search and browsing to identify similar data, or rely on the built-in, super-charged algorithms that suggest ‘clusters’ of data that OpenRefine thinks can be normalized to a single value (including suggesting the ‘best’ value based on relevancies defined in the algorithm). Very useful for assessing and migrating legacy metadata from different systems, and plays well with lots of standard data storage formats (CSV and other *-delimited files, RDF, XML, JSON, etc). Advanced users can explore OpenRefine as a tool for linking existing data to external sources (eg FreeBase) or normalizing data using programming languages for complex queries. Relatively short learning curve for ‘basic’ level of usage - common actions have built in buttons, pretty intuitive navigation and design, and import/export is very simple. openrefine.org provides easy-to-understand video tutorials, in addition to text-based documentationDocumentation Notes: openrefine.org provides easy-to-understand video tutorials, in addition to text-based documentation. Also have a documentation wiki here: https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users
Source link: https://github.com/OpenRefine/OpenRefine
community-maintained (http://openrefine.org/community.html)https://github.com/OpenRefine/OpenRefinehttps://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-UsersynyOSSJava
3
Anna NeatrourMA-002Interface (GUI) tools designed for assessing metadataDPLA Aggregation toolsassessing metadatatools package
https://github.com/ncdhc/dpla-aggregation-tools
This set of tools provides a way of visually browsing metadata from OAI-PMH feeds, with the option to check for values in required fields. Data is displayed in grids, allowing a user to more effectively assess an entire set/collection. Can be particularly useful for people who would like to assess the metadata available over OAI-PMH but who are not comfortable reviewing XML. While the tools are set to review simple dublin core and a set of required fields that applies to NCDHC, this can be modified by changing the code to review a qualified dublin core OAI-PMH feed, and the setting for required fields can also be adjusted. At the University of Utah, we are using these tools (modified by the Mountain West Digital Library) to assess mappings and required field values for legacy collections. North Carolina Digital Heritage Centerhttps://github.com/ncdhc/dpla-aggregation-toolshttps://github.com/ncdhc/dpla-aggregation-tools/wikiyyOSS
4
Rachel T.MA-003Interface (GUI) tools designed for statistical computing and which could be used for assessing metadataSPSSstatistical computingtool
http://www-01.ibm.com/software/analytics/spss/
Statistical analysis tool widely used in the social sciences, commercially available from IBM. Useful for identifying meaningful relationships between variables.IBMynnproprietaryJava
5
Kathryn GMA-004Interface (GUI) tools designed for assessing metadataLODrefineefficiency and assessment across large datasetstool
https://github.com/sparkica/LODRefine
While still operational, this tool longer supported/maintained. This is a LOD-enabled version of OpenRefine. It builds off OpenRefine version 2.5 with integrated extensions that make transition from tabular data to Linked Data more streamlined. It was built by a post-doc student at the now-defunt DERI institue, and makes heavy use of the DERI RDF extension. Last updated in 2013. Documentation URL not active.Information on installing LODRefine or OpenRefine with the DERI extension: https://github.com/LODLAM/LODLAMTO16/blob/master/OpenRefine_Tutorial/Installation/README.mdMateja Verlic (@sparkica)https://github.com/sparkica/LODRefine[url inactive] http://code.zemanta.com/sparkicaynyOSSJava
6
Rachel T.MA-005Command-line scripts designed for assessing metadataMetadata Breakersassessing metadatastand-alone script
https://github.com/vphill/metadata_breakers
This Python script allows you to parse digital library metadata exposed in an OAI-PMH repository. The data comes in as OAI, and Metadata Breakers provides flexible options for outputting the data in a format that other tools can easily use it. More detailed explanation and examples of how the tool could be used are found in a 2013 Code4Lib article http://journal.code4lib.org/articles/7818Mark PhillipsyOSSPython
7
MA-006Command-line scripts designed for assessing metadataCompleteness rating in Europeanaassessing metadatatool
https://docs.google.com/document/d/1Henbc0lQ3gerNoWUd5DcPnNq4YxOxDW5SQ7g4f26Py0/edit#heading=h.l2fg46yn5tej
This Java program assigns point-based values to “score” individual metadata records for completeness and assumed “information value” [attractiveness?] to humans. The score is used to increase the visibility of best record in the Europeana portal by boosting their ranking. Logic for points awarded to a record is laid out in supporting documentation. Note from Borys Omelayenko (in-line comment via github): “It gives rank from 0 to 10 for a record, that consists of two parts: up to 5 points for tags with values (potentially) coming from controlled vocabularies, and up to 5 points for free-text fields.” Notes from Borys Omelayenko (in-line comment via github):
“It gives rank from 0 to 10 for a record, that consists of two parts: up to 5 points for tags with values (potentially) coming from controlled vocabularies, and up to 5 points for free-text fields.” (line 27-52)
Hugo Manguinhas (editor)https://github.com/europeana/uim-europeana/blob/master/workflow_plugins/europeana-uim-plugin-enrichment/src/main/java/eu/europeana/uim/enrichment/utils/RecordCompletenessRanking.javaJava
8
Rachel T.MA-007OtherTableaustatistical computing and graphics toolhttp://www.tableau.com/ Tableau is a popular commercial tool for data analysis and visualization, designed to be usable for people without programming skills and often used in business settings for market analytics. It can intake multiple types of data (spreadsheets, databases, and web data) and allow users to design a dashboard of visualizations from that data. Tableau Softwareyno, but scripts can be added
Tableau Public Free is free, but projects must be saved to public.tableau.com
proprietary
9
Andrea L.MA-008Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadataRstatistical computing and graphics programming language or libraryhttps://www.r-project.org/R is a (no cost) statistical computing software "environment" that can be used for data analysis and displaying it graphcially. Runs on UNIX, FreeBSD, Linux, Windows, and MacOS.The R FoundationR
10
MA-009Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadataR Studiointegrated development environment (IDE)programming language or libraryhttps://www.rstudio.com/Free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphicshttps://support.rstudio.com/hc/en-usyyOSS
11
MA-010Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadataD3data visualizationprogramming language or libraryhttps://d3js.org/D3 is a JavaScript library for visualizing data with HTML, SVG, and CSS.Mike Bostock https://github.com/mbostockhttps://github.com/d3/d3https://github.com/d3/d3/wikiJavaScript
12
MA-011Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadataPlot.lydata visualizationprogramming language or libraryhttps://plot.ly/An online analytics and data visualization toolPlotlyhttps://github.com/d3/d3/wikiny/n (free for Python, R, and Matlab)both (OSS for Python, R, and Matlab)
13
Sara R.MA-012Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadataAnaconda distribution of Python efficiency and assessment across large datasetsprogramming language or libraryhttps://www.python.org/Python is a widely-used programming language. The Anaconda distribution of Python comes bundled with packages useful for metadata assessment, including data analysis and visualization libraries (e.g., scikit-learn, pandas, NumPy, SciPy, NLTK, matplotlib), as well as the Jupyter (IPython) notebook interactive computational environment.Continuum Analyticshttps://www.continuum.io/downloadshttps://docs.continuum.io/anaconda/indexnyyOSSPython
14
Sara R.MA-013Command-line resources (languages, libraries, and interfaces) that can be used for assessing metadataPython pandasefficiency and assessment across large datasetsprogramming language or libraryhttp://pandas.pydata.org/Python library for analyzing data. It is available as a standalone download or as part of the “Anaconda distribution of Python” (see above.)Wes McKinney (creator);
Python for Data community (maintainer)
http://pandas.pydata.org/pandas-docs/stable/http://pandas.pydata.org/pandas-docs/stable/nnyOSSPython
15
Laura A. (?)MA-014Infrastructure tools that make data processing more efficient and allow for assessment across large datasetsApache Sparkefficiency and assessment across large datasetscomputing frameworkhttp://spark.apache.org/A fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It's an open source cluster computing framework.Apache Software Foundation
16
MA-015Infrastructure tools that make data processing more efficient and allow for assessment across large datasetsHadoopefficiency and assessment across large datasetscomputing frameworkhttps://hadoop.apache.org/Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. (summary from www.sas.com/en_us/insights/big-data/hadoop.html)Apache Software Foundation
17
MA-016eCommons Metadatasharing and testingdataset
https://github.com/cmh2166/eCommonsMetadata
Review of the eCommons DSpace metadata as of Wednesday, February 3rd, 2016.Christina Harlow
18
Andrea L.MA-017OtherGoogle Analyticsbusiness intelligencetoolhttps://analytics.google.com/Offers event tracking to discover which links are clicked and files downloaded. Track how the search feature is used. APIs, such as the Google Tag Manager, can extract data into various formats. Customize dimensions and metrics to collect data on specific metadata fields. Can be used with R to extract data.Google related question:
19
Rachel T.MA-018DatasetDigital Public Library of America: Bulk Metadata Download Feb 2015sharing and testingdataset
http://digital.library.unt.edu/ark:/67531/metadc502991/
Dataset containing metadata (~8 million records) contributed to the Digital Public Library of America (DPLA) and normalized into their internal format. This provides and easy, ready-to-download example of DPLA data for testing and experiementation. The full DPLA dataset can also be accessed directly from DPLA. Mark Phillips
20
Laura A.MA-019DatasetUNT Libraries Metadata Edit Datasetsharing and testingdataset
http://digital.library.unt.edu/ark:/67531/metadc304852/
This dataset contains data samples from metadata records (1,193,814 samples per file) extracted from the UNT Libraries' Digital Collections. It contains one sample per metadata record version in the system with aggregate counts of fields and also hash values of an element as well. Data was collected in March 2014 with dates from May 19, 2004 to February 4, 2014.
21
Kathryn GMA-020DatasetInternet Archive Dataset Collectionsharing and testingdatasethttps://archive.org/details/datasetsThe Dataset Collection is an aggregation resource for large data archives from both organizations/sites and individuals. It is accessible via API (vs. bulk download).
Internet Archive for collection. Data sets are provided by individuals or organizations.
each data set has a unique URLeach data set has a unique URLy (website)yN/A
22
Conal TMA-021Interface (GUI) tools designed for assessing metadataGadget
efficiency and assessment across large datasets
tool
https://github.com/Conal-Tuohy/SIMILE-Gadget/wiki
Gadget is an XML inspector designed to create useful summaries of large XML datasets. It generates sparklines, and displays frequencies of values clustered by XPaths.
Created by Stefano Mazzocchi for MIT SIMILE project
https://github.com/Conal-Tuohy/SIMILE-Gadget
https://github.com/Conal-Tuohy/SIMILE-Gadget/wikiy (website)yyOSSJava
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Loading...
 
 
 
Tools_overview
PieCharts
VOCAB