SledgeHammer: UI User Guide

SledgeHammer: UI User’s Guide

User Interface Guide

http://goo.gl/LcVRm1

Metadata Technology North America Inc.

200 Prosperity Drive, Knoxville, TN 37923,

United States of America

Tel:  +1 (865) 245-4542 / Fax: +1 (865) 245-4542

mtna@mtna.us | http://www.mtna.us

Table of Contents

About this Guide

Getting Started

Download from Website

Installation

Data

Configuration

Input Tab

Text Data Tab

Metadata Tab

Statistics Tab

Scripts Tab

Forge

Support & Feedback

Licensing

Community Edition

Professional/Enterprise Edition

About this Guide

This user’s guide covers the UI (user interface) version of SledgeHammer RC1 (Release Candidate 1). It will review the general overview of the UI features as well as the functionalities of the SledgeHammer. However, for more technical information and the command line flavor, please refer to the detailed OpenDF SledgeHammer: Technical User Guide.

Furthermore, video tutorials on these SledgeHammer topics are found on our website: http://www.mtna.us/sledgehammer 

Getting Started

The SledgeHammer focuses on data conversion and metadata production. It is part of a larger suite of tools called OpenDataForge (OpenDF), developed by Metadata Technology North America for facilitating the transformation and processing of statistical data and the production and use of metadata specifications such as the DDI or SDMX. Other OpenDF tools include the Caelum ( complements SledgeHammer by providing a free tool for generating report and other documents off the generated metadata) and Asmurex.

The SledgeHammer software provides the following features:

  • Reading data and extracting metadata from various data sources. Supported formats include Stata, SPSS, SAS Syntax+ASCII, DDI+ASCII, Stat/Transfer+ASCII
  • Writing data in ASCII text format (fixed, csv, delimited), with various optimization options
  • Producing standard metadata from input files. Supported specifications include  DDI-Codebook (1.0-2.1 and 2.5), DDI LifeCycle (3.1, 3.2), and Triple-SSS (1.1, 2.0)
  • Computing summary statistics at the variable and category levels for inclusion in DDI or other purpose. This include min, max, mean, standard deviation, variance, missing count, weighted/unweighted frequencies
  • Generating scripts for reading ASCII data into various platforms. Supported packages include R, SAS, SPSS, Stata, and various flavours of SQL (MS-SQL, MySql, Oracle, HSQL, PostgreSQL, MonetDB). The database scripts can include the creation of the database schema (for hosting the data and lookup tables) along with bulk loading of the ASCII data.

Additional features under development or planned include:

  • Creating data subsets through variable selection and record filtering
  • Computing new variables (recoded, derived)
  • Generating synthetic data
  • Supporting additional input/output formats/packages
  • Pushing data into cloud platforms such as Google Fusion
  • Statistical Disclosure Control

As a Java based product, SledgeHammer is equally available on Microsoft, Mac or Linux operating systems.

Download from Website

Download SledgeHammer from our website (http://www.mtna.us/sledgehammer). For new users, it is encouraged to test the free Community Edition, which contains all functionalities of the Professional Edition but with a cap on variable and records quantity.

Installation

For Mac, download the zip file from the website. Save and extract to a preferred location, like the Applications folder. Open the extracted folder and double click the SledgeHammer icon to start.

For Windows run the downloaded installer to create a desktop icon for you to initiate SledgeHammer.

If you have retrieved the generic ZIP package, run the script or application of your choice.

  • Be sure to have downloaded Java 1.6 or higher on your computer. If you are unsure if you have Java, or you do not know what version you have, see the Installation section.
  • Extract SledgeHammer to a folder of your choice, but we recommend
  • Under Windows: C:\DataForge\SledgeHammer
  • Under Mac/Linux: ~/dataforge/sledgehammer
  • On Linux/Mac system, you may need to make the sledgehammer script executable, for example by using the “chmod +x ~/dataforge/sledgehammer/sledgehammer

If you are still experiencing Java-related issues, please review these troubleshooting tips.

Data

The file we will be using for examples and walkthrough is the Stata version of the American National Election Study 1948. It can be found in the sledgehammer/data directory or alternatively retrieved from the ANES Data Center. The file name is NES1948.dta.

More sample data can be found in the SledgeHammer Showcase.

Configuration

The community edition of SledgeHammer is the default upon initial download and installation.

If you have registered or have purchased a license, you will receive a key file via email.

Installing your key is quite easy.

Simply go to settings and click the licensing tab.

Browse to select the license key file and click Install.

Notice that the application will restart to activate the  license, and prompt you to agree to the relevant terms.

The SledgeHammer application opens up to the welcome page, where users click on the icon associated with their license to continue to the input tab. Users can also click the continue button at the bottom right corner to access the input tab.

Click the ‘do not show again’ box at the bottom left corner to hide this page upon subsequent running of the SledgeHammer.  you can bring it back by changing the options in the settings menu.

Input Tab

To add single files to the SledgeHammer, simply drag and drop into the white box.

*(See section below for combination file import)

You can also click the ‘Select from Disk’ button to navigate to the file location.

Users can also select recently-used files by clicking on the selection button at the bottom of the page.

*Uploading ASCII Text + Syntax files : To upload a combination of ASCII text data file with SAS syntax, StatTransfer syntax, DDI-C or DDI-L (.xml), select BOTH files and drag and drop. Or when selecting from disk, select both files and press ‘open’.

Once the file(s) is uploaded, the Input Tab displays the features of the input file, like the name, format, and the content.

It also displays the data in two tabular formats, either in the data dictionary style or the data preview.

Data Preview table

By clicking within a cell of the Codes column, users can see the associated code values.

If at anytime you want to reset SledgeHammer, for example to upload a different file, click the “Reset” button in top-right corner. However, resetting SledgeHammer means any work done thus far will be lost.

Text Data Tab

In order to provide users a way to access the data held in studies and manipulate it in a variety of ways, SledgeHammer implements an ASCII writer. The ASCII writer allows users the choice to export data to Fixed, CSV, or Tab Delimited files. Along with basic data exports, the ASCII writer allows users the ability to export summary statistics to CSV files. Also if users are interested in creating a database schema using the script generators, the ASCII writer can create CSV files with appropriate null values to be used in a bulk import for a specific database.

Save As Options

Users can auto-generate the file name or specify.

Optimization Options

Full Optimization - This will calculate the smallest possible width for each variable. If a variable has values that are too big for the specified width, the width will be altered to be large enough to contain the value. In addition to increasing the width, this option will also reduce the width if the specified width is unnecessarily large. This serves to reduce the amount of whitespace in the ASCII file which results in a smaller file. Full Optimization is the default optimization.

Auto-Adjust - This is similar to Full Optimization in that it will increase the width of a variable if the specified width is too small to contain the data values in the variable. However, it will not reduce the size of unnecessarily large widths.

Report-Only - This serves as a way for users to trust the specified widths in the proprietary file. If the ASCII Writer comes across any values that cannot be displayed in the width specified it will replace them with an *.

None - If users do not want to adjust any of the widths they can choose to do no optimization. This should be done carefully, due to the fact that an error will occur if a data value is found that cannot fit in a variables width, which will cause the program to crash if not handled appropriately. SledgeHammer takes this into consideration and provides options to work around this should an error be thrown.

If you only want to export data files (without metadata or statistics), you can click the Forge button at this point. This is dependent on the specifics of your particular project. If you would like to continue and add metadata, statistics, and scripts, review each tab and click the Forge button at the end (as we will do for the example in this User’s Guide).

Metadata Tab

One of the core motivations behind DataForge and SledgeHammer is to provide a tools to producers and users fostering and facilitating the adoption and use of international metadata standards, in particular the Data Documentation Initiative (DDI).

DDI-XML is a starting point for other operations, many of which are provided by DataForge. It is internationally recognized as a best practice for the management of statistical data and used by data producers, archives, centers, and researchers around the globe. DDI can also be used with other tools such as Nesstar, the IHSN Microdata Management Toolkit or NADA, or Colectica.

In addition to DDI, SledgeHammer provide support for the Triple-S XML, a simple specification for describing surveys and variables.

Users have the option to generate DDI-Codebook and/or DDI-Lifecycle metadata depending on the project’s need. More information about the DDI standard and its two development lines can be found here: http://www.ddialliance.org/Specification/

For the purposes of this guide, we chose to generate DDI-Codebook 2.5, DDI-Lifecycle 3.2 RP, DDI-Lifecycle 3.2 SU, and Triple S 2.0.

DDI-Codebook 2.5

Users can auto-generate or specify the file name to be saved as. Furthermore, to be DDI compliant, users need to include agency and version numbers. The statistics to be included in the XML have to first be chosen in the Statistics Tab (see next section), where you specify the variables to generate the summary statistics for.

DDI-LifeCycle 3.2

ddi_32rp.png

32_su.png

Similar to the above DDI-Codebook, users can auto-generate or specify the file name to be saved as and also include agency and version numbers. More Lifecycle-specific elements are included. Users can generate a Study Unit (SU) or Resource Package (RP) .  A study unite is a DDI instance that contains information about the actual study, like the universe. A resource package is a DDI instance which structures materials for publication that are intended to be reused by multiple studies, projects, or communities of users.

Triple S 2.0

Triple-S is a simple XML standard used to document survey and variable data.

Add More

To generate other metadata, click the ‘plus’/add icon and choose from the dropdown menu.

Statistics Tab

SledgeHammer has the ability to compute summary statistics from provided data files. These summary statistics can then optionally be stored in DDI-XML metadata.

The Professional edition of SledgeHammer also provide the options to:

  • Directly export summary statistics to ASCII CSV file for further processing, analysis, or other documentation purposes
  • Compute weighted summary statistics
  • Generate multiple sets (i.e unweighted, weighted)

Users can select specific statistics or all of them by using the selection arrows in the middle.

Select the variables to generate the summary statistics for. To do so, click in the field and type in the variable name. Press the ‘enter’ key or outside of the field to commit. Add more variables by repeating in the fields below.

Scripts Tab

Some users may only have access to one statistical package to do their research. So what are they to do if they find a study they want to look at that is written in another format? SledgeHammer seeks not only to be a convenient and effective way of documenting statistical studies, but also of transferring data and metadata from one format to another. In order to facilitate this, SledgeHammer implements a number of different script generators. The scripts produced by the generators, when combined with ASCII data can rebuild the study in other formats.

In our example, the study was created in Stata, NES1948.dta. Let’s say you only have access to SPSS, and you would like to create a script to recreate the study in SPSS. This is easily accomplished using the SledgeHammer.

scripts_tab.png

In the Scripts Tab, users have the option to generate several types of scripts, including R, SAS, SPSS, Stata, Mathematica, StatTransfer, and various flavours of SQL (MS-SQL, MySql, Oracle, HSQL, PostgreSQL, MonetDB etc). For this example, we chose to convert to SAS and SPSS, as well as generate scripts for MySQL and Google Big Query. For ease of the user, the logo or icon of the respective script is shown.

script_nodes.png

SAS Script

Auto-generate or specify file name.

SPSS Script

Auto-generate or specify file name.

MySQL

mysql.png

Auto-generate or specify file name, database name, and table name. Denote columns per table.

Google Big Query

Auto-generate or specify file name. Choose flavor between Big Query or Big Query Apps.

Add More Scripts

scripts_popup.png

To generate another script, click the ‘plus’/add icon and choose from the dropdown menu.

Forge

When all of the necessary commands are made in each generator tab, click the ‘Forge’ button. A popup box will prompt you to browse for the destinator directory in which to save the output files. It is automatically saved in an output file contained within the file holding the input file(s) unless specified.

forge1.png

forge2.png

Files are saved in the designed folder.

Support & Feedback

  • Community driven support is available through the OpenMetadata Google group at https://groups.google.com/forum/?fromgroups#!forum/openmetadata 
  • We want DataForge to be a community driven program and strongly encourage you send ideas, suggestion or feedback to dataforge@mtna.us.
  • Commercial support is available upon request and will be included in the Professional and Enterprise editions of SledgeHammer. Contact us at dataforge@mtna.us for more information around our data/metadata management services and support options.
  • Data/Metadata management and processing services are also available through our Open Data Packaging Services (http://www.mtna.us/odps)

Licensing

Community Edition

See http://goo.gl/uNjxT

Professional/Enterprise Edition

Under development, currently in private beta testing. Contact dataforge@mtna.us for further information.