suavelogo3-small.png

Survey Formatting Guidelines

SuAVE takes CSV files as input. As a data file is ingested in SuAVE, the column names are converted into the survey questions (“variables”)  and the entries below each question correspond to the responses. Survey questions and responses must be properly formatted in order to be functional in the SuAVE user interface.

Note 1.  These guidelines are not about designing surveys. A survey design tutorial can be found, for example, at http://www.statpac.com/surveys/,

Table of Contents:

 Formatting Column Names (Variables)

        Example

Column type qualifiers

Reserved column names

        Length of names

        Grouping

        Distinction within Groups

Formatting Responses

        Non-numeric Values

        Blank Answers (missing values)

Note 2. If you have non-English characters in your original Excel file, you would need your CSV file exported from Excel to be encoded in UTF-8. The simplest method is to copy the content of your Excel spreadsheet and paste it into Google Sheets (as values), then use File-Download As - CSV to save the UTF-8 encoded file, before importing into SuAVE. More information is at  http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding 

FORMATTING COLUMN NAMES (VARIABLES)

Example

Here is an example of how CSV column names translate to variables within SuAVE.

CSV:

a6abdf1600.png

SuAVE:

67f5fab7cd.png

Column Type Qualifiers

Notice that in the CSV example above, a #number qualifier is appended to the popid column name. A list of supported qualifiers includes:

Qualifier

Usage

Example

#number

Values of a number variable will be shown as a histogram and a slider in the filter panel on the left

Latitude#number

#date

Values of a date variable will be shown to support filtering by dates, in the filter panel

Date of Birth#date

#long

Values of long variables, such as item descriptions or responses to open ended questions, will not be listed in the filter panel. Instead, the filter panel will include a search box for such variables at the top of the panel

Description#long;
What are your suggestions#long

#link

In the information panel on the right, this variable will be shown as a link to an external resource

Company URL#link;
Click to edit#link

#ordinal

Similar to #number; these values will be shown as a special type of histogram in the filter panel. The values are expected in the form “1 Strongly Agree”, “2 Agree”, “3 Neutral”, “4 Disagree”, etc. This will force values to be ordered in the main display.

Persons in household#ordinal

(eg with responses “1 One”, “2 2-3”, “3 3-5”, “4 6-10”, “5 11-15”, “6 16 or more”)

#textlocation

Expects a well-formatted address that will be geocoded on the fly

Street Address#textlocation

#hidden

This variable won’t appear in the filter panel or in the sorting dropdown list

Longitude#number#hidden

Reserved column names

Column name

Usage

Example

#name

The title of an item, as it will appear at the top of the information panel on the right. This can be defined through the authoring interface

#name

#href

URL to be invoked as user clicks on the title

#href

#img

Name of a deepzoom image structure associated with a given record. This can be defined through the authoring interface

#img

Latitude

If any part of variable name contains Latitude, it will be treated as a geographic coordinate (expected in decimal degrees)

north.latitude

Longitude

If any part of variable name contains Longitude, it will be treated as a geographic coordinate (expected in decimal degrees)

Longitude

Length of names

Variable names should be able to fit on one line. Anything longer than 40-50 characters should be abbreviated as much as possible. While hovering over a name will reveal a longer string, shortening the names is a preferred strategies

Common abbreviations (i.e. changing “not applicable” to “N/A”) and other creative ways of shortening the variable names and values without sacrificing the meaning are very helpful.

Grouping

If the dataset has questions in groups (i.e. a group for demographics questions and a group for household items questions), the question can be preceded by a number.

For example, the demographics questions could be preceded by a “1” and the household items by a “2,” turning this:

Neighborhood
Gender
Has a refrigerator
Has a blender


into this:

1 Neighborhood
1 Gender
2 Has a refrigerator
2 Has a blender

The variable names should be descriptive enough to be interpretable from outside of their question group.

Distinction within Groups

The first 25-30 characters should be distinct from other questions of the same question group.

For example, these three variable names all start with the same series of characters:

1 If unemployed, reason for unemployment, respondent can't find work
1 If unemployed, reason for unemployment, respondent does volunteer work
1 If unemployed, reason for unemployment, respondent is a student


When users see them on the interface, they will only see the first 25-30 characters followed by an ellipsis (...) and will need to mouseover to see the rest of the question. Selecting such questions from a dropdown would be problematic. It is suggested to shortening them to 40-50 characters like the following:

1 CAN’T FIND WORK as reason for unemployment
1 VOLUNTEER WORK as reason for unemployment
1 STUDENT as reason for unemployment


Capitalization can also be used to stress key words in questions within a group.

FORMATTING RESPONSES

Non-numeric Values

Non-numeric values should have text rather than a code. For example, “1 Strongly disagree” to “5 Strongly agree” is more helpful than “1” to “5.”

To provide another example, if you examine the responses to “Difficulty doing work/activities” and one of the values is “5,” this is not very meaningful. Adding numerals such as “1 No difficulty,” “2 Some difficulty,” and so on would make more sense.

Ordinal values should be preceded by a numeral. For example, entries that answer “How would you describe your health?” would be organized alphabetically in SuAVE like so:

Good

Poor

So/so

Very Good

Very Poor

They should instead be written as:

1 Very Poor

2 Poor

3 So/so

4 Good

5 Very Good

This makes the histograms more meaningful, because this prevents them from being ordered alphabetically.

Blank Answers (missing values)

When there is no response, it is best to leave that entry blank. SuAVE has an option to show or hide missing values.