Survey Formatting Guidelines
SuAVE takes CSV files as input. As a data file is ingested in SuAVE, the column names are converted into the survey questions (“variables”) and the entries below each question correspond to the responses. Survey questions and responses must be properly formatted in order to be functional in the SuAVE user interface.
Note 1. These guidelines are not about designing surveys. A survey design tutorial can be found, for example, at http://www.statpac.com/surveys/,
Formatting Column Names (Variables)
Column type qualifiers
Reserved column names
Length of names
Distinction within Groups
Blank Answers (missing values)
Note 2. If you have non-English characters in your original Excel file, you would need your CSV file exported from Excel to be encoded in UTF-8. The simplest method is to copy the content of your Excel spreadsheet and paste it into Google Sheets (as values), then use File-Download As - CSV to save the UTF-8 encoded file, before importing into SuAVE. More information is at http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding
Here is an example of how CSV column names translate to variables within SuAVE.
Notice that in the CSV example above, a #number qualifier is appended to the popid column name. A list of supported qualifiers includes:
Values of a number variable will be shown as a histogram and a slider in the filter panel on the left
Values of a date variable will be shown to support filtering by dates, in the filter panel
Date of Birth#date
Values of long variables, such as item descriptions or responses to open ended questions, will not be listed in the filter panel. Instead, the filter panel will include a search box for such variables at the top of the panel
In the information panel on the right, this variable will be shown as a link to an external resource
Similar to #number; these values will be shown as a special type of histogram in the filter panel. The values are expected in the form “1 Strongly Agree”, “2 Agree”, “3 Neutral”, “4 Disagree”, etc. This will force values to be ordered in the main display.
Persons in household#ordinal
(eg with responses “1 One”, “2 2-3”, “3 3-5”, “4 6-10”, “5 11-15”, “6 16 or more”)
Expects a well-formatted address that will be geocoded on the fly
Allows for multiple values in a single cell (eg in responses to ‘check all that apply” questions). Responses shall be separated by the pipe symbol (“|”)
Description associated with an item, which will be shown at the top of the info panel, but not in the list of facets. There can be only one #info column.
This variable won’t appear in the filter panel or in the sorting dropdown list
The title of an item, as it will appear at the top of the information panel on the right. This can be defined through the authoring interface
URL to be invoked as user clicks on the title
Name of a deepzoom image structure associated with a given record. This can be defined through the authoring interface
If any part of variable name contains Latitude, it will be treated as a geographic coordinate (expected in decimal degrees)
If any part of variable name contains Longitude, it will be treated as a geographic coordinate (expected in decimal degrees)
Variable names should be able to fit on one line. Anything longer than 40-50 characters should be abbreviated as much as possible. While hovering over a name will reveal a longer string, shortening the names is a preferred strategies
Common abbreviations (i.e. changing “not applicable” to “N/A”) and other creative ways of shortening the variable names and values without sacrificing the meaning are very helpful.
If the dataset has questions in groups (i.e. a group for demographics questions and a group for household items questions), the question can be preceded by a number.
For example, the demographics questions could be preceded by a “1” and the household items by a “2,” turning this:
Has a refrigerator
Has a blender
2 Has a refrigerator
2 Has a blender
The variable names should be descriptive enough to be interpretable from outside of their question group.
The first 25-30 characters should be distinct from other questions of the same question group.
For example, these three variable names all start with the same series of characters:
1 If unemployed, reason for unemployment, respondent can't find work
1 If unemployed, reason for unemployment, respondent does volunteer work
1 If unemployed, reason for unemployment, respondent is a student
When users see them on the interface, they will only see the first 25-30 characters followed by an ellipsis (...) and will need to mouseover to see the rest of the question. Selecting such questions from a dropdown would be problematic. It is suggested to shortening them to 40-50 characters like the following:
1 CAN’T FIND WORK as reason for unemployment
1 VOLUNTEER WORK as reason for unemployment
1 STUDENT as reason for unemployment
Capitalization can also be used to stress key words in questions within a group.
Non-numeric values should have text rather than a code. For example, “1 Strongly disagree” to “5 Strongly agree” is more helpful than “1” to “5.”
To provide another example, if you examine the responses to “Difficulty doing work/activities” and one of the values is “5,” this is not very meaningful. Adding numerals such as “1 No difficulty,” “2 Some difficulty,” and so on would make more sense.
Ordinal values should be preceded by a numeral. For example, entries that answer “How would you describe your health?” would be organized alphabetically in SuAVE like so:
They should instead be written as:
1 Very Poor
5 Very Good
This makes the histograms more meaningful, because this prevents them from being ordered alphabetically.
When there is no response, it is best to leave that entry blank. SuAVE has an option to show or hide missing values.