Survey Formatting Guidelines

SuAVE takes CSV files as input. As a data file is ingested in SuAVE, the column names are converted into the survey questions (“variables”)  and the entries below each question correspond to the responses. Survey questions and responses must be properly formatted in order to be functional in the SuAVE user interface.

Note 1.  These guidelines are not about designing surveys. A survey design tutorial can be found, for example, at,

Table of Contents:

 Formatting Column Names (Variables)


Column type qualifiers

Reserved column names

        Length of names


        Distinction within Groups

Formatting Responses

        Non-numeric Values

        Blank Answers (missing values)

Note 2. If you have non-English characters in your original Excel file, you would need your CSV file exported from Excel to be encoded in UTF-8. The simplest method is to copy the content of your Excel spreadsheet and paste it into Google Sheets (as values), then use File-Download As - CSV to save the UTF-8 encoded file, before importing into SuAVE. More information is at 



Here is an example of how CSV column names translate to variables within SuAVE.





Column Type Qualifiers

Notice that in the CSV example above, a #number qualifier is appended to the popid column name. A list of supported qualifiers includes:





Values of a number variable will be shown as a histogram and a slider in the filter panel on the left



Values of a date variable will be shown to support filtering by dates, in the filter panel

Date of Birth#date


Values of long variables, such as item descriptions or responses to open ended questions, will not be listed in the filter panel. Instead, the filter panel will include a search box for such variables at the top of the panel

What are your suggestions#long


In the information panel on the right, this variable will be shown as a link to an external resource

Company URL#link;
Click to edit#link


Similar to #number; these values will be shown as a special type of histogram in the filter panel. The values are expected in the form “1 Strongly Agree”, “2 Agree”, “3 Neutral”, “4 Disagree”, etc. This will force values to be ordered in the main display.

Persons in household#ordinal

(eg with responses “1 One”, “2 2-3”, “3 3-5”, “4 6-10”, “5 11-15”, “6 16 or more”)


Expects a well-formatted address that will be geocoded on the fly

Street Address#textlocation


This variable won’t appear in the filter panel or in the sorting dropdown list


Reserved column names

Column name




The title of an item, as it will appear at the top of the information panel on the right. This can be defined through the authoring interface



URL to be invoked as user clicks on the title



Name of a deepzoom image structure associated with a given record. This can be defined through the authoring interface



If any part of variable name contains Latitude, it will be treated as a geographic coordinate (expected in decimal degrees)



If any part of variable name contains Longitude, it will be treated as a geographic coordinate (expected in decimal degrees)


Length of names

Variable names should be able to fit on one line. Anything longer than 40-50 characters should be abbreviated as much as possible. While hovering over a name will reveal a longer string, shortening the names is a preferred strategies

Common abbreviations (i.e. changing “not applicable” to “N/A”) and other creative ways of shortening the variable names and values without sacrificing the meaning are very helpful.


If the dataset has questions in groups (i.e. a group for demographics questions and a group for household items questions), the question can be preceded by a number.

For example, the demographics questions could be preceded by a “1” and the household items by a “2,” turning this:

Has a refrigerator
Has a blender

into this:

1 Neighborhood
1 Gender
2 Has a refrigerator
2 Has a blender

The variable names should be descriptive enough to be interpretable from outside of their question group.

Distinction within Groups

The first 25-30 characters should be distinct from other questions of the same question group.

For example, these three variable names all start with the same series of characters:

1 If unemployed, reason for unemployment, respondent can't find work
1 If unemployed, reason for unemployment, respondent does volunteer work
1 If unemployed, reason for unemployment, respondent is a student

When users see them on the interface, they will only see the first 25-30 characters followed by an ellipsis (...) and will need to mouseover to see the rest of the question. Selecting such questions from a dropdown would be problematic. It is suggested to shortening them to 40-50 characters like the following:

1 CAN’T FIND WORK as reason for unemployment
1 VOLUNTEER WORK as reason for unemployment
1 STUDENT as reason for unemployment

Capitalization can also be used to stress key words in questions within a group.


Non-numeric Values

Non-numeric values should have text rather than a code. For example, “1 Strongly disagree” to “5 Strongly agree” is more helpful than “1” to “5.”

To provide another example, if you examine the responses to “Difficulty doing work/activities” and one of the values is “5,” this is not very meaningful. Adding numerals such as “1 No difficulty,” “2 Some difficulty,” and so on would make more sense.

Ordinal values should be preceded by a numeral. For example, entries that answer “How would you describe your health?” would be organized alphabetically in SuAVE like so:




Very Good

Very Poor

They should instead be written as:

1 Very Poor

2 Poor

3 So/so

4 Good

5 Very Good

This makes the histograms more meaningful, because this prevents them from being ordered alphabetically.

Blank Answers (missing values)

When there is no response, it is best to leave that entry blank. SuAVE has an option to show or hide missing values.