Survey Formatting Guidelines

In version 1, SuAVE takes CSV files as input. Each row in the CSV file will represent a single respondent or a collection item, and column names will represent survey questions or dataset variables. Cells will represent individual survey responses to each question, or dataset values. The CSV file should not have any additional rows or columns beyond that. Survey questions (column names) and responses (values in each cell) must be properly formatted in order to be functional in the SuAVE user interface.

Note 1. These guidelines are not about designing surveys. A survey design tutorial can be found, for example, at http://www.statpac.com/surveys/,

Note 3. If you have non-English characters in your dataset, you would need your CSV file to be encoded in UTF-8. If you are managing the data in Excel, the simplest method is to copy the content of your spreadsheet and paste it into Google Sheets (as values), then use File-Download As - CSV to save the UTF-8 encoded file, before importing into SuAVE. More information is at http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding .

FORMATTING COLUMN NAMES (VARIABLES)

Example

Here is an example of how CSV column names translate to variables within SuAVE.

CSV:

SuAVE:

Column Type Qualifiers

Notice that in the CSV example above, a #number qualifier is appended to the popid column name. A list of supported qualifiers includes:

Qualifier	Usage	Example
#number	Values of a number variable will be shown as a histogram and a slider in the filter panel on the left	Latitude#number
#date	Values of a date variable will be shown to support filtering by dates, in the filter panel	Date of Birth#date
#long	Values of long variables, such as item descriptions or responses to open ended questions, will not be listed in the filter panel. Instead, the filter panel will include a search box for such variables at the top of the panel	Description#long; What are your suggestions#long
#link	In the information panel on the right, this variable will be shown as a link to an external resource	Company URL#link; Click to edit#link
#ordinal	Similar to #number; these values will be shown as a special type of histogram in the filter panel. The values are expected in the form “1 Strongly Agree”, “2 Agree”, “3 Neutral”, “4 Disagree”, etc. This will force values to be ordered in the main display.	Persons in household#ordinal (eg with responses “1 One”, “2 2-3”, “3 3-5”, “4 6-10”, “5 11-15”, “6 16 or more”)
#textlocation	Expects a well-formatted address that will be geocoded on the fly	Street Address#textlocation
#multi	Allows for multiple values in a single cell (eg in responses to ‘check all that apply” questions). Responses shall be separated by the pipe symbol (“\|”)	Tags#multi
#info	Description associated with an item, which will be shown at the top of the info panel, but not in the list of facets. There can be only one #info column.	Description#info
#hidden	This variable won’t appear in the filter panel or in the sorting dropdown list	Longitude#number#hidden
#hiddenmore	This variable won’t be displayed anywhere in the SuAVE interface.Typically used to hide polygon and line geometry from being shown in the info panel	geometry#hiddenmore
#sortquan	Values of the variable will be sorted by counts in the variable’s facet in the left panel	Country Name#sortquan

Reserved column names

Column name	Usage	Example
#name	The title of an item, as it will appear at the top of the information panel on the right. This can be defined through the authoring interface	#name
#href	URL to be invoked as user clicks on the title	#href
#img	Name of an image associated with a given record. If you define your own images to be used in SuAVE, make sure the image filenames only contain alphanumerics, the underbar, the hyphen, and the dot character. They should not include filename extensions.	#img
Latitude	If any part of a variable name contains Latitude, it will be treated as a geographic coordinate. The values are expected in decimal degrees.	north.latitude
Longitude	If any part of a variable name contains Longitude, it will be treated as a geographic coordinate. The values are expected in decimal degrees	Longitude
geometry	Polygon or line geometry, in WKT format

Length of names

Variable names should fit on one line. Anything longer than 40-50 characters should be abbreviated as much as possible. While hovering over a variable name will reveal a longer string, shortening them is a preferred strategy. For example, “What is respondent’s gender?” variable name can be simple rewritten as “Gender”.

Common abbreviations (i.e. changing “not applicable” to “N/A”) and other creative ways of shortening the variable names and values without sacrificing the meaning are very helpful.

Grouping

If the dataset has questions in groups (i.e. a group for demographics-related questions and a group for household items questions), the questions can be preceded by a number.

For example, the demographics questions could be preceded by a “1”, and questions about household items by a “2,” turning this:

Neighborhood
Gender
Has a refrigerator
Has a blender

into this:

1 Neighborhood
1 Gender
2 Has a refrigerator
2 Has a blender

The variable names should be descriptive enough to be interpretable from outside of their question group.

Distinctions within Groups

The first 25-30 characters of every variable name should be distinct from other questions of the same question group.

For example, these three variable names all start with the same series of characters:

1 If unemployed, reason for unemployment, respondent can't find work
1 If unemployed, reason for unemployment, respondent does volunteer work
1 If unemployed, reason for unemployment, respondent is a student

When such variables are displayed in SuAVE, users will only see the first 25-30 characters followed by an ellipsis (...), and will need to mouse over the names to see the rest of the question. Selecting such questions from a dropdown would be problematic. Try to shorten variable names to 40-50 characters, as in the following:

1 CAN’T FIND WORK as reason for unemployment
1 VOLUNTEER WORK as reason for unemployment
1 STUDENT as reason for unemployment

Capitalization can also be used to stress key words in questions within a group.

Multiple-response questions

When data are exported from common statistical packages, multiple-response questions (“check all that apply” type) are typically converted into a set of binary variables. It is often better to have them as a single variable that allows multiple responses. Use #multi variable qualifier to show such questions in SuAVE. For example, a question “What flavors of ice cream you prefer? Check all that apply” can be presented as a variable “Preferred ice cream#multi” with responses such as “vanilla|strawberry|chocolate”, where “vanilla”, “strawberry” and “chocolate” represent individual set of responses.

FORMATTING RESPONSES

Non-numeric Values

Non-numeric values should have text rather than a code. For example, “1 Strongly disagree” to “5 Strongly agree” is more helpful than “1” to “5.”

To provide another example, if you examine the responses to “Difficulty doing work/activities” and one of the values is “5,” this is not very meaningful. Adding numerals such as “1 No difficulty,” “2 Some difficulty,” and so on would make more sense.

Ordinal values should be preceded by a numeral. For example, entries that answer “How would you describe your health?” would be organized alphabetically in SuAVE like this:

Good

Poor

So/so

Very Good

Very Poor

Such a sequence is not intuitive to work with. Add preceding numbers, such as:

1 Very Poor

2 Poor

3 So/so

4 Good

5 Very Good

This makes bar charts and cross-tabs more meaningful for analysis.

Blank Answers (missing values)

When there is no response, it is best to leave that entry blank. SuAVE has an option to show or hide missing values.

Table of Contents: