Survey Formatting Guidelines
In version 1, SuAVE takes CSV files as input. Each row in the CSV file will represent a single respondent or a collection item, and column names will represent survey questions or dataset variables. Cells will represent individual survey responses to each question, or dataset values. The CSV file should not have any additional rows or columns beyond that. Survey questions (column names) and responses (values in each cell) must be properly formatted in order to be functional in the SuAVE user interface.
Note 1. These guidelines are not about designing surveys. A survey design tutorial can be found, for example, at http://www.statpac.com/surveys/,
Formatting Column Names (Variables)
Example
Column type qualifiers
Reserved column names
Length of names
Grouping
Distinctions within groups
Multiple response questions
Non-numeric Values
Blank Answers (missing values)
Note 2. Use safe characters for column names and responses: alphanumerics [0-9a-zA-Z], and special characters $-_.+!*'(), If you have unsafe characters in your dataset, the CSV file may still be loaded but would not display correctly. Experiment, and reload the data as necessary.
Note 3. If you have non-English characters in your dataset, you would need your CSV file to be encoded in UTF-8. If you are managing the data in Excel, the simplest method is to copy the content of your spreadsheet and paste it into Google Sheets (as values), then use File-Download As - CSV to save the UTF-8 encoded file, before importing into SuAVE. More information is at http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding .
Example
Here is an example of how CSV column names translate to variables within SuAVE.
CSV:
SuAVE:
Notice that in the CSV example above, a #number qualifier is appended to the popid column name. A list of supported qualifiers includes:
Qualifier | Usage | Example |
#number | Values of a number variable will be shown as a histogram and a slider in the filter panel on the left | Latitude#number |
#date | Values of a date variable will be shown to support filtering by dates, in the filter panel | Date of Birth#date |
#long | Values of long variables, such as item descriptions or responses to open ended questions, will not be listed in the filter panel. Instead, the filter panel will include a search box for such variables at the top of the panel | Description#long; |
#link | In the information panel on the right, this variable will be shown as a link to an external resource | Company URL#link; |
#ordinal | Similar to #number; these values will be shown as a special type of histogram in the filter panel. The values are expected in the form “1 Strongly Agree”, “2 Agree”, “3 Neutral”, “4 Disagree”, etc. This will force values to be ordered in the main display. | Persons in household#ordinal (eg with responses “1 One”, “2 2-3”, “3 3-5”, “4 6-10”, “5 11-15”, “6 16 or more”) |
#textlocation | Expects a well-formatted address that will be geocoded on the fly | Street Address#textlocation |
#multi | Allows for multiple values in a single cell (eg in responses to ‘check all that apply” questions). Responses shall be separated by the pipe symbol (“|”) | Tags#multi |
#info | Description associated with an item, which will be shown at the top of the info panel, but not in the list of facets. There can be only one #info column. | Description#info |
#hidden | This variable won’t appear in the filter panel or in the sorting dropdown list | Longitude#number#hidden |
#hiddenmore | This variable won’t be displayed anywhere in the SuAVE interface.Typically used to hide polygon and line geometry from being shown in the info panel | geometry#hiddenmore |
#sortquan | Values of the variable will be sorted by counts in the variable’s facet in the left panel | Country Name#sortquan |
Column name | Usage | Example |
#name | The title of an item, as it will appear at the top of the information panel on the right. This can be defined through the authoring interface | #name |
#href | URL to be invoked as user clicks on the title | #href |
#img | Name of an image associated with a given record. If you define your own images to be used in SuAVE, make sure the image filenames only contain alphanumerics, the underbar, the hyphen, and the dot character. They should not include filename extensions. | #img |
Latitude | If any part of a variable name contains Latitude, it will be treated as a geographic coordinate. The values are expected in decimal degrees. | north.latitude |
Longitude | If any part of a variable name contains Longitude, it will be treated as a geographic coordinate. The values are expected in decimal degrees | Longitude |
geometry | Polygon or line geometry, in WKT format |
Variable names should fit on one line. Anything longer than 40-50 characters should be abbreviated as much as possible. While hovering over a variable name will reveal a longer string, shortening them is a preferred strategy. For example, “What is respondent’s gender?” variable name can be simple rewritten as “Gender”.
Common abbreviations (i.e. changing “not applicable” to “N/A”) and other creative ways of shortening the variable names and values without sacrificing the meaning are very helpful.
If the dataset has questions in groups (i.e. a group for demographics-related questions and a group for household items questions), the questions can be preceded by a number.
For example, the demographics questions could be preceded by a “1”, and questions about household items by a “2,” turning this:
Neighborhood
Gender
Has a refrigerator
Has a blender
into this:
1 Neighborhood
1 Gender
2 Has a refrigerator
2 Has a blender
The variable names should be descriptive enough to be interpretable from outside of their question group.
The first 25-30 characters of every variable name should be distinct from other questions of the same question group.
For example, these three variable names all start with the same series of characters:
1 If unemployed, reason for unemployment, respondent can't find work
1 If unemployed, reason for unemployment, respondent does volunteer work
1 If unemployed, reason for unemployment, respondent is a student
When such variables are displayed in SuAVE, users will only see the first 25-30 characters followed by an ellipsis (...), and will need to mouse over the names to see the rest of the question. Selecting such questions from a dropdown would be problematic. Try to shorten variable names to 40-50 characters, as in the following:
1 CAN’T FIND WORK as reason for unemployment
1 VOLUNTEER WORK as reason for unemployment
1 STUDENT as reason for unemployment
Capitalization can also be used to stress key words in questions within a group.
Non-numeric values should have text rather than a code. For example, “1 Strongly disagree” to “5 Strongly agree” is more helpful than “1” to “5.”
To provide another example, if you examine the responses to “Difficulty doing work/activities” and one of the values is “5,” this is not very meaningful. Adding numerals such as “1 No difficulty,” “2 Some difficulty,” and so on would make more sense.
Ordinal values should be preceded by a numeral. For example, entries that answer “How would you describe your health?” would be organized alphabetically in SuAVE like this:
Good
Poor
So/so
Very Good
Very Poor
Such a sequence is not intuitive to work with. Add preceding numbers, such as:
1 Very Poor
2 Poor
3 So/so
4 Good
5 Very Good
This makes bar charts and cross-tabs more meaningful for analysis.
When there is no response, it is best to leave that entry blank. SuAVE has an option to show or hide missing values.