[thelist] Survey answers schema

Luther, Ron Ron.Luther at hp.com
Thu Jul 13 10:01:18 CDT 2006


Bill Moseley asked about coding survey answers:


>>Another database question.  For a survey I have a table that holds
answers.  But a question 
>>can generate different types of answers -- such as free text, a enum
list of options( "red", 
>>"blue"), or an intger (1..10).

>>So, I'm wondering how best to handle that in the database.
>>I'm not very fond of the multi-columns.  Is there a better method?


Hi Bill,


Sorry for the late response.  (I used to do quite a bit of custom survey
design and analysis.)

As far as I can recall, we always used multi-columns.  (1) I think it
simplifies multivariate 
analysis where you may be trying to group categorize respondents based
on answers to 5 or 6 
questions simultaneously.  (2) I think it simplifies variance
calculations if you are using 
more complicated stratified or cluster sampling strategies.  (3) Since I
would generally farm 
out the task of keypunching the survey results, multi-columns gave me an
easy way to eyeball 
the data and cross-check for completeness and gross level errors. 


We also spent a fair amount of time on ETL.  Just because the survey
answers to question # 4 
are "red" or "blue" doesn't necessarily mean that is what you want
entered in the db.  You may 
prefer to have "red" entered as "1" and "blue" as "2" since the numerics
may be easier for 
analytics.

[You just need to be a little careful here and not let marketing types
transform responses of 
"I wouldn't use your new product to poke a dead badger in the behind!"
to "Why yes I would be 
willing to pay $1000 for the new product."]  ;-)


In general, I never saw much use for entering raw reponses to open-ended
questions in the db. 
Computationally it's pretty hard to determine that two respondent's
answers of "Because your 
product costs too damn much" and "Because your product is too bloody
expensive" should be 
categorized together.  However, having a person read through the
open-ended responses and 
sort them into piles of 'similar responses' will very quickly group
these two together.  
I would generally present results off these summarized categories of
open-ended responses 
rather than listing all of the individual raw responses.


HTH,

RonL.



More information about the thelist mailing list