[thelist] Survey answers schema
Luther, Ron
Ron.Luther at hp.com
Thu Jul 13 10:01:18 CDT 2006
Bill Moseley asked about coding survey answers:
>>Another database question. For a survey I have a table that holds
answers. But a question
>>can generate different types of answers -- such as free text, a enum
list of options( "red",
>>"blue"), or an intger (1..10).
>>So, I'm wondering how best to handle that in the database.
>>I'm not very fond of the multi-columns. Is there a better method?
Hi Bill,
Sorry for the late response. (I used to do quite a bit of custom survey
design and analysis.)
As far as I can recall, we always used multi-columns. (1) I think it
simplifies multivariate
analysis where you may be trying to group categorize respondents based
on answers to 5 or 6
questions simultaneously. (2) I think it simplifies variance
calculations if you are using
more complicated stratified or cluster sampling strategies. (3) Since I
would generally farm
out the task of keypunching the survey results, multi-columns gave me an
easy way to eyeball
the data and cross-check for completeness and gross level errors.
We also spent a fair amount of time on ETL. Just because the survey
answers to question # 4
are "red" or "blue" doesn't necessarily mean that is what you want
entered in the db. You may
prefer to have "red" entered as "1" and "blue" as "2" since the numerics
may be easier for
analytics.
[You just need to be a little careful here and not let marketing types
transform responses of
"I wouldn't use your new product to poke a dead badger in the behind!"
to "Why yes I would be
willing to pay $1000 for the new product."] ;-)
In general, I never saw much use for entering raw reponses to open-ended
questions in the db.
Computationally it's pretty hard to determine that two respondent's
answers of "Because your
product costs too damn much" and "Because your product is too bloody
expensive" should be
categorized together. However, having a person read through the
open-ended responses and
sort them into piles of 'similar responses' will very quickly group
these two together.
I would generally present results off these summarized categories of
open-ended responses
rather than listing all of the individual raw responses.
HTH,
RonL.
More information about the thelist
mailing list