Wednesday, October 1, 2008

Data Analysis in practice

I'm not sure if you feel about the same thing: you have been working with a professor for so long you think you know everything that s/he knows. But every time s/he impresses you with something new. This is what I felt about my advisor--DAS.

He offered a data-analysis course this semester, which covers a wide range of topics ranging from pre-analysis data cleaning, index-building, to various analytical approaches, with the statistical software of SPSS. As I have been working on data for years, I assume this class won't be a big challenge for me. I was thinking that even I just learn one thing new in each class, that will not be too bad. However, what I get turns out to be more than I expected. (I hope this doesn't mean I knew little before going to this class.)

The first class when I sat in was about building a composite index; for example, a "newspaper use" variable. I know exactly the technical process of doing this. But why should we build a multiple item index? I got questioned by a journal reviewer about my "talk" variable, which was based on single survey question. What is wrong with a single-item measure?

Basically, this at least partially has to do with the idea of systematic error, which refers to the missing of data due to mechanisms that does not affect everyone equally. For example when asking about people's income level, those who are weathy tend to be more concious and senstive to the question and have a higher chance of skipping the question. This is called a systematic error because it happens not to everybody, but only to those who don't feel comfortable answering the question (usually the wealthier ones).

Does setting up a yard sign, showing a bump sticker, or donating money each consititue a valid measure ofr political participation? The answer might be negative. Those who put up a yard sign need to have a lawn at the first place. By the same token, for people to show bump stickers, they need to have a car. In addition, those who can donate usually are economically better off. All these measures favor a demographic with higher SES status and thus can not be considered a comprehensive index of participation individually. That's why researchers usually combine all of these variables to form a composite index that gives a clearer picture regarding the concept of "participation."

Another source of systematic error comes from the tendancy for some people to give "socially desirable" answers--people are prone to give anwswers that are adored by the society. There are several strategies employed by large survey institution to deal with such situation. For example, GSS matched the interviewer and interviewee to the level or gender and race in order to solicit valid answer. One appraoch I felt very interesting is by asking respondensts a "hypothetical policy proposal" which does not exist at all. If the respondents said they have heard about it, they are people who tend to provide socially desirable answers.

No comments: