Analysing statistics just isn’t that easy
Having a solid statistical and scientific background, I often find myself frustrated by research and data-analysis in User Experience Metrics, Conversion Optimization and Google Analytics. In my opinion, doing research and analysing statistics requires proper training and understanding of what you are doing. Am I the only one?
You need a brain to do statistics!
Just last week, I have resisted to various inclinations of throwing ‘Measuring the user experience’ by Tom Tullis and Bill Albert against the wall. After reading the book completely, I found it to be a brave attempt to explain statistics as well as a total over-simplification of doing research. In my view, such a simplification really messes up the reliability of results.
The common message in the online research community appears to be that research and statistics are easy and can be executed by everyone. True, all kind of packages like Google Analytics or Convert make doing statistics that much easier. But still… you really need a brain to do it!
Research methodology and simple descriptive statistics are not easy. In my first year of university, three quarters of the students failed their first (mainly descriptive) statistics exam. Also, my years of teaching made clear that mathematics and statistics are the most challenging subjects. It is hard! Executing research and analysing data without proper knowledge of both research designs and statistics can lead to serious misinterpretations in results. I will discuss 3 pitfalls:
1. Doing statistics with small amounts of data
I am not going to argue that statistical analysis with less than 30 observation is not possible, because there are tests (student T-test for example) specifically designed for doing just that. Still, one should be aware that small samples have limited power. This means that differences between two small samples will only be significant if the difference is obvious and large. For instance, if the old design of your checkout page did an average conversion of 2 % and the new design has a conversion rate of 20 %, then the difference will appear significant with 20 observations. But usually differences aren’t that obvious. Small differences or nuances cannot be tested with small samples.
More importantly, I really wonder whether you should do statistical analyses with a very small sample at all. I would always advice a qualitative approach if you have a sample of 15 individuals or less. In a qualitative research design you gather in-depth understanding of human behaviour. Ask open questions and try to discover why visitors of your website buy your products, (dis)like your design of read your posts. Analysing these answers (in a non-statistical manner) will be of great value to increase the conversion of your site. ‘Measuring the user experience’ actually gives a nice introduction to a more qualitative approach of user experience research.
2. Representative sample
Equally important to the sample size is the question whether the sample is representative. Does the sample of individuals you research upon resemble the total population. An example:
If we would do a User Experience study of Yoast.com and we would ask totally random people to visit our website, the sample will not be representative. No offence, but to visit the Yoast-website, you have to be some kind of nerd. You can imagine that the User Experience of random people will probably greatly differ from those of nerds. A representative sample of our population would thus be a random sample of nerds. We would need nerds from all over the world, because our readers from the US probably differ from the ones we have in Europe, India or Australia. And maybe, because of a recent growth in our reader population, our current population also includes some non-nerds. We should definitely take into account the nerdiness of the individuals in our sample. Making a representative sample is hard, the more if you do not know exactly what your population looks like. Taking a large random sample takes care of most of these issues. But: especially with small samples, it is hard to make sure your sample is representative. And: a non-representative sample leads to non-representative (and thus worthless) results.
Validity & Reliability
The validity of a measurement tool (for example a question in a survey) tells us the degree to which the tool actually measures what it claims to measure. Sometimes it is referred to as accuracy.
Reliability is the extent to which a measurement gives consistent results. So, If you pose the same question to the same person twice, will answers be the same? A reliable measurement tool results in the same answers over and over again.
Difference between reliability and validity:
Imagine a person of 200 pounds stepping on the scale 5 times and gets readings of 15. 250, 95, 140 and 500 pounds. This scale is not reliable, the reading is different every time. If the scale consistently reads 150 pounds, the scale is reliable, because the reading are the same. However, the scale is not valid. The reading is wrong. It does not measure, what you want to measure.
3. Validity: GIGO
Website analytics is awesome because a lot of measuring is very easy. You can just count the number of visitors on your page and the number of clicks on a button. Attitudes towards your brand and self-reported issues with usability are much more difficult to measure though. If you want to measure these kinds of things, you could do a qualitative study with a small sample. But a quantitative design with a larger number of individuals is also possible. Possible but also challenging and difficult! The drafting of questions in a survey (especially with limited answering possibilities) is difficult and requires proper testing. You should make sure that your questions really measure what you want to know. Measuring what you want to measure is what we call validity of your measurements. An example:
You want to measure the extend to which people like the design of your website. You ask whether they like the colour. The answers to this question indeed say something about the degree to which people like your website. But design is more than colour. You would probably need more questions to really capture the degree to which people like the design of your website.
If the questions you present to people are of bad quality, the data will become of bad quality as well. Thus remember GIGO: Garbage In, Garbage Out!
Interpreting invalid data (whatever sophisticated statistical analyses you will apply) will always lead to invalid results.
Research is definitely a very powerful tool. But, I think you should have some statistical and methodological background in order to interpret results and execute proper analyses. Taking the time to really understand what you are doing is required.
In this post, I have only discussed very basic methodological and statistical topics. If this is out of your league, you should definitely brush up your statistical knowledge (only if you want to do research, otherwise please do something more fun).
This being said, I do understand the seduction of simple statistical techniques that are available for a broad public. Testing is a beautiful tool to improve your website! For the future, I expect research to become more and more important for websites owners.
This is why we are currently brainstorming at Yoast about designing a tool or a service, which will help people with interpreting test results and statistics. We will keep you posted about developments in this new project!