So, you're playing by the book. You've talked to customers, ran some tests, and are ready to start building your product. Hold on! There's a chance that your results are totally random!
That's why in statistics the probability of the results being random is always measured. It is called p-value or Hypothesis "null". Basically, p - is a gorilla factor.
There are 2 factors that influence p:
Sample size (how many people have you talked to)
The actual results you're getting.
For instance, you've talked to 10 customers and 57% of them tell you that they have looked up solutions to their problem and pay for solving it right now. Meaning, 6 people are flagging that you have a business here. But what if you took another 6 random people from a pool of your potential customers? Would they have said that they have never looked up anything and that they have no pain to resolve at all?
When you do your research you get noisy data. What is a noisy data? Let's take 10 people and ask them to flip a coin 10 times. They have a 50/50 chance to get heads or a tail. The problem is that if you ask every one of them, they would not all have 5 heads and 5 tails. One participant would have 2 heads and 8 tails. Another would have 7 tails and 3 heads. The third, perhaps, would break a record and have 10 tails in a row. This numbers — your noisy data. But altogether the split will be somewhere around 50/50. And this is the underlying probability, your trend.
By looking at noisy data you usually are able to figure out the underlying probability. You go from right to left to see the trend. But the trend (from left to right) influences data you see.
To evaluate your chance of seeing totally random results you do what is called "null hypothesis testing". How is it done?
Hypothesis 0 = results are totally random
Let's assume it's true.
P-value: what's the probability of me seeing at least something as conclusive as being observed? In other words, what if another 6 customers randomly picked will tell you the opposite story?
Here you have to use the starting point — the golden middle that represents total randomness. Why is it 50%? Because thousands of studies have shown that that's where it is located. Take 100 people and ask them to answer your question YES or NO. Totally randomly, without thinking. Guess what? You'll get about 50% of YES answers and 50% of NO answers. It's a fact.
Now, let's have a closer look at the imaginary customer interview data: 57% of respondents sending you positive signals. The absolute random result will be 50%.
What we do — we put a confidence interval on this. It is the number that you usually see in scientific researches with +/- , and it indicates that the accurate results might be different. The confidence interval gives you a range of values of what your results may be, and it is also related to the sample size. The smaller your sample size, the larger will be your confidence interval. There's some heavy Math involved in calculating the confidence intervals (t-score for small sample sizes) — I will not bore you with the formulas here (if you're interested in the subject you can find the calculations here.
However, if you're talking to less than 15 people your confidence interval will be not less than 15 (you can try to calculate it on your own, those are the results of my research). At this stage, let's presume you believe me and take for granted that the interval will be not less that 15. You place it on your probability line, and you get the results: from 42 to 72 %
Getting back to our hypothesis — we want the confidence interval to not cross the Hypothesis 0 (results = 50%). If it does, it means that there's a 90% probability that the results you see are purely random. By crossing the middle we proved our Hypothesis 0 to be true.
When I was running my interviews on the parents of picky eating kids I had the following result:
As you can see, my results did not cross the magic middle, therefore I proved that Hypothesis was NOT true. Meaning, there's a 90% chance that the idea of my results being random is ridiculous.
Huh, statistics is sooo damn hard.
But what are you supposed to do if you have your results close to 50% and by adding the confidence interval you're crossing the middle? Right, you increase the sample size, getting more and more data. It's not 100% scientific but I'd say, settle down only when you see clearly that not less than 70% of your respondents give you the same (positive or negative, like in my case) signals. How big should the sample be? If you're on a budget I'd suggest adding 5 more respondents and see what the results will be. Probability is high that you'll get statistically significant results by adding 1/2 of the initial sample size (presuming that you've talked to 10 people initially).
So, that's the statistics course in a nutshell 😂
Hope it'll help and you'll adopt this data-driven approach in your audience discovery. Happy researching!