Data dredging'

11/4/2023

Let’s Do Some Data Dredging: P-Hacking With Our Coca-Cola Example The p-value we compute must be less than our chosen benchmark value for us to conclude statistical significance. If you think about it, the smaller our computed p-value is, the more unlikely it is that the data we obtained was by pure coincidence. The most commonly selected benchmark value for p-value analysis is 0.05, or a 5% random chance that we would have received our test results in a universe where daily Coca-Cola consumption has no significant positive correlation with fitness.

The statistical significance of data is communicated through something called a p-value, which is the probability of obtaining our results in a universe where our null hypothesis is true. P-hacking can lead to academic papers headlined with false positives, such as tobacco smoking improving health or vaccinations damaging the human body. If a researcher runs ninety-nine statistically insignificant experiments before obtaining a statistically significant one, and they only report the significant one in a scientific paper, that individual is guilty of p-hacking. Thus, in our Coca-Cola experiment, our null hypothesis would be the claim that everyday consumption of Coca-Cola has no significant positive correlation with fitness.ĭata dredging, also called p-value hacking, stems from reporting cherry-picked statistically significant results from a set of tests while intentionally leaving out the necessary context of the majority statistically insignificant ones. In other words, a null hypothesis is a statement researchers build just to hopefully knock down. In hypothesis testing, researchers formulate a null hypothesis - which is the idea that the variables we are testing do not affect the results.

0 Comments

Data dredging'

Leave a Reply.

Author

Archives

Categories