Sveriges mest populära poddar
Linear Digressions

Anscombe's Quartet

16 min19 juni 2017
Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different. It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is. Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur. It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics. In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.

Linear Digressions med Katie Malone finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.