spiders georg: I live in cave and eat over 10,000 spiders each day
everyone else: you fucked up a perfect good factoid is what you did. look at it. it’s got statistical error.
ACTUALLY!
ACTUALLY SPIDERS GEORG SHOULD BE COUNTED!
Yes, he’s an outlier. But if an outlier is a valid data point – as in, he’s not being counted in error, it’s not a data entry issue, or it’s not being calculated in a very different way to everyone else (e.g. measuring his diet over a lifetime compared to everyone else being measured over a month), then he absolutely should be counted, for accuracy and honesty. You can’t just ignore data points to manipulate your messaging.
HOWEVER. The mean/average (add all the factors, divide by number of factors) MAY NOT BE the best way to represent statistical data, ESPECIALLY when outliers are concerned. The mean is always pulled towards the outlier – in this case, the mean is closer to 7 ish than 0.
What you should do with Spiders Georg-style situations:
1. Consider the MEDIAN (the midway point, where there’s an equal # of data points before and after this number) or the MODE (the most common factor). They are NOT affected by outliers, so Spiders Georg can eat to his heart’s content.
2. VISUALLY represent the statistical data! You can have two different charts with essentially the same numbers, but the picture tells a different story.
3. Bring up spread (how far away are the lowest and highest data points) and statistical deviation (the difference of each data point from the mean)!
tl;dr: Spiders Georg likely NOT statistical error, but there are better ways to include him in your data.
Math major: Look, I explained how to do Spiders Georg properly!
Me: You fucked up a perfectly good meme, is what you did. Look at it, it’s giving people anxiety.