Skewness — Definition, Problem and Reducing Methods

2 min readFeb 10, 2021

What is skewness?

Better looking at this picture:

How does it happen?

Outliers causes the distribution is skewed towards them. For instance, if you have extremely low values in comparison with the rest, your distribution will skew to the left and vice versa.

Why is skewness a problem?

Many common statistical methods require at least an approximately normal distribution, such as: central limit theorem, hypothesis testing (z-test, ANOVA), etc. With a skewed data, not only it limits our tools to do the work, but also affects performance of our model especially regression-based model.

For example: majority of student has height between 160–175 cm and minority are over 200 cm. The data is skewed to right. If we apply a linear regression model to this data, this is going to happens:

As we can see, with an outlier (height = 250 cm), our R-squared drops from 0.805 to 0.584. R-squared tells us percentage variation of y (height) explained by x (weight). In this case, regression line work better at predicting majority.