What is the difference between Label Encoding and One-Hot Encoding (or Creating Dummies)? Which to apply when?

Senan Mehdiyev
2 min readJun 20, 2022

--

Let’s start with the explanation. We know that before the modeling process, there must be pre-processing to make the data to obtain the best possible performance. There is a situation we may face in the pre-processing phase. It is the conversion of Categorical data into Numeric. The point is that a mathematical model (simply the computer brain) can only understand numbers.

Aha! Now let’s look at the difference between Label Encoding and One-Hot Encoding (or Creating Dummies) and their application! As we remember from statistics, we can divide Categorical data into two groups, Nominal and Ordinal. Examples for Nominal “Gender”, “Hair color”, “Country”, and for Ordinal “Position (Junior, Middle, Senior)”, “Education (BS, MS, PhD)”, “Satisfaction level (Bad, Average, Good, Excellent) ” and others can be given. As you can see from the examples, Ordinal has a ranking of values, but not Nominal, and it is not possible to do it under normal conditions!

If our data is Ordinal and the count of Unique values ​​is large, Label Encoding — (which will also help prevent the creation of a large number of features, save memory and not increase the complexity of the model), vice versa, if data is Nominal and the count of Unique values ​​are low, One-Hot Encoding (or Creating Dummies) should be applied

Creating Dataset
Dataset
Implementation of Label Encoding
Label Encoding
Implementation of One-Hot Encoding
One-Hot Encoding

We can also create dummy features using “pd.get_dummies()”, and the application is more convenient and provides access to important parameters. For example; The “drop_first” parameter can be used to reduce the total number of dummy features by one by deleting the first dummy feature. The reason for the reduction by one is due to Degrees of Freedom, resulting it prevents Multicollinearity. I will talk about this in detail in another article, stay tuned! :)

Implementation of pd.get dummies()
Creating Dummies

If you liked the article or have any ideas, we will be glad if you like the post and comment. Thanks!

--

--

Senan Mehdiyev

Self-motivated, ambitious to work, research, and develop in Data Science/ML/DL, have communicative skills, highly inspired to run with a competitive team