One Hot Encoding vs Label Encoder

When to use One Hot Encoding vs LabelEncoder?

When considering One Hot Encoding(OHE) and Label Encoding, we must try and understand what model you are trying to build. Namely, the two categories of the model we will be considering are:

  1. Tree-Based Models: Gradient Boosted Decision Trees and Random Forests.
  2. Non-Tree Based Models: Linear, kNN, or Neural Network-based.

Let's consider when to apply OHE and when to apply Label Encoding while building tree-based models.

We apply OHE when:

  1. When the values that are close to each other in the label encoding correspond to target values that aren't close (non - linear data).
  2. When the categorical feature is not ordinal (dog, cat, etc).

We apply Label encoding when:

  1. The categorical feature is ordinal (Primary school, high school, etc).
  2. When we can come up with a label encoder that assigns close labels to similar categories: This leads to fewer splits in the tress hence reducing the execution time.
  3. When the number of categorical features in the dataset is huge: One-hot encoding a categorical feature with a huge number of values can lead to
    1. High memory consumption and
    2. The case when non-categorical features are rarely used by the model. You can deal with the 1st case if you employ sparse matrices. The 2nd case can occur if you build a tree using only a subset of features. For example, if you have 9 numeric features and 1 categorical with 100 unique values and you one-hot-encoded that categorical feature, you will get 109 features. If a tree is built with only a subset of features, the initial 9 numeric features will rarely be used. In this case, you can increase the parameter controlling the size of this subset. In xgboost, it is called colsample_bytree, in sklearn's Random Forest-max_features.

Let's consider when to apply OHE and Label Encoding while building nontree based models.

To apply Label encoding, the dependence between feature and target must be linear in order for Label Encoding to be utilized effectively.

Similarly, in case the dependency is non-linear, you might want to use OHE for the same.


    • Related Articles

    • Simple imputer

      SimpleImputer is a scikit-learn class that is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder. It is implemented by the use of the SimpleImputer() method as shown in the ...