# What is categorical data?

Updated: Oct 20, 2021

In machine learning, we many times come across data which are not in numbers such as colors, names, etc.

Though it seems like a good way of collecting information, categorical data is a little difficult to work.

Machine learning algorithms operate on mathematical vectors.

**Encoding of categorical data**

As we discussed, machine learning algorithms cannot directly work with categorcial data as they operate on numbers.

Some work on the data before we can feed it to a machine learning model so that it can operate on it.

The process of turning categorical data into usable, machine-learning ready, mathematical data is called **categorical encoding**.

**Types of Encoding**

**Ordinal Encoding or Label Encoding**

We convert ordered string labels to integer values 1 through *k, k being the number of class.*

*OneHot* Encoding

*OneHot*Encoding

We denote one column to each data category and number them 0 for false, and true for 1 in each row.

**Binary encoding**

First, the categories are encoded by ordinal encoding, then we convert those integers are binary code, then the digits from that binary number are split into separate columns.

**Base N Encoding**

Binary has conversion using Base 2 but this encoding allows us to convert the integers with any value of the base. It is useful to reduce size of the large numbers.

**Hashing**

We transform a string of characters into a usually shorter fixed-length value using an algorithm that represents the original string.

You can specify length as n and that will be your number of columns number of categories in actual data doesn’t matter