Skip to main content
  1. Posts/

Cleaning Index and Column Names in Data

·1 min
Data Data Basics Data Pandas
Table of Contents

The first step in cleaning a dataset is to organize and standardize the index and column names to avoid confusion and inconsistencies. This includes renaming index and column labels, as well as sorting them.

Rename Index and Column Names Using a Dictionary
#

To clean index and column names, you can use the rename method with a dictionary to map old names to new names:

df.rename(index={"old_index_1": "new_index_1", "old_index_2": "new_index_2"})

To rename column names, simply use the columns keyword instead of index. By default, this returns a new DataFrame without modifying the original. To update the original DataFrame directly, add the inplace=True argument:

df.rename(index={"old_index_1": "new_index_1"}, inplace=True)

Rename with a Function
#

If you want to rename all index or column names using a function, such as converting all column names to uppercase:

df.rename(columns=str.upper, inplace=True)

This is especially helpful when applying the same transformation to all labels.

Sorting Index or Column Names
#

To sort the index or column names, you can use:

df.sort_index(axis=0)  # axis=0 for index, axis=1 for column names

Like before, this returns a new DataFrame. To modify the original DataFrame in-place:

df.sort_index(axis=0, inplace=True)

Related

Converting Categorical Variables with `get_dummies`
·2 mins
Data Data Basics Data Pandas
Filtering Data with query()
·1 min
Data Data Basics Data Pandas
Grouping Data with `cut`
·2 mins
Data Data Basics Data Pandas