Concatenating and Merging DataFrames

Table of Contents

In real-world data analysis, we often encounter datasets from different sources. For example, one table may contain customer information, and another may contain order records; or we may have transaction logs from different months. To analyze these datasets effectively, we need to combine them. There are three main methods:

Vertical concatenation (stacking rows)
Horizontal concatenation (aligning columns side by side)
Merging (similar to SQL joins)

Concatenating DataFrames
#

Vertical Concatenation
#

If you have two datasets with the same structure (i.e., same column names), such as transactions from two months, you can stack them using pd.concat():

pd.concat([df1, df2])

By default, this preserves the original indices. If you’d like to reset the index:

pd.concat([df1, df2], ignore_index=True)

This creates a new 0-based index.

Horizontal Concatenation
#

To combine datasets side by side (i.e., merge columns):

pd.concat([df1, df2], axis=1)

This aligns data by index. Make sure that df1 and df2 share the same index, or NaN will be introduced for unmatched rows.

Merging DataFrames
#

Using `merge`
#

If you’re familiar with SQL joins, pandas.merge() offers similar functionality. For example, if you have an orders table and a customers table, and both use CustomerID as a key:

df_result = pd.merge(df_orders, df_customers, on="CustomerID", how="inner")

Parameter	Description
`on`	Specifies the common column to join on
`left_on` / `right_on`	Use when the join keys have different names in the two DataFrames
`how`	Join method: `"inner"` (intersection), `"outer"` (union), `"left"`, `"right"`

Using `join`
#

To merge DataFrames based on their index rather than columns:

df_result = df1.join(df2, how="left", lsuffix="_left", rsuffix="_right")

By default, join() merges on index
If both DataFrames have columns with the same name, use lsuffix and rsuffix to disambiguate them

Converting Categorical Variables with `get_dummies`

6 May 2025·2 mins

Data Data Basics Data Pandas

Filtering Data with query()

3 May 2025·1 min

Data Data Basics Data Pandas

Grouping Data with `cut`

3 May 2025·2 mins

Data Data Basics Data Pandas

Concatenating DataFrames #

Vertical Concatenation #

Horizontal Concatenation #

Merging DataFrames #

Using merge #

Using join #

Related

Concatenating DataFrames
#

Vertical Concatenation
#

Horizontal Concatenation
#

Merging DataFrames
#

Using `merge`
#

Using `join`
#