Merging and Concatenating pandas DataFrames

Learn how to combine multiple DataFrames using merge and concat operations.

20 min read
Intermediate

Merging and Concatenating pandas DataFrames

When working with pandas DataFrames, there might arise a case where you would want to merge or concatenate multiple DataFrames together for easier data manipulation. So, in this lesson, you will learn how to perform these operations using pandas.


Merging/Joining pandas DataFrames

Two different pandas DataFrames can be merged or joined together if they share a common column.

python
# Importing the pandas library as pd
import pandas as pd

The first DataFrame will hold information about students from a class,

python
# Creating a Pandas DataFrame using a Python Dictionary
dict_values = {
    "Roll Number": [1, 2, 3, 4, 5, 6, 7],
    "Student": ["Aaron", "Aaron", "Britney", "Chris", "Diana", "Dave", "Evan"],
    "Age": [18, 19, 18, 18, 18, 17, 19],
}

students_df = pd.DataFrame(dict_values)

# Printing the DataFrame
print(students_df)

The second DataFrame will hold information about their grades in their final examination and their attendance record,

python
# Creating a Pandas DataFrame using a Python Dictionary
grades_dict_values = {
    "Roll Number": [1, 2, 3, 4],
    "Attendance": [220, 219, 220, 220],
    "Final Grades": ["A", "A", "C", "F"],
}

grades_df = pd.DataFrame(grades_dict_values)

# Printing the DataFrame
print(grades_df)

Since both of these DataFrames are for the same set of students and share a unique column called Roll Number, we can combine the two together in a single DataFrame. There are different kinds of joins that can be performed between pandas DataFrames (similar to that of SQL tables).

python
# Performing a left outer join
print(pd.merge(students_df, grades_df, on="Roll Number", how="left"))
python
# Performing a right outer join
print(pd.merge(students_df, grades_df, on="Roll Number", how="right"))
python
# Performing a full outer join
print(pd.merge(students_df, grades_df, on="Roll Number", how="outer"))
python
# Performing an inner join
print(pd.merge(students_df, grades_df, on="Roll Number", how="inner"))

Concatenating pandas DataFrames

We can concatenate two or more DataFrames using the pandas concat() method. First, let us create two DataFrames with the same column names.

python
# Initializing a pandas DataFrame
dict_1 = {
    "Roll Number": [1, 2, 3],
    "Student": ["Aaron", "Aaron", "Britney"],
}

students_df_1 = pd.DataFrame(dict_1)

# Printing the DataFrame
print(students_df_1)
python
# Initializing a pandas DataFrame
dict_2 = {
    "Roll Number": [4, 5, 6],
    "Student": ["Chris", "Diana", "Dave"],
}

students_df_2 = pd.DataFrame(dict_2)

# Printing the DataFrame
print(students_df_2)

Now, concatenating the DataFrames,

python
# Concatenate pandas DataFrames along the row axis
print(pd.concat([students_df_1, students_df_2], axis=0))
python
# Concatenate pandas DataFrames along the column axis
print(pd.concat([students_df_1, students_df_2], axis=1))