Aggregation and GroupBy in pandas

Learn to perform aggregation and groupby operations in pandas.

Aggregation in pandas

Since pandas DataFrame is made up of pandas Series as columns, various aggregation methods can be called from it. Let's have a look at some of them,

python

# Importing the pandas library as pd
import pandas as pd

# Creating a pandas DataFrame using a Python Dictionary
dict_values = {
    "Fruit": ["Apple", "Apple", "Orange", "Mango", "Apple"],
    "Weight": ["1kg", "1kg", "2kg", "2kg", "5kg"],
    "Price": [100, 200, 300, 400, 500],
}

df = pd.DataFrame(dict_values)

# Printing the DataFrame
print(df)

python

# Returns the sum of the column
print("Sum of Price:", df["Price"].sum())

# Returns the minimum of the column
print("Min Price:", df["Price"].min())

# Returns the maximum of the column
print("Max Price:", df["Price"].max())

python

# Returns the specified aggregated values
print(df["Price"].agg(["min", "max", "sum", "mean", "median"]))

pandas GroupBy

The pandas GroupBy operation involves the combination of splitting an object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Grouping by on the Fruit column and getting the sum of the Price column for the grouped values,

python

print(df.groupby("Fruit")["Price"].sum())

Grouping by on the Weight column and getting the sum of the Price column for the grouped values,

python

print(df.groupby("Weight")["Price"].sum())

Grouping by on the Fruit and Weight column and getting the sum of the Price column for the grouped values,

python

print(df.groupby(["Weight", "Fruit"]).sum())

You can also apply a function on the grouped values,

python

# Importing NumPy
import numpy as np

def grouped_func(g):
    median_val = np.median(g) * 100
    return pd.Series({"Median * 100": median_val})

print(df.groupby(["Fruit", "Weight"]).apply(grouped_func))

Using a lambda function,

python

print(df.groupby(["Fruit", "Weight"]).apply(lambda g: pd.Series({"Median * 100" : np.median(g) * 100})))