Aggregation and GroupBy in pandas
Learn how to aggregate and group data in pandas for powerful data analysis.
Aggregation and GroupBy in pandas
Learn to perform aggregation and groupby operations in pandas.
Aggregation in pandas
Since pandas DataFrame is made up of pandas Series as columns, various aggregation methods can be called from it. Let's have a look at some of them,
# Importing the pandas library as pd
import pandas as pd
# Creating a pandas DataFrame using a Python Dictionary
dict_values = {
"Fruit": ["Apple", "Apple", "Orange", "Mango", "Apple"],
"Weight": ["1kg", "1kg", "2kg", "2kg", "5kg"],
"Price": [100, 200, 300, 400, 500],
}
df = pd.DataFrame(dict_values)
# Printing the DataFrame
print(df)# Returns the sum of the column
print("Sum of Price:", df["Price"].sum())
# Returns the minimum of the column
print("Min Price:", df["Price"].min())
# Returns the maximum of the column
print("Max Price:", df["Price"].max())# Returns the specified aggregated values
print(df["Price"].agg(["min", "max", "sum", "mean", "median"]))pandas GroupBy
The pandas GroupBy operation involves the combination of splitting an object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Grouping by on the Fruit column and getting the sum of the Price column for the grouped values,
print(df.groupby("Fruit")["Price"].sum())Grouping by on the Weight column and getting the sum of the Price column for the grouped values,
print(df.groupby("Weight")["Price"].sum())Grouping by on the Fruit and Weight column and getting the sum of the Price column for the grouped values,
print(df.groupby(["Weight", "Fruit"]).sum())You can also apply a function on the grouped values,
# Importing NumPy
import numpy as np
def grouped_func(g):
median_val = np.median(g) * 100
return pd.Series({"Median * 100": median_val})
print(df.groupby(["Fruit", "Weight"]).apply(grouped_func))Using a lambda function,
print(df.groupby(["Fruit", "Weight"]).apply(lambda g: pd.Series({"Median * 100" : np.median(g) * 100})))