Table of contents
Introduction
This is the study of groups.
It is the most important topic in pandas
GROUP BY is applicable to categorical data
After grouping if we try to print it we will not get any output , the only output that we get is
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f677f267130>
Aggregate Function in pandas
The most basic aggregation method is counting.
To count the number of the animals is as easy as applying a count pandas function on the whole
zoo
dataframe:zoo.count()
Following the same logic, you can easily sum the values in the
water_need
column by typing:zoo.water_need.sum()
Eventually, let’s calculate statistical averages, like mean and median!
The syntax is the same as it was with the other aggregation methods above:
zoo.water_need.mean() zoo.water_need.median()
As a data scientist, you will probably do segmentations all the time. For instance, it’s nice to know the mean
water_need
of all animals (we have just learned that it’s347.72
).But very often it’s much more actionable to break this number down – let’s say – by animal types. With that, we can compare the species to each other. (Do lions or zebras drink more?) Or we can find outliers! (Elephants drink a lot!)
Here’s a simple visual showing how pandas performs “segmentation” – with
groupby
and aggregation: