(1)、按列分組
import pandas as pd
import numpy as np
df = dataframe()
df
data1 data2 key1 key2
0 -1.488061 -0.002241
aone
10.707773
0.338733
atwo
2 -1.689161
0.647643 b one
30.987463 -0.584322 b two
4 -0.560973 -1.147191
aone
依據單個列名』key1』進行為分組
group1 = df.groupby('key1')
[x for x in group1]
[('a', data1 data2 key1 key2
0 -1.488061 -0.002241 a one
10.707773
0.338733 a two
4 -0.560973 -1.147191 a one),
('b', data1 data2 key1 key2
2 -1.689161
0.647643 b one
30.987463 -0.584322 b two)]
依據多個列名[『key1』,』key2』]進行分組
group2 = df.groupby(['key1','key2'])
[x for x in group2]
[(('a', 'one'), data1 data2 key1 key2
0 -1.488061 -0.002241 a one
4 -0.560973 -1.147191 a one),
(('a', 'two'), data1 data2 key1 key2
10.707773
0.338733 a two),
(('b', 'one'), data1 data2 key1 key2
2 -1.689161
0.647643 b one),
(('b', 'two'), data1 data2 key1 key2
30.987463 -0.584322 b two)]
其中,group1是乙個中間分組變數,為groupby型別;
推導式[x for x in group1]用於顯示分組內容
(2)、分組統計
對分組group1、group2分別應用size()、sum()、count()等統計函式,可分別統計分組的數量、不同列的分組和、不同列的分組數量。
group1.size()
key1
a 3
b 2
dtype: int64
group1.sum()
data1 data2
key1
a -1.341260 -0.810698
b -0.701698
0.063321
group2.size()
key1 key2
aone
2two
1b one
1two
1dtype: int64
group2.count()
data1 data2
key1 key2
aone22
two1
1b one11
two1
1
(3)、agg()
agg(func)可對分組後的某一列或者多個列的資料應用func函式,也可推廣到同時作用於多個列和多個函式上。
例:對分組後的』data1』列求均值
group1['data1'].agg('mean')
key1
a -0.447087
b -0.350849
name: data1, dtype: float64
例:對分組後的』data1』和』data2』列分別求均值、求和
group1['data1','data2'].agg(['mean','sum'])
data1 data2
mean sum mean sum
key1
a -0.447087 -1.341260 -0.270233 -0.810698
b -0.350849 -0.701698
0.031660
0.063321
data1 data2
key1
a -0.447087 -0.270233
b -0.350849
0.031660
data1 data2
key1 key2
aone -1.024517 -0.574716
two0.707773
0.338733
b one -1.689161
0.647643
two0.987463 -0.584322
(5)、reset_index()
通過reset_index()函式可以將groupby()的分組結果轉換成dataframe物件,進而儲存。
group1['data1','data2'].agg(['mean','sum']).reset_index()
key1 data1 data2
mean sum mean sum
0a -0.447087 -1.341260 -0.270233 -0.810698
1 b -0.350849 -0.701698
0.031660
0.063321
pandas聚合運算,分組運算
分組運算,先根據一定規則拆分後的資料,然後對資料進行聚合運算,如前面見到的 mean sum 等就是聚合的例子。聚合時,拆分後的第乙個索引指定的資料都會依次傳給聚合函式進行運算。最後再把運算結果合併起來,生成最終結果。先生成乙個dataframe 用key1的索引分類再求平均 df.groupby ...
pandas分組運算(groupby)
按a列分組 groupby 獲取其他列的均值 方法1 b df b groupby df a mean 按a列分組,獲取b列的均值 print b 方法2 b df.ix 1 groupby df.ix 0 mean 按a列分組 0對應a列,1對應b列 獲取b列的均值 print b 方法3 2.聚...
Pandas之資料分組
df.groupby by none axis 0,default 0 level none int,level name,or sequence of such,default none as index true bool,default true sort true group keys tr...