pandas資料分組運算 groupby

2021-08-14 10:24:42 字數 3516 閱讀 8414

(1)、按列分組

import pandas as pd

import numpy as np

df = dataframe()

df

data1   data2   key1    key2

0 -1.488061 -0.002241

aone

10.707773

0.338733

atwo

2 -1.689161

0.647643 b one

30.987463 -0.584322 b two

4 -0.560973 -1.147191

aone

依據單個列名』key1』進行為分組

group1 = df.groupby('key1')  

[x for x in group1]

[('a',       data1     data2 key1 key2

0 -1.488061 -0.002241 a one

10.707773

0.338733 a two

4 -0.560973 -1.147191 a one),

('b', data1 data2 key1 key2

2 -1.689161

0.647643 b one

30.987463 -0.584322 b two)]

依據多個列名[『key1』,』key2』]進行分組

group2 = df.groupby(['key1','key2'])  

[x for x in group2]

[(('a', 'one'),       data1     data2 key1 key2

0 -1.488061 -0.002241 a one

4 -0.560973 -1.147191 a one),

(('a', 'two'), data1 data2 key1 key2

10.707773

0.338733 a two),

(('b', 'one'), data1 data2 key1 key2

2 -1.689161

0.647643 b one),

(('b', 'two'), data1 data2 key1 key2

30.987463 -0.584322 b two)]

其中,group1是乙個中間分組變數,為groupby型別;

推導式[x for x in group1]用於顯示分組內容

(2)、分組統計

對分組group1、group2分別應用size()、sum()、count()等統計函式,可分別統計分組的數量、不同列的分組和、不同列的分組數量。

group1.size()
key1

a 3

b 2

dtype: int64

group1.sum()
data1   data2

key1

a -1.341260 -0.810698

b -0.701698

0.063321

group2.size()
key1  key2

aone

2two

1b one

1two

1dtype: int64

group2.count()
data1   data2

key1 key2

aone22

two1

1b one11

two1

1

(3)、agg()

agg(func)可對分組後的某一列或者多個列的資料應用func函式,也可推廣到同時作用於多個列和多個函式上。

例:對分組後的』data1』列求均值

group1['data1'].agg('mean')
key1

a -0.447087

b -0.350849

name: data1, dtype: float64

例:對分組後的』data1』和』data2』列分別求均值、求和

group1['data1','data2'].agg(['mean','sum'])
data1            data2

mean sum mean sum

key1

a -0.447087 -1.341260 -0.270233 -0.810698

b -0.350849 -0.701698

0.031660

0.063321

data1       data2

key1

a -0.447087 -0.270233

b -0.350849

0.031660

data1       data2

key1 key2

aone -1.024517 -0.574716

two0.707773

0.338733

b one -1.689161

0.647643

two0.987463 -0.584322

(5)、reset_index()

通過reset_index()函式可以將groupby()的分組結果轉換成dataframe物件,進而儲存。

group1['data1','data2'].agg(['mean','sum']).reset_index()
key1   data1                    data2

mean sum mean sum

0a -0.447087 -1.341260 -0.270233 -0.810698

1 b -0.350849 -0.701698

0.031660

0.063321

pandas聚合運算,分組運算

分組運算,先根據一定規則拆分後的資料,然後對資料進行聚合運算,如前面見到的 mean sum 等就是聚合的例子。聚合時,拆分後的第乙個索引指定的資料都會依次傳給聚合函式進行運算。最後再把運算結果合併起來,生成最終結果。先生成乙個dataframe 用key1的索引分類再求平均 df.groupby ...

pandas分組運算(groupby)

按a列分組 groupby 獲取其他列的均值 方法1 b df b groupby df a mean 按a列分組,獲取b列的均值 print b 方法2 b df.ix 1 groupby df.ix 0 mean 按a列分組 0對應a列,1對應b列 獲取b列的均值 print b 方法3 2.聚...

Pandas之資料分組

df.groupby by none axis 0,default 0 level none int,level name,or sequence of such,default none as index true bool,default true sort true group keys tr...