分組聚合後unstacking

2021-10-20 10:48:12 字數 4096 閱讀 5325

# 讀取employee資料集,求出每個種族的平均工資

employee = pd.read_csv('data/employee.csv')

employee.groupby('race')['base_salary'].mean().astype(int)

race

american indian or alaskan native 60272

asian/pacific islander 61660

black or african american 50137

hispanic/latino 52345

others 51278

white 64419

name: base_salary, dtype: int32

# 對種族和性別分組,求平均工資

agg = employee.groupby(['race', 'gender'])['base_salary'].mean().astype(int)

agg'''

race gender

american indian or alaskan native female 60238

male 60305

asian/pacific islander female 63226

male 61033

black or african american female 48915

male 51082

hispanic/latino female 46503

male 54782

others female 63785

male 38771

white female 66793

male 63940

name: base_salary, dtype: int32

'''

#stack: 將資料從」**結構「變成」花括號結構「,即將其列索引變成行索引。

#unstack,資料從」花括號結構「變成」**結構「,將資料的行索引轉換為列索引

agg.unstack('gender')

gender

female

male

race

american indian or alaskan native

60238

60305

asian/pacific islander

63226

61033

black or african american

48915

51082

hispanic/latino

46503

54782

others

63785

38771

white

66793

63940

# 對索引層race做unstack

agg.unstack('race')

race

american indian or alaskan native

asian/pacific islander

black or african american

hispanic/latino

others

white

gender

female

60238

63226

48915

46503

63785

66793

male

60305

61033

51082

54782

38771

63940

#unstack處理multiindex

# 按race和gender分組,求工資的平均值、最大值和最小值

agg2 = employee.groupby(['race', 'gender'])['base_salary'].agg(['mean', 'max', 'min']).astype(int)

agg2

mean

maxmin

race

gender

american indian or alaskan native

female

60238

98536

26125

male

60305

81239

26125

asian/pacific islander

female

63226

130416

26125

male

61033

163228

27914

black or african american

female

48915

150416

24960

male

51082

275000

26125

hispanic/latino

female

46503

126115

26125

male

54782

165216

26104

others

female

63785

63785

63785

male

38771

38771

38771

white

female

66793

178331

27955

male

63940

210588

26125

# 此時unstack('gender')會生成多級列索引,可以用stack和unstack調整結構

agg2.unstack('gender')

mean

maxmin

gender

female

male

female

male

female

male

race

american indian or alaskan native

60238

60305

98536

81239

26125

26125

asian/pacific islander

63226

61033

130416

163228

26125

27914

black or african american

48915

51082

150416

275000

24960

26125

hispanic/latino

46503

54782

126115

165216

26125

26104

others

63785

38771

63785

38771

63785

38771

white

66793

63940

178331

210588

27955

26125

SQL分組聚合後的關聯處理

在進行表關聯時,理想的情況是多張表的原表資料直接可以進行關聯,形成關聯結果。在特殊情況下,需要對原始表的記錄進行提前處理,將處理之後的結果再進行關聯,以保證關聯之後結果的正確性。聚合關聯就是一種比較常見的場景,即將每張表分別先進行聚合處理,然後按照對應關聯欄位將聚合結果進行關聯。例如,有兩張原始明細...

pandas 分組聚合

綜合使用 資料鏈結 統計每個國家的星巴克的數量 按照單字段聚合 codeing utf 8 import pandas as pd import numpy as np df pd.read csv data starbucks store worldwide.csv 按國家進行分組後會有很多列 然...

分組聚合函式使用

1.mysql 的分組合併函式group concat group concat 會計算哪些行屬於同一組,將屬於同一組的列顯示出來。要返回哪些列,由函 數引數 就是欄位名 決定。分組必須有個標準,就是根據group by指定的列進行分組。例 select 分組字段,group concat 合併字段...