# 讀取employee資料集,求出每個種族的平均工資
employee = pd.read_csv('data/employee.csv')
employee.groupby('race')['base_salary'].mean().astype(int)
race
american indian or alaskan native 60272
asian/pacific islander 61660
black or african american 50137
hispanic/latino 52345
others 51278
white 64419
name: base_salary, dtype: int32
# 對種族和性別分組,求平均工資
agg = employee.groupby(['race', 'gender'])['base_salary'].mean().astype(int)
agg'''
race gender
american indian or alaskan native female 60238
male 60305
asian/pacific islander female 63226
male 61033
black or african american female 48915
male 51082
hispanic/latino female 46503
male 54782
others female 63785
male 38771
white female 66793
male 63940
name: base_salary, dtype: int32
'''
#stack: 將資料從」**結構「變成」花括號結構「,即將其列索引變成行索引。
#unstack,資料從」花括號結構「變成」**結構「,將資料的行索引轉換為列索引
agg.unstack('gender')
gender
female
male
race
american indian or alaskan native
60238
60305
asian/pacific islander
63226
61033
black or african american
48915
51082
hispanic/latino
46503
54782
others
63785
38771
white
66793
63940
# 對索引層race做unstack
agg.unstack('race')
race
american indian or alaskan native
asian/pacific islander
black or african american
hispanic/latino
others
white
gender
female
60238
63226
48915
46503
63785
66793
male
60305
61033
51082
54782
38771
63940
#unstack處理multiindex
# 按race和gender分組,求工資的平均值、最大值和最小值
agg2 = employee.groupby(['race', 'gender'])['base_salary'].agg(['mean', 'max', 'min']).astype(int)
agg2
mean
maxmin
race
gender
american indian or alaskan native
female
60238
98536
26125
male
60305
81239
26125
asian/pacific islander
female
63226
130416
26125
male
61033
163228
27914
black or african american
female
48915
150416
24960
male
51082
275000
26125
hispanic/latino
female
46503
126115
26125
male
54782
165216
26104
others
female
63785
63785
63785
male
38771
38771
38771
white
female
66793
178331
27955
male
63940
210588
26125
# 此時unstack('gender')會生成多級列索引,可以用stack和unstack調整結構
agg2.unstack('gender')
mean
maxmin
gender
female
male
female
male
female
male
race
american indian or alaskan native
60238
60305
98536
81239
26125
26125
asian/pacific islander
63226
61033
130416
163228
26125
27914
black or african american
48915
51082
150416
275000
24960
26125
hispanic/latino
46503
54782
126115
165216
26125
26104
others
63785
38771
63785
38771
63785
38771
white
66793
63940
178331
210588
27955
26125
SQL分組聚合後的關聯處理
在進行表關聯時,理想的情況是多張表的原表資料直接可以進行關聯,形成關聯結果。在特殊情況下,需要對原始表的記錄進行提前處理,將處理之後的結果再進行關聯,以保證關聯之後結果的正確性。聚合關聯就是一種比較常見的場景,即將每張表分別先進行聚合處理,然後按照對應關聯欄位將聚合結果進行關聯。例如,有兩張原始明細...
pandas 分組聚合
綜合使用 資料鏈結 統計每個國家的星巴克的數量 按照單字段聚合 codeing utf 8 import pandas as pd import numpy as np df pd.read csv data starbucks store worldwide.csv 按國家進行分組後會有很多列 然...
分組聚合函式使用
1.mysql 的分組合併函式group concat group concat 會計算哪些行屬於同一組,將屬於同一組的列顯示出來。要返回哪些列,由函 數引數 就是欄位名 決定。分組必須有個標準,就是根據group by指定的列進行分組。例 select 分組字段,group concat 合併字段...