Python資料處理 Pandas模組使用（七）

產生資料

import numpy as np
import pandas as pd
from pandas import dataframe
df = dataframe()
print(df)
#    data1     data2 key1 key2
# 0      1  0.493919    a  one
# 1      2 -0.887280    a  two
# 2      3 -0.267727    b  one
# 3      2  0.048972    b  two
# 4      5  0.309441    a  one

分組groupby

pandas中最為常用和有效的分組函式。

# 根據單個列分組
group1 = df.groupby('key1')
print(group1)
# # 根據多個列分組
group2 = df.groupby(['key1','key2'])
print(group2)
# # 遍歷group1中的元素
l=[x for x in group1]
print(l)
# [('a',    data1     data2 key1 key2
# 0      1  0.493919    a  one
# 1      2 -0.887280    a  two
# 4      5  0.309441    a  one), 
#  ('b',    data1     data2 key1 key2
# 2      3 -0.267727    b  one
# 3      2  0.048972    b  two)]

# 呼叫聚合函式列印出每個組中結果
print(group1.size())
# key1
# a    3
# b    2
# dtype: int64
print(group1.sum() )
#       data1     data2
# key1                 
# a         8 -0.083919
# b         5 -0.218755
print(group1.count())
#       data1  data2  key2
# key1                    
# a         3      3     3
# b         2      2     2

print(group1['data1'].agg('mean'))
# key1
# a    2.666667
# b    2.500000
# name: data1, dtype: float64
print(group1['data1'].agg(['mean','sum']))
#           mean  sum
# key1
# a     2.666667    8
# b     2.500000    5
print(group1['data1','data2'].agg(['mean','sum']))
#          data1         data2
#           mean sum      mean       sum
# key1
# a     2.666667   8 -0.556335 -1.669005
# b     2.500000   5  0.052789  0.105577

# data1 data2

# key1

# a 2.666667 0.083070

# b 2.500000 -0.346864

# data1 data2

# key1 key2

# a one 3.0 0.556917

# two 2.0 -0.864626

# b one 3.0 0.723882

# two 2.0 -1.417610

透視表pivot_table

可以產生類似於excel資料透視表的結果，相當的直觀。

print(pd.pivot_table(df, index='key1', columns='key2'))
#      data1         data2          
# key2   one two       one       two
# key1                              
# a        3   2  0.246833  1.018249
# b        3   2 -0.508228  1.298586
print(df.pivot_table(['data1'], index='key1',columns='key2'))
#      data1
# key2   one two
# key1
# a        3   2
# b        3   2

print(df.pivot_table(index='key1',columns='key2', margins=true)) # data1 data2 # key2 one two all one two all # key1 # a 3.0 2.0 2.666667 0.246833 1.018249 0.503971 # b 3.0 2.0 2.500000 -0.508228 1.298586 0.395179

# all 3.0 2.0 2.600000 -0.004854 1.158417 0.460454

交叉表crosstab

可以按照指定的行和列統計分組頻數，用起來非常方便；當然同樣的功能也可採用groupby實現。

print(pd.crosstab(df.key1,df.key2, margins=true))
# key2  one  two  all
# key1
# a       2    1    3
# b       1    1    2
# all     3    2    5

學習Python大資料處理模組Pandas

適合初學入門本節基本了解pandas裡的一些資料結構和模組的基本使用，初步了解pandas的提供的一些功能，學會基本使用。通過python的zip構造出一元組組成的列表作為dataframe的輸入資料rec。in 3 import pandas as pd in 4 import random i...

Python 資料處理

將檔案切分，存入列表 strip split with open james.txt as jaf data jaf.readline james data.strip split 資料檔案為 2 34,3 21,2.34,2.45,3.01,2 01,2 01,3 10,2 22 print ja...

Python 資料處理

本場 chat 為 python 資料處理課程，包括 python 基礎知識極簡教程提公升 python 執行效率的方法爬蟲簡介 scrapy selenium 自動化測試框架簡易分布式 redis 分詞程式設計 jieba 資料儲存本地資料上傳 hive 通過本場 chat 讀者可學到以...

Python資料處理 Pandas模組使用（七）

學習Python大資料處理模組Pandas

Python 資料處理

Python 資料處理

相關推薦