3 Pandas資料初探索之索引調整方法

（1）重新索引

在pandas物件中，其實索引也是乙個物件，所以可對其進行修改。

例如：df.index=['a','b','c']

>>> df = 
>>> df = pd.dataframe(df)
>>> df
one       two     three
0 -0.996986  0.190981  0.482912
1 -0.233812 -0.140953  0.052706
2  0.470900  0.590664  0.486823
#設定索引
>>> df.index=['a','b','c']
>>> df
one       two     three
a -0.996986  0.190981  0.482912
b -0.233812 -0.140953  0.052706
c  0.470900  0.590664  0.486823

（2）設定新的索引

reindex()：重新索引並得到乙個新的pandas物件。

且reindex()方法不僅可以重新索引dataframe，也可以同時實現過濾功能。

>>> new = df.reindex(['b','c','e'])
>>> new
one       two     three
b -0.233812 -0.140953  0.052706
c  0.470900  0.590664  0.486823
e       nan       nan       nan

reindex()也可以用來調整列的順序，這時需要設定axis引數為'columns'或1；

>>> df.reindex(['three','two','one'],axis='columns')
three       two       one
a  0.766450  0.452801  1.286715
b  0.342262  1.523188  0.620788
c  0.867786  0.758714 -2.343242

（3）使用set_index()可以指定某一列為索引，這在對日期型資料或者是以名稱進行區分的資料非常有用，後期會以例項描述的更為詳細。

>>> df.set_index('one')
two     three
one-0.996986  0.190981  0.482912
-0.233812 -0.140953  0.052706
0.470900  0.590664  0.486823

上述的方法在將某一列設定為索引後，特徵不會將該列進行儲存，若需要將設定為索引的列保留在資料中，則需要將引數drop設定為false：

>>> df.set_index('one',drop=false)

one two three

one 1.286715 1.286715 0.452801 0.766450

0.620788 0.620788 1.523188 0.342262

-2.343242 -2.343242 0.758714 0.867786

>>> df

one two three

a 1.286715 0.452801 0.766450

b 0.620788 1.523188 0.342262

c -2.343242 0.758714 0.867786

two three

onea 1.286715 0.452801 0.766450

b 0.620788 1.523188 0.342262

c -2.343242 0.758714 0.867786

修改索引和列名的標籤可以使用rename()方法結

合字典、series

或者乙個原函式將標籤對映為乙個新的標籤

。（1）關於結合字典可以參照3-pandas之series和dataframe區別的第六部分，

（2）使用函式對映的方式

例：將字串的大寫轉換函式傳入，對列標籤進行修改

>>> df.rename(columns=str.upper)
one       two     three
a -0.996986  0.190981  0.482912
b -0.233812 -0.140953  0.052706
c  0.470900  0.590664  0.486823

例：結合lambda：將所有的列的前2個字元大寫，其餘小寫

>>> df.rename(columns=lambda x:x[:2].upper()+x[2:].lower())
one       two     three
a -0.996986  0.190981  0.482912
b -0.233812 -0.140953  0.052706
c  0.470900  0.590664  0.486823

層次化索引可以基於series和dataframe建立更加高維的資料。

也就是說，若有乙個dataframe是乙個堆積式的（在乙個軸上需要建立不止乙個索引），那麼此時就需要用到層次化索引，這和panel有些類似。但是在實際中並不是很常用！

建立乙個層次化索引：

>>> data=pd.series(np.random.randn(5),index=[['a','a','b','b','b'],['a1','a2','b1','b2','b3']])
>>> data
a  a1    0.792324
a2   -0.650764
b  b1   -0.282874
b2   -1.402477
b3   -3.551578
dtype: float64
#檢視索引
>>> data.index
multiindex(levels=[['a', 'b'], ['a1', 'a2', 'b1', 'b2', 'b3']],
codes=[[0, 0, 1, 1, 1], [0, 1, 2, 3, 4]])
#levels包含了每個級別索引的標籤，labels是對每個資料在對應不同levels的位置進行了標記

每個index均有乙個屬性（名稱names），可通過.index.names對索引列的列名進行建立於修改

>>> data.index.names=['first','second']
>>> data
first  second
a      a1        0.792324
a2       -0.650764
b      b1       -0.282874
b2       -1.402477
b3       -3.551578
dtype: float64

重排級別順序是基於有索引個數》=1的dataframe。

（1）swaplevel():將columns軸上的索引級別進行互換。

>>>df.swaplevel(0,1,axis=1)

（2）reorder_levels()：指定多個級別的順序

（3）提取資料還是可以使用iloc()與loc()

（4）unstack()：若index軸上有多個級別索引的dataframe，使用該方法將指定級別(level引數)安排在columns上形成乙個新的dataframe

（3）pandas 缺失資料處理

右擊桌面上選擇 open in terminal 開啟終端。在彈出的終端中輸入 ipython 進入python的直譯器中，如圖1所示。圖1 ipython直譯器匯入實驗常用的python包。如圖2所示。import pandas as pd pandas用來做資料處理。import numpy a...

初探pandas 索引和查詢資料

利用pandas查詢資料 import pandas as pd ser pd.series range 0 10,2 print ser 0 0 1 2 2 4 3 6 4 8 dtype int64通過index檢視索引值 print ser.index rangeindex start 0,s...

2 Pandas資料初探索之資料型別與資料篩選

1 pandas的資料型別主要結合了pandas和numpy兩個模組中的資料型別，包括以下幾種 2 通過dateframe物件的dtypes屬性,可得到各個特徵的資料型別。例如 df.dtypes 3 型別轉換 1 通過astype 轉換資料型別，預設情況下會產生物件轉換後的乙個副本 df a df...

3 Pandas資料初探索之索引調整方法

（3）pandas 缺失資料處理

初探pandas 索引和查詢資料

2 Pandas資料初探索之資料型別與資料篩選

相關推薦