資料分析之Pandas（一）

首先匯入相關模組

import

pandas as pd

2from pandas import

series,dataframe

3import numpy as np

series是一種類似與一維陣列的物件，由下面兩個部分組成：

values：一組資料（ndarray型別）

index：相關的資料索引標籤

兩種建立方式：

(1) 由列表或numpy陣列建立

預設索引為0到n-1的整數型索引

#使用列表建立series

1 series(data=[1,2,3,4,5])
23 0    1
4 1    2
5 2    3
6 3    4
7 4    5
8 dtype: int64

# 使用numpy建立series

# 可以通過設定index引數指定索引
1 series(data=np.random.randint(1,40,size=(5,)),index=['
a','
d','
f','
g','
t'],name='
bobo')
23 a     3
4 d    22
5 f    35
6 g    19
7 t    21

# 由字典建立:不能在使用index.但是依然存在預設索引

1 dic =
5 s = series(data=dic)

可以使用中括號取單個索引（此時返回的是元素型別），或者中括號裡乙個列表取多個索引（此時返回的是乙個series型別）。

(1) 顯式索引：　

- 使用index中的元素作為索引值

- 使用s.loc（推薦）:注意，loc中括號中放置的一定是顯示索引

(2) 隱式索引：

- 使用整數作為索引值 - 使用.iloc（推薦）:iloc中的中括號中必須放置隱式索引 s.iloc[0:2]

可以把series看成乙個定長的有序字典

可以通過shape，size，index,values等得到series的屬性

1
s.index23
s.values45
可以使用s.head(),tail()分別檢視前n個和後n個值
6 s.head(1)

對series元素進行去重

1 s = series(data=[1,1,2,2,3,3,4,4,4,4,4,5,6,7,55,55,44])
2 s.unique()  # array([ 1,  2,  3,  4,  5,  6,  7, 55, 44], dtype=int64)

兩個series進行相加:索引與之對應的元素會進行算數運算,不對應的就補空

1 s1 = series([1,2,3,4,5],index=['
a','
b','
c','
d','e'
])2 s2 = series([1,2,3,4,5],index=['
a','
b','
f','
c','e'
])3 s = s1+s2
4 s

a 2.0b 4.0c 7.0d nan

e 10.0f nan

可以使用pd.isnull()，pd.notnull()，或s.isnull(),notnull()函式檢測缺失資料

s.notnull() #

判斷每行是否是空值

a true

b true

c true

d false

e true

f false

s.isnull 與之恰好相反，空值為true

例如：取出所以不是空值的行

dataframe屬性：values、columns、index、shape

1 dic =
5 df = dataframe(data=dic,index=['
語文','
數學','
英語','
理綜'])
字典的key作為列索引，index作為顯示索引
(1) 對列進行索引
- 通過類似字典的方式  df['q']
- 通過屬性的方式     df.q
可以將dataframe的列獲取為乙個series。返回的series擁有原dataframe相同的索引，且name屬性也已經設定好了，就是相應的列名。
例如：df['張三']
獲取多個索引
#修改列索引
(2) 對行進行索引
- 使用.loc加index來進行行索引
- 使用.iloc加整數來進行行索引
同樣返回乙個series，index為原來的columns。
(3) 對元素索引的方法
- 使用列索引
- 使用行索引(iloc[3,1] or loc['c','q']) 行索引在前，列索引在後 如：df.iloc[0,1]
總結：索引的方式
1、對列進行索引使用df,裡面放置列索引
2、對行進行索引使用.loc方顯示索引index 或.iloc放隱式索引整數
總結：1、使用中括號df[0:2] 是對行進行切片
2、使用loc、iloc是對列進行切片：df.loc['b':'c','丙':'丁']
				資料分析之Pandas
from pandas import series,dataframe import pandas as pd import numpy as np states california ohio oregon texas year 2000,2001,2002,2003 value 35000,71...
				資料分析之pandas
pandas是基於numpy構建的庫，擁有兩種資料結構 series和dataframe series 就是一維陣列 dataframe 是二維陣列series in 1 from pandas import series,dataframe in 2 import pandas as pd in ...
				python資料分析之pandas
matplotlib inline import pandas as pd import numpy as np import matplotlib.pyplot as plt 1.建立dataframe dates pd.date range 20200401 periods 6,freq 2d ...

資料分析之Pandas（一）

資料分析之Pandas

資料分析之pandas

python資料分析之pandas

相關推薦