pandas處理丟失資料

有兩種丟失資料：

none是python自帶的，其型別為python object。因此，none不能參與到任何計算中。

object型別的運算要比int型別的運算慢得多

計算不同資料型別求和時間

%timeit np.arange(1e5,dtype=***).sum()

np.nan是浮點型別，能參與到計算中。但計算的結果總是nan

但可以使用np.nan*()函式來計算nan，此時視nan為0。

n = np.array([1
,2,3
,4,5
,np.nan]
)np.nansum(n)

df = pd.dataframe(
)df.index=
['張三'
,'李四'
,'王五'
,'趙六'
]df.iloc[0,
1]= np.nan
df.iloc[3,
2]=none
df

(1)判斷函式

# 檢測列是否有缺失
df.isnull().
any(axis=0)
df.isnull().
any(axis=
1)

(2) 過濾函式

# 預設是行
df.dropna(
)df.dropna(axis=1)
# 單純刪行或列
df.drop(labels=
['張三'
,'趙六'
],axis=0)
df.drop(labels=
['數學'
,'英語'
],axis=
1)

可以選擇過濾的是行還是列（預設為行）

也可以選擇過濾的方式 how = 『all』

df.dropna(axis=
1, how=
'any'
,subset=
['張三'
,'趙六'])
df.dropna(axis=
0, how=
'any'
,subset=
['數學'
,'英語'
])

(3) 填充函式 series/dataframe

df.fillna(value=
100)
df['數學'
].fillna(value=
100)
# method : 
# 根據前一排的填充
df.fillna(method=
'ffill'
,axis=0)
df.fillna(method=
'pad'
,axis=
1)

對於dataframe來說，還要選擇填充的軸axis。記住，對於dataframe來說：

pandas處理丟失資料

有兩種丟失資料的方式 none np.nan nan 1,none none是python當中自帶的，型別為python object，所以，none是不能參與到任何的計算當中的 2,np.nan np.nan是浮點型別，能參與到計算當中，但是計算的結果為nan pandas中none與np.nan...

Pandas處理丟失資料

pandas處理丟失資料 1 建立含nan的矩陣 dates pd.date range 20130101 periods 6 df pd.dataframe np.arange 24 reshape 6,4 index dates,columns a b c d df.iloc 0,1 np.na...

pandas處理丟失資料

pandas將none和nan視為可交換的，它們都可以用來指示丟失的資料。none可以代替丟失值哨兵值並不適合所有情況，只能用於陣列的型別為物件的情況。none會導致一些聚合操作，比如sum 和min 會報錯。nan 代替丟失值另外一中哨兵值一種特殊的浮點型資料，不管什麼操作，只要有nan，...

pandas處理丟失資料

pandas處理丟失資料

Pandas處理丟失資料

pandas處理丟失資料

相關推薦