pandas處理日期缺失

#兩種方法實現
'''1）dataframe.reindex：調整dataframe的索引以適應新的索引
2） dataframe.resample，可以對時間序列重新取樣，支援補充缺失值
'''import pandas as pd 
import os
%matplotlib inline

df = pd.dataframe(
)df

pdate

pvuv

02019-12-01

10010

12019-12-02

20020

22019-12-04

40040

32019-12-05

50050

#df缺少2019-12-03這個日期
df.set_index(
'pdate'
).plot(
)

[外鏈轉存失敗,源站可能有防盜煉機制,建議將儲存下來直接上傳(img-cscc4omn-1601386256504)(output_2_1.png)]

#方法一

pdate

pvuv

02019-12-01

10010

12019-12-02

20020

22019-12-04

40040

32019-12-05

50050

df_date = df.set_index(
'pdate'
)df_date
#將索引設定為日期,並且將index改為時間序列的格式

pvuv

pdate

2019-12-01

10010

2019-12-02

20020

2019-12-04

40040

2019-12-05

50050

df_date.index

index(['2019-12-01', '2019-12-02', '2019-12-04', '2019-12-05'], dtype='object', name='pdate')

#將dataframe的index從字串轉換成時間序列，再輸出相應index為時間序列的dataframe
df_date = df_date.set_index(pd.to_datetime(df_date.index)
)df_date.index

datetimeindex(['2019-12-01', '2019-12-02', '2019-12-04', '2019-12-05'], dtype='datetime64[ns]', name='pdate', freq=none)

pdate = pd.date_range(start =
'2019-12-01'
,end =
'2019-12-05'
)pdate

datetimeindex(['2019-12-01', '2019-12-02', '2019-12-03', '2019-12-04',
'2019-12-05'],
dtype='datetime64[ns]', freq='d')

df_date_new = df_date.reindex(pdate,fill_value=0)
#注意，重新設定index是reindex，不是set_index
df_date_new

pvuv

2019-12-01

10010

2019-12-02

20020

2019-12-0300

2019-12-04

40040

2019-12-05

50050

『』』reindex()方法用於建立乙個符合新索引的新物件

①對於series型別，呼叫reindex()會將資料按照新的索引進行排列,如果某個索引值之前不存在，則引入缺失值

如：②dataframe中，reindex()可以改變行索引和列索引

reset_index()，顧名思義，即設定索引。可以設定單索引和復合索引

呼叫這個函式會生成乙個新的dataframe, 新的df使用乙個列或多個列作為索引

reset_index()，它是set_index()的反操作，呼叫它分層索引的索引層級會被還原到列中

df_date_new.plot(
)

[外鏈轉存失敗,源站可能有防盜煉機制,建議將儲存下來直接上傳(img-gkr5se3o-1601386256511)(output_10_1.png)]

#方法二：pandas.resample方法

df

pdate

pvuv

02019-12-01

10010

12019-12-02

20020

22019-12-04

40040

32019-12-05

50050

df_new2 = df.set_index(pd.to_datetime(df[
'pdate'])
)#df_new2

pdate

pvuv

pdate

2019-12-01

10010

2019-12-02

20020

2019-12-04

40040

2019-12-05

50050

df_new2 = df.set_index(pd.to_datetime(df[
'pdate'])
).drop(
'pdate'
,axis=1)
df_new2

pvuv

pdate

2019-12-01

10010

2019-12-02

20020

2019-12-04

40040

2019-12-05

50050

df_new3 = df_new2.resample(
'd')
.mean(
).fillna(0)
df_new3

pvuv

pdate

2019-12-01

100.0

10.0

2019-12-02

200.0

20.0

2019-12-03

0.00.0

2019-12-04

400.0

40.0

2019-12-05

500.0

50.0

Pandas 處理缺失資料

import numpy as np import pandas as pd from pandas import series,dataframes series a b np.nan,c d pd.isnull s 0 false 1 false 2 true 3 false 4 false d...

pandas處理缺失資料

na處理方法方法說明 dropna 根據各標籤的值中是否存在缺失資料對軸標籤進行過濾，可通過閾值調節對缺失值得容忍度 fillna 用指定值或插值方法如ffill和bfill 填充缺失資料 isnull 返回乙個含有布林值的物件，這些布林值表示哪些值是缺失值na,該物件的型別與源型別一樣 no...

Pandas 缺失值處理

二處理缺失值首先拿到乙份資料，以dataframe提取後，要檢視缺失值的情況 import pandas as pd df pd.read csv df.isnull 獲得true,false的返回值 df.isnull sum 判斷缺失的數量常用此介面來快速判斷各特徵的缺失值情況！df.dro...

pandas處理日期缺失

Pandas 處理缺失資料

pandas處理缺失資料

Pandas 缺失值處理

相關推薦