pandas是python中高效能的資料分析庫。為資料的統計分析帶來了極大的便利。
本文以pandas中最常用的資料結構dataframe為主,總結常用知識點,如增刪改查,分組統計等等。
df=pd.dataframe()
df.empty
df.drop_duplicates()
df.replace(, regex=true, inplace=true)
import pandas as pd
data=[[101,90],[102,99],[103,]]
df=pd.dataframe(data=data,columns=['id','score'])
df.fillna("未知", inplace=true)
import pandas as pd
data=[['101',90,'2003-05-10'],['102',99,'2003-05-12'],['103',105,'2003-05-11']]
df=pd.dataframe(data=data,columns=['id','score','dt'])
#重塑 以dt作為index,id的值作為列名,去填充對應score的值
print df.pivot(index='dt', columns='id', values='score').reset_index()
df.dtypes
df.shape
df.columns.values
df.rename(columns=, inplace=true)
df.drop(['id'], axis=1)
df['id2']= pd.series(range(df.shape[0]))
df['id'] = df['id'].astype('int')
或df['id']=df['id'].map(lambda x:int(x))
df["id"].values.tolist()
df.sort_values(['colnamea','colnameb'], ascending=false)
int(df[['score']].min())
pd.concat([df1,df2])
import pandas as pd
data=[['101',90,'2003-05-10'],['102',99,'2003-05-10'],['103',105,'2003-05-11']]
df=pd.dataframe(data=data,columns=['id','score','dt'])
data2=[('101','1'),('102','0')]
df2=pd.dataframe(data=data2,columns=['id','***'])
#left join
left_join_df=df.merge(right=df2, how="left", on="id")
#right join
# right_join_df=df.merge(right=df2, how="right", on="id")
# 或right_join_df=df.merge(right=df2, how="right", left_on="id",right_on="id")
#inner join
inner_join_df=df.merge(right=df2, how="inner", left_on="id",right_on="id")
#outer join
outer_join_df=df.merge(right=df2, how="outer", on="id")
總結:left join類似於sql的left join。如a left join b,產生a的完整集,b中沒有匹配上的以nan代替。同理 right join、inner join、outer join。
pandas中dataframe中merge,預設是inner join。
#這些操作類似於sql中的where條件。
import pandas as pd
data=[['101',90,'2003-05-10'],['102',99,'2003-05-10'],['103',105,]]
df=pd.dataframe(data=data,columns=['id','score','dt'])
#且print df[(df['id'].isin(['101','102']) & (df['score']==90))]
#取反print df[~df['id'].isin(['101','102'])]
#不等式
print df[df['score']>=99]
#過濾掉非空
print df[df['dt'].notnull()]
import pandas as pd
data=[
(1,80,90,1,2),
(2,60,80,1,2),
(3,70,90,1,3),
(4,90,80,1,3),
(5,60,90,1,3),
(6,50,80,2,3),
(7,80,90,2,3),
(8,70,70,2,1),
(9,90,90,2,1),
(10,80,90,2,1)
]df=pd.dataframe(data=data,columns=['id','language','math','grade','class'])
#分組求和---按grade,class分組,對language,math分別求和
print df.groupby(['grade','class'])['language','math'].sum().reset_index()
#分組排序求topn----按grade,class分組,取每組math最大的topn
print pd.concat([subgroup.sort_values(['math'],ascending=false).head(2)
for subgroupname,subgroup in df.groupby(['grade','class'])])
data=[(101,90),(102,99),(103,99)]
df=pd.dataframe(data=data,columns=['id','score'])
data=[[101,90],[102,99],[103,99]]
df=pd.dataframe(data=data,columns=['id','score'])
data=[,,]
df=pd.dataframe(data=data)
df=pd.read_excel(io="excelpath.xlsx", sheetname="sheetname")
#注:讀取tsv檔案,將分割符換成\t即可。
df=pd.read_csv(filepath,sep=',',names=['colnamea','colnameb'...])
df=pd.read_json(filepath, lines=true)
df.to_csv("csvresult.csv",index=false,header=true,sep=',',encoding='utf-8-sig')
df.to_json(path_or_buf="jsondata.json",orient='records', lines=true)
df=pd.read_sql(sql=sql,con=conn)
Pandas知識點總結
import pandas as pd s pd.series 1 2,3 4,5 6,np.nan np.nan相當於什麼都沒有 dates pd.date range 20200101 periods 6 初始化六個日期 df pd.dataframe np.random.randn 6,4 i...
pandas知識點總結
pandas學習筆記 1.一維資料結構 series 物件 b pd.series data 1,2,3 利用陣列建立series物件 b 0 1 1 2 2 3 dtype int64 type b pandas.core.series.series a 利用字典dict建立series物件 c ...
常用硬體知識點總結
浮空輸入 一般多用與外部按鍵輸入。io 的電平狀態是不確定的,完全由外部輸入決定。輸入引腳上任何雜訊都會改變輸入段檢測到的電平。帶上拉輸入 內接上拉電阻輸入,當輸入引腳懸空時讀到的是1。帶下拉輸入 內接下拉電阻輸入,當輸入引腳懸空時讀到的是0。模擬輸入 應用 adc 模擬輸入 推挽輸出 推挽電路是兩...