Python資料操作彙總

建立dataframe

df=pd.dataframe(
[[list1]
,[list2]..
.].t,index=
['list1'
,'list2'
])

dataframe 改變列名

方法一

df.columns=
['names'
,'scores'
]

方法二 reindex以及reindex_like

index是行 columns是列

df.reindex(index=
['c'
,'f'
,'b'
], columns=
['three'
,'two'
,'one']) 
df.reindex(
['c'
,'f'
,'b'
], axis=
'index'
)df.reindex_like(df2)

方法三 rename

movies_df.rename(columns=
, inplace=
true
)

df.rename(columns=
,          index=
)

dataframe 排序

df.sort_values(by=
['list1'
,'list2'
],ascending=
true
)

df.sort_values(by=
"grade"
)

列表推導式

%整除，zip函式的使用

[x for x in
range
(1.8
)if x%2==
0]

a=

重複repeat

ls=[1
,2,3
,4]ls*
2# 方法一
sorted
(ls*2)
np.repeat(ls,2)
#方法二

檢視dataframe變數資訊

movies_df.info(
)

檢視dataframe維度

movies_df.shape

dataframe去掉重複

temp_df =temp_df.drop_duplicates(inplace=
true
,keep=
false
)

dataframe描述

movies_df[
'genre'
].describe(
)

dataframe按列計數

movies_df[
'genre'
].value_counts(
)

dataframe切片操作

字典不可以切片需要自取

df[0:
3]#選擇行
df["a"
]#選擇列
df.loc  #selection by label
df.iloc #selection by position
df.at   #快速選擇
df.iat

根據條件判斷進行切片選擇

df[df >0] 
df[df.a >
0]

isin判斷 %in%

isin

df2[df2[
'e']
.isin(
['two'
,'four'])
] movies_df[movies_df[
'director'
].isin(
['christopher nolan'
,'ridley scott'])
].head(
)

根據isin取反

~movies_df[
'director'
].isin(
['christopher nolan'
,'ridley scott'
])

賦隨機值

df.loc[:,
'd']
= np.random.randint(0,
7, size=
10)

根據條件在數字前加負號

df2[df2 >0]
=-df2

去掉缺失值

df1.dropna(how=
'any'
)

df.
(lambda x: x.
max(
)- x.
min())
lambda x:x if
/for..
..movies_df[
"rating_category"
]= movies_df[
"rating"].
(lambda x:
'good'
if x >=
8.0else
'bad'
)

dataframe的豎直拼接rbind操作

true

)#方法一

pieces =

[df[:3

], df[3:

7], df[7:

]]#方法二

pd.concat(pieces)

mergy 合併

pd.merge(left, right, on=
'key'
)

類似aggregate的分類合計

df.groupby(
['a'
,'b'])
.sum
()

dataframe多列合併為一列 stack操作

stacked = df2.stack(
)

python pivot_table 資料透視表

pd.pivot_table(df, values=『d』, index=[『a』, 『b』], columns=[『c』])

python 改變變數型別 astype

df[
"grade"
]= df[
"raw_grade"
].astype(
"category"
)# as.factor
dft[
['a'
,'b']]
= dft[
['a'
,'b']]
.astype(np.uint8)

dataframe刪除列 drop

df.drop(
['a'
,'d'
], axis=
0)

dataframe刪除行參考dataframe切片操作

計算dataframe相關係數corelation

movies_df.corr(
)

多重判斷選擇或判斷操作or %in%

movies_df[
(movies_df[
'director']==
'christopher nolan')|
(movies_df[
'director']==
'ridley scott')]
.head(
)

返回判斷索引 which where操作

方法一
a = df[
(df.boolcol==3)
&(df.attr==22)
].index.tolist(
)df[
'names'
].tolist(
).index(
'random'
)#返回第乙個
方法二np.where(
)np.where(df[
'names']==
'random'
)#返回所有
方法三 索引切片
df.loc[df[
'names']==
'random'
,'scores'
]

判斷索引根據值比較大小

vframe.scores>
float
(vframe.loc[vframe[
'names']==
'random'
,'scores'])
vframe.loc[vframe.scores>
float
(vframe.loc[vframe[
'names']==
'random'
,'scores'])
,'names'
]df.a>df.a[df.b==1]
.iloc[
0]

list to pandas series

x_label_update=pd.series(x_label_update)

python檔案操作整理彙總

python中對檔案資料夾檔案操作函式的操作需要涉及到os模組和shutil模組。得到當前工作目錄，即當前python指令碼工作的目錄路徑 os.getcwd 返回指定目錄下的所有檔案和目錄名 os.listdir 函式用來刪除乙個檔案 os.remove 刪除多個目錄 os.removedi...

python 列表方法操作彙總

list主要操作索引切片查詢修改增加刪除擴充套件統計排序獲取下標拷貝由圖通過元素查詢索引 index 1 list a b a d e a 2 print list.index a 查詢第乙個元素a對應的下標 3 print list.index a 3 查詢第從四個元素a...

Python 檔案相關操作彙總

1.檔案操作流程 1.開啟檔案，得到檔案控制代碼並賦值給乙個變數 2.通過控制代碼對檔案進行操作 3.關閉檔案 file object open file name access mode buffering 各個引數的細節如下檔案操作簡單 1 coding utf 8 23 filename s...

Python資料操作彙總

python檔案操作整理彙總

python 列表方法操作彙總

Python 檔案相關操作彙總

相關推薦