pandas使用技巧總結 1

記錄工作中使用過的一些pandas技巧

匯入必要的庫

並設定路徑

對得到資料預處理

cs = well_data['壓裂段'].str.split('-',n=2,expand=true)
#壓裂段形式：井號+層號+段號#
#.str.split()對應list.split()#
#''：以什麼分割，預設空格，n：分成幾段，預設-1(all)，expand:預設false，#
#即分割後形成乙個series，當為true時，根據分隔的列數形成dataframe#
cols = ['井號','層號','段號']
#將pandas某一列移到最後一列去

歸一化4種歸一化方式：

1.rescaling (min-max normalization) 有時簡稱normalization(縮放到0，1之間但沒有改變資料分布)

x ′=

x−min⁡(x

)max⁡(

x)−min⁡(

x)x^=\frac

x′=max(x

)−min(x)

x−min(x)

2.mean normalization

x ′=

x−mean⁡(

x)x−

min⁡(x

)x^=\frac(x)}

x′=x

−min(x

)x−m

ean(

x)3.standardization(z-score normalization)（縮放到0附近但沒有改變資料分布）

x ′=

x−mean⁡(

x)σx^=\frac(x)}

x′=σx−

mean

(x)

4.scaling to unit length

x ′=

x∥x∥

x^=\frac

x′=∥x∥

x由於資料集各組資料間差異較大，需對資料進行歸一化處理

well_data = (well_data -well_data.min()) / (well_data.max() - well_data.min())

Pandas 使用技巧（一）

pandas 的列表pd series 1,2,3,4 它會為每乙個資料設定乙個序號 dtype，列表中資料的格式 dataframe pandas 的矩陣，在pandas中叫做dataframe,它是乙個大的矩陣類似於二維的numpy，資料為numpy的資料，但是為每行指定索引和每一列指定索引，結...

pandas 使用技巧記錄

這個drop duplicate方法是對dataframe格式的資料，去除特定列下面的重複行。返回dataframe格式的資料。dataframe.to csv des file panda聚合分組當檔案過大無法單次讀取到記憶體時，可以採用iterator進行多次讀取 reader pd.read...

Pandas使用總結

第一參加阿里天池舉辦的比賽，關於美年雙高的比賽，成績0.08。因為之前只學習過理論，沒有具體做過專案，對python會一些，但對pandas基本從零開始。比賽初期基本copy大佬的資料處理部分的後面再在此基礎上修改。天池的比賽，比較貼近實際的專案，還是很有參與價值的。在這個比賽中，我也認識到了特...

pandas使用技巧總結 1

Pandas 使用技巧（一）

pandas 使用技巧記錄

Pandas使用總結

相關推薦