pandas 個人使用筆記

主要是使用 pandas 過程中問題的記錄，避免重複搜尋

# 讀檔案
val_x = pd.read_csv('val_feature.csv') 
# 分塊讀取
reader = pd.read_csv(file, iterator=true, error_bad_lines=false)
chunk = reader.get_chunk(1000)
# 根據某一列排序，改變原來data frame
val_x .sort_values(by='time', ascending=true)
# 分組，可以通過 for id, data in grouped 獲取組號和資料
grouped = val_x .groupby(['id_road'])
# 使用pd.timestamp 將字串型別轉換為日期格式
time_stamp = pd.to_datetime(raw_features['time'])
# 獲取日期中的年月日
time_stamp.dt.month
# 有序編碼
times = raw_features['time']
times.sort_values(ascending=true)
raw_features["time"] = pd.factorize(times)[0]
# 獲取某一列的資料，只能使用標籤索引，不能使用整數索引，通過便簽索引切邊進行篩選時，前閉後閉。
df.loc[:, ['id_sample']]
# 選取所有age大於30的行
df.loc[df['age']>30,:]
# 只能使用整數索引，不能使用標籤索引，通過整數索引切邊進行篩選時，前閉後開。得到前三行
df.iloc[:3, :]
# 既可以使用標籤索引，也可以使用整數索引。選取第3行的name資料
df.ix[2,'name']
# 單元格選取
df.at['b','name']
df.iat[1,0]
# 新增一列，值為1
val_x['add_column'] = 1 
# 在指定位置插入某一列，並設定列名，賦值
x_test.insert(1, 'speed', value=y_pred)
# 刪除某一列
raw_features.drop('time', axis=1)
# pandas將指定列的重複元素刪除，同時不保留任何乙個
df.drop_duplicates(['output'], inplace=true, keep=false) 
# 行拼接
speed_x = pd.concat([speed_x, tmp_x])
# 根據相同列拼接，取交集
df3 = pd.merge(df1, df2, on='key')
# 兩邊欄位名不同
df3 = pd.merge(df1, df2, left_on='lkey', right_on='rkey')
# 外連線其實左連線和右連線的並集。左連線是左側dataframe取全部資料，右側dataframe匹配左側dataframe。（右連線right和左連線類似）
df3 = pd.merge(df1, df2, how='left/outer')
# 轉list
val_x = val_x.values.tolist() 
# 寫入csv，保留header，不保留索引
id_sample.to_csv(output_file, header=true, index=false)

pandas使用筆記

pandas預設的處理物件是dataframe，安裝之後載入 import pandas as pddataframe處理經常因為缺失值報錯將df的缺失值成空字串 df df.fillna writer pd.excelwriter data.xlsx df1.to excel writer,she...

git個人使用筆記

記錄在使用git時用到的一遠端倉庫第1步建立ssh key。在使用者主目錄下，看看有沒有.ssh目錄，如果有，再看看這個目錄下有沒有id rsa和id rsa.pub這兩個檔案，如果已經有了，可直接跳到下一步。如果沒有，開啟shell windows下開啟git bash 建立ssh key...

pytorch 個人使用筆記

debug 主要記錄遇到的bug和函式的筆記，減少重複搜尋 torch.eye n,m none,out none 得到單位矩陣，即對角線為1，不設定 m 預設方陣 torch.cat a,b dim 按維度拼接 torch.sum input,list dim,bool keepdim false...

pandas 個人使用筆記

pandas使用筆記

git個人使用筆記

pytorch 個人使用筆記

相關推薦