python編碼問題記錄2

python3中

此qr資料是utf-8編碼的資料格式，並指定errors的錯誤處理方式errors='ignore'

在讀，寫資料的時候直接指定編碼格式

with open(data_path,'r',encoding='utf-8',errors='ignore') as file_content:

整個讀入和寫進去不報錯，且能看到錯誤處理方式為：

def split_qr(data_path,output_path):

with open(output_path,'a',encoding="utf-8") as f:

with open(data_path,'r',encoding='utf-8',errors='ignore') as file_content:

for line in file_content:

print("line in file_content: ") # 挨個輸出文字每一行，比如輸出文字第一行：比較好哪方面的壓力啊

print(line) # 之前的寫print("line in file_content: "，line),輸出會亂碼可能是由於環境是python2的問題，

query_reply_arr = line.strip('\n').split('\t') # 原始資料的每一行，q r用之間用\t分割開了，每行的末尾其實有\n

if len(query_reply_arr) != 2: # 如果只有乙個q,或只有乙個r則跳過

continue

label=str('0')

out_line_q= '%s\t%s\n' % (label, query_reply_arr[0].strip(' '))#將label與content合併

f.write((out_line_q))

out_line_r= '%s\t%s\n' % (label, query_reply_arr[1].strip(' '))#將label與content合併

f.write((out_line_r))

split_qr("./part100_train_raw_data.txt","./splited_qr.txt")

盧老師的**會有,很可能是由於其用的是python2.

llabel.encode("utf-8"), word_content.strip(' ').encode("utf-8")

相關：

python問題記錄

今天才python群裡看到乙個問題 python2.7 l x for x in hello print lprint x python3.4 l x for x in hello print l print x 兩者都可以列印出 l h e l l o 但是只有python2.7可以列印出變數x的...

python爬蟲問題記錄

環境搭建基本庫框架打碼平台 pycharm官方使用文件 python命名規範 python中文文件啟動參考 cd d e mongodb bin mongob dbpath e mongodb data db 驗證是否啟動成功啟動參考 cd d e redis redis server re...

Python學習問題記錄

在學習python的是後遇到點小問題，記錄下來以後忘了再來看看。一.python2 和python3在繼承父類的時候是不同的。super 是乙個特殊的函式，幫助python將父類和子類關聯起來。在python3中，直接使用如下 class father def init self,make,kkk,...

python編碼問題記錄2

python問題記錄

python爬蟲問題記錄

Python學習問題記錄

相關推薦