Python遍歷路徑下檔案並轉換成UTF 8編碼

os.walk很方便，下面寫了兩個版本的函式進行遍歷，分別是不使用walk和使用walk的。

import sys
import string
import os
def detect_nowalk(dir_path):
files = os.listdir(dir_path)
for filename in files:
print "file:%s\n" % filename
next = os.path.join(dir_path, filename)
if os.path.isdir(next):
print "file folds:%s\n" % filename
detect_nowalk(next)
if __name__ == "__main__":
detect_nowalk(".")

import sys
import os
def detect_walk(dir_path):
for root, dirs, files in os.walk(dir_path):
for filename in files:        
print "file:%s\n" % filename
for dirname in dirs:
print "dir:%s\n" % dirname
if __name__ == "__main__":
detect_walk(".")

另外附上使用第一種方法轉換檔案編碼的原始碼，有的檔案轉換後用gedit開啟是亂碼，但用vi檢視是正確的。

import sys
import string
import codecs
import os
import shutil
def gbktoutf8(path):
files = os.listdir(path)
for filename in files:
if os.path.isdir(filename):
print "file folds:%s\n" % filename
gbktoutf8(filename)
continue
try:
tokens = string.splitfields(filename, '.')
if len(tokens) != 2 or tokens[1] != 'txt':
#print tokens[1]
continue
else:
print 'encode converting (gbk to utf-8) : ', filename
utffile=open(filename)
tstr = utffile.read()
#tstr = utffile.read().decode("gbk") is wrong
tstr = tstr.encode("utf-8")
utffile.close()
utffile = open(filename, 'w')
utffile.write(tstr)
utffile.close()
except:
print "error %s" %filename
if __name__ == "__main__":
gbktoutf8(".")

1.14更新：發現linux自帶的iconv -f gb18030 -t utf8 a.txt >> b.txt更好用，而且有的用decode("gb18030")會出現亂碼（"gbk"一樣亂碼）的情況不再存在。在python指令碼不難呼叫，就不詳細寫了。

Python遍歷路徑下檔案並轉換成UTF 8編碼

os.walk很方便，下面寫了兩個版本的函式進行遍歷，分別是不使用walk和使用walk的。import sysimport string import osdef detect nowalk dir path files os.listdir dir path for filename in fi...

c 遍歷指定路徑下檔案

最近專案中用到監測資料夾下檔案，並將新的檔案剪下到其他路徑下。使用c 實現，從指定路徑掃瞄獲取檔案，每次獲取100個檔案，實時監測，將檔案分別存到其他路徑下，如下正在複製檔案列表 public listcopyfilelist public string dstdirectory c testfi...

python 遍歷資料夾下檔案

需求描述 1 讀取指定目錄下的所有資料集檔案 2 讀取指定檔案，輸出檔案資料 3 儲存到指定目錄實現過程如下 coding utf 8 created on thu may 10 17 02 40 2018 author admin import os import pandas as pd i...

Python遍歷路徑下檔案並轉換成UTF 8編碼

Python遍歷路徑下檔案並轉換成UTF 8編碼

c 遍歷指定路徑下檔案

python 遍歷資料夾下檔案

相關推薦