python檢索特定內容的文字檔案

windows環境下python2.7

指令碼指定乙個引數作為要檢索的字串

例如： >find.py ./ hello

# coding=utf-8
import os
import sys
# 找到當前目錄下的所有文字檔案
deffindfile
(path):
f = 
d = 
l = os.listdir(path)
for x in l:
if os.path.isfile(os.path.join(os.getcwd() + "\\", x)):
else:
return f, d  # 返回檔案和目錄的列表
# print x, "\n", y
# 統計乙個文字內字串的個數
deffindstrcount
(file, strtofind):
count = 0
thefile = open(file, 'rb')
while
true:
buffer = thefile.read()
ifnot buffer:
break
count += buffer.count(strtofind)
thefile.close()
return count
# 遍歷檔案列表中，包含特定字串的檔案
deffindstr
(file, str):
# f = open(file, "r+")
# if f.read().find(str) != -1:
#     s = os.getcwd() + "\\" + file
# else:
#     s = "none"
# f.close()
i = 1
global s
for line in open(file):
# return is index of the str start position.
if line.find(str) != -1:
s = os.getcwd() + "\\" + file + "------>line:%d" % (i)
print s
i = i + 1
return s
l =   # 全域性變數，存放找到的目標檔案
deffind
(p, str):
try:
f, d = findfile(p)
for x in f:
ret = findstr(x, str)
if ret:
if d:
for x in d:
os.chdir(x)
find("./", str)
os.chdir('../')
except exception, e:
print e
finally:
pass
if __name__ == '__main__':
s = 0
find(sys.argv[1], sys.argv[2])

minhash演算法檢索相似文字基於檢索的問答系統

實現架構主要流程建語料庫首先建立乙個語料庫，即問題和答案的集合，乙個問題對應乙個答案文字預處理對輸入的問題進行分詞，拼寫糾錯，詞過濾word filter，去停用詞stopwords 文字表示成向量詞向量技術word2vec，tf idf 文字相似度計算余弦相似度，歐式距離文字高效檢...

關於檢索替換文字中特定字元或字串的方法

昨天，別人提出了，要統計一條微博內容中，有多少表情了多少人包含了多少主題等。這個需求，經過查閱網上的資料了解到，通過正則匹配就能解決。方法比較簡單。內容如下 pattern p pattern.compile regex matcher m p.matcher text while m.find...

python獲取docx文件的內容文字

簡單的說,docx裡面的每乙個段落都是乙個paragraph物件,段落中文字如果有不同的樣式加粗，斜體就會有不同的run物件,而且paragraph和run物件都有乙個text屬性，表示的是他包含的文字 import docx defgettext filename doc docx.docum...

python檢索特定內容的文字檔案

minhash演算法檢索相似文字 基於檢索的問答系統

關於檢索 替換文字中特定字元或字串的方法

python獲取docx文件的內容 文字

相關推薦

minhash演算法檢索相似文字基於檢索的問答系統

關於檢索替換文字中特定字元或字串的方法

python獲取docx文件的內容文字