pip install python-docx
word文件結構
document:文件
paragraph:段落
run:文字塊
讀取word文件內容
doc.paragraphs 得到的是乙個列表,包含了每個段落的例項
from docx import document
doc = document(
'這是乙個文件.docx'
)print
(doc.paragraphs)
paragraphs.text 得到該段落的文字內容
from docx import document
doc = document(
'這是乙個文件.docx'
)for paragraph in doc.paragraphs:
print
(paragraph.text)
paragraphs.runs 得到乙個列表,包含了每個文字塊
from docx import document
doc = document(
'這是乙個文件.docx'
)paragraph = doc.paragraphs[1]
runs = paragraph.runs
print
(runs)
run.text 得到該文字塊的文字內容
from docx import document
doc = document(
'這是乙個文件.docx'
)paragraph = doc.paragraphs[1]
runs = paragraph.runs
for run in runs:
print
(run.text)
綜合應用
from docx import document
doc = document(
'netease q2 2019 earnings release-final.docx'
)ls_1 =
count =
0for paragraph in doc.paragraphs:
if'profit'
in paragraph.text:
count +=
1print
(count)
18 python 位元組串和檔案file
a bytes print a b bytes range 65,100 print b c bytes 10 print c d bytes 你好 utf 8 print d 執行結果 b b abcdefghijklmnopqrstuvwxyz abc b x00 x00 x00 x00 x00...
18 Python常用資料型別操作2
概念 有序的可變的元素集合 定義 方式1 元素1,元素2.例如 nums 1,2,3,4,5 方式2列表生成式 快速生成list 語法range stop 0,1,2 stop 1 range start,stop step start,start step,start 2 step 右 1左 右 ...
python讀取Word文件內容
本方法存在兩個問題 1.profit出現的次數沒統計進去 2.如果同乙個句子 現了兩次profit,只統計出1次。請大神指教 from docx import document doc document netease q2 2019 earnings release final.docx coun...