18Python讀取Word文件內容

2021-10-23 06:14:45 字數 1389 閱讀 5819

pip install python-docx

word文件結構

document:文件

paragraph:段落

run:文字塊

讀取word文件內容

doc.paragraphs 得到的是乙個列表,包含了每個段落的例項

from docx import document

doc = document(

'這是乙個文件.docx'

)print

(doc.paragraphs)

paragraphs.text 得到該段落的文字內容

from docx import document

doc = document(

'這是乙個文件.docx'

)for paragraph in doc.paragraphs:

print

(paragraph.text)

paragraphs.runs 得到乙個列表,包含了每個文字塊

from docx import document

doc = document(

'這是乙個文件.docx'

)paragraph = doc.paragraphs[1]

runs = paragraph.runs

print

(runs)

run.text 得到該文字塊的文字內容

from docx import document

doc = document(

'這是乙個文件.docx'

)paragraph = doc.paragraphs[1]

runs = paragraph.runs

for run in runs:

print

(run.text)

綜合應用

from docx import document

doc = document(

'netease q2 2019 earnings release-final.docx'

)ls_1 =

count =

0for paragraph in doc.paragraphs:

if'profit'

in paragraph.text:

count +=

1print

(count)

18 python 位元組串和檔案file

a bytes print a b bytes range 65,100 print b c bytes 10 print c d bytes 你好 utf 8 print d 執行結果 b b abcdefghijklmnopqrstuvwxyz abc b x00 x00 x00 x00 x00...

18 Python常用資料型別操作2

概念 有序的可變的元素集合 定義 方式1 元素1,元素2.例如 nums 1,2,3,4,5 方式2列表生成式 快速生成list 語法range stop 0,1,2 stop 1 range start,stop step start,start step,start 2 step 右 1左 右 ...

python讀取Word文件內容

本方法存在兩個問題 1.profit出現的次數沒統計進去 2.如果同乙個句子 現了兩次profit,只統計出1次。請大神指教 from docx import document doc document netease q2 2019 earnings release final.docx coun...