re模組使用案例

寫在前面的話：

re模組當中有很多函式，但是以下三種：

re.sub, re.findall, re.match 這三個命令是爬蟲當中經常使用到的

爬蟲當中經常使用到re.sub,re.findall,re.match 進行資料清洗，提取，請務必掌握好這3個命令。

接下來就是一些小案例，訓練並掌握下對re.sub,re.finadll,re.match的運用以及對應輸出的結果有一定的認知。

findall：提取 ——>核心在於，要提取什麼(定義個正規表示式，在**進行)

sub：替換 ——>核心在於，要替換什麼物件，被替換成什麼，在**進行

match：匹配——>核心在於，要匹配什麼，在**匹配

with
open
('index.html'
,'r'
,encoding=
'utf-8'
)as f:
html=f.read(
)print
(html)

html輸出的結果如下：


"en"
>
"utf-8"
>
title<
/title>
<
/head>
="email"
>
email:kefu@csdn.net
<
/div>
="tel"
>
手機號:
400-
660-
0108
<
/div>
<
/div>
<
/footer>
<
/body>
<
/html>

# 定義乙個提取email的正規表示式
#先導入re 模組
import re
pattern_1=
'(.*?)
'#匹配div標籤裡面的class="email"
ret_1=re.findall(pattern_1,html)
# 用正規表示式，在html中去提取
print
(ret_1)
# 但提取的結果為乙個空列表，原因在於，.匹配除了換行符意外的所有字元

## 因此先過濾掉\n,過濾\n  使用re.sub()
html_s=re.sub(
'\n',''
,html)
print
(html_s)


"en"
>
"utf-8"
>
title<
/title>
<
/head>
="email"
>                email:kefu@csdn.net            <
/div>
="tel"
>                手機號:
400-
660-
0108
<
/div>
<
/div>
<
/footer>
<
/body>
<
/html>

## 過濾掉\n後，再進行提取操作即findall
ret_2=re.findall(pattern_1,html_s)
print
(ret_2)

[
'                email:kefu@csdn.net            '
]

## 如上顯示的結果前尾都有空格的列表，通過取列表的索引0，在通過strip()函式能夠去掉收尾的空白
print
(ret_2[0]
.strip())
# 如下顯示即為提取到的郵箱位址，就是我們想要提取的資料
email:kefu@csdn.net

###  定義乙個匹配密碼的正規表示式
###注意前面加乙個^是為了防止被轉義
password_pattern=r'^[a-za-z0-9_]$'
# 該密碼以字母、數字或者下劃線為開頭，長度為6-16位
pass1=
'1234567'
pass2=
'k123456'
pass3=
'k123'

print
(re.match(password_pattern,pass1)
)print
(re.match(password_pattern,pass2)
)print
(re.match(password_pattern,pass3)
)

; span=(0
,7), match=
'1234567'
>
; span=(0
,7), match=
'k123456'
>
none

心得體會：

re.sub, re.findall, re.match 這三個命令

資料清洗，提取，爬蟲當中經常使用到

務必掌握好這3個命令

re模組使用

import re strdata python is the best language in the world match只能匹配以開頭的子符串，第乙個引數是正則，第二個引數是需要匹配的字串 res re.match p strdata,re.i re.i引數表示忽略大小寫 res re.m...

RE模組使用

i mport reli re.match d 12821j128j312893j129 match方法，先使用正規表示式，然後傳入待查字串 print li 結果物件 sre.sre match object span 0,5 match 12821 ifli print li.group 獲得資...

re 模組使用

re模組是python獨有的匹配字串的模組，該模組中提供的很多功能是基於正規表示式實現的，而正規表示式是對字串進行模糊匹配，提取自己需要的字串部分，他對所有的語言都通用。正規表示式元字元元字元匹配內容匹配除換行符以外的任意字元 w匹配字母或數字或下劃線 s匹配任意的空白字元 d匹配數字 n匹配...

re模組使用案例

re模組使用

RE模組使用

re 模組使用

相關推薦