使用正規表示式步驟

使用正規表示式步驟:

s = r'ab\ncde'  # r 防止轉義
print(s)  # abcde

1.匯入re模組

import re

定義字串

base_str = 'he6ll2ow7or9ld'
base_str1 = '67he6ll2ow7or9ld'

2.定義匹配規則(需要使用compile()方法)

pattern = re.compile(r'\d+')
pattern2 = re.compile(r'\d')

3.開始匹配

3.1 match(『待匹配的字串』)只匹配成功一次

match()預設從頭開始匹配,如果字串中的第乙個字元不符合制定的規則,直接返回none

不會繼續向下進行匹配

如果使用字串中的第乙個字元符合匹配規則,返回匹配成功的內容(match物件),不會繼續向下匹配

result1 = pattern.match(base_str)
print(result1)   # none(匹配不成功)

# base_str = '6he6ll2ow7or9ld'
# pattern = re.compile(r'\d+')
# result1 = pattern.match(base_str)
# print(result1)  #

match(『待匹配的字串』,[,start,end])從指定返回中進行匹配

result2 = pattern2.match(base_str1,7,10)
print(result2)  #

3.2 group() 分組

group_str = '4h3e2lloworld'
pattern3 = re.compile(r'\dh\de\dl')
res = pattern3.match(group_str)
print(res)  # print(res.group())  # 4h3e2l
pattern4 = re.compile(r'(\d)h(\d)e(\d)l')  # 加()分組
res = pattern4.match(group_str)
print(res.group())  # 4h3e2l
print(res.group(0))  # 4h3e2l
print(res.group(1))  # 4
print(res.group(2))  # 3
print(res.group(3))  # 2

分組的反向引用(拓展)

注意:反向引用不代表分組,只是前面分組的值的引用

html_str = '
'pattern = re.compile(r'<(html)><(h1)>(.*)')
res2 = pattern.match(html_str1)
print(res2.group(1))  # html
print(res2.group(2))  # h1
print(res2.group(3))  # hellopython

**3.3 search(待匹配字串[,start,end]),**從頭開始匹配

search()方法是全域性匹配,如果整個字串中都沒有符合匹配規則的,才會返回none

如果匹配成功,直接返回匹配成功的字串,不會繼續向下匹配,只匹配成功一次

search_str = 'h3e2l5l7oworld'
pattern = re.compile(r'\d')
res = pattern.search(search_str)
print(res)  # pattern = re.compile(r'\d+')
res = pattern.search(search_str)
print(res)  #

3.4 findall()方法和match\search都不一樣

findall()全域性匹配會將匹配字串中所有符合規則的字串全部返回

返回的是乙個列表,列表中的每乙個元素都是匹配成功的子串

返回的內容不是match物件

如果整個字串中都沒有符合匹配規則的內容,返回的是空列表,而不是none

findall_str = 'h3e2ll5o6p8yt9hon'
pattern = re.compile(r'\d')
res = pattern.findall(findall_str)
print(res)  # ['3', '2', '5', '6', '8', '9']
pattern1 = re.compile(r'v')
res1 = pattern1.findall(findall_str)
print(res1)  #

3.5 finditer()方法和 findall()方法很類似

都是全域性匹配 findall返回列表

finditer 返回的是可迭代物件

finditer_str = 'h3e1l4lo5wo6rl8d'
pattern = re.compile(r'\d')
res = pattern.finditer(finditer_str)
print(res)  # for i in res:
# print(i)  # match物件
# # # # # # #

print(i.group())
# # 3
# 1# 4
# 5# 6
# 8

3.6 貪婪模式和非貪婪模式

html = 'hello
world
python
'

貪婪模式:盡可能多的獲取.*

pattern = re.compile(r'.*
')res = pattern.findall(html)
print(res)  # ['hello
world
python
']

非貪婪模式:盡可能少的獲取.*?

pattern = re.compile(r'.*?
')res = pattern.findall(html)
print(res)  # ['hello
', 'world
', 'python
']

3.7 匹配中文

中文的編碼範圍:[\u4e00-\u9fa5]

cn_str = 'hello 你好 world 世界'
pattern = re.compile(r'[\u4e00-\u9fa5]+')
res = pattern.findall(cn_str)
print(res)  # ['你好', '世界']

爬蟲中萬能的正規表示式匹配規則:

.*?(配合邊界值),再配上re.s(能夠匹配到換行)  ---> 無敵表示式
re.compile(r'《邊界》(.*?)《邊界》',re.s)

使用正規表示式

如果原來沒有使用過正規表示式，那麼可能對這個術語和概念會不太熟悉。不過，它們並不是您想象的那麼新奇。請回想一下在硬碟上是如何查詢檔案的。您肯定會使用和字元來幫助查詢您正尋找的檔案。字元匹配檔名中的單個字元，而則匹配乙個或多個字元。乙個如 data?dat 的模式可以找到下述檔案 data1.d...

使用正規表示式

本文節選自並有稍微修正。使用正規表示式您可以使用正規表示式做很多事情。在以下的列表中，您可以找到一些最普通最常用的正規表示式的例子。表示文字串必須在一行的開頭。所以，當查詢行的開頭只為 hosts 的行，可以使用命令 grep ls hosts 代表了一行的結尾。所以，當查詢行的結尾只為 ho...

正規表示式使用

1 靜態match方法使用靜態match方法，可以得到源中第乙個匹配模式的連續子串。2 靜態的matches方法這個方法的過載形式同靜態的match方法，返回乙個matchcollection，表示輸入中，匹配模式的匹配的集合。3 靜態的ismatch方法此方法返回乙個bool，過載形式同靜態...

使用正規表示式步驟

使用正規表示式

使用正規表示式

正規表示式使用

相關推薦