正規表示式的相關用法

正規表示式，又稱規則表示式。（英語：regular expression，在**中常簡寫為regex、regexp或re），電腦科學的乙個概念。正則表通常被用來檢索、替換那些符合某個模式(規則)的文字。

大家在寫正規表示式的過程中，可利用開源中國的正規表示式測試工具，對其進行檢測，鏈結為：

常見的正規表示式匹配模式如下所示：

re.match嘗試從字串的起始位置匹配的乙個模式，如果不是起始位置匹配的話，match()就返回none

基本的語法結構：re.match(pattern,string,flags=0)

最常規的匹配

#_*_coding: utf-8_*_
import re
content="hello 123 456 word_this is a regex demo"
result=re.match("^hello\s\d+\s\d+\s\w+.*demo$",content)
print(result)
print(result.group())
print(result.span())

泛匹配

import re
content="hello 123 4567 world_this is a regex demo"
result=re.match("^hello.*demo$",content)
print(result)
print(result.group())
print(result.span())

匹配目標

import re
content="hello 123 4567 world_this is a regex demo"
result=re.match("^hello\s(\d+).*demo$",content)
print(result)
print(result.group())
print(result.group(1))
print(result.span())

貪婪匹配

import re
content="hello 123 4567 world_this is a regex demo"
result=re.match("^h.*(\d+).*demo$",content)
print(result)
print(result.group(1))

非貪婪匹配

import re
content="hello 123 4567 world_this is a regex demo"
result=re.match("^h.*?(\d+).*?demo$",content)
print(result)
print(result.group(1))

指定匹配模式，.*可匹配換行符

import re
content='''hello 123 4567 world_this
is a regex demo'''
result=re.match("^h.*?(\d+).*?demo$",content,re.s)
print(result)
print(result.group(1))

轉義字元

import re
content="the price is $5.00"
result=re.match("the price is \$5\.00",content)
print(result)

總結：盡量使用泛匹配，使用括號獲取到匹配目標，盡量使用非貪婪模式，有換行符就用re.s

re.search,掃瞄整個字串，並返回第乙個成功的匹配

總結：為匹配方便，能用search，就不用match

import re
content='''extra string hello 123 4567 world_this
is a regex demo extra string'''
result=re.search("h.*?(\d+).*?demo",content,re.s)
print(result)
print(result.group(1))

匹配例項：

#_*_coding: utf-8_*_
import re
html='''
id="songs-list>搜尋字串，以列表形式返回所有匹配結果
#_*_coding: utf-8_*_
import re
html='''
id="songs-list>替換字串中每乙個匹配的子串返回替換後的字元
#_*_coding: utf-8_*_
import re
content="extra strings hello 1234567 world_this is a regex demo extra strings"
content=re.sub('\d+','',content)
print(content)

如果替換的字串包含元字串本身，可採用下面的方法：

#_*_coding: utf-8_*_
import re
content="extra strings hello 1234567 world_this is a regex demo extra strings"
content=re.sub('(\d+)',r'\1 8910',content)
print(content)

例項：去除html**中的a標籤，並獲取歌名

#_*_coding: utf-8_*_
import re
html='''
id="songs-list>將正規表示式字串編譯成正規表示式物件
#_*_coding: utf-8_*_
import re
content='''hello 1234567 world_this
is a regex demo
'''pattern=re.compile('hello.*?demo',re.s)
result=re.match(pattern,content)
print(result)

#_*_coding: utf-8_*_
import re
import requests
content=requests.get("").text
pattern=re.compile('(.*?)
.*?',re.s)
content=re.sub(' ','',content)
results=re.findall(pattern,content)
forresult
in results:
print("url=",result[0].strip())
print("作者是", result[2].strip())
print("書名是", result[1].strip())

執行結果如下：

正規表示式相關

限定符說明指定零個或更多個匹配例如 w 或 abc 等效於。指定乙個或多個匹配例如 w 或 abc 等效於。指定零個或乙個匹配例如 w?或 abc 等效於。指定恰好 n 個匹配例如 pizza 指定至少 n 個匹配例如 abc 指定至少n 個但不多於m 個匹配。指定盡可能少地使用重複的...

正規表示式相關

我們知道匹配字串通常用正規表示式，因為幾乎每種語言都有自己的正規表示式引擎，所以效率會比你自己寫演算法要高效的多。下面來看下一些常用的正規表示式運算子。注意這裡主要是個人總結，所以都會以一些自己用到的東西為主，如果要看具體的api，請在網上查詢基礎知識儲備稍微注意下一些細節的地方，比如和的...

正規表示式相關

table 特殊符號代表意義 alnum 代表英文大小寫字元及數字，亦即 0 9,a z,a z alpha 代表任何英文大小寫字元，亦即 a z,a z blank 代表空白鍵與 tab 按鍵兩者 cntrl 代表鍵盤上面的控制按鍵，亦即包括 cr,lf,tab,del.等等 digit 代表數...

正規表示式的相關用法

正規表示式 相關

正規表示式相關

正規表示式相關

相關推薦

正規表示式相關