學會使用正規表示式

1. 用正規表示式判定郵箱是否輸入正確。

import re
r ='^(\w)+(\.\w+)*@(\w)+((\.\w))$'
e ='[email protected]'
if re.match(r,e):
print(re.match(r,e).group(0))
else:
print('error')

2. 用正規表示式識別出全部**號碼

import re
'''number=re.findall('(\d)-(\d)',str)
print(number)

3. 用正規表示式進行英文分詞。re.split('',news)

import re
news = '''failure is probably the fortification in your pole. it is like a peek your wallet as the thief, when you are thinking how to spend several hard-won lepta, when you are wondering whether new money, it has laid background.'''
word = re.split('[\s,.?\-]+',news)
print(word)

4. 使用正規表示式取得新聞編號

import re
newsurl = ''
num=re.search('\_(.*).html',newsurl).group(1)
print(num)

5. 生成點選次數的request url

import re
newsurl = ''
newsid = re.search('\_(.*).html', newsurl).group(1).split('/')[-1]
res = ''.format(newsid)
print(res)

6. 獲取點選次數

import requests
import re
newsurl = ''
newsid=re.search('\_(.*).html', newsurl).group(1).split('/')[-1]
res = requests.get(''.format(newsid))
clickcount=(int(res.text.split('.html')[-1].lstrip("(')").rstrip("');")))
print(clickcount)

7. 將456步驟定義成乙個函式 def getclickcount(newsurl):

def getclickcount(newsurl):
newsid=re.search('\_(.*).html', newsurl).group(1).split('/')[-1]
res = requests.get(''.format(newsid))
clickcount = (int(res.text.split('.html')[-1].lstrip("(')").rstrip("');")))
return clickcount

8. 將獲取新聞詳情的**定義成乙個函式 def getnewdetail(newsurl):

def getnewsdetail(newsurl):
resd = requests.get(newsurl)
resd.encoding = 'utf-8'
soupd = beautifulsoup(resd.text, 'html.parser')  # 開啟新聞詳情頁
title = soupd.select('.show-title')[0].text
info = soupd.select('.show-info')[0].text
# c = soupd.select('#content')[0].text  # 正文
dati = datetime.strptime(dt, '%y-%m-%d %h:%m:%s')
else:
source = 'none'
content = soupd.select('.show-content')[0].text.strip()
click = getclickcount(newsurl)
print(dati, title, newsurl, source, click)

9. 取出乙個新聞列表頁的全部新聞包裝成函式def getlistpage(pageurl):

def getlistpage(pageurl):
res = requests.get(pageurl)
res.encoding = 'utf-8'
soup = beautifulsoup(res.text,'html.parser')
for news in soup.select('li'):
if len(news.select('.news-list-title')) > 0:
newsurl = news.select('a')[0].attrs['href']  # 鏈結
getnewsdetail(newsurl)

10. 獲取總的新聞篇數，算出新聞總頁數包裝成函式def getpagen():

def getpagen():
res = requests.get('')
res.encoding = 'utf-8'
soup = beautifulsoup(res.text, 'html.parser')
pagenumber=int(soup.select('.a1')[0].text.rstrip('條'))
page = int(soup.select('.a1')[0].text.rstrip('條'))//10+1
return page

11. 獲取全部新聞列表頁的全部新聞詳情。

n=getpagen()
for i in range(1,n+1):
pageurl = ''
getlistpage(pageurl)

使用正規表示式

如果原來沒有使用過正規表示式，那麼可能對這個術語和概念會不太熟悉。不過，它們並不是您想象的那麼新奇。請回想一下在硬碟上是如何查詢檔案的。您肯定會使用和字元來幫助查詢您正尋找的檔案。字元匹配檔名中的單個字元，而則匹配乙個或多個字元。乙個如 data?dat 的模式可以找到下述檔案 data1.d...

使用正規表示式

本文節選自並有稍微修正。使用正規表示式您可以使用正規表示式做很多事情。在以下的列表中，您可以找到一些最普通最常用的正規表示式的例子。表示文字串必須在一行的開頭。所以，當查詢行的開頭只為 hosts 的行，可以使用命令 grep ls hosts 代表了一行的結尾。所以，當查詢行的結尾只為 ho...

正規表示式使用

1 靜態match方法使用靜態match方法，可以得到源中第乙個匹配模式的連續子串。2 靜態的matches方法這個方法的過載形式同靜態的match方法，返回乙個matchcollection，表示輸入中，匹配模式的匹配的集合。3 靜態的ismatch方法此方法返回乙個bool，過載形式同靜態...

學會使用正規表示式

使用正規表示式

使用正規表示式

正規表示式使用

相關推薦