乙個很有趣的個人部落格,不信你來撩 fangzengye.com
import requests
import re
import json
from bs4 import beautifulsoup
def
get_one_page
(url)
:user_agent =
headers =
response = requests.get(url,headers)
return response.text
獲取網頁內容
def
get_information
(html_text)
: pattern = re.
compile
('shtml">(.)
.*?"rank__price">(.)
.*?(.*?)'
, re.s)
items = re.findall(pattern,html_text)
for item in items:
yield
正則匹配
yield整合起資料結構
finaall返回匹配到的列表,裡面為元組
def
recording
(information)
:with
open
('豆瓣top250.txt'
,'a'
,encoding=
'utf-8'
)as f:
f.write(json.dumps(information,ensure_ascii=
false)+
'\n'
)
將爬到的資訊寫入檔案
def
main()
:for i in
range(0
,1):
response = get_one_page(
'') html_text = get_information(response)
for m in html_text:
recording(m)
('正在爬取第'
+str
(i)+
'頁')
('爬取完畢!'
)
main(
)
爬取zol索尼相機排行榜
import requests import re import json from bs4 import beautifulsoup defget one page url user agent headers response requests.get url,headers return re...
爬取貓眼電影排行榜
匯入我們需要的模組 import reimport requests 一 獲取網頁內容 1 宣告目標url,就是爬取的 位址 base url 2 模仿瀏覽器 headers 3 發起請求 response requests.get base url,headers headers 4 接收響應的資...
爬取豆瓣電影推薦排行榜
import requests from bs4 import beautifulsoup class dianying def html url self,url html requests.get url soup beautifulsoup html.text,lxml pai soup.se...