爬取貓眼電影排行100電影

import json
import requests
from requests.exceptions import requestexception
import re
import time
#獲取單頁的內容
def get_one_page(url):
try:
response = requests.get(url)
if response.status_code == 200:
return response.text
return none
except requestexception:
return none
#將單頁的內容正則匹配過濾取出資訊，然後建立為字典格式。這裡用了yield，返回乙個生成器，每次呼叫都執行一次next（），從上一次yield的位置再往下執行。
def parse_one_page(html):
pattern = re.compile('.*?board-index.*?>(\d+).*?data-src="(.*?)".*?name">(.*?).*?star">(.*?)
.*?releasetime">(.*?)
'                         + '.*?integer">(.*?).*?fraction">(.*?).*?', re.s)
items = re.findall(pattern, html)
for item in items:
yield 
#將內容寫入檔案
def write_to_file(content):
with open('result.txt', 'a', encoding='utf-8') as f:
f.write(json.dumps(content, ensure_ascii=false) + '\n')
#傳入offset引數，獲取頁面資訊，過濾
def main(offset):
url = '' + str(offset)
html = get_one_page(url)
#對這個生成器進行遍歷，它會不斷向下next（），直到沒有
for item in parse_one_page(html):
print(item)
write_to_file(item)
if __name__ == '__main__':
for i in range(10):
main(offset=i * 10)
time.sleep(1)

python爬取貓眼電影排行

完整的如下在這裡閒著沒事，把解析html中的正則方法改用了xpath與beautifulsoup，只能說各有各的優點吧。正則的話，提取資訊可以連貫，一次性提取出所有需要的資訊，當然前提是你的正則式子沒有寫錯，所以說正則寫起來相比xpath與beautifulsoup來說要複雜一下，提取出錯後，除...

爬取貓眼電影排行榜

匯入我們需要的模組 import reimport requests 一獲取網頁內容 1 宣告目標url，就是爬取的位址 base url 2 模仿瀏覽器 headers 3 發起請求 response requests.get base url,headers headers 4 接收響應的資...

爬取貓眼電影

有乙份工作需要我列出兩個電影院的每天電影排期資訊，我不想每次都要去貓眼上覆制貼上。所以做了個爬蟲功能能夠知道每天的電影排期資訊使用限制只能在當天使用，不能在前一晚上使用，後面我會再考慮修改 coding utf 8 import requests import re from bs4 imp...

爬取貓眼電影排行100電影

python爬取貓眼電影排行

爬取貓眼電影排行榜

爬取貓眼電影

相關推薦