利用爬蟲獲取貓眼電影熱門前100資料

實現方法

1:訪問**，獲取頁面原始碼

2:解析頁面，得到想要的資料

3:迴圈爬取多個頁面

4:把資料寫入本地檔案

'''
需求：爬取貓眼電影前100的電影資訊
實現方法:訪問**，獲取頁面原始碼
解析頁面，得到想要的資料
迴圈爬取多個頁面
把資料寫入本地檔案
'''import requests
from requests.exceptions import requestexception
import re
import json
import vthread
#定義獲取單頁響應函式
def get_one_page(url):
try:
headers = 
response = requests.get(url,headers = headers)
# 判斷是否正常訪問
if response.status_code == 200:
return response.text
return none
except requestexception:
return none
#定義單頁正規表示式函式
def regular_one_page(html, regular_method=re.compile(
'.*?board-index.*?>(\d+).*?data-src="(.*?)".*?name">(.*?).*?star">(.*?)
.*?releasetime">(.*?)
.*?integer">(.*?).*?fraction">(.*?)',
re.s)):
istr = re.findall(regular_method,html)
#    print(istr)
for istr1 in istr:
yield
#定義寫入檔案函式
def write_to_file(content):
with open('maoyantop100.txt','a',encoding='utf-8') as f:
f.write(json.dumps(content,ensure_ascii=false)+'\n')#將字典資料寫入     encoding與ensure_ascii=false中文
f.close()
#主函式
def main(offset):
url = '' + str(offset)
#    print(url)
html = get_one_page(url)
regular_one_page(html)
for istr1 in regular_one_page(html):
print(istr1)
write_to_file(istr1)
if __name__ == '__main__':
for i in range(10):
main(i*10)

python爬蟲爬取貓眼電影Top100

很早就對爬蟲有所耳聞，於是乎就在網上買了一本python爬蟲的書，在學習的過程中也想做一些筆記與大家分享分享，勿噴 2.1.貓眼電影top100 2.2.f12開啟控制台，在response中找到需要的頁面資訊如圖 2.3.發現每一部電影都是乙個dd標籤，我們需要爬取它的排名位址電影名稱主演...

爬蟲基礎爬取貓眼Top100電影

usr bin env python3 coding utf 8 date 2019 11 11 0011 12 40 author mijiu version 1.0 import requests,re,csv from lxml import etree 獲取頁面原始碼貓眼電影top100 ...

python爬蟲學習之獲取貓眼電影排名前10

我們用正規表示式來完成這個任務，並把讀取到的內容寫入到文字中。首先獲取該網頁的html 注意千萬別用開發者模式檢視網頁的原始碼，原始碼可能和response.text不一樣然後用python的第三方庫，requests庫進行網頁html的爬取注意 1 在獲取源之前我們要設定一下user age...

利用爬蟲獲取貓眼電影熱門前100資料

python爬蟲爬取貓眼電影Top100

爬蟲基礎 爬取貓眼Top100電影

python爬蟲學習之獲取貓眼電影排名前10

相關推薦

爬蟲基礎爬取貓眼Top100電影