python爬蟲豆瓣top250電影，評分，評論等

import requests
from bs4 import beautifulsoup
def get_movies():
movie_list=
for i in range(0,10):
link=''+str(i*25)
r=requests.get(link,headers=headers,timeout=10)
print(str(i+1),"頁響應狀態碼:",r.status_code)
soup=beautifulsoup(r.text,"lxml")
div_list=soup.find_all('div',class_='hd' or 'bd')
for each in div_list:
movie=each.a.span.text.strip()
return movie_list
movies=get_movies()
f = open('top250.txt', 'w') #清空檔案內容再寫
其中r.text內容大致是這樣的：
肖申克的救贖
/ the shawshank redemption
/ 月黑高飛(港)  /  刺激1995(臺)
導演: 弗蘭克·德拉邦特 frank darabont   主演: 蒂姆·羅蘋斯 tim robbins /...
1994 / 美國 / 犯罪 劇情
9.71612528人評價
希望讓人自由。

高階一點

from selenium import webdriver
import time
import urllib.request
import re
from bs4 import beautifulsoup
import codecs
page = urllib.request.urlopen("")
contents = page.read()
soup = beautifulsoup(contents,"html.parser")
# driver = webdriver.chrome("chromedriver.exe")  # chromedriver所在路徑
# driver.get(r"")
mov_list=soup.find_all(attrs=)
for each in mov_list:
movname=each.find(attrs=).get_text()
print('電影名:',movname)
rate=each.find(attrs=).get_text()
print('評分:',rate)
comment=each.find(attrs=).get_text()

結果如下：

或者這樣使用beautifulsoup寫

import requests
from bs4 import beautifulsoup
f = open('top250.txt', 'w+',encoding='utf-8') #追加方式寫檔案
for i in range(0,10):
link=""+str(i*25)
r=requests.get(link,headers=headers)
soup=beautifulsoup(r.text,"lxml")
#mov_list=soup.find_all(attrs=)
mov_list=soup.find_all(class_="item")
for each in mov_list:
number=each.find(attrs=).em.text.strip()
print('排名:',number)
movname=each.find(attrs=).get_text().strip()
print('電影名:',movname)
#.p.text的含義是：提取元素中的文字，strip()的功能是把字串左右的空格去掉
director=each.find(attrs=).p.text.strip().replace(" ","").strip().replace("\n","").strip().replace("...","").strip().replace("/","")
print(director)
rate=each.find(attrs=).get_text()
print('評分:',rate)
comment=each.find(attrs=).get_text().strip()
f.writelines([number,'\n',movname,'\n',director,'\n',rate,'\n',comment,'\n'])
f.close()

程式執行結果:

Python小爬蟲抓取豆瓣電影Top250資料

寫leetcode太累了，偶爾練習一下python，寫個小爬蟲玩一玩比較簡單，抓取豆瓣電影top250資料，並儲存到txt 上傳到資料庫中。通過分析可以發現，不同頁面之間是有start的值在變化，其他為固定部分。以物件導向的編碼方式編寫這個程式，養成好的編碼習慣。基本資訊在 init 函式中初始化...

爬蟲教程用Scrapy爬取豆瓣TOP250

文章首發於 guanngxu 的個人部落格用scrapy爬取豆瓣top250 最好的學習方式就是輸入之後再輸出，分享乙個自己學習scrapy框架的小案例，方便快速的掌握使用scrapy的基本方法。本想從零開始寫乙個用scrapy爬取教程，但是官方已經有了樣例，一想已經有了，還是不寫了，盡量分享在網...

python練習簡單爬取豆瓣網top250電影資訊

因為有的電影詳情裡沒有影片的又名，所以沒有爬取電影的又名。基本思路爬取top250列表頁展示中電影的排行榜排名，電影詳情鏈結，電影名稱。然後通過電影鏈結進入到詳情頁，獲取詳情頁的原始碼，再進行爬取，爬取後的資料儲存在字典中，通過字典儲存在mongo資料庫中的。from urllib.request...

python爬蟲豆瓣top250電影，評分，評論等

Python小爬蟲 抓取豆瓣電影Top250資料

爬蟲教程 用Scrapy爬取豆瓣TOP250

python練習簡單爬取豆瓣網top250電影資訊

相關推薦

Python小爬蟲抓取豆瓣電影Top250資料

爬蟲教程用Scrapy爬取豆瓣TOP250