python爬取豆瓣電影排行榜資料

2021-10-04 16:43:01 字數 1822 閱讀 9925

電影推薦—豆瓣電影排行榜資料抓取

目標**:

目標資料描述:(1)排名(2)電影名字 (3)鏈結 (4)導演人員 (5)評價 (6)評分 (7)評價人數 (8)評價內容

#篩選資訊

def get_top

(url)

: respose=requests.

get(url,headers=headers)

soup=

beautifulsoup

(respose.text,

'lxml'

) nums=soup.

select

('em'

) titles=soup.

find_all

('div'

,class_=

'hd'

) actors=soup.

find_all

('p'

,class_='')

links=soup.

select

('ol li div a'

) rating_nums=soup.

find_all

('span'

,class_=

'rating_num'

) evaluate_numbers=soup.

find_all

('div'

,class_=

'star'

) evaluates=soup.

find_all

('span'

,class_=

'inq'

) #將資訊放進字典中

for num,title,link,actor,rating_num,evaluate_number,evaluate in

zip(nums,titles,links,actors,rating_nums,evaluate_numbers,evaluates)

: data=

print

(data)

#寫入檔案

file=

open

(r'd:\hh.txt'

,'a'

,encoding=

'utf-8'

)for k,v in data.

items()

: s2=

str(v)

file.

write

(k+' '

) file.

write

(s2+

' ')

file.

write

('\n'

) file.close

if __name__ ==

'__main__'

: #多頁爬取

for i in

range(11

):urls=

&filter='

.format

(i*25)}

#遍歷for url in urls:

get_top

(url)

爬取豆瓣電影推薦排行榜

import requests from bs4 import beautifulsoup class dianying def html url self,url html requests.get url soup beautifulsoup html.text,lxml pai soup.se...

爬取豆瓣電影排行榜top250

下面直接上 import requests from bs4 import beautifulsoup 爬取網頁原始碼 defdownload page url headers req requests.get url url,headers headers return req.content 爬...

爬取貓眼電影排行榜

匯入我們需要的模組 import reimport requests 一 獲取網頁內容 1 宣告目標url,就是爬取的 位址 base url 2 模仿瀏覽器 headers 3 發起請求 response requests.get base url,headers headers 4 接收響應的資...