Python爬蟲爬取豆瓣電影（二）

檢視上乙個專案，請看：

上乙個專案中獲取到了一定數量的電影url資訊，這次來獲取單個電影的電影詳情。

#對傳遞的url返回乙個名為soup的beautifulsoup物件
defget_url_html_soup
(url)
:    header=request_body.get_header(
)    proxies=request_body.get_proxy(
)    req=requests.get(url=url,proxies=proxies,headers=header)
html=req.text
if req.status_code==
200:
print
("請求成功！"
)        soup=bs4.beautifulsoup(html,
'lxml'
)return soup
if req.status_code!=
200:
print
("請求失敗！"
)

#因為url中有電影的id，所以這裡就用正規表示式來獲取url中的數字。
defget_movie_id
(url)
:    movie_id = re.
compile
(r'\d'
)    f = movie_id.findall(url)
k = file_operation.segmented(f)
id= k.replace(
' ','')
return
id

#將soup作為傳遞引數，從中獲取相應的資訊
# 返回影片標題
defget_movie_title
(soup)
:    movie_name = soup.find(
'h1'
).text  # 獲取電影標題
return movie_name
# 返回影片的導演名
defget_movie_directors
(soup)
:    contents = soup.find(
'div',id
='info'
)# 構造正規表示式
directors = re.
compile
(r'導演:(.*)'
)# 在contents的文字內容中尋找與正規表示式匹配的內容（編譯執行正規表示式）
f = directors.findall(contents.text)[0
]    lists =
str.split(f)
director_name = file_operation.segmented(lists)
return director_name
# 返回影片的編劇名
defget_movie_screenwriter
(soup)
:    contents = soup.find(
'div',id
='info'
)    screenwriter = re.
compile
(r'編劇:(.*)'
)    f = screenwriter.findall(contents.text)[0
]    lists =
str.split(f)
screenwriter_name = file_operation.segmented(lists)
return screenwriter_name
# 返回影片的主演名
defget_movie_character
(soup)
:    contents = soup.find(
'div',id
='info'
)    character = re.
compile
(r'主演:(.*)'
)    f = character.findall(contents.text)[0
]    lists =
str.split(f)
characters_name = file_operation.segmented(lists)
return characters_name
# 返回影片的型別
defget_movie_type
(soup)
:    contents = soup.find(
'div',id
='info'
)type
= re.
compile
(r'型別:(.*)'
)    f =
type
.findall(contents.text)[0
]    lists =
str.split(f)
type_name = file_operation.segmented(lists)
return type_name
# 返回影片的製片國家/地區
defget_movie_country
(soup)
:    contents = soup.find(
'div',id
='info'
)    pattern = re.
compile
('製片國家/地區:(.*)'
)    f = pattern.findall(contents.text)[0
]    lists =
str.split(f)
country = file_operation.segmented(lists)
return country

需要將爬取的資訊放入mysql資料庫的話請看：

python爬蟲爬取豆瓣電影資訊

我們準備使用python的requests和lxml庫，直接安裝完之後開始操作目標爬取肖申克救贖資訊傳送門導入庫import requests from lxml import etree 給出鏈結 url 獲取網頁html前端一行搞定，在requests中已經封裝好了 data reque...

爬蟲豆瓣電影爬取案例

直接上僅供參考。目標爬取資料是某地區的正在上映部分的資料，如下圖完整如下 usr bin python coding utf 8 from lxml import etree import requests 目標爬取豆瓣深圳地區的正在上映部分的資料注意點 1 如果網頁採用的編碼方式...

Python爬蟲之爬取豆瓣電影（一）

最近閒來無事學習python爬蟲，爬取豆瓣電影開啟豆瓣電影按f12 重新整理豆瓣網頁，會發現network的xhr中有鏈結貼上出鏈結會出現如下json 說明這個是每個分類電影的標籤，是乙個get請求的api，如果在python中載入成字典，則包含以惡個tags，對應的值是乙個列表，裡面的每...

Python爬蟲 爬取豆瓣電影（二）

python爬蟲爬取豆瓣電影資訊

爬蟲 豆瓣電影爬取案例

Python爬蟲之爬取豆瓣電影（一）

相關推薦

Python爬蟲爬取豆瓣電影（二）

爬蟲豆瓣電影爬取案例