python 抓取https豆瓣電影資訊

1、豆瓣**為https,python需要模擬瀏覽器行為，新增請求頭資訊，

2、開啟開發者工具，對資訊進行提取

2.1定位到電影資訊頭，先把關注的資訊提取出來

table = re.findall(r'(.*?)顯示全部影片

', data, re.s)

#print(table)

firsttable = table[0]

2.2 提取具體每個電影描述資訊，以list-item為分割，提取每個電信資訊

python程式：

print ("https 獲取成功")

table = re.findall(r'(.*?)顯示全部影片

', data, re.s)

#print(table)

firsttable = table[0]

#print(firsttable)

def step3():

print("start step3")

score =

# 1.按li標籤對獲取**中所有行，儲存在列表rows中：

rows = re.findall(r'class="list-item"(.*?)', firsttable, re.s)

print(rows[0])

# 2.迭代rows中的所有元素，獲取每一行的資料，並把資料組成item列表，將每乙個item新增到scorelist列表：

movielist =

for row in rows:

items =

#獲取電影名稱

title = re.findall(r'data-title=(.*?)\n', row, re.s)

print("title is",title[0])

title[0]=title[0].replace('"','')

#獲取評分

score = re.findall(r'data-score=(.*?)\n', row, re.s)

print("score is",score[0])

#獲取電影發布時間

release = re.findall(r'data-release=(.*?)\n', row, re.s)

print("release is",release[0])

#獲取電影時長

duration = re.findall(r'data-duration=(.*?)\n', row, re.s)

print("duration is",duration[0])

#獲取上映地區

region = re.findall(r'data-region=(.*?)\n', row, re.s)

print("region is",region[0])

#獲取導演

director = re.findall(r'data-director=(.*?)\n', row, re.s)

print("director is",director[0])

#獲取主要演員

actors = re.findall(r'data-actors=(.*?)\n', row, re.s)

print("actors is",actors[0])

#獲取投票數

votecount = re.findall(r'data-votecount=(.*?)\n', row, re.s)

print("votecount is",votecount[0])

df=pd.dataframe(movielist,index=list(range(1,len(movielist)+1)),columns=['名稱', '評分', '上映時間','時長','上映地區','導演','主演','投票數'])

df.to_csv("movie.csv",encoding='utf_8_sig')

return movielist

def test():

#time.sleep(2)

print("test is running!")

print("get result")

print(step3())

print("end result")

Python python抓取豆瓣電影top250

一直對爬蟲感興趣，學了python後正好看到某篇關於爬取的文章，就心血來潮實戰一把吧。實現目標抓取豆瓣電影top250，並輸出到檔案中 1.找到對應的url 2.進行頁面元素的抓取 3.編寫第一步實現抓取第乙個頁面第二步將其他頁面的資訊也抓取到第三步輸出到檔案 4.5.結果 1 控制台...

抓取豆瓣2023年電影分類 python

嗯，這次簡單點突然很想看電影，於是就抄起了python搞了一發豆瓣的電影年度清單，順便統計了評分排名和分類之類的。還算簡單吧 16年電影都在這個鏈結大概 83 ad e9 97 a8 sort time page limit 365 page start 0 這裡其實是可以get傳輸直接訪問豆瓣...

Python 爬蟲抓取豆瓣讀書TOP250

coding utf 8 author yukun import requests from bs4 import beautifulsoup 發出請求獲得html原始碼的函式 def get html url 偽裝成瀏覽器訪問 resp requests.get url,headers heade...

python 抓取https豆瓣電影資訊

Python python抓取豆瓣電影top250

抓取豆瓣2023年電影 分類 python

Python 爬蟲 抓取豆瓣讀書TOP250

相關推薦

抓取豆瓣2023年電影分類 python

Python 爬蟲抓取豆瓣讀書TOP250