Python爬取電影天堂資源

2021-08-21 02:19:52 字數 1540 閱讀 4439

from urllib import request,parse

from lxml import etree

import requests,re

url1 = ""

req1 = request.request(url1)  

response1 = request.urlopen(req1) 

html1 = response1.read()          

content1 = etree.html(html1)

#電影名列表

name_list = content1.xpath('//*[@id="header"]/div/div[3]/div[2]/div[2]/div[1]/div/div[2]/div[2]/ul/table/tr/td[1]/a[2]/text()')
#詳情鏈結列表

lianjie_list = content1.xpath('//*[@id="header"]/div/div[3]/div[2]/div[2]/div[1]/div/div[2]/div[2]/ul/table/tr/td[1]/a[2]/@href')
#詳情鏈結列表

for m in range(len(lianjie_list)):

lianjie_list[m] = ""+lianjie_list[m]

ftps=​​​​​​​

for i in lianjie_list:

url = i

req = request.request(url)  

response = request.urlopen(req)     

html = response.read()          

#這裡我是先將二進位制的html存入txt裡,然後利用正規表示式去匹配該文字

f = open("1.txt","wb")

f.write(html)

f.close()

f = open("1.txt","r")

st=f.read()

rule_name =r''

compile_name = re.compile(rule_name, re.m)

res_name = compile_name.findall(st)

ftps += res_name

dic = dict(map(lambda x,y:[x,y],name_list,ftps))

for k in dic:

print(k+": \n"+dic[k]+".mkv")

print("download over!")

附上輸出圖

爬取電影天堂

分析每頁的url,可以得到規律是 第t頁的url為 於是可以先分析第一頁,然後對頁數進迴圈,就可得到所有最新電影的詳細資訊。from lxml import etree headers defget movie url url resp requests.get url,headers header...

python xpath爬取電影天堂

import requests from lxml import html base domain url html gndy dyzz list 23 1.html headers defspider base url html gndy dyzz list 23 html movies for ...

python爬蟲 爬取電影天堂連線

import requests,re,chardet,pymysql from piaot import def shoye url headers req requests.get url,headers headers req.encoding gb2312 html req.text 正則 z...