不知道做什麼專案,跟著練一下,先熟練requests吧,scrapy放幾天,練下正則,爬一下貓眼電影top100寫入csv檔案,明後天寫ip**池的
import requests
import re
import time
defgethtml
(url)
: header =
; mojo-trace-id=2; hm_lpvt_703e94591e87be68cc8da0da7cbd0be2=1593172421; _lxsdk_s=172f079e98b-3b8-0b8-141%7c%7c3'
,'user-agent':}
try:
html = requests.get(url, headers=header, timeout=30)
.text
return html
except
:return
'發生異常'
defgetpage
(ulist,html)
: pattern = re.
compile
('(.*?)
.*?integer">(.*?).*?fraction">(.*?)'
,re.s)
results = re.findall(pattern,html)
for result in results:
title,author,num1,num2 = result
author = re.sub(
'\s+',''
,author)
number = num1+num2
[title,author,number]
)return ulist
definfo
(ulist)
:with
open
('movie.csv'
,'w'
,encoding=
'utf-8-sig'
)as f:
forlist
in ulist:
print
(list
) res =
','.join(
list
) f.writelines(res+
'\n'
)def
main()
: starturl =
''depth =
10 ulist =
for i in
range
(depth)
: url = starturl +
str(i*10)
html = gethtml(url)
ulist = getpage(ulist,html)
time.sleep(2)
info(ulist)
main(
)
很簡單的40行小**實現,都不能算專案只能叫練習,加油加油 利用正則爬取貓眼電影
爬取貓眼電影 import json import requests from requests.exceptions import requestexception import redef get one page url 獲取乙個頁面的資訊 try proxies get random ip ...
Python爬取貓眼電影
不多說,直接上 import requests import re import random import pymysql import time 連線資料庫 db pymysql.connect host localhost port 3306,user root passwd a db pyt...
Python之爬蟲 貓眼電影
usr bin env python coding utf 8 import json import requests import re import time 貓眼多了反爬蟲,速度過快,則會無響應,所以這裡多了乙個延時等待 from requests.exceptions import requ...