爬取拉勾網終於成功期間遇見的各類問題大彙總

初入爬蟲一行的學習，對爬蟲有了簡單的了解，一直都想試試拉勾網，但是頻頻被拒，拉勾的反爬簡直太強了。天下沒有不透風的牆，查詢了各種文獻，看了各種帖子，終於克服了乙個又乙個的難題。下面我們來總結一下，以備今後引以為戒。

problem 1：'status': false, 'msg': '您操作太頻繁,請稍後再訪問', 'clientip': '117.136.41.41', 'state': 2402

import requests
url = ''
headers = 
r = requests.get(url,headers= headers)
print(r.text)

基本80%的**都可以通過以上的方式獲取基本資訊，但是拉勾網太強大了，試過多次一直出現這個問題，這句話是說我訪問的太過頻繁。所以我就在網上找了一些免費的**ip，但是依舊出現這個問題，而且clientip仍然是我本網的ip。後來看了一些帖子，大致了解到拉勾應該是記錄了我的訪問一些cookie，所以在headers中加入了cookie.問題解決，但是出現了problem 2的問題。

原文：problem 2:httpconnectionpool（host:xx）max retries exceeded with url ': failed to establish a new connection: [errno 99] cannot assign requested address'

爬蟲多次訪問同乙個**一段時間後就會出現錯誤，原因是因為在每次資料傳輸前客戶端要和伺服器建立tcp連線，為節省傳輸消耗，預設為keep-alive，即連線一次，傳輸多次，然而在多次訪問後不能結束並回到連線池中，導致不能產生新的連線

headers中的connection預設為keep-alive，

將header中的connection一項置為close

headers =

# agents = random.sample(agent, 1)

url_start = "資料分析?city=%e6%88%90%e9%83%bd&cl=false&fromsearch=true&labelwords=&suginput="

url_parse = "全國&needaddtionalresult=false"

headers =

s = requests.session()

s.get(url_start, headers=headers, timeout=3) # 請求首頁獲取cookies

cookie = s.cookies # 為此次獲取的cookies

response = s.post(url_parse, data=data, headers=headers, proxies=proxies,cookies=cookie, timeout=3) # 獲取此次文字

time.sleep(5)

text = json.loads(response.text)

print(text)

info = text["content"]["positionresult"]["result"]

for i in info:

print(i["companyfullname"])

companyfullname = i["companyfullname"]

print(i["positionname"])

positionname = i["positionname"]

print(i["salary"])

salary = i["salary"]

print(i["companysize"])

companysize = i["companysize"]

print(i["skilllables"])

skilllables = i["skilllables"]

print(i["createtime"])

createtime = i["createtime"]

print(i["district"])

district = i["district"]

print(i["stationname"])

stationname = i["stationname"]

if __name__ == '__main__':

main()

爬取拉勾網終於成功期間遇見的各類問題大彙總

Python爬取拉勾網招聘資訊

初級爬蟲爬取拉勾網職位資訊

拉勾網職位資料爬取按公司規模爬取

爬取拉勾網終於成功 期間遇見的各類問題大彙總

Python爬取拉勾網招聘資訊

初級爬蟲 爬取拉勾網職位資訊

拉勾網職位資料爬取 按公司規模爬取

相關推薦

爬取拉勾網終於成功期間遇見的各類問題大彙總

初級爬蟲爬取拉勾網職位資訊

拉勾網職位資料爬取按公司規模爬取