爬蟲 urllib2 Headers 常用

爬蟲---學習筆記

爬蟲請求**時要模擬瀏覽器,這就要用到user-agent

#pc端

useragent =

#phone
useragent =

利用構建好的useragents.py檔案來充當爬蟲請求頭

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import urllib2
import useragents
'''useragents.py是個自定義的模組，位置處於當前目錄下 '''
class urllib2modifyheader(object):
'''使用urllib2模組修改header '''
def __init__(self):
#  pc  user-agent
pcua = useragents.useragent.get('ie 9.0')
#  mobile  user-agent
mbua = useragents.useragent.get('uc standard')
#測試用的**選擇的是有道翻譯
self.url = ''
self.useuseragent(pcua,1)
self.useuseragent(mbua,2)
def useuseragent(self,useragent,name):
request = urllib2.request(self.url)
request.add_header(useragent.split(':')[0],useragent.split(':')[1])
response = urllib2.urlopen(request)
filename = str(name) + '.html'
with open(filename,'a') as fp:
fp.write("%s\n\n" %useragent)
fp.write(response.read())
if __name__ == '__main__':
umh = urllib2modifyheader()

同一**會給不同的瀏覽器返回不同的內容，使用網路爬蟲時盡可能的新增乙個固定的user-agent

2 爬蟲基礎 urllib2模組

底層操作request物件請求頭設定之useragent使用者請求頭設定使用者 useragent 自定義請求頭訊息請求方式之get post請求 get請求處理 post請求處理 handler處理器自定義開鎖人opener 自定義http opener 自定義proxy opener 會...

Python爬蟲之urllib模組2

python爬蟲之urllib模組2 pg 55,乙個待畢業待就業的二流大學生。看了一下上一節的反饋，有些同學認為這個沒什麼意義，也有的同學覺得太簡單，關於 beautifulsoup 和lxml 獲取後面的鏈結我們能不能如法炮製呢，我們先來試試。我們把寫成下面那樣然後我們現在來試試結果我們發...

再學爬蟲 urllib

urllib是python內建的http請求庫，主要包括4個模組 request error parse robotparser。import urllib.request response urllib.request.urlopen print type response 結果，返回乙個http...

爬蟲 urllib2 Headers 常用

2 爬蟲基礎 urllib2模組

Python爬蟲之urllib模組2

再學爬蟲 urllib

相關推薦