python 爬蟲 user agent 生成

有些**做了反爬技術，如：比較初級的通過判斷請求頭部中的user-agent欄位來檢測是否通過瀏覽器訪問的。

在爬這類**時需要模擬user-agent

import
random
import
refrom typing import
dict, list
class
useragent:
'''**
'''__filepath = '
user-agent.txt
''''
物件例項
'''__instance =none
'''**瀏覽器
'''__dict: dict[str, list] ={}
'''**瀏覽器
'''__list: list[str] =
'''初始化
'''def
__init__
(self):
reg = re.compile(r'
firefox|chrome|msie|opera
', re.i)
with open(self.
__filepath, '
r', encoding='
utf_8_sig
') as f:
for r in
f:                result = reg.search(r) and
reg.search(r).group().lower()
if result and (not result in self.__dict
):                    self.
__dict[result] =
result 
and self.__dict
self.
__list
'''單例 - 建構函式
'''def
__new__
(cls):
ifnot cls.__instance
:            cls.
__instance = super(useragent, cls).__new__
(cls)
return cls.__instance
'''谷歌
'''@property
def chrome(self) ->str:
return random.choice(self.__dict['
chrome'])
'''火狐
'''@property
def firefox(self) ->str:
return random.choice(self.__dict['
firefox'])
'''ie
'''@property
def ie(self) ->str:
return random.choice(self.__dict['
msie'])
'''opera 瀏覽器
'''@property
def opera(self) ->str:
return random.choice(self.__dict['
opera'])
'''隨機
'''def random(self) ->str:
return random.choice(self.__list
)    
'''迭代
'''def
__iter__
(self):
self.
__iter = iter(self.__list
)        
return
self
'''下乙個
'''def
__next__
(self):
return next(self.__iter
)    
'''索引
'''def
__getitem__(self, index) -> str or
list(str):
return self.__list
[index]
useragent =useragent()
print
(useragent.random())
'''for n in useragent:
print(n)
'''

Scrapy增加隨機請求頭user agent

一般為了避免伺服器反爬蟲，當我們發出request時，需要增加一些隨機的請求頭資訊 header 然後就可以輕鬆的繞過伺服器的反偵察手段了。因此一般只要在middlewares.py檔案中新增加如下然後每次request的時候，就會有隨機的user agent了，然後就可以有效的避免伺服器反爬蟲了...

python爬蟲非同步爬蟲

壞處無法無限制的開啟多執行緒或者多程序。執行緒池程序池適當使用使用非同步實現高效能的資料爬取操作人多力量大環境安裝 pip install aiohttp 使用該模組中的clientsession 2表示同時存在兩個協程 pool pool 2 urls for i in range 1...

Python爬蟲初識爬蟲

模擬瀏覽器開啟網頁，獲取網頁中我們想要的那部分資料瀏覽器開啟網頁的過程當你在瀏覽器中輸入位址後，經過dns伺服器找到伺服器主機，向伺服器傳送乙個請求，伺服器經過解析後傳送給使用者瀏覽器結果，包括html,js,css等檔案內容，瀏覽器解析出來最後呈現給使用者在瀏覽器上看到的結果瀏覽器傳送訊息給...

python 爬蟲 user agent 生成

Scrapy增加隨機請求頭user agent

python爬蟲 非同步爬蟲

Python爬蟲 初識爬蟲

相關推薦

python爬蟲非同步爬蟲

Python爬蟲初識爬蟲