爬蟲中seeting中的設定

一、setting 自動生成的內容含義

bot_name = 『taocartest』

spider_modules = [『taocartest.spiders』]

newspider_module = 『taocartest.spiders』

#user_agent = 『taocartest (+

『』『如果啟用，scrapy將會尊重 robots.txt策略』』』

robotstxt_obey = false

『』『scrapy ********** 併發請求(concurrent requests)的最大值』』』

#concurrent_requests = 32

『』『對單個**進行併發請求的最大值』』』

#concurrent_requests_per_domain = 16

『』『是否啟用cookies middleware.如果關閉，cookies 將不會傳送給web server』』』

#cookies_enabled = false

『』『表明 telnet 終端（及其外掛程式）是否啟用布林值』』』

#telnetconsole_enabled = false

『』『要啟用spider中介軟體，可以將其加入到spider_middlewares 設定中。

該是指是乙個字典，鍵為中介軟體的路徑，值為中介軟體的順序（order）。如上就是開啟』』』

#spider_middlewares =

#autothrottle_target_concurrency = 1.0

『』『啟用autothrottle除錯(debug)模式，展示每個接收到的response。可以通過此來檢視限速引數是如何實時被調整的』』』

#autothrottle_debug = false

『』『http快取是否開啟。』』』

『』『不快取設定中的http返回值(code)的request。』』』

『』『實現快取儲存後端的類。』』』

二、setting中預設沒有，但可以新增的內容含義

#一下內容為預設setting.py檔案沒有的字段，但是可以自己新增

concurrent_items

『』』預設: 100

item processor(即 item pipeline) 同時處理(每個response的)item的最大值。

『』』default_item_class

『』『預設: 『scrapy.item.item』

the scrapy shell 中例項化item使用的預設類』』』

depth_limit

『』『預設: 0

爬取**最大允許的深度(depth)值。如果為0，則沒有限制』』』

depth_priority

『』『預設: 0

整數值。用於根據深度調整request優先順序。

如果為0，則不根據深度進行優先順序調整』』』

depth_stats

『』』預設: true

是否收集最大深度資料。』』』

depth_stats_verbose

『』『預設: false

是否收集詳細的深度資料。如果啟用，每個深度的請求數將會被收集在資料中。』』』

randomize_download_delay

『』』預設: true

如果啟用，當從相同的**獲取資料時，scrapy將會等待乙個隨機的值 (0.5到1.5之間的乙個隨機值 * download_delay)。

該隨機值降低了crawler被檢測到(接著被block)的機會。某些**會分析請求，查詢請求之間時間的相似性。

隨機的策略與 wget --random-wait 選項的策略相同。

若 download_delay 為0(預設值)，該選項將不起作用

『』』

爬蟲中 Cookies的處理

儲存客戶端的相關狀態手動處理在抓包工具中捕獲cookie,將其封裝在headers中應用場景 cookie沒有有效時長且不是動態變化自動處理使用session機制使用場景動態變化的cookie session物件該物件和requests模組用法幾乎一致.如果在請求的過程中產生了cook...

python 爬蟲中的extract

ul class list li 123 li li abc li ul 1xx.xpath ul class list li xx是html文件 xpath解析返回乙個解析器列表 2 xx.xpath ul class list li extract output 123 abc extract使...

python 爬蟲中的extract

1xx.xpath ul class list li xx是html文件 xpath解析返回乙個解析器列表 2xx.xpath ul class list li extract output 123 abc extract使提取內容轉換為unicode字串，返回資料型別為list 3xx.xpath...

爬蟲中seeting中的設定

爬蟲 中 Cookies的處理

python 爬蟲中的extract

python 爬蟲中的extract

相關推薦

爬蟲中 Cookies的處理