scrapy獲取汽車之家資料

1、建立scrapy專案

2、找到對應介面

3、建立爬蟲檔案

> cd scrapy_carhome\scrapy_carhome\spiders

scrapy_carhome\scrapy_carhome\spiders> scrapy genspider car

4、注釋robots協議

']#注意如果你的請求的介面是html為結尾的那麼是不需要加/的

start_urls = ['

']defparse(self, response):

name_list = response.xpath('

//div[@class="main-title"]/a/text()')

price_list = response.xpath('

//div[@class="main-lever"]//span/span/text()')

# 遍歷列表

for i in

range(len(name_list)):

name =name_list[i].extract()

price =price_list[i].extract()

print(name,price)

執行爬蟲檔案

架構組成 12

3）spiders —>spider

類定義了如何爬取某個

(或某些

)**。包括了爬取的動作(例

如:是否跟進鏈結

)以及如何從網頁的內容中提取結構化資料(爬取

item)

。換句話說，

spider

就是您定義爬取的動作及

分析某個網頁

(或者是有些網頁

)的地方。

4）排程器 —>有自己的排程規則，無需關注

5）管道（

item pipeline

） —>最終處理資料的管道，會預留介面供我們處理資料

當item

在spider

中被收集之後，它將會被傳遞到

item pipeline

，一些元件會按照一定的順序執行對

item

的處理。

每個item pipeline元件(

有時稱之為

「item pipeline」)

是實現了簡單方法的

python

類。他們接收到

item

並通過它執行

一些行為，同時也決定此

item

是否繼續通過

pipeline

，或是被丟棄而不再進行處理。

以下是item pipeline

的一些典型應用：

1. 清理

html

資料 2.

驗證爬取的資料(檢查

item

包含某些字段

) 3. 查重(

並丟棄)

4. 將爬取結果儲存到資料庫中

scrapy

工作原理

Python練習 scrapy 爬取汽車之家文章

autohome.py spider檔案 coding utf 8 import scrapy from autohome.items import autohomeitem class autohomespider scrapy.spider name autohome allowed domai...

PYTHON爬取汽車之家資料

使用知識使用說明源 usr bin env python coding utf 8 time 2020 1 16 15 34 author wsx site file cars.py software pycharm import json from multiprocessing import...

RCurl汽車之家抓取

junjun 2016年4月20日參考 library rcurl loading required package bitops install.packages xml library xml library reshape 偽裝報頭 myheader c user agent mozilla...

scrapy獲取汽車之家資料

Python練習 scrapy 爬取汽車之家文章

PYTHON爬取汽車之家資料

RCurl汽車之家抓取

相關推薦