Scrapy 終端呼叫選擇器方法

示例，輸入如下命令後shell會進入python（或ipython）互動式介面：

scrapy shell ""

有一點注意的是必須是雙引號，單引號會報錯。

之後會顯示當前儲存的資料結構以供查詢，這和我們編寫py指令碼時的資料結構完全相同，可以直接使用相關方法，

諸如：

如下所示，

>>> response.xpath('//title/text()')
>>> response.css('title::text')

這兩種方式提取的都是節點型資料，所以都可以使用.extract()或者.extract_first()方法提取data部分

以下面的原始碼為例進行提取示範：

提取標籤屬性，

>>> response.xpath('//base/@href').extract()
[u'']
>>> response.css('base::attr(href)').extract()
[u'']

對提取目標路徑的標籤進行篩選，contains(@href, "image")

表示href

熟悉需要包含image字元，css同理，

response.xpath('//a[contains(@href, "image")]/@href').extract()
out[1]: ['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
response.xpath('//a[contains(@href, "image1")]/@href').extract()
out[2]: ['image1.html']
response.css('a[href*=image]::attr(href)').extract()
out[3]: ['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
esponse.css('a[href*=image2]::attr(href)').extract()
out[4]: ['image2.html']

結合兩者，

內建了正規表示式re和re_first方法，

response.xpath('//a[contains(@href, "image")]/text()')
out[8]: [,,
,,]response.xpath('//a[contains(@href, "image")]/text()').re(r'name:\s*(.*)')
out[7]: ['my image 1 ', 'my image 2 ', 'my image 3 ', 'my image 4 ', 'my image 5 ']
response.xpath('//a[contains(@href, "image")]/text()').re_first(r'name:\s*(.*)')
out[9]: 'my image 1 '

Scrapy之css選擇器

response.css 標籤名標籤名的話可以是html標籤比如 title body div，也可以是你自定義的class標籤。例子 response.css title extract 對結果以列表的形式進行返回 extract first 對extract 返回的結果列表取第乙個元素。res...

scrapy中的css選擇器

response.css a 返回的是selector物件，response.css a extract 返回的是a標籤物件 response.css a text extract first 返回的是第乙個a標籤中文字的值 response.css a attr href extract firs...

Scrapy（六）之Selector選擇器

當我們取得了網頁的response之後，最關鍵的就是如何從繁雜的網頁中把我們需要的資料提取出來，python從網頁中提取資料的包很多，常用的有下面的幾個你可以在scrapy中使用任意你熟悉的網頁資料提取工具，但是，scrapy本身也為我們提供了一套提取資料的機制，我們稱之為選擇器 seletors...

Scrapy 終端呼叫 選擇器方法

Scrapy之css選擇器

scrapy中的css選擇器

Scrapy（六）之Selector選擇器

相關推薦

Scrapy 終端呼叫選擇器方法