scrapy的學習之路1 簡單的例子

pip install -i   scrapy

scrapy startproject articlespider

main.py是後面建立用來執行scrapy的

spider名 **網域名稱

在articlespider建立main.py,可以通過此檔案執行scrapy

from scrapy.cmdline import execute import sys import os # print(__file__) #本檔名 # print(os.path.dirname(__file__)) #父檔名 # print(os.path.abspath(os.path.dirname(__file__))) #路徑和父檔名 execute(["scrapy", "crawl", "jobbole"])
以上執行可能在win下會報錯

在jobbole.py下

# 獲取列表頁每乙個item的url

post_urls = response.css('#archive .floated-thumb .post-thumb a::attr(href)').extract()

for post_url in post_urls:

print(post_url)

yield request(url=parse.urljoin(response.url, post_url), callback=self.parse_info) # 把獲取到的url交給詳情頁的方法處理

next_url = response.css('.next.page-numbers::attr(href)').extract_first()

if next_url:

"""獲取詳情頁的資訊"""

def parse_info(self, response):

# 以下都是獲取詳情頁資訊

res_title = response.xpath('//div[@class="entry-header"]/h1/text()').extract_first()

res_date = response.xpath('//p[@class="entry-meta-hide-on-mobile"]/text()').extract_first().strip().replace('·', '').strip()

res_zhan = response.xpath('//span[contains(@class, "vote-post-up")]/h10/text()').extract_first()

res_content = response.xpath('//div[@class="entry"]/p/text()').extract_first()

res_cate_a = response.xpath('//p[@class="entry-meta-hide-on-mobile"]/a/text()').extract_first()

res_cate_c = ','.join(res_cate_b)

res_shoucang = response.xpath('//div[@class="post-adds"]/span[2]/text()').extract_first().strip()

match_obj1 = re.match('.*(\d+).*', res_shoucang)

if match_obj1:

res_shoucang = match_obj1.group(1)

else:

res_shoucang = 0

res_comment = response.xpath('//div[@class="post-adds"]/a/span/text()').extract_first().strip()

match_obj2 = re.match('.*(\d+).*', res_comment)

if match_obj2:

res_comment = match_obj2.group(1)

else:

res_comment = 0

scrapy學習（1）安裝
環境 ubuntu14.04 python2.7 資料 1，python2.7，pip，setuptools都是已經安裝完成 2，lxml和openssl sudo apt get install python openssl sudo apt get install python lxml 都顯示...

Scrapy學習筆記（1）
最近總被房產中介騷擾，因此打算做一件事情找乙個爬蟲把自己區域內所有中介的手機號爬下來，統統匯入手機黑名單。經過簡單比較，感覺scrapy挺小巧的，因此選擇了它。安裝 windows環境 scrapy的安裝還是有一點麻煩，因為它本身依賴的專案足有5個之多。不過你如果正確安裝了vs2008的c 編譯器...

scrapy爬蟲框架學習之路 3 25
上回我們說到，如何使用python的requests請求庫爬取豆瓣高分電影榜，本次就說一說如何使用scrapy這個python爬蟲框架去實現爬蟲功能。首先，使用scrapy的框架需要經歷一下步驟建立工程專案 scrapy的工程建立在命令列中完成的。首先在命令列中輸入，就可以建立乙個名字叫做cnbl...

scrapy的學習之路1 簡單的例子

scrapy學習（1）安裝

Scrapy學習筆記（1）

scrapy爬蟲框架學習之路 3 25

相關推薦