Scrapy遞迴抓取資料存入資料庫（示例二）

參考：

scrapy爬取了鏈結之後，如何繼續進一步爬取該鏈結對應的內容？

parse可以返回request列表，或者items列表，如果返回的是request，則這個request會放到下一次需要抓取的佇列，如果返回items，則對應的items才能傳到pipelines處理（或者直接儲存，如果使用預設feedexporter）。那麼如果由parse()方法返回下乙個鏈結，那麼items怎麼返回儲存？request物件接受乙個引數callback指定這個request返回的網頁內容的解析函式（實際上start_urls對應的callback預設是parse方法），所以可以指定parse返回request，然後指定另乙個parse_item方法返回items：

以爬取南京大學bbs為例：

1. spider下的檔案：

authorindex =content.index('信區')

author = content[11:authorindex-2]

boardindex = content.index('標題')

board =content[authorindex+8:boardindex-2]

timeindex = content.index('南京大學小百合站 (')

time = content[timeindex+26:timeindex+50]

return (author,board,time)

#content = content[timeindex+58:]

#return (author,board,time,content)

def parse2(self,response):

hxs =htmlxpathselector(response)

item = response.meta['item']

items =

content =hxs.select('/html/body/center/table[1]/tr[2]/td/textarea/text()').extract()

parsetuple = self.parsecontent(content)

item['author'] =parsetuple[0].decode('utf-8')

item['board']=parsetuple[1].decode('utf-8')

item['time'] = parsetuple[2]

#item['content'] = parsetuple[3]

return items

def parse(self,response):

hxs = htmlxpathselector(response)

items =

title=hxs.select('/html/body/center/table/tr[position()>1]/td[3]/a/text()').extract()

url=hxs.select('/html/body/center/table/tr[position()>1]/td[3]/a/@href').extract()

for i in range(0, 10):

item =bbsitem()

item['link'] = urljoin_rfc('', url[i])

item['title'] = title[i][:]

for item in items:

yield request(item['link'],meta=,callback=self.parse2)

2. pipelines檔案：

# -*- coding: utf-8 -*-
3. 設定setting.py：
item_pipelines =['tutorial.pipelines.mysqlstorepipeline']
				Scrapy遞迴抓取資料存入資料庫（示例二）
參考 scrapy爬取了鏈結之後，如何繼續進一步爬取該鏈結對應的內容？parse可以返回request列表，或者items列表，如果返回的是request，則這個request會放到下一次需要抓取的佇列，如果返回items，則對應的items才能傳到pipelines處理 或者直接儲存，如果使用預設...
				定時抓取資料並存入資料庫
其實，這部分主要是實現定時抓取資料的程式，資料的抓取以及儲存程式已寫 從tushare獲取歷史 資料 抓取交易日 周一到周五 資料，定時為每天的15 30抓取，其中主要使用到了schedule模組用於定時執行任務 如下 import schedule import time from datetim...
				Scrapy爬取資料存入Mongodb中
這次使用scrapy簡單的爬取一些多列表電影資料，儲存在csv檔案及json檔案中，最後把這些資料全部儲存在mongodb中。涉及的知識點有pipeline，yield，中介軟體，xpath，items 的使用。coding utf 8 import scrapy from douban.items...

Scrapy遞迴抓取資料存入資料庫（示例二）

Scrapy遞迴抓取資料存入資料庫（示例二）

定時抓取資料並存入資料庫

Scrapy爬取資料存入Mongodb中

相關推薦