05 scrapy框架使用

settings.py:工程的配置檔案

2.cd proname（進入專案）

3.建立爬蟲原始檔：

4.執行工程

5.配置settings.py:

2.指定日誌型別

3.ua偽裝

執行指令：scrapy crawl spidername -o filepath （scrapy crawl duanzi -o duanzi.json）

中文亂碼問題：

2.在items.py中定義相關屬性

3.在爬蟲檔案中將解析到的資料儲存封裝到item型別的物件中

4.將item型別的物件提交給管道

5.在管道檔案（pipelines.py）中,接收爬蟲檔案提交過來的item型別物件，且對其進行任意形式的持久化儲存操作

6.在配置檔案中開啟管道機制

1. 將資料存到txt檔案中的管道編寫

class duanzipropipeline:
# 保證檔案只開啟一次檔案,需要重寫父類的乙個方法
fp = none
def open_spider(self, spider):
print('我是open_spider,我只會在爬蟲開始的時候只執行一次')
self.fp = open('duanzi.txt', 'w', encoding='utf-8')
# 關閉檔案，也是只有一次
def close_spider(self, spider):
print('我是close_spider,我只會在爬蟲結束的時候只執行一次')
self.fp.close()
# 該方法用來接收item物件的, 一次只能接收乙個item, 該方法會被呼叫多次
# 引數item:就是接收到的item物件
def process_item(self, item, spider):
# print(item)           # item就是乙個字典
# 將item儲存到文字檔案中
self.fp.write(item['title']+':'+ item['content']+'\n')
return item

2. 將資料存到mysql中的管道編寫

#將資料儲存到mysql中
class mysqlpileline(object):
conn = none
cursor = none
def open_spider(self,spider):
self.conn = pymysql.connect(host='127.0.0.1',port=3306,user='root',password='521221',db='spider',charset='utf8')
print(self.conn)
def process_item(self,item,spider):
self.cursor = self.conn.cursor()
sql = 'insert into duanzi values ("%s","%s")'%(item['title'], item['content'])
# 事務處理
try:
self.cursor.execute(sql)
self.conn.commit()
except exception as e:
print(e)
self.conn.rollback()
return item
def close_spider(self,spider):
self.cursor.close()
self.conn.close()

3. 將資料存到redis中的管道編寫

#將資料寫入redis
class redispileline(object):
conn = none
def open_spider(self,spider):
self.conn = redis(host='127.0.0.1',port=6379)
print(self.conn)
def process_item(self,item,spider):
#報錯：將redis模組的版本指定成2.10.6即可。pip install -u redis==2.10.6
self.conn.lpush('duanzidata',item)

4.setting.py 中給管道開啟並設定優先順序

item_pipelines =

注意: 爬蟲檔案提交到管道的item是先給優先順序最高的管道類使用，等到其使用完後需要返回return item給下乙個管道類使用，不是併發的。

05 Scrapy 框架基礎

scrapy是什麼?參考 06 scrapy配置安裝及入門案例架構圖涉及三個物件和七個模組開發過程中,scrapy的使用主要需要自己手寫的內容是 spider 和 item pipline 兩個模組,middlewares 和 spider middlewares偶爾肯能會需要手寫.scrap...

Mooc爬蟲05 scrapy框架

1 scrapy框架的介紹安裝 pip install scrapy檢視是否安裝完成 scrapy hscrapy框架是實現爬蟲功能的乙個軟體結構和功能元件集合 scrapy爬蟲框架的結構這5 2的結構,就是scrapy框架主要有三條主要的資料流路徑第一條路徑 1 engine通過中介軟體獲得...

scrapy框架基本使用

進入工程目錄建立爬蟲檔案編寫爬蟲檔案執行工程 allow domains 允許的網域名稱 parse self,response scrapy工程預設是遵守robots協議的，需要在配置檔案中進行操作。基於管道 3.將在爬蟲檔案中解析的資料儲存封裝到item物件中 4.將儲存了解析資料的ite...

05 scrapy框架使用

05 Scrapy 框架基礎

Mooc爬蟲05 scrapy框架

scrapy框架基本使用

相關推薦