Scrapy 中文輸出與儲存

1、中文輸出

python3.x中中文資訊直接可以輸出處理；

python2.x中：採用中文encode("gbk")或者encode("utf-8")。

2、中文儲存

在scrapy中對資料進行處理的檔案是pipelines.py 檔案，首先開啟專案設定檔案setting.py 配置pipelines。

# configure item pipelines # see

#item_pipelines =

上面**中的'firstpjt.pipelines.firstpjtpipeline'分別代表「核心目錄名.pipelines 檔名.對應的類名」,將**修改為：

# configure item pipelines # see

item_pipelines =

然後編寫pipelines.py 檔案：

# -*- coding: utf-8 -*-
# define your item pipelines here
## don't forget to add your pipeline to the item_pipelines setting
# see: 
# 匯入codecs模組，使用codecs直接進行解碼
import codecs
class firstpjtpipeline(object):
def __init__(self):
# 以寫入的方式建立或開啟乙個普通的檔案用於儲存爬取到的資料
self.file = codecs.open("e:/steveworkspace/firstpjt/mydata/mydata1.txt", "wb", encoding="utf-8")
def process_item(self, item, spider):
# 設定每行要寫的內容
l = str(item) + '\n'
# 此處通過print() 輸出，方便程式的除錯
print(l)
# 將對應資訊寫入檔案中
self.file.write(l)
return item
def close_spider(self, spider):
self.file.close()

3、輸出中文到json檔案

json資料常見的基本儲存結構有陣列和物件兩種。

陣列形式：["蘋果","梨子","葡萄"]

物件結構為鍵值對形式：

修改pipelines.py 檔案參考:

# -*- coding: utf-8 -*-
# define your item pipelines here
## don't forget to add your pipeline to the item_pipelines setting
# see: 
import codecs
import json
class scrapytestpipeline(object):
def __init__(self):
#以寫入的方式建立或開啟乙個json格式檔案
self.file = codecs.open("e:/pycharmworkspace/scrapytest/mydata/datayamaxun.json", "ab", encoding="utf-8")
print("開啟檔案---------------")
def process_item(self, item, spider):
print("開始寫入---------------")
for j in range(0,len(item["bookname"])):
bookname = item["bookname"][j]
# author=item["author"][j]
price = item["price"][j]
book = 
#通過dict(item)將item轉化為乙個字典
#然後通過json模組下的dumps()處理字典資料
#在進行json.dumps()序列化的時候，中文會預設使用ascii編碼，顯示中文需要設定ensure_ascii=false
i = json.dumps(dict(book), ensure_ascii=false)
#加上"\n"形成要寫入的一行資料
line = i + '\n'
print("正在寫入檔案---------------")
self.file.write(line)
return item
def close_spider(self, spider):
self.file.close()
print("關閉檔案---------------")

scrapy中輸出中文儲存中文

usr bin python coding utf 8 author dahu import json with open huxiu.json r as f data json.load f print data 0 title for key in data 0 print s s key,da...

scrapy 爬蟲儲存資料

scrapy儲存資訊的最簡單的方法主要有四種，o 輸出指定格式的檔案，命令如下 json格式，預設為unicode編碼 scrapy crawl itcast o teachers.json json lines格式，預設為unicode編碼 scrapy crawl itcast o teache...

scrapy中文字元問題

在scrapy spider的解析函式中，有時候通過如下兩種方式獲得的html資料中中文字元出現類似於 u3010 u6bdb u91cc這種格式的字元。respone.text或者 response.body.decode response.encoding 該字串產生的問題是因為將unicode...

Scrapy 中文輸出與儲存

scrapy中輸出中文儲存中文

scrapy 爬蟲儲存資料

scrapy中文字元問題

相關推薦