Scrapy框架抓取豆瓣電影的小爬蟲學習日記（三）

獲取到影片資訊之後，下一步就是要把獲取到的資訊進行儲存了。網上很多的案例都是儲存成json格式，這裡我想用mysql伺服器來儲存。

1、首先安裝好mysql資料庫，建好filminfo表和字段。

2、在items.py檔案中新增你需要儲存到資料庫中的資訊，定義相對應的class，生成item類物件。

class doubanitem(scrapy.item):
# define the fields for your item here like:
# name = scrapy.field()
type = scrapy.field()
id = scrapy.field()
title = scrapy.field()
year = scrapy.field()
directer = scrapy.field()
scenarist = scrapy.field()
actor = scrapy.field()
genre = scrapy.field()
area = scrapy.field()
language = scrapy.field()
releasedate = scrapy.field()
runtime = scrapy.field()
anname = scrapy.field()
imdb = scrapy.field()

3、編輯pipelines.py檔案，將你在items類中獲得的資料儲存到mysql資料庫中。這裡需要注意的幾點是：

（1）python3.6使用的mysql資料庫模組是pymysql，

不支援2.x支援的mysqldb 模組；

（2）在pymysql.connect裡面設定連線伺服器的引數；

（3）sql語句執行後，需要commit()執行一下

def __init__(self):
self.conn = pymysql.connect(host='localhost',port=3306,user="root",passwd="******",db = "filmandtv",charset='utf8')
self.cursor = self.conn.cursor()
insert_sql = '''
insert into filminfo(type,filmid,title,year,directer,scenarist,actor,genre,area,language,releasedate,runtime,anname,imdb,imagepath,plotsummary) values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)
'''self.cursor.execute(insert_sql,(item["type"],item["id"],item["title"],item["year"],directer,scenarist,actor,genre,area,language,releasedate,runtime,anname,item["imdb"],filmrelative_path,item["plot_summary"]))
self.conn.commit()

4、還有很重要的一步，就是在spider裡設定返回items的資訊

return doubanitem(type=type,id=id,title=title,year=year,directer=directer,scenarist=scenarist,actor=actor,genre=genre,area=area,language=language,releasedate=releasedate,runtime=runtime,anname=anname,imdb=imdb,poster_url=poster_url,plot_summary=plot_summary)

5、執行一下，資料就儲存下來了！成功！

scrapy抓取豆瓣電影TOP250

scrapy入門級的學習關於scrapy學習的要點 1 css選擇器選擇元素 scrapy提供css和xpath兩種選擇器來提取html元素，因為我對css更熟，這裡用的就是css。2 對item和pipeline的理解 item是存放資料的容器，pipline用來處理抓取後放在item中的資料。...

Python python抓取豆瓣電影top250

一直對爬蟲感興趣，學了python後正好看到某篇關於爬取的文章，就心血來潮實戰一把吧。實現目標抓取豆瓣電影top250，並輸出到檔案中 1.找到對應的url 2.進行頁面元素的抓取 3.編寫第一步實現抓取第乙個頁面第二步將其他頁面的資訊也抓取到第三步輸出到檔案 4.5.結果 1 控制台...

scrapy框架爬取豆瓣電影的資料

1.什麼是scrapy框架？scrapy是乙個為了爬取資料，提取結構性資料而編寫的應用框架。其可以應用在資料探勘，資訊處理或儲存歷史資料等一系列的程式中。其最初是為了頁面抓取更確切來說,網路抓取所設計的，也可以應用在獲取api所返回的資料例如 amazon associates web se...

Scrapy框架抓取豆瓣電影的小爬蟲學習日記（三）

scrapy抓取豆瓣電影TOP250

Python python抓取豆瓣電影top250

scrapy框架爬取豆瓣電影的資料

相關推薦