Scrapy入門案例

scrapy是乙個為了爬取**資料，提取結構性資料而編寫的應用框架。其可以應用在資料探勘，資訊處理或儲存歷史資料等一系列的程式中。其最初是為了頁面抓取 (更確切來說, 網路抓取 )所設計的，也可以應用在獲取api所返回的資料(例如 amazon associates web services ) 或者通用的網路爬蟲。scrapy用途廣泛，可以用於資料探勘、監測和自動化測試。

我使用的是anaconda3，相應的python版本是3.7

pip install scarpy

抓取美劇天堂的前100最新，如下圖所示：

步驟：（1）cmd進入需要建立專案的資料夾

cd e://scrapytest

（2）建立工程

scrapy startproject movie

（3）建立爬蟲程式

cd movie // 進入工程目錄 scrapy genspider meiju meijutt.com //建立爬蟲程式

執行**，系統會自動建立檔案目錄和檔案，如下圖所示：

檔案說明：

（4）設定資料模板（每條資料格式）

#items.py
import scrapy
class movieitem(scrapy.item):
# define the fields for your item here like:
name = scrapy.field()

（5）爬蟲主類

#meiju.py
# -*- coding: utf-8 -*-
import scrapy
from movie.items import movieitem
class meijuspider(scrapy.spider):
name = 'meiju'
allowed_domains = ['meijutt.com']
start_urls = ['']
def parse(self, response):
movies = response.xpath('//ul[@class="top-list  fn-clear"]/li')
for each_movie in movies:
item = movieitem()
item['name'] = each_movie.xpath('./h5/a/@title').extract()[0]
yield item

（6）設定配置檔案

#settings.py 新增內容 item_pipelines =

（7）抓取後資料處理程式

#pipelines.py
# -*- coding: utf-8 -*-
# define your item pipelines here
## don't forget to add your pipeline to the item_pipelines setting
# see: 
class moviepipeline(object):
def process_item(self, item, spider):
with open("e://my_meiju.txt",'a') as fp:
print(type(item['name']))
fp.write(item['name'] + "\n")

（8）執行爬蟲程式

cd movie scrapy crawl meiju

參考部落格：出錯部分已進行

scrapy爬蟲簡單案例

進入cmd命令列,切到d盤 cmd d 建立article資料夾 mkdir articlescrapy startproject articlescrapy genspider xinwen www.hbskzy.cn 命令後面加爬蟲名和網域名稱不能和專案名同名 items檔案 define h...

scrapy爬蟲小案例

在豆瓣圖書爬取書籍資訊為例爬取下面劃紅線的資訊 1.先建立乙個myspider專案如何建立專案上面已經說過了 2.開啟myspider目錄下的items.py item 定義結構化資料字段，用來儲存爬取到的資料因為要爬取的是兩行資訊，下面定義兩個變數來訪問字串 coding utf 8 def...

爬蟲系列2 scrapy專案入門案例分析

本文從乙個基礎案例入手，較為詳細的分析了scrapy專案的建設過程在官方文件的基礎上做了調整主要內容如下 0 準備工作 1 scrapy專案結構 2 編寫spider 3 編寫item.py 4 編寫pipelines.py 5 設定settings.py 6 執行spider 安裝scrapy...

Scrapy入門案例

scrapy爬蟲簡單案例

scrapy爬蟲小案例

爬蟲系列2 scrapy專案入門案例分析

相關推薦