Python實戰演練之跨頁爬取

上章回顧

上一章python實戰演練之scrapy初體驗中講到了scrapy專案的建立，爬蟲的建立，以及資料的提取。

跨頁爬取如何實現

不過這些都是建立在單頁網頁資料的爬取之中，很多時候我們需要跨很多頁去爬取資料，這個時候該怎麼實現呢？

跨頁爬取的實現

所以，srcapy的跨頁爬取很好實現，只用在courses.py中新增幾行**即可。

import scrapy
from educsdn.items import coursesitem
class
coursesspider
(scrapy.spider)
:    name =
'courses'
allowed_domains =
['edu.csdn.net'
]    start_urls =
['']#第一頁
p =1def
parse
(self, response)
:# 解析課程資訊
# 獲取當前請求頁面下的所有課程資訊
print
(dd.xpath(
"./div[@class='titleinfor'/text()]"
).extract())
dl = response.selector.css(
"div.course_item"
)# 遍歷課程資訊並封裝到item
for dd in dl:
item = coursesitem(
)            item[
'title'
]= dd.css(
"span.title::text"
).extract_first(
)            item[
'url'
]= dd.css(
"a::attr(href)"
).extract_first(
)            item[
'pic'
]= dd.css(
"img::attr(src)"
).extract_first(
)            item[
'teacher'
]= dd.css(
"span.lecname::text"
).extract_first(
)            item[
'time'
]= dd.css(
"span.course_lessons::text"
).extract_first(
)            item[
'price'
]= dd.css(
"p.priceinfo i::text"
).extract_first(
)print
(item)
# 跨頁提取資訊
self.p +=
1if self.p <4:
next_url =
''+str
(self.p)
url = response.urljoin(next_url)
yield scrapy.request(url=url,callback=self.parse)

self.p < 4說明只爬取前三頁的資料，具體輸出和上一章差不多，只是多了2頁的新增資料，這裡就不列出了。

Python實戰爬蟲爬取段子

不管三七二十一我們先導入模組段子所在的 import re import requests 如果沒這模組執行cmd pip install requests領域 web開發，爬蟲，資料分析，資料探勘，人工智慧零基礎到專案實戰，7天學習上手做專案獲取的內容段子所在的 import re im...

python爬蟲爬取多頁內容

前幾天零組資料庫發文關閉，第乙個念頭是可惜，想著趕緊把資料儲存下來，卻發現爬蟲已經忘得差不多了，趕緊複習一波。不多說，pycharm，啟動！不知道爬啥，隨便找個網頁吧 url 首選獲取目標 html頁面 f12提取請求頭資訊，這裡我們只需ua即可根據網頁 meta標籤設定編碼格式如下 impor...

python爬取豆瓣網頁短評實戰！

首先我們開啟我的父親母親的網頁介面鏈結可以觀察到如下介面以及讀者對本書的評價接下來我們直接附上書名我的父親母親出版社南海出版公司原作名 alfred and emily 譯者匡詠梅出版年 2013 1 頁數 238 定價 29.50元裝幀精裝叢書新經典文庫萊辛作品 is...

Python實戰演練之跨頁爬取

Python實戰爬蟲 爬取段子

python爬蟲爬取多頁內容

python爬取豆瓣網頁短評實戰！

相關推薦

Python實戰爬蟲爬取段子