python爬蟲之鏈家鄭州二手房爬取

今天爬取鏈家鄭州二手房的資訊，先寫簡單一點的，通過鏈家的過濾標籤，共篩選出5家符合條件的**，只爬取了列表頁的內容，後面在爬取稍微複雜一點的頁面。

首先分析url及返回的html文字，發現所要的資訊就在當前url返回的html文字當中，這就非常簡單了。

返回的html文字

既然這樣，那提取內容就十分的簡單了。

這篇比較簡單，就不做過多的闡述了，完整**如下：

'''
爬取鏈家二手房資訊
類的使用
'''# 匯入第三方庫
import requests
import time
import csv
from lxml import etree
from fake_useragent import useragent
# 隨機請求頭
ua = useragent(
)# 定義鏈家的類
class
lianjia()
:# 初始化物件
def__init__
(self)
:        self.start_url =
""self.headers =
# 定義得到html文字的方法
defget_html
(self, url)
:        time.sleep(1)
html = requests.get(url, headers=self.headers)
.content.decode(
)return html
# 得到解析html文字的方法
defpaser_html
(self, html)
:        e = etree.html(html)
# 提取房屋名稱
room_title = e.xpath(
'//div[@class="title"]/a/text()'
)# 提取房屋的詳細資訊
house_info = e.xpath(
'//div[@class="address"]/div[@class="houseinfo"]/text()'
)# 提取房屋的**
price_info = e.xpath(
'//div[@class="totalprice"]/span/text()'
)        price_info =
[i +
"萬"for i in price_info]
# 提取房屋的單價
unit_price = e.xpath(
'//div[@class="unitprice"]/span/text()'
)# 儲存提取的資訊到csv
with
open
("lianjia.csv"
,'w'
, newline='')
as f:
cs_writer = csv.writer(f)
cs_writer.writerow(unit_price)
cs_writer.writerow(price_info)
cs_writer.writerow(room_title)
cs_writer.writerow(house_info)
# 定義執行函式，實現主要邏輯
defrun
(self)
:        url = self.start_url
html = self.get_html(url)
self.paser_html(html)
# 程式執行介面
if __name__ ==
'__main__'
:    lianjia_spider = lianjia(
)    lianjia_spider.run(
)#

最後寫入了csv檔案中，提取結果如下：

python爬蟲爬取鏈家二手房資訊

問題一鏈家也有反爬蟲策略和robots限制，robots限制忽略不然沒法爬另外頻繁爬取會直接導致被ban，需要隔天才會解禁止。防止被ban的方法有多種，1.禁止cookie 2.設定header 3.加大爬取間隔 4.使用我只用了前三種方法，具體可以在settings.py 和middle...

python爬蟲爬取鏈家二手房資訊

coding utf 8 import requests from fake useragent import useragent from bs4 import beautifulsoup import json import csv import time 構建請求頭 useragent use...

Python爬蟲實戰爬取鏈家網二手房資料

買房裝修，是每個人都要經歷的重要事情之一。相對於新房交易市場來說，如今的二手房交易市場一點也不遜色，很多二手房的資訊剛剛掛出來，就被其他購房者拿下了。爬取鏈家網二手房資訊受害者位址匯入工具 import requests import parsel import time 請求網頁，爬取資料 f...

python爬蟲之鏈家鄭州二手房爬取

python爬蟲爬取鏈家二手房資訊

python爬蟲爬取鏈家二手房資訊

Python爬蟲實戰 爬取鏈家網二手房資料

相關推薦

Python爬蟲實戰爬取鏈家網二手房資料