實戰原生爬蟲

2.找到資料對應的網頁，充分了解所爬**的結構，確定爬取頁面和內容，找到資料所在的標籤位置（在谷歌瀏覽器按f12可出現htmls）

3.怎麼爬

模擬http請求，向伺服器傳送這個請求，獲取到伺服器返回給我們的html

用正規表示式提取我們要的資料（名字，人氣）

1.找定位標籤：盡量具有唯一性，盡量最接近要提取的資料，盡量選擇可以閉合的標籤

2.確定標籤寫正規表示式提取

3.寫函式分析

'''
this is a module
'''import re
from urllib import request
# 斷點除錯
#精練    def __refine(self,anchors):
l = lambda anchor:#strip內建函式去掉\n和空格
return map(l,anchors)
def __sort(self,anchors):
anchors = sorted(anchors,key=self.__sort_seed,reverse=true)#reverse決定公升序
return anchors
def __sort_seed(self,anchor):
r = re.findall('\d*',anchor['number'])
number = float(r[0])
if '萬' in anchor['number']:
number *=10000
return number
def __show(self,anchors):
for rank in range(0,len(anchors)):
print('rank' + str(rank+1)
+':'+anchors[rank]['name']
+"------"+anchors[rank]['number'])
def go(self):
htmls=self.__fetch_content()
anchors = self.__analysis(htmls)
anchors = list(self.__refine(anchors))
anchors = self.__sort(anchors)
self.__show(anchors)
spider=spider()
spider.go()

Python 十一原生爬蟲

一分析抓取目的確定抓取頁面爬取主播人氣排行二整理爬蟲常規思路爬蟲前奏明確目的找到資料對應的網頁分析網頁的結構找到資料所在的標籤位置模擬 http 請求，向伺服器傳送這個請求，獲取到伺服器返回給我們的html 用正規表示式提取我們要的資料名字，人數三 vscode中除錯 f5 啟...

Python 十一原生爬蟲

python爬蟲實戰

python python基礎 python快速教程 python學習路線圖 python大資料學習之路 python爬蟲實戰 python pandas技巧系量化小講堂 python機器學習入門資料梳理學習群大資料 python資料探勘2 323876621 r r語言知識體系怎樣學習r ...

實戰 原生爬蟲

Python 十一 原生爬蟲

Python 十一 原生爬蟲

python爬蟲實戰

相關推薦

實戰原生爬蟲

Python 十一原生爬蟲

Python 十一原生爬蟲