爬蟲基礎爬取糗事百科內容和標題

import re
import time
from urllib import request
# 作業2: 爬取糗事百科文字頁的所有段子,結果如 : xx說: ***x
#    # 1表示頁碼
#   regcom = re.compile('(.*?)', re.s)
#	# 獲取名稱
#   namecom = re.compile('', re.s)
#	# 獲取內容
#	contentcom = re.compile('(.*?)', re.s)
headers = 
def getdata(url):
#構建請求物件
req = request.request(url,headers=headers)
response = request.urlopen(req)
html = response.read().decode()
regcom = re.compile('(.*?)', re.s)
comment_list = regcom.findall(html)  #返回的是乙個列表
# print(comment_list)
item_list = 
for comment in comment_list:
namecom = re.compile('', re.s)
name = namecom.findall(comment)[0].strip()
# print(name)
contentcom = re.compile('(.*?)', re.s)
content = contentcom.findall(comment)[0].strip()
# print(content)
return item_list
if __name__ == "__main__":
# 所有資料
alldata = 
# [,,,,...]
# 遍歷每一頁的資料
for i in range(1, 10):
url = "" + str(i) + "/"
list1 = getdata(url)
# print(list1)
alldata.extend(list1)
time.sleep(0.5)
# 遍歷alldata 把資料顯示
for dict1 in alldata:
print("%s 說： %s" % (dict1["name"], dict1["content"]))

千山萬水總是情，點個關注行不行。

python爬取糗事百科的標題和內容

這篇文章基於python3來編寫，這裡使用來xpath來解析資料，由於糗事百科的反爬機制，這裡我們需要加入header資訊，我認為最主要的就是解析資料這塊，我推薦這個部落格，博主由淺入深的解釋來如何來使用xpath來獲取我們需要的節點，在xpath中，返回的是乙個元素，我們可以繼續對這個元素進行xp...

PYTHON爬蟲學習糗事百科內容爬取

改了半天，終於按照自己的設想把這東西做出來了，趕快把自己的心得寫下來。首先上原始碼先觀察各網頁間的規律，構建出變數，通過for迴圈實現多頁內容的爬取構建乙個自定義的函式，來爬取我們想要的內容開始還是模擬chrome瀏覽器進行訪問。因為爬取的主要是使用者的id和使用者發表的段子這倆部分內容，...

爬取糗事百科段子內容

import requests,sqlite3,re class processdatatool object 資料處理的工具類工具類中一般不寫 init 初始化屬性，只封裝工具方法對資料進行操作。工具類中的方法一般是以工具類居多。classmethod def process data cls,...

爬蟲基礎 爬取糗事百科內容和標題

python爬取糗事百科的標題和內容

PYTHON爬蟲學習 糗事百科內容爬取

爬取糗事百科段子內容

相關推薦

爬蟲基礎爬取糗事百科內容和標題

PYTHON爬蟲學習糗事百科內容爬取