Python爬蟲實戰獲取糗事百科上面的笑話

目的：獲取糗事百科的笑話，並儲存為檔案。

結果：

#!/usr/bin/python3
#-*- coding: utf-8 -*-
import urllib.request
import re
#瀏覽器偽裝池，將爬蟲偽裝成瀏覽器，避免被**遮蔽
opener = urllib.request.build_opener()
opener.addheaders = [headers]
urllib.request.install_opener(opener)
#獲取所有的笑話內容
weburl = ""
webcontent=urllib.request.urlopen(weburl).read().decode("utf-8", "ignore")
matchpat = '.*?(.*?)'
jokes = re.compile(matchpat, re.s).findall(webcontent)
jokecount = len(jokes)
print("笑話總數量：" + str(jokecount))
for idx in range(jokecount):
#顯示所有笑話
print("*****==")
joke = re.sub(r'\n', "", str(jokes[idx]))
joke = re.sub(r'
', "", joke)
print(joke)
#將每個笑話儲存為乙個檔案
filename = "joke_" + str(idx+1) + ".txt"
fo = open(filename, "wb")
fo.write(joke.encode("gbk" ,"ignore"))
fo.close()

Python爬蟲實戰糗事百科

前面我們已經說了那麼多基礎知識了，下面我們做個實戰專案來挑戰一下吧。這次就用前面學的urllib和正規表示式來做，python爬蟲爬取糗事百科的小段子。爬取前我們先看一下我們的目標 1.抓取糗事百科熱門段子 2.過濾帶有的段子首先我們確定好頁面的url，糗事百科的是但是這個url不方便我們後面...

爬蟲實戰糗事百科

閒來無聊，在網上按照教程寫了乙個python爬蟲，就是竊取資料然後儲存下來爬蟲實戰糗事百科。從糗百上爬取段子，然後輸出到console，我改了一下儲存到了資料庫。不扯沒用的，直接上這是爬取得部分 usr bin python coding utf 8 import urllib import u...

python爬蟲糗事百科

coding utf 8 import urllib2 import re 工具類 class tools object remove n re.compile r n replace br re.compile r remove ele re.compile r re.s rs 引數，要進行替換的...

Python爬蟲實戰 獲取糗事百科上面的笑話

Python爬蟲實戰 糗事百科

爬蟲實戰 糗事百科

python爬蟲糗事百科

相關推薦

Python爬蟲實戰獲取糗事百科上面的笑話

Python爬蟲實戰糗事百科

爬蟲實戰糗事百科