Python爬取微博熱搜榜，將資料存入資料庫

這裡是用來爬取微博熱搜榜的資料，網頁位址為開啟網頁並按下f12進入開發者模式，找到...裡的內容，如圖所示：

href後面的內容即為對應的中文編碼的原始碼，其中很多25應該是干擾字元，後面刪掉解析就可以發現是微博熱搜的標題。我數了下，一共有27個，剛好第乙個標題為「比伯願為賽琳娜捐腎」九個字，乙個漢字佔三個字元，一共27個。

我用的是python3.6.0，開發工具為pycharm2017.2.3，資料庫為mysql。

#-*-coding:utf-8-*-
importurllib, pymysql, requests, re
# 配置資料庫
config=# 
鏈結資料庫
conn=pymysql.connect(
**config)
cursor=conn.cursor()
# 獲取熱搜原始碼
weibohotfile=requests.get(
'')
weibohothtml=weibohotfile.text
# 正規表示式匹配
url，找到
title
hotkey=re.compile(
r'td class=\\"td_05\\">)
hotkeylistbe=hotkey.findall(weibohothtml)
rank=1
# 遍歷獲取的
title
列表fortitleinhotkeylistbe:# 
去除干擾數字
title=title.replace(
'25','')
url=''+title
title=urllib.parse.unquote(title)
print
(str
(rank)
+' '+title+' '+' '+url+'\n')
# 執行資料語句
sql='insert into hotsearch (rank, daydate, mindate, title, url) values (%s, curdate(), curtime(), %s, %s)'cursor.execute(sql, (rank, title, url))
rank+=1
conn.commit()
cursor.close()
conn.close()

這裡要說明一下，資料庫的連線，db是資料庫的名稱weibo（這個可以自己取名字），charset表示字符集為utf8，表的名稱為hotsearch，裡面有rank，daydate，mindate，title，url為表中的字段，作者未說明這些欄位的定義，我自己定義如下：

對於varchar的型別，如果預設，則其編碼模式為latin1，我剛開始不知道，執行py檔案一直報錯，然後網上搜尋了一下，需要將latin1改為utf8，這樣就對了。

最終可在資料庫中檢視匯入成功的資料。

當然，作者還寫了利用bat直接執行py檔案，這樣更快些，我在網上查了下，也學會了：

1、找到.py對應所在的檔案目錄並記錄下來，比如我的就是e:\***\pythonprojects\blogspider

2、建立乙個txt檔案，在檔案中寫入如下內容：

@echo off

cd e:\***\pythonprojects\blogspider

start python crawlerblog01.py

3、將txt檔案改為bat字尾，然後雙擊執行即可。

Python爬取微博熱搜榜，將資料存入資料庫

爬取新浪微博熱搜榜

python爬取微博熱搜

小鹽巴學習筆記用Python爬取微博熱搜

Python爬取微博熱搜榜，將資料存入資料庫

爬取新浪微博熱搜榜

python爬取微博熱搜

小鹽巴學習筆記 用Python爬取微博熱搜

相關推薦

小鹽巴學習筆記用Python爬取微博熱搜