簡單的爬取中國天氣網某個城市七天的天氣預報資料

首先需要了解需要爬取的資料的網頁的結構，其結構如下圖所示（只擷取了部分）：

從圖中可以看出每一天的天氣資料都被乙個包含，這七天的資料又都包含在乙個中。

所以我們的目標就是獲取ul下每個li中的資料。注意，「今天」的資料中溫度只有乙個數值，而其後6天的

資料溫度都有兩個數值，需要單獨處理。

**如下（ide：pycharm）：

from bs4 import beautifulsoup
from bs4 import unicodedammit
import urllib.request
import sqlite3
class weatherdb:
def opendb(self):
# 使用sqlite3建立的weathers資料庫
self.con = sqlite3.connect("weathers.db")
self.cursor = self.con.cursor()
try:
self.cursor.execute("create table weathers (wcity varchar(16),wdate varchar(16),wweather varchar(64),wtemp varchar(32),constraint pk_weather primary key(wcity,wdate))")
except:
self.cursor.execute("delete from weathers")
def closedb(self):
self.con.commit()
self.con.close()
def insert(self,city,date,weather,temp):
try:
self.cursor.execute("insert into weathers(wcity,wdate,wweather,wtemp) values(?,?,?,?)",(city,date,weather,temp))
except exception as err:
print(err)
def show(self):
self.cursor.execute("select * from weathers")
rows = self.cursor.fetchall()
print("%-16s%-16s%-32s%-16s"%("city","date","weather","temp"))
for row in rows:
print("%-16s%-16s%-32s%-16s"%(row[0],row[1],row[2],row[3]))
class weatherforecast:
def __init__(self):
# 構造請求頭，模擬瀏覽器
self.headers=
# 要爬取的四個城市的名稱及其在中國天氣網所對應的**
self.citycode=
def forecastcity(self, city):
if city not in self.citycode.keys():
print(city+"code cannot be found")
return
# 要訪問的**
url = ""+self.citycode[city]+".shtml"
try:
#構造request的引數
req = urllib.request.request(url, headers=self.headers)
data = urllib.request.urlopen(req)
data = data.read()
dammit = unicodedammit(data, ["utf-8","gbk"])
# data中就是返回來的整個網頁
data = dammit.unicode_markup
soup = beautifulsoup(data, features="html.parser")
# 借助beautifulsoup查詢class是『t clearfix'的ul中所有的li
lis = soup.select("ul[class='t clearfix'] li")
# 用來區分「今天」和其餘6天
n=0for li in lis:
try:
# 獲取li下h1中的文字值
date = li.select('h1')[0].text
# 獲取標籤li下class是「wea」的p標籤下的文字值
weather = li.select('p[class="wea"]')[0].text
if n>0:
# 對應其餘六天，有2個溫度需要提取，獲取標籤li下class是「tem」的p標籤下的span標籤的文字值
temp = li.select('p[class="tem"] span')[0].text+"/"+li.select('p[class="tem"] i')[0].text
else:
# 對應「今天」，有1個溫度需要提取，獲取標籤li下class是「tem」的p標籤下的i標籤的文字值
temp = li.select('p[class="tem"] i')[0].text
print(city,date,weather,temp)
n=n+1
# 將資料插入資料庫
self.db.insert(city,date,weather,temp)
except exception as err:
print(err)
except exception as err:
print(err)
def process(self,cities):
self.db = weatherdb()
self.db.opendb()
for city in cities:
self.forecastcity(city)
self.db.closedb()
ws = weatherforecast()
ws.process(["北京","上海","廣州","深圳"])
print("comploted")

執行上述**後，可以看到如下結果：

我也不是很懂，大家多多指教。

Python爬取中國天氣網指定城市天氣

功能完整 import pandas as pd import requests import re 建立乙個字典儲存中國天氣網城市 def createcitycode fh r text 中國天氣網城市 csv data pd.read csv fh,engine python data da...

Python爬取中國天氣網天氣資料

由於一些需要，想要獲取今天的天氣資料，於是又撿起了python寫了個爬蟲用來獲取中國天氣網上的氣象資料。由於我需要的資料比較簡單，因為我只需要北京地區當天的溫度最低溫度和最高溫度和天氣，因此部分比較簡單，下面就來講講這個爬取的過程。第一步網頁分析要進行爬蟲設計，首先得分析網頁的請求過程。首...

Python爬取中國天氣網中的蘇州天氣

我選擇的是中國天氣網中的蘇州天氣，準備抓取最近7天的天氣以及最高最低氣溫程式開頭我們新增 coding utf 8這樣就能告訴直譯器該py程式是utf 8編碼的，源程式中可以有中文。要引用的包 requests 用來抓取網頁的html源 csv 將資料寫入到csv檔案中 random 取隨機數...

簡單的爬取中國天氣網某個城市七天的天氣預報資料

Python爬取中國天氣網指定城市天氣

Python爬取中國天氣網天氣資料

Python爬取中國天氣網中的蘇州天氣

相關推薦