你要是比較懶的話也可以先爬取城市作為列表並用迴圈方式生成年份與月份import requests
from requests.exceptions import requestexception
from bs4 import beautifulsoup
import os
import csv
import time
def get_one_page
(url)
:'''
獲取網頁
('正在載入'
+url)
headers=
try:
response = requests.
get(url,headers=headers)
if response.status_code ==
200:
return response.content
return none
except requestexception:
return none
def parse_one_page
(html)
:'''
對網頁內容進行解析
'''soup =
beautifulsoup
(html,
"lxml"
) info = soup.
find
('div'
, class_=
'wdetail'
) rows=
tr_list = info.
find_all
('tr')[
1:] # 使用從第二個tr開始取
for index, tr in
enumerate
(tr_list)
: # enumerate可以返回元素的位置及內容
td_list = tr.
find_all
('td'
) date = td_list[0]
.text.
strip()
.replace
("\n",""
) # 取每個標籤的text資訊,並使用replace
()函式將換行符刪除
weather = td_list[1]
.text.
strip()
.replace
("\n",""
).split
("/")[
0].strip()
temperature_high = td_list[2]
.text.
strip()
.replace
("\n",""
).split
("/")[
0].strip()
temperature_low = td_list[2]
.text.
strip()
.replace
("\n",""
).split
("/")[
1].strip()
rows.
((date,weather,temperature_high,temperature_low)
)return rows
cities =
['chengdu'
,'aba'
,'bazhong'
,'dazhou'
,'deyang'
,'ganzi'
,'guangan'
,'guangyuan'
,'leshan'
,'luzhou'
,'meishan'
,'mianyang'
,'neijiang'
,'nanchong'
,'panzhihua'
,'scsuining'
,'yaan'
,'yibin'
,'ziyang'
,'zigong'
,'liangshan'
]years =
['2012'
,'2013'
,'2014'
,'2015'
,'2016'
,'2017'
,'2018'
]months =
['01'
,'02'
,'03'
,'04'
,'05'
,'06'
,'07'
,'08'
,'09'
,'10'
,'11'
,'12'
]if __name__ ==
'__main__'
: # os.
chdir
() # 設定工作路徑
for city in cities:
with
open
(city +
'_weather.csv'
,'a'
, newline='')
as f:
writer = csv.
writer
(f) writer.
writerow([
'date'
,'weather'
,'temperature_high'
,'temperature_low'])
for year in years:
for month in months:
url =
''+city+
'/month/'
+year+month+
'.html'
html =
get_one_page
(url)
content=
parse_one_page
(html)
writer.
writerows
(content)
(city+year+month+
' is ok!'
) time.
sleep(2
)
**可能還需優化,歡迎指教import requests
from bs4 import beautifulsoup
import re
url =
''headers=
response = requests.
get(url,headers=headers)
html = response.content
soup =
beautifulsoup
(html,
"lxml"
)results = soup.
find
('a'
, title=
'四川歷史天氣預報'
).parent.next_sibling # 爬取其他地方也行,只需將title裡的四川更換
patterns= re.
compile
('href=.*?/lishi/([a-z]+).*?html'
)cities = re.
findall
(patterns,
str(results)
) years =
for i in
range(7
):years.
('20'
+str
(i+12).
zfill(2
))months =
for i in
range(12
):months.
(str
(i+1).
zfill(2
))
Python 爬蟲,爬取歷史天氣資料
先上原始碼 這次用的是beautifulsoup,解析html,非常的便捷 import datetime import pandas as pd import re import requests import time from bs4 import beautifulsoup headers ...
python爬取靜態網頁歷史天氣資料
利用python庫 requests 和 beautifulsoup 對靜態網頁內容爬取 這裡給出的例子是對乙個天氣 的歷史天氣進行爬取 待更新 附python 一般網頁都會有 robots.txt 檔案,用來記錄使用者對資料和表單內容的許可權。直接在主頁後面加 robots.txt 即可訪問到。例...
Python爬取中國天氣網天氣資料
由於一些需要,想要獲取今天的天氣資料,於是又撿起了python寫了個爬蟲用來獲取中國天氣網上的氣象資料。由於我需要的資料比較簡單,因為我只需要北京地區當天的溫度 最低溫度和最高溫度 和天氣,因此 部分比較簡單,下面就來講講這個爬取的過程。第一步 網頁分析 要進行爬蟲設計,首先得分析網頁的請求過程。首...