爬取校園新聞首頁的新聞

1. 用requests庫和beautifulsoup庫，爬取校園新聞首頁新聞的標題、鏈結、正文、show-info。

2. 分析info字串，獲取每篇新聞的發布時間，作者，**，攝影等資訊。

import
requests
from bs4 import
beautifulsoup
from datetime import
datetime
url = "
"res =requests.get(url);
res.encoding = "
utf-8
"soup = beautifulsoup(res.text, "
html.parser");
for news in soup.select("li"
):    
if len(news.select("
.news-list-title
")) > 0:  #
排除為空的li
time = news.select("
.news-list-info
")[0].contents[0].text
title = news.select("
.news-list-title
")[0].text
description = news.select("
.news-list-description
")[0].text
a = news.select('
a')[0].attrs['
href']
detail_res =requests.get(a)
detail_res.encoding = "
utf-8
"detail_soup = beautifulsoup(detail_res.text, "
html.parser")
print(detail_soup.select("
#content
")[0].text)  #
正文print
(time, title, description, a)
content = detail_soup.select("
#content
")[0].text
info = detail_soup.select("
.show-info
")[0].text
date_time = info.lstrip('
')[:19]
print
(info)
break
info = '
'detail_time = info.lstrip('
')[:19]
sh = info[info.find("
審核"):].split()[0].lstrip('')
print
(detail_time, sh)
info1 = '
'info1 = info1[info1.find("
作者"):info1.find('
')].lstrip('
').split()[1]
print
(info1)
now_time =datetime.now();
now_time.year
print(datetime.strptime(date_time, "
%y-%m-%d %h:%m:%s"))
print(now_time.strftime('
%y\%m\%d
'))

執行截圖：

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫，爬取校園新聞首頁新聞的標題鏈結正文 show info。2.分析info字串，獲取每篇新聞的發布時間，作者，攝影等資訊。import requests newsurl res requests.get newsurl 返回response物...

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫，爬取校園新聞首頁新聞的標題鏈結正文 show info。import requests from bs4 import beautifulsoup newsurl res requests.get newsurl res.encoding ...

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫，爬取校園新聞首頁新聞的標題鏈結正文。codding utf 8 author wf import requests from bs4 import beautifulsoup from datetime import datetime ur...

爬取校園新聞首頁的新聞

爬取校園新聞首頁的新聞

爬取校園新聞首頁的新聞

爬取校園新聞首頁的新聞

相關推薦