1. 用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題、鏈結、正文、show-info。
import requests
from bs4 import beautifulsoup
newsurl=''
res=requests.get(newsurl)
res.encoding='utf-8'
soup=beautifulsoup(res.text,'html.parser')
a=soup.select('li')[10].a.attrs['href']
print(a)
newsurl1=a
res2=requests.get(newsurl1)
res2.encoding='utf-8'
soup1=beautifulsoup(res2.text,'html.parser')
t=soup1.select('#content')[0].text
t2=soup1.select('.show-info')[0].text
print(t)
print(t2)
2. 分析info字串,獲取每篇新聞的發布時間,作者,**,攝影等資訊。
3. 將字串格式的發布時間轉換成datetime型別
datetime1=t5+' '+t3[1]
print(datetime1)
d2=datetime.strptime(datetime1,"%y-%m-%d %h:%m:%s")
print(d2)
4. 使用正規表示式取得新聞編號
newsurl2=''
newsurl3=re.search('(\d\.html)',newsurl2).group(1)
newsurl4=newsurl3.rstrip('.html')
print(newsurl4)
5. 生成點選次數的request url
url=''
res3=requests.get(url).text
print(res3)
6. 獲取點選次數
res4=res3.split('html')
res5=res4[-1].lstrip("('").rstrip("');")
print(res5)
7. 將456步驟定義成乙個函式 def getclickcount(newsurl):
def getclickcount(newsurl):
newsurl2=''
newsurl3=re.search('(\d\.html)',newsurl2).group(1)
newsurl4=newsurl3.rstrip('.html')
newid=''+newsurl4+'&modelid=80'
res3=requests.get(newid).text
print(res3)
res4=res3.split('html')
res5=res4[-1].lstrip("('").rstrip("');")
print(res5)
8. 將獲取新聞詳情的**定義成乙個函式 def getnewdetail(newsurl):
schoolurl=''
def getnewdetail(schoolurl):
res10=requests.get(schoolurl)
res10.encoding='utf-8'
soup10=beautifulsoup(res10.text,'html.parser')
b=soup10.select('#content')[0].text
print(b)
getnewdetail(schoolurl)
9. 嘗試用使用正規表示式分析show info字串,點選次數字串。
t2=soup1.select('.show-info')[0].text
爬取校園新聞首頁的新聞
1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。2.分析info字串,獲取每篇新聞的發布時間,作者,攝影等資訊。import requests newsurl res requests.get newsurl 返回response物...
爬取校園新聞首頁的新聞
1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。2.分析info字串,獲取每篇新聞的發布時間,作者,攝影等資訊。import requests from bs4 import beautifulsoup from datetime ...
爬取校園新聞首頁的新聞
1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文。codding utf 8 author wf import requests from bs4 import beautifulsoup from datetime import datetime ur...