爬取中國大學排名（前567）

import requests
from bs4 import beautifulsoup
import re
alluniv = 
def gethtmltext(url):
try:
r = requests.get(url,timeout = 30)
r.raise_for_status()
r.encoding = 'utf-8'
return r.text
except:
return ""
findpaiming = re.compile(r'[\u4e00-\u9fa5]+')
findshuzi = re.compile(r'\d\d\d+\.\d|\d\d\d')
def fillunivlist(soup):
paiming = 1
for item in soup.find_all('tr'): #遍歷tr開頭的，從而用tr將各個學校的資訊分開
item = str(item)
link = re.findall(findpaiming,item) #利用正規表示式，找出中文字串【學校，省市，培養規模】
link2 = re.findall(findshuzi, item) #利用正規表示式，找出三個數字或者四個數字帶小數點，也就是學校的總分
if(len(link)==3): #因為存在中國地質大學 武漢這種，會把中國地質大學 和 武漢 分別提取成文字
if(len(link2)<=2): #這個是總分，但是存在將文字裡面超過100排名的數字提出，從而存在兩個數字，等於1直接假如，等於2加入第二個值
if(len(link2)==1):
else:
paiming = paiming + 1
if (len(link) == 4): #因為存在中國地質大學 武漢這種，會把中國地質大學 和 武漢 分別提取成文字 ，合併這著兩個值
data = [str(link[0])+'('+str(link[1])+')',link[2],link[3]]
if (len(link2) <= 2):
if (len(link2) == 1):
else:
paiming = paiming + 1
def printunivlist(num):
print("".format("排名",'學校名稱','省市','總分','培養規模'))
for i in range(num):
u = alluniv[i]
print("".format(u[3],u[0],u[1],u[4],u[2]))
def main(num):
url = ''
html = gethtmltext(url)
soup = beautifulsoup(html,'html.parser')
fillunivlist(soup)
printunivlist(num)
main(567)

爬取中國大學排名情況（前100）

import requests from bs4 import beautifulsoup alluniv 獲取所要爬取的html文字內容 defgethtmltext url try r requests.get url,timeout 30 r.raise for status return r...

中國大學排名定向爬取

步驟一從網路上獲取大學排名網頁內容步驟二提取網頁內容中資訊到合適的資料結構步驟三利用資料結構展示並輸出結果通過右鍵檢視其網頁源可得到如下介面我們在這個介面找到如下資訊，可以發現，這些資訊是在tbody標籤下的，tr下面的td中就是我們想要爬取的資訊。我們僅爬取前四個td值進行返回，第...

中國大學排名的爬取

功能描述輸入大學排名的url鏈結輸出大學排名資訊的螢幕輸出排名，大學名稱，總分技術路線 requests bs4 定向爬蟲進隊輸入url進行爬取，不擴充套件爬取程式的結構設計步驟一從網路上獲取頁面資訊 gethtmltext 步驟二提取網頁內容中資訊到合適的資料結構 fillu...

爬取中國大學排名（前567）

爬取中國大學排名情況（前100）

中國大學排名定向爬取

中國大學排名的爬取

相關推薦