爬蟲學習 5 17

1.用text() 獲取文字資訊

2.通過url_infos = selector.xpath('//div[@class="article block untagged mb15"]')定位至『迴圈點』

4.start-with()可以獲取多個類似標籤的內容

3. string(.)方法可用於標籤套標籤的情況\

res = requests.get(url,headers=headers)
ids = re.findall('',res.text,re.s)
contents = re.findall('.*?(,*?)',res.text,re.s)
laughs =  re.findall('(/d+)',res.text,re.s)
for id, content, laugh, comment, in zip(ids,contents,laughs,comments):
info=
return info               #只返回資料，不儲存

res = requests.get(url,headers = headers)

soup = beautifulsoup(res.text,'html.parser')

ids = soup.select(' a > h2')

contents = soup.select(' div > span')

laughs = soup.select(' span.stats-vote > i')

comments = soup.select('i.number')

for id, content, laugh, comment, in zip(ids, contents, laughs, comments):

info =

return info

res = requests.get(url,headers = headers)
etree = html.etree
selector = etree.html(res.text)
url_infos = selector.xpath('//div[@class="article block untagged mb15"]')
try:
for url_info in url_infos:
id = url_info.xpath('div[1]/a[2]/h2/text()')[0]
content = url_info.xpath('a[1]/div/span/text()')[0]
laugh = url_info.xpath('div[2]/span/span[2]/i/text()')[0]
comment = url_info.xpath('a[1]/div/span/text()')[0]
info=
return info
except indexerror:
pass         #pass掉indexerror 異常

regular expressions 14.73269534111023

beautifulsoup 17.247904062271118

lxml 14.632810115814209

HTML CSS學習筆記517

時如逝水，永不回頭。一晃，物是人非事事休。更多中間沒有太多的細節處理，所以頁面不完美，其中有乙個屬性名忘記了拼寫的方式，就是圓角屬性名 radius，這個標籤全部是 border radius。浮動並不是乙個很好的屬性，因為他會影響其他元素，所以我們需要清除浮動。為什麼要浮動？為什麼要清除浮動？為...

python學習打卡（5 17

函式定義函式是使用def語句定義的。函式由語句塊組成，它們從外部接受值引數並可能返回乙個或多個值計算結果 def hello name return hello,name print hello gumby hello,gumby 在函式內部給引數賦值對外部沒有任何影響 mynames in...

5 17練習總結

今天做了四道題，但是這裡只放三道，最後一道題我不會，第一題和第二題有錯誤。題目描述小王同學在座標系的 0，0 處，但是他找不到考試的試場，於是一邊走路一邊問路，每個被問路的人會告訴他乙個指令包括走路或轉彎現在請編乙個程式，顯示他每次走路後的座標彎後坐標不變，所以不必顯示座標初始方向向 y...

爬蟲學習 5 17

HTML CSS學習筆記517

python學習打卡（5 17

5 17練習總結

相關推薦