python爬蟲注意點

1.從a標籤下獲取內容，是string

#角色
'''主演：
張譯黃景瑜海清
'''actors=li.find('p',attrs='pactor')
act=''
for actor in actors:
act+=actor.string+' '

2.從標籤下獲取內容，是text

#介紹 '''簡介：作為遠達建築公司的副總監楊維（王健飾），工作上處處受到上級和同事的打壓，家庭中妻子（王妍飾）對其也不尊重。各種的壓迫下，導致楊維走上歧途。將周燕（呂小漫飾）、白亞楠（徐藝涵飾）、沈美玲（劉雨晴飾）分別抓到自己的地窖中，將其虐待......地窖外面，三位女性的親人苦苦尋找，白亞楠的父親白景山（梁岩飾）和周燕... 展開全部  '''instroture=li.find('p',attrs=).text

print (instroture)

python爬蟲報錯 attributeerror: 'nonetype' object has no attribute 'text'，這是

time=li.find('span'
,attrs=).text，整個html中沒有span 這種型別

這種情況下try  except，說明情況即可：

try:  
time=li.find('span',attrs=).text  
print (time)  
except:  
print ('還沒上映')

4.python的路徑問題

requests.exceptions.missingschema: invalid url '': no schema supplied. perhaps you meant http:imgwx2.2345.com/dypcimg/img/8/65/sup196226_223x310.jpg?1525231260?

將路徑前加上『http:』

5.獲取span標籤下的a標籤下的內容：

#獲取影片的名字
'''媽媽咪鴨
'''name=li.find('span',attrs=).a.text
#print (name)

6.這個問題真的是需要初學者注意，因為沒有系統的學習，很多問題只能是自己去碰

首先獲取物件的方法是soup.find（'標籤','屬性'）

獲取的集合方法是soup.find_all（'標籤','屬性'）

當我們獲取物件時：

ul=soup.find('ul',attrs=)

我們是可以遍歷的:

for li in ul:
#print (li)
name=li.find('a',attrs=).text
print (name)

但是這時就會報錯：

name=li.find('a',attrs=).text
typeerror: find() takes no keyword arguments

但是html中確實是有的！！！

怎樣解決呢？？？這裡就是要提到find_all這個方法了，只有find_all獲取的方法，才能遍歷從中獲取資料！！！！

這樣：

#獲取物件
ul=soup.find('ul',attrs=)
#獲取集合
li_list=ul.find_all('li',attrs=)
for li in li_list:
#print (li)
name=li.find('a',attrs=).text
print (name)

這樣就能獲取到了

7.按屬性查詢，和直接查詢

+0.658

我們這樣查詢不到！！

time=li.find('h3',attrs=).text

這有一層一層的屬性查詢

sco=li.find('div',attrs=).h3.text

python爬蟲注意點

python 爬蟲的注意事項

python中若干注意點

Python一點注意

python爬蟲注意點

python 爬蟲的注意事項

python中若干注意點

Python一點注意

相關推薦