Selenium PhantomJS使用初體驗

抓取使用ajax技術完成的網頁內容時可以使用selenium+phantomjs技術

1.pip install selenium

武漢科技大學首頁有一塊使用js非同步載入的網頁內容，如圖

抓取這一塊內容的思路是：判斷這一塊是否載入完畢；selenium抓取

在判斷載入完畢這一步可以判斷是否有'校企合作'出現

（ps：其實合理的做法是找非同步內容裡面的某個最後載入出來元素，但是這個例子裡面元素沒有多餘的特徵供選擇了）

1
#coding:utf-8
2from selenium import
webdriver
3from selenium.webdriver.common.by importby4
from selenium.webdriver.support.ui import
webdriverwait
5from selenium.webdriver.support import
expected_conditions as ec
67 driver = webdriver.phantomjs(executable_path = '
c://python27//scripts')
8 driver.get("
")910
try:
11     elment = webdriverwait(driver, 10).until(ec.presence_of_element_located((by.partial_link_text, '
校企合作
')))
12finally
:13     ul = driver.find_element_by_id('
infocont_137575764138965434_148645613741998292')
14     status = '
false:'15
if ul!=none:
16         lis = ul.find_elements_by_tag_name('li'
)17if lis==none:
18print('
查詢失敗')
19for li in
lis:
20             text = li.find_element_by_tag_name('a'
).text
21if text!=''
:22                 status = '
tuple:'23
print(status+text)
24     driver.close()

這段程式的執行步驟為：判斷是否有鏈結包含「校企合作」字串；

找id為infocont_137575764138965434_148645613741998292的ul標籤

找ul標籤裡面的li標籤

找li標籤裡的a標籤，並提取a標籤的text

值得注意的是：

windows系統需要在首行設定編碼；

使用webdriverwait判斷網頁載入狀況，比time.sleep效果更好；

非同步載入可能返回比顯示出來更多的li標籤，審查元素可以看到，但是網頁中沒有讓它顯示出來，因此需要判斷text!=''；

標籤不能直接跨層級查詢。

執行結果：

使用selenium獲取網頁動態資料初體驗

from selenium import webdriver from lxml import etree import time 將瀏覽器驅動程式放入此目錄例項化瀏覽器物件傳入瀏覽器驅動程式 bro webdriver.chrome executable path chromedriver.e...

開源專案 springboot plus 初體驗

體驗了下開源專案 springboot plus，可以作為後台開發平台腳手架。springboot plus 基於springboot 2 的管理後台系統,有數十個基於此的商業應用，包含了使用者管理，組織機構管理，角色管理，功能點管理，選單管理，許可權分配，資料許可權分配，生成等功能。相比其他開源的...

VS2008SP1下jQuery使用初體驗

說明在這個頁面裡引入了jquery類庫和jquery的api文件庫，為了啟用vs2008sp1的智慧型感知效果，可能需要更新一下visual studio的智慧型感知，步驟如下編輯 intellisence 更新jscriptintellisence 如下圖所示因為在這個頁面中有一處地方存在這...

Selenium PhantomJS使用初體驗

使用selenium獲取網頁動態資料初體驗

開源專案 springboot plus 初體驗

VS2008SP1下jQuery使用初體驗

相關推薦