xpath
xpath常用規則
text = '''
'''
from lxml import etree
selector = etree.html(text)
result = selector.xpath('//*')
print(result)
輸出
[, , , , , , , , , , , , , ]
from lxml import etree
selector = etree.html(text)
result = selector.xpath('//li/a')
print(result)
輸出
[, , , , ]
from lxml import etree
selector = etree.html(text)
result = selector.xpath('//li/..')
print(result)
輸出
from lxml import etree
selector = etree.html(text)
result = selector.xpath('//li[@class="item-0"]')
print(result)
輸出
[, ]
from lxml import etree
selector = etree.html(text)
result1 = selector.xpath('//li[@class="item-0"]/text()')
result2 = selector.xpath('//li[@class="item-0"]/a/text()')
print(result1)
print(result2)
輸出
['\n ']
['first item', 'fifth item']
注://li[@class="item-0"]/text()得到['\n '] 因"/"是獲取直接子節點
from lxml import etree
selector = etree.html(text)
result = selector.xpath('//li[@class="item-0"]/a/@href')
print(result)
輸出
['link1.html', 'link5.html']
from lxml import etree
text1 = '''
first item
'''selector = etree.html(text1)
result1 = selector.xpath('//li[@calss="li"]/a/text()')
result2 = selector.xpath('//li[contains(@class,"li")]/a/text()')
print(result1)
print(result2)
輸出
['first item']
from lxml import etree
text2 = '''
first item
'''selector = etree.html(text2)
result = selector.xpath('//li[contains(@class,"li") and @name="item"]/a/text()')
print(result
輸出
['first item']
from lxml import etree
text = '''
'''selector = etree.html(text)
result1 = selector.xpath('//li[1]/a/text()')
print(result1)
result2 = selector.xpath('//li[last()]/a/text()')
print(result2)
result3 = selector.xpath('//li[position()<3]/a/text()')
print(result3)
result4 = selector.xpath('//li[last()-2]/a/text()')
print(result4)
輸出
['first item']
['fifth item']
['first item', 'second item']
['third item']
from lxml import etree
text3 = '''
'''selector = etree.html(text3)
result1 = selector.xpath('//li[1]/ancestor::*')
print(result1)
result2 = selector.xpath('//li[1]/ancestor::div')
print(result2)
result3 = selector.xpath('//li[1]/attribute::*')
print(result3)
result4 = selector.xpath('//child::a[@href="link1.html"]')
print(result4)
result5 = selector.xpath('//li[1]/descendant::span')
print(result5)
result6 = selector.xpath('//li[1]/following::*[2]')
print(result6)
result7 = selector.xpath('//li[1]/following-sibling::*')
print(result7)
輸出
[, , , ]
['item-0']
[, , , ]
Python 爬蟲 XPATH使用總結
xpath的常用規則 nodename 選取此節點的所有子節點 從當前節點擊取直接子節點 從當前節點擊取子孫節點 選取當前節點 選取當前節點的父節點 選取屬性 匹配規則 lang eng from lxml import etree text html etree.html text result ...
XPath 語法總結
xpath 是一門在 xml 文件中查詢資訊的語言。xpath 可用來在 xml 文件中對元素和屬性進行遍歷。xpath 是 w3c xslt 標準的主要元素,並且 xquery 和 xpointer 都構建於 xpath 表達之上。因此,對 xpath 的理解是很多高階 xml 應用的基礎。xpa...
Xpath語法總結
xpath 使用路徑表示式來選取 xml 文件中的節點或者節點集。這些路徑表示式和我們在常規的電腦檔案系統中看到的表示式非常相似。在 xpath 中,有七種型別的節點 元素 屬性 文字 命名空間 處理指令 注釋以及文件節點 或稱為根節點 xml 文件是被作為節點樹來對待的。樹的根被稱為文件節點或者根...