（Bug修復）C 爬蟲，讓你不再覺得神秘

修復載入https**中午亂碼，導致node解析失敗的問題

****：

// from file 從檔案獲取html資訊
var doc = new htmldocument();
doc.load(filepath);
// from string 從字串獲取html資訊
var doc = new htmldocument();
doc.loadhtml(html);
// from web   從**獲取html資訊
var url = "";
var web = new htmlweb();
var doc = web.load(url);

var web = new htmlweb();
var doc = web.load(url);

在web中我們還可以設定cookie、headers等資訊，來處理一些特定的**需求，比如需要登陸等。

網頁在你檢視網頁源**之後只是一段字串，而爬蟲所做的就是在這堆字串中，查詢到我們想要的資訊，挑選出來。

以往的篩選方法：正則（太麻煩了，寫起來有些頭疼）

htmlagilitypack 支援通過xpath來解析我們需要的資訊。

網頁右鍵檢查

通過xpath就可以準確獲取你想要元素的全部資訊。

獲取選中元素

var web = new htmlweb();
var doc = web.load(url);
var htmlnode = doc?.documentnode?.selectsinglenode("/html/body/header")

獲取元素資訊

htmlnode.innertext;
htmlnode.innerhtml;
//根據屬性取值
htmlnode?.getattributevalue("src", "未找到")

///

public static class loadhtmlhelper

/// /// 獲取單個節點擴充套件方法

///

/// 文件物件

/// xpath路徑

///

public static htmlnode getsinglenode(this htmldocument htmldocument, string xpath)

/// /// 獲取多個節點擴充套件方法

///

/// 文件物件

/// xpath路徑

///

public static htmlnodecollection getnodes(this htmldocument htmldocument, string xpath)

/// /// 獲取多個節點擴充套件方法

///

/// 文件物件

/// xpath路徑

///

public static htmlnodecollection getnodes(this htmlnode htmlnode, string xpath)

/// /// 獲取單個節點擴充套件方法

///

/// 文件物件

/// xpath路徑

///

public static htmlnode getsinglenode(this htmlnode htmlnode, string xpath)

///

/// 位址

/// 檔案路徑

C 爬蟲，讓你不再覺得神秘

from file 從檔案獲取html資訊 var doc new htmldocument doc.load filepath from string 從字串獲取html資訊 var doc new htmldocument doc.loadhtml html from web 從獲取html資...

程式設計師，請盡早修復你的Bug

一旦進入軟體開發的生命週期，bug就不可避免地隨之而來。關於是在軟體開發生命週期的早期還是後期實施和發布後去修復bug的問題上，產生過許多激烈的討論。軟體開發人員總體認為早期修復bug是最優的策略。無論是在哪個發展階段，修復bug都非常耗時，而且置之不理會產生一定的成本。越到後期去修復bug，出現...

5種食物讓你百毒不侵

身處現代的你，會不會覺得還是古代比較好？那時候不僅不會有噪音，也沒有空氣及水的汙染，大家可以悠閒地過日子，只要把肚子填飽就可以了。但是現在，不僅無法解決這些外在因素，連最基本的身體健康都會時常出問題。因為外在環境的影響，人的體內或多或少都會受到一些汙染，常會導致身體抗過敏性的機能減退，以及癌症等現代...

（Bug修復）C 爬蟲，讓你不再覺得神秘

C 爬蟲，讓你不再覺得神秘

程式設計師，請盡早修復你的Bug

5種食物讓你百毒不侵

相關推薦