Jsoup爬蟲任務總結

這兩周由於公司需要大量資料爬取進資料庫給使用者展示素材，在不停的做爬蟲工作，現在總算基本完成就剩清理資料的工作；

公司有乙個採集器管理後台的專案，可以直接把爬蟲**打包成jar匯入進去設定定時引數即可；

關於jsoup的一些命令使用示例:

解析html文件:

document doc = jsoup.parse(html);
從乙個url載入乙個document:

document doc = jsoup.connect("url").get();
示例乙個通常的爬蟲** :

public void testaddsbkk88epubdata()

} catch (exception e) finally }}

示例乙個模擬登入****:

private void login(string username, string password)

webelement phonelogin = driver.findelement(by.xpath("//a[@class='do-phone-login']"));

while(phonelogin == null) catch (interruptedexception e)

}phonelogin.click();

webelement lastlogin = driver.findelement(by.xpath("//a[@class='sure-btn sure-success phone-login-btn']"));

while(lastlogin == null) catch (interruptedexception e)

}driver.findelement(by.xpath("//input[@name='phone' and @class='form-put']")).clear();

driver.findelement(by.xpath("//input[@name='phone' and @class='form-put']")).sendkeys(username);

driver.findelement(by.xpath("//input[@name='passwd' and @class='form-put']")).clear();

driver.findelement(by.xpath("//input[@name='passwd' and @class='form-put']")).sendkeys(password);

lastlogin.click();

log.info("點選登入賬號中~~~~~~~~~");

getpage();

// return focus to main window

// driver.switchto().defaultcontent();

}this.driver = new firefoxdriver(firefoxprofile);

Jsoup爬蟲學習筆記

jsoup org.jsoupgroupid jsoupartifactid 1.10.2version dependency 本次實驗物件以京東為例 key是要搜尋的關鍵字 public list querykey string key catch ioexception e 所有你在js中可以使...

網路爬蟲（二） Jsoup的使用

這裡的getelementsbytags得到的是乙個類似陣列，所以需要取第乙個值，即first，text的內容得到的是標籤內的文字內容，這裡可能會有乙個疑問，為什麼jsoup都可以直接得到網頁的內容了，還需要httpclients closeablehttpclient 因為在實際開發中，要用到多執...

jsoup爬蟲，專案實戰，歡迎收看

import com.mongodb.basicdbobject import com.mongodb.dbcollection import org.jsoup.jsoup import org.jsoup.nodes.document import org.jsoup.nodes.element...

Jsoup爬蟲任務總結

Jsoup爬蟲學習筆記

網路爬蟲（二） Jsoup的使用

jsoup爬蟲，專案實戰，歡迎收看

相關推薦