java爬取捧腹網段子

先上效果圖：

準備工作：

/**
* 建立http連線
*/public static string connect(string address) 
} catch (exception e)  finally  catch (exception e) 
}return stringbuffer.tostring();
}

/**
* 用於將內容寫入到磁碟檔案
* @param alltext
*/private static void writetofile(string alltext) 
if (!targetfile.exists()) 
bos = new bufferedoutputstream(new fileoutputstream(targetfile, true));
bos.write(alltext.getbytes());
} catch (ioexception e)  finally  catch (ioexception e) }}
system.out.println("寫入完畢。。。");
}

引入jsoup的jar包（用於解析dom）：

org.jsoup jsoup 1.11.2

開始分析**:捧腹網段子

首先找到我們需要的內容（作者、標題和正文）

檢視其元素，我這裡檢視的是標題標籤：

知道其結構之後，就可以獲取我們想要的內容了：

public static void main(string args) 
system.out.println("第" + i + "頁內容爬取完畢。。。");
}//將內容寫入磁碟
test.writetofile(alltext.tostring());
}

GO 並的爬取捧腹的段子

位址的規律第一頁第二頁第三頁檢視每個頁面的原始碼，可以看到標題的 if re1 nil 取內容 tmptitle re1.findallstringsubmatch result,1 因為我只過濾第乙個內容 for data range tmptitle 取關鍵資訊，內容 re2 regex...

段子網爬取段子

2re提取標題和內容問題一複製網頁源寫入txt,更改字尾為html發現開啟後頁面不一樣搜尋知道網頁顯示還需要其他支撐問題二標題和內容數目不對應檢查發現re寫的不全，下次應先檢查時先看網頁顯示介面找排版規律，再看元素規律，類似實現時可加上print num 來檢驗內容標題數目是否對應...

Python實戰爬蟲爬取段子

不管三七二十一我們先導入模組段子所在的 import re import requests 如果沒這模組執行cmd pip install requests領域 web開發，爬蟲，資料分析，資料探勘，人工智慧零基礎到專案實戰，7天學習上手做專案獲取的內容段子所在的 import re im...

java爬取捧腹網段子

GO 並的爬取捧腹的段子

段子網爬取段子

Python實戰爬蟲 爬取段子

相關推薦

Python實戰爬蟲爬取段子