python爬蟲小練習小說爬取

import requests
from bs4 import beautifulsoup
if __name__ ==
'__main__'
:    headers =
# 對首頁的頁面資料進行爬取
url =
''page_text = requests.get(url=url,headers=headers)
.text
# 在首頁中解析出章節的標題和詳情頁的url
# 1.例項化beautifulsoup物件，需要頁面源**資料載入到物件中
soup = beautifulsoup(page_text,
'lxml'
)# 解析出章節的標題和詳情頁的url
li_list = soup.select(
'.book-mulu > ul > li'
)with
open
('./sanguoyanyi.txt'
,'w'
,encoding=
'utf-8'
)as fp:
for li in li_list:
title = li.a.string
detial_url =
''+ li.a[
'href'
]# 對詳情頁url發起請求，解析出詳情頁內容
detial_page_text = requests.get(url=detial_url,headers=headers)
.text
# 解析出詳情頁中相關的章節內容
detial_soup = beautifulsoup(detial_page_text,
'lxml'
)            div_tag = detial_soup.find(
'div'
,class_=
'chapter_content'
)            content = div_tag.text
fp.write(title+
':'+content+
'\n'
)print
(title+
'爬取成功！！'
)

Python爬蟲例項，爬取小說

import pprint import requests from bs4 import beautifulsoup 獲取原始碼 defget source url r requests.get url if r.status code 200 print r.status code 錯誤 rai...

python小爬蟲爬小說（html

先挑個軟柿子捏捏吧，硬的現在還不行。就結合網頁html的各種標籤，爬取已在原始碼內的資訊。就觀察標籤的的特點，利用bs4中的beautifulsoup 進行獲取資訊。如下 import requests from bs4 import beautifulsoup 使用beautifulsoup 解析...

爬蟲之小說爬取

以筆趣閣為例，爬取一念永恆這本具體如下 1 from bs4 import beautifulsoup 2from urllib import request 3import requests 4importre5 import sys6 def down this chapter chapt...

python爬蟲小練習 小說爬取

Python爬蟲例項，爬取小說

python小爬蟲 爬小說（html

爬蟲之小說爬取

相關推薦

python爬蟲小練習小說爬取

python小爬蟲爬小說（html