import requests
from bs4 import beautifulsoup
if __name__ ==
'__main__'
: headers =
# 對首頁的頁面資料進行爬取
url =
''page_text = requests.get(url=url,headers=headers)
.text
# 在首頁中解析出章節的標題和詳情頁的url
# 1.例項化beautifulsoup物件,需要頁面源**資料載入到物件中
soup = beautifulsoup(page_text,
'lxml'
)# 解析出章節的標題和詳情頁的url
li_list = soup.select(
'.book-mulu > ul > li'
)with
open
('./sanguoyanyi.txt'
,'w'
,encoding=
'utf-8'
)as fp:
for li in li_list:
title = li.a.string
detial_url =
''+ li.a[
'href'
]# 對詳情頁url發起請求,解析出詳情頁內容
detial_page_text = requests.get(url=detial_url,headers=headers)
.text
# 解析出詳情頁中相關的章節內容
detial_soup = beautifulsoup(detial_page_text,
'lxml'
) div_tag = detial_soup.find(
'div'
,class_=
'chapter_content'
) content = div_tag.text
fp.write(title+
':'+content+
'\n'
)print
(title+
'爬取成功!!'
)
Python爬蟲例項,爬取小說
import pprint import requests from bs4 import beautifulsoup 獲取原始碼 defget source url r requests.get url if r.status code 200 print r.status code 錯誤 rai...
python小爬蟲 爬小說(html
先挑個軟柿子捏捏吧,硬的現在還不行。就結合網頁html的各種標籤,爬取已在原始碼內的資訊。就觀察標籤的的特點,利用bs4中的beautifulsoup 進行獲取資訊。如下 import requests from bs4 import beautifulsoup 使用beautifulsoup 解析...
爬蟲之小說爬取
以筆趣閣 為例,爬取一念永恆這本 具體 如下 1 from bs4 import beautifulsoup 2from urllib import request 3import requests 4importre5 import sys6 def down this chapter chapt...