爬取fushuwang小說txt

import re
import requests
from bs4 import beautifulsoup       #網頁解析 獲取資料
5# 增加重連次數
s=requests.session(
)            s.keep_alive =
false
# 關閉多餘連線
#第一頁url不太對所以單獨爬取
url=
''.format
(i)        html=askurl(url)
bs = beautifulsoup(html.text,
"html.parser"
)        data=
for item in bs.find_all(
'div'):
findlink = re.
compile
(r'(.*?)')
link = re.findall(findlink,
str(item))if
len(link)!=0
:                link = link[0]
.replace(
"\u3000",""
)# 去掉簡介中標點符號
link =
"".join(link.split())
#print(link)
else
:continue
#print(txt)
return txt
defwrite_data
(data)
:    txt =
str(data)
.replace(
'[','')
.replace(
']','')
.replace(
"'","")
# data是前面執行出的資料，先將其轉為字串才能寫入
with
open
('存放結果.txt'
,'a'
, encoding=
'utf-8'
)as file_handle:
# .txt可以不自己新建,**會自動新建
file_handle.write(txt)
# 寫入
file_handle.close(
)def
change()
:#將『第幾章』的位置換行
keyword = re.
compile
(r'第[1-9]\d*章'
)str
='\r\n'
with
open
('存放結果.txt'
,'r'
, encoding=
'utf-8')as
file
:        content =
file
.read(
)        posts = re.finditer(keyword, content)
for post in posts:
#print(post.group())
if post.start()!=
-1:                content = content[
:post.start()]
+str
+post.group()+
str+content[post.end():
]file
=open
(r'上.txt'
,'w'
)file
.write(content)
file
.close(
)if __name__==
'__main__'
:    main(
)

參考：

1、askurl中增加重連次數以及關閉連線

python 關於max retries exceeded with url 的錯誤

2、change中在『第幾章』的位置換行

在檔案指定位置插入字串

在a檔案的keyword之後插入字串str

file
=open
(『a』,
'r')
content =
file
.read(
) post = content.find(keyword)
if post !=-1
:     content = content[
:post+
len(keyword)]+
str+content[post+
len(keyword):]
file
=open
(『a』,
'w')
file
.write(content)
file
.close(
)

這其中的content[:post]讀取的是keyword之前的內容，content[post:]讀取的是包括keyword在內的之後的內容。所以要在keyword之後插入str需是用content[:post+len(keyword)]與content[post+len(keyword):]

requests爬取小說

1.url解析 2.傳送請求 3.接收返回 4.進行解析 5.儲存將國風中文網制定頁的的題目作者最近更新章節和時間抓取下來儲存到本地小夥伴們，今天我們用的利劍是requests xpath 第一步匯入模組 import requests from lxml import etree im...

Python爬取小說

感覺這個夠蛋疼的，因為你如果正常寫的話，前幾次執行沒問題，之後你連都沒改，再執行就出錯了。其實這可能是網路請求失敗，或者有反爬蟲的東西吧。但這就會讓你寫的時候非常苦惱，所以這這東西，健壯性及其重要！import requests from bs4 import beautifulsoup impo...

nodejs 爬取小說

前段時間看到有個同學用python爬取了於是打算用nodejs爬取一下在這裡先總結一下整個過程.僅供學習，請勿商業類似jquery的乙個庫 const cheerio require cheerio 檔案管理模組 const fs require fs 控制併發數 const async re...

爬取fushuwang小說txt

requests爬取小說

Python爬取小說

nodejs 爬取小說

相關推薦