多執行緒爬取糗事網python3

使用到了多執行緒這裡使用queue進行資料互動

2.建立爬取頁面的類

3.建立處理資料的類

4.建立呼叫函式

cram_exit = false
parse_exit = false
def main():
pagequeue = queue(20)
for i in range(1, 21):
pagequeue.put(i)
# 採集結果
dataqueue = queue()
filename = open("e://file/qiushi2.json", "a")
# 建立鎖
lock = threading.lock()
# 三個採集執行緒的名字
crawlist = ['執行緒1號', '執行緒2號', '執行緒3號']
threadcrawl = 
# 儲存三個採集執行緒的名字
for threadname in crawlist:
thread = threadcrawl(threadname, pagequeue, dataqueue)
thread.start()
# 三個解析執行緒的名字
parselist = ["解析執行緒1號", "解析執行緒2號", "解析執行緒3號"]
# 儲存三個解析執行緒
threadparse = 
for threadname in parselist:
thread = threadparse(threadname, dataqueue, filename, lock)
thread.start()
while not pagequeue.empty():
pass
global cram_exit
cram_exit = true
print('pagequeue為空')
for thread in threadcrawl:
thread.join()
print('1')
while not dataqueue.empty():
pass
global parse_exit
parse_exit = true
for thread in threadparse:
thread.join()
print('2')
with lock:
# 關閉檔案
filename.close()
print("謝謝使用！")
if __name__ == "__main__":
main()

python3糗事百科爬取

import urllib.request import re 糗事百科爬蟲類 class sqbk def init self self.pageindex 1 self.user agent mozilla 4.0 compatible msie 5.5 windows nt initial h...

python3 網頁爬取框架

程式的結構設計步驟1 提交商品搜尋請求，迴圈獲取頁面步驟2 對於每個頁面，提取商品名稱和資訊步驟3 將資訊輸出到螢幕上步驟4 將資料存入資料庫例項import requests import re from sqlalchemy import create engine def geth...

python3爬蟲之爬取糗事百科段子

coding utf 8 import urllib.request as urllib2 from time import sleep from bs4 import beautifulsoup 宣告變數且賦值迴圈13次靜態的這個url是糗事百科 url user agent是爬蟲與反爬蟲的...

多執行緒爬取糗事網python3

python3糗事百科爬取

python3 網頁爬取 框架

python3爬蟲之爬取糗事百科段子

相關推薦

python3 網頁爬取框架