實現簡單乙個簡單的python爬蟲程式

爬蟲又稱網路蜘蛛，網頁機械人，是一種按照一定的規則，自動地抓取全球資訊網資訊的程式或者指令碼。

http協議（超文字傳輸協議）

https協議（超文字傳輸協議安全）

使用（基本流程）：

實現網頁採集

# coding=utf-8
# author : 安城 ance
# requests網頁資料採集
# 時間 : 2021/1/18 22:54
import requests
# 準備url
url =
""# 準備相應資訊  用來模擬瀏覽器請求
headers =
# 鍵盤接收搜尋內容
content =
input
('搜尋內容'
)param =
# request  get請求 且攜帶封裝引數和相應頭部資訊
response = requests.get(url=url, params=param, headers=headers)
# 獲取響應資料的文字
page_text = response.text
# 此處列印一下看是否爬取到資料
print
(page_text)
# 將對應文字寫入html檔案
filename = content +
'.html'
with
open
(filename,
'w', encoding=
'utf-8'
)as fp:
fp.write(page_text)
print
(filename,
'儲存成功'
)

爬取成功後將在當前工程目錄下生成相應搜尋詞的html檔案

Python實現的乙個簡單LRU cache

起因我的同事需要乙個固定大小的cache，如果記錄在cache中，直接從cache中讀取，否則從資料庫中讀取。python的dict 是乙個非常簡單的cache,但是由於資料量很大，記憶體很可能增長的過大，因此需要限定記錄數，並用lru演算法丟棄舊記錄。key 是整型，value是10kb左右的p...

Python 實現乙個簡單的多執行緒

import threading def main str print str def create thread num,args threads for i in range num try t threading.thread target main,args args t.start exc...

Python 基於Redis實現乙個簡單的分布式鎖

redis lock.py import redis import time import threading 連線池方式 pool redis.connectionpool host 127.0.0.1 port 6379 redis con redis.redis connection pool...

實現簡單乙個簡單的python爬蟲程式

Python實現的乙個簡單LRU cache

Python 實現乙個簡單的多執行緒

Python 基於Redis實現乙個簡單的分布式鎖

相關推薦