最基礎的爬蟲

#1、引入模組
from urllib import request
#2、操作
#(1)定義目標url
base_url = ""
#請求頭部---request headers
headers = 
req = request.request(base_url,headers=headers) #生成乙個帶headers的request物件
#說明：
#a.url
#b.data:(預設空)是伴隨新勢力提交的資料（比如要post的資料），同時http請求將從get方式改成「post」方式。
#c.headers:(預設空)，字典型別，包含了需要傳送到http報頭的鍵值對。
#c.1 user-agent:表示瀏覽器的身份
#歷史：netscape(網景)vs ie，網景就涼涼了，----網景程式設計人員去mozilla(開源了)
#新增更多的header資訊
req.add_header("connection","keep-alive")
#獲取header資訊
print(req.get_header("connection"))

#使用urllib庫，將langlang2017全站網頁請求並儲存
#1、引入模組
from urllib import request
from urllib import error
#2、操作
#（1）建立url
base_url = ""
try:
# （2）請求url
reponse = request.urlopen(base_url,timeout=0.02)
# (3)讀取內容
html = reponse.read()
# （4）轉碼
html = html.decode("utf-8")
# （5）儲存
with open("route.html", "w", encoding="utf-8") as f:
f.write(html)
except error.urlerror as e:
print(e)

爬蟲系列（一）最簡單的爬蟲

首先，什麼是爬蟲？網路蜘蛛 web spider 也叫網路爬蟲 web crawler 1 螞蟻 ant 自動檢索工具 automatic indexer 或者在foaf軟體概念中網路疾走 web scutter 是一種自動化瀏覽網路的程式或者說是一種網路機械人網路爬蟲又被稱為網頁蜘...

最基礎的Hash

type thash node node record state longint next thash end var a,i longint p thash hash array 0.11 of thash hash表 procedure insert k longint 在雜湊表中插入k va...

Python 3 0最簡單的爬蟲

做個小專案練練手，比較有動力繼續下去，這邊參考最簡單的爬蟲程式自己抄了一下。但是因為3.0的關係，無法直接使用，根據2.0版本的進行修改後成功了。如下 coding utf 8 import urllib.request import re 該函式用於獲取html內容使用到urlopen的函式 ...

最基礎的爬蟲

爬蟲系列 （一）最簡單的爬蟲

最基礎的Hash

Python 3 0最簡單的爬蟲

相關推薦

爬蟲系列（一）最簡單的爬蟲