python3爬蟲學習筆記

爬蟲爬取京東某手機頁面

beautifulsoup

原文記錄內容太多現進行摘錄和分類

pip3 install jieba

kou@ubuntu:~/python$ cat clahamlet.py 
#!/usr/bin/env python
# coding=utf-8
#e10.1calhamlet.py
def gettext():
txt = open("hamlet.txt", "r").read()
txt = txt.lower()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_『~':
txt = txt.replace(ch, " ")   #將文字中特殊字元替換為空格
return txt
hamlettxt = gettext()
words  = hamlettxt.split()
counts = {}
for word in words:			
counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=true) 
for i in range(10):
word, count = items[i]
print ("".format(word, count))

#!/usr/bin/env python
# coding=utf-8
#e10.1calhamlet.py
def gettext():
txt = open("hamlet.txt", "r").read()
txt = txt.lower()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_『~':
txt = txt.replace(ch, " ")   #將文字中特殊字元替換為空格
return txt
hamlettxt = gettext()
words  = hamlettxt.split()
counts = {}
for word in words:			
counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=true) 
for i in range(10):
word, count = items[i]
print ("".format(word, count))

學習資源是中國大學mooc的爬蟲課程。《嵩天老師》

下面寫幾個簡單的**！熟悉這幾個**的書寫以後基本可以完成需求！

r.raise_for_status()//如果不是200就會報錯

print(r.text[:1000])//只有前1000行

python3爬蟲筆記

請求並提取資料的自動化程式發起請求獲取響應內容解析文字內容儲存資料 1.瀏覽器傳送資訊給該所在的伺服器，這個過程叫做http request。2.服務收到瀏覽器傳送的訊息後，能夠根據瀏覽器傳送訊息的內容，做相應的處理，然後把訊息回傳給瀏覽器。這個過程叫做http response。...

Python3爬蟲學習筆記0 0 綜述

歡迎捧場，博主前一段時間零零碎碎地學習了一些python爬蟲的知識，現在把一些學習內容整理出來，水平有限，難免出現錯誤，希望大家能夠批評指正，謝謝。該系列部落格的內容參照崔慶才靜覓的部落格內容完成，謝謝授權。python 版本 3.5 預備知識 python基礎，http協議，正規表示式，url...

Python3爬蟲學習筆記1 0 什麼是爬蟲？

我們來思考乙個問題，什麼是爬蟲？說一下我對爬蟲的理解。理解爬蟲之前，我們思考一下網路是一種什麼樣式的存在。爬蟲就是游離在這些網路之間的乙個自動化程式，並且能夠完成對網路地瀏覽，自動採集網路中所有訪問到的內容從而在網路中得到你需要的資訊。網路蜘蛛爬蟲維基百科順便說一句 google是世界上最大...

python3爬蟲學習筆記

python3爬蟲筆記

Python3爬蟲學習筆記0 0 綜述

Python3爬蟲學習筆記1 0 什麼是爬蟲？

相關推薦