學習python寫網路爬蟲（一）

#最簡單的爬蟲
import urllib2
defdownload
(url):
return urllib2.urlopen(url).read()
print download('')

#更加健壯的版本，可以捕獲異常了
import urllib2
defdownload
(url):
print
'downloading:',url
try:
html = urllib2.urlopen(url).read()
except urllib2.urlerror as e:
print
'download error:',e.reason
html = none
return html
print download('')

import urllib2

defdownload

(url,num_retries = 2):

'downloading:',url

try:

html = urllib2.urlopen(url).read()

except urllib2.urlerror as e:

'download error:',e.reason

html = none

if num_retries > 0:

if hasattr(e,'code') and

500<= e.code < 600:

return download(url,num_retries-1)

return html

print download('')

#與之前寫的**的對比就是加了**之後，爬我的csdn部落格時可以

#爬下來了，而不加**的時候，則無法爬取

import urllib2

defdownload

(url, user_agent = 'wswp', num_retries = 2):

'downloading:',url

headers =

request = urllib2.request(url,headers=headers)

try:

html = urllib2.urlopen(request).read()

except urllib2.urlerror as e:

'download error:',e.reason

html = none

if num_retries > 0:

if hasattr(e,'code') and

500<= e.code < 600:

return download(url,num_retries-1)

return html

print download('')

python寫網路爬蟲

注本文旨在練習正規表示式的簡單使用方法 usr bin evn python coding cp936 def gethtml url 定義gethtml 函式，用來獲取頁面源 page urllib.urlopen url urlopen 根據url來獲取頁面源 html page.read 從...

Python網路爬蟲學習scrapy 一

總結乙個今天的學習過程 1，上午繼續嘗試昨天的問題客戶端putty登陸遠端windows不能輸入命令的問題可以說是坎坷的不行原因是安裝freesshd軟體時，最後乙個是否選擇問題是否以系統服務的方式啟動，這裡選擇否，千萬不要選擇是，這樣就可以在登陸後在輸入框中輸入命令了這事王師兄替我嘗試安...

用python寫爬蟲（一）初識爬蟲

爬蟲又被稱之為網路蜘蛛網路機械人等，簡單來說就是模擬客戶端傳送網路請求，接收請求響應，按照一定的規則自動的抓取網際網路資訊的程式。1.從個人角度來說，爬蟲可以做我們的生活助手。2.從商業角度來說，爬蟲能實現巨大的商業價值。網路爬蟲根據系統結構和開發技術大致可以分為四種型別通用網路爬蟲聚焦網路爬...

學習python寫網路爬蟲（一）

python寫網路爬蟲

Python網路爬蟲學習scrapy 一

用python寫爬蟲（一）初識爬蟲

相關推薦