爬蟲感悟3

from bs4 import beautifulsoup
import requests
headers =
url_path = ''
url = url_path + content + '/'
wb_data = requests.get(url,headers=headers)
soup = beautifulsoup(wb_data.text,'html.parser')
imgs = soup.select('a > img')
list = 
for img in imgs:
photo = img.get('src')
path = 'c:\\users\jerry\desktop\photo'
i = 1
for item in list:
if item==none:
pass
elif '?' in item:
data = requests.get(item,headers=headers)
fp = open(path + content + str(i) + '.jpeg','wb')
fp.write(data.content)
fp.close
i = i + 1
else:
data = requests.get(item, headers=headers)
fp = open(path + item[-10:], 'wb')
fp.write(data.content)
fp.close()

path = 'c:\users\desktop\photo'

在**中寫這個得到的結果總是：

syntaxerror: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \u******xx escape

上網查了別人的回到會明白了合適 \ u 的轉義的問題，所以找不到這個路徑，只需要在前面加上 \

就可以了，改寫為：path = 'c:\\users\desktop\photo'

這個網頁的靜態html只會顯示數張，後面的使用上了動態js,就需要轉換思路。

python爬蟲感悟 Python之爬蟲有感（一）

urllib.request.request url headers headers user agent 是爬蟲和反爬蟲鬥爭的第一步，傳送請求必須帶user agent 使用流程 1 建立請求物件 request urlllib.request.request url 2 傳送請求獲取響應物件 r...

爬蟲總結3

div id xx last a 2 href id是xx的div的父一級標籤下的所有標籤中最後乙個標籤下的第二個a標籤的名為href屬性的值 html a text text html下文字內容是的所有a標籤下的當前標籤就還是那個a標籤的文字內容from lxml import etree ...

爬蟲基礎 3

入門小練習附註 moocpython網路爬蟲與資訊提取 coding utf 8 import requests from bs4 import beautifulsoup def gethtmltext url try req requests.get url req.raise for sta...

爬蟲感悟3

python爬蟲感悟 Python之爬蟲有感（一）

爬蟲總結3

爬蟲基礎 3

相關推薦