from bs4 import beautifulsoup
import requests
headers =
url_path = ''
url = url_path + content + '/'
wb_data = requests.get(url,headers=headers)
soup = beautifulsoup(wb_data.text,'html.parser')
imgs = soup.select('a > img')
list =
for img in imgs:
photo = img.get('src')
path = 'c:\\users\jerry\desktop\photo'
i = 1
for item in list:
if item==none:
pass
elif '?' in item:
data = requests.get(item,headers=headers)
fp = open(path + content + str(i) + '.jpeg','wb')
fp.write(data.content)
fp.close
i = i + 1
else:
data = requests.get(item, headers=headers)
fp = open(path + item[-10:], 'wb')
fp.write(data.content)
fp.close()
path = 'c:\users\desktop\photo'
在**中寫這個得到的結果總是:
syntaxerror: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \u******xx escape
上網查了別人的回到會明白了合適 \ u 的轉義的問題,所以找不到這個路徑,只需要在前面加上 \
就可以了,改寫為:path = 'c:\\users\desktop\photo'
這個網頁的靜態html只會顯示數張,後面的使用上了動態js,就需要轉換思路。
python爬蟲感悟 Python之爬蟲有感(一)
urllib.request.request url headers headers user agent 是爬蟲和反爬蟲鬥爭的第一步,傳送請求必須帶user agent 使用流程 1 建立請求物件 request urlllib.request.request url 2 傳送請求獲取響應物件 r...
爬蟲總結3
div id xx last a 2 href id是xx的div的父一級標籤下的所有標籤中最後乙個標籤下的第二個a標籤的名為href屬性的值 html a text text html下文字內容是 的所有a標籤下的當前標籤 就還是那個a標籤 的文字內容from lxml import etree ...
爬蟲基礎 3
入門小練習 附註 moocpython網路爬蟲與資訊提取 coding utf 8 import requests from bs4 import beautifulsoup def gethtmltext url try req requests.get url req.raise for sta...