看到論壇上有人發,自己跟著敲了遍**,有些地方進行了改動,學習了。
# -*- coding: utf-8 -*-
# @time : 2020/6/17 18:24
# @author : banshaohuan
# @site :
# @file : bizhi.py
# @software: pycharm
import requests
from bs4 import beautifulsoup
import os
import time
import random
from fake_useragent import useragent
index = ""
interval = 0.1
first_dir = "d:/彼岸桌面爬蟲"
# 存放**分類子頁面的資訊
classification_dict = {}
# 得到乙個隨機的header
def get_headers():
# 設定headers
ua = useragent()
headers =
return headers
# 獲取頁面篩選後的內容列表
# 獲取頁面篩選後的內容列表
def screen(url, select):
headers = get_headers()
html = requests.get(url=url, headers=headers)
html = html.text
soup = beautifulsoup(html, "lxml")
return soup.select(select)
# 將分類子頁面資訊存放在字典中
def init_classification():
url = index
select = "#header > div.head > ul > li:nth-child(1) > div > a"
classifications = screen(url, select)
for c in classifications:
href = c.get("href")
text = c.string
if text == "4k桌布": # 4k桌布需要許可權,無法爬取,只能跳過
continue
second_dir = f"/"
url = index + href
global classification_dict
classification_dict[text] =
# 獲取頁碼
# 定位到 1920 1080 解析度
def handle_images(links, path):
for link in links:
href = link.get("href")
# 過濾廣告
if href == "":
continue
# 第一次跳轉
print(f":無此,爬取失敗")
continue
href = link[0].get("href")
# 第二次跳轉
url = index + href
# 找到要爬取的
select = "div#main table a img"
link = screen(url, select)
if link == :
print(f":該需要登入才能爬取,爬取失敗")
continue
# 這裡去掉alt中所有的符號,只保留名字
# ui互動頁面
def ui():
print("-----------netbian----------")
print("全部", end=" ")
for c in classification_dict.keys():
print(c, end=" ")
print()
choice = input("請輸入分類名:")
if choice == "全部":
for c in classification_dict.keys():
select_classification(c)
elif choice not in classification_dict.keys():
print("輸入錯誤,請重新輸入!")
print("----")
ui()
else:
select_classification(choice)
def main():
if not os.path.exists(first_dir):
os.mkdir(first_dir)
init_classification()
ui()
if __name__ == "__main__":
main()
參考:
python爬取彼岸桌面桌布
1.目標站點分析 進入 經過f12分析,url都儲存在 2.選擇爬取工具,這裡網頁比較簡單,就採用requests庫和正則.import requests import osimport reimport time 主頁 main urls headers ifnot os.path.exists ...
scrapy 爬取桌布
scrapy startproject bizhi scrapy genspider bizhispider www.netbian.com 要爬取的桌布 網域名稱www.netbian.com 新建python檔案run.py from scrapy import cmdline cmdline....
python爬取必應的桌布
閒著沒事。想找點桌布,於是用python寫個爬蟲來爬個桌布吧。1 收先安裝python環境 2.安裝所需要的三方庫 win下 pip install requests pip install beautifulsoup4 import requests import urllib.request i...