爬網易雲歌單

學習爬蟲嘛，就是批量獲取目標**上內容。首先需要知道目標**的url，尤其是需要獲取目標**裡面子鏈結中的內容時，需要先批量獲取所有子鏈結的url。

其次是從大量的資訊中提取並整理自己想要的資訊。

是不是很簡單~~

一般用beautiful soup 庫，專門用來提取網頁的資料，用作爬蟲很好用。

beautiful soup簡介

官方文件

也可以直接用正規表示式提取。

from bs4 import beautifulsoup  #beautiful soup是python的乙個庫，最主要的功能是從網頁抓取資料
import requests
import time
headers =
#好像還不能輕易更改為其他的 headers
for i in
range(0
,1330,35
):print
(i) 
time.
sleep(2
)  #休息2秒，防止被識別為機器
url =
'歐美&order=hot&limit=35&offset='
+str
(i)    response = requests.
get(url=url, headers=headers)
html = response.text   #將網頁原始碼轉換為 文字
soup =
beautifulsoup
(html,
'html.parser'
) #將文字格式轉為與原始碼格式一致，
# 獲取包含歌單詳情頁**的標籤
ids = soup.
select
('.dec a'
)  #select是 css選擇器，可以直接識別子標籤， .dec為父標籤 a為子標籤
# 獲取包含歌單索引頁資訊的標籤
lis = soup.
select
('#m-pl-container li'
)print
(len
(lis)
)for j in
range
(len
(lis)):
# 獲取歌單詳情頁位址
url = ids[j]
['href'
]        # 獲取歌單標題
title = ids[j]
['title'
]        play = lis[j]
.select
('.nb')[
0].get_text()
# 獲取歌單貢獻者名字
user = lis[j]
.select
('p')[
1].select
('a')[
0].get_text()
# 輸出歌單索引頁資訊
print
(url, title, play, user)
# 將資訊寫入csv檔案中
with
open
('playlist.csv'
,'a+'
, encoding=
'utf-8-sig'
)as f:
f.write
(url +
','+ title +
','+ play +
','+ user +
'\n'
)

from bs4 import beautifulsoup
import pandas as pd
import requests
import time
df = pd.
read_csv
('playlist.csv'
, header=none, error_bad_lines=false, names=
['url'
,'title'
,'play'
,'user'])
headers =
for i in df[
'url']:
time.
sleep(2
)    url =
''+ i    response = requests.
get(url=url, headers=headers)
html = response.text
soup =
beautifulsoup
(html,
'html.parser'
)    # 獲取歌單標題
title = soup.
select
('h2')[
0].get_text()
.replace
(','
,'，'
)    # 獲取標籤
tags =
tags_message = soup.
select
('.u-tag i'
)for p in tags_message:
tags.
(p.get_text()
)    # 對標籤進行格式化
iflen(tags)
>1:
tag =
'-'.
join
(tags)
else
:        tag = tags[0]
# 獲取歌單介紹
if soup.
select
('#album-desc-more'):
text = soup.
select
('#album-desc-more')[
0].get_text()
.replace(''
,'').
replace
(','
,'，'
)else
:        text =
'無'    # 獲取歌單收藏量
collection = soup.
select
('#content-operation i')[
1].get_text()
.replace
('(',''
).replace
(')',''
)    play = soup.
select
('.s-fc6')[
0].get_text()
# 歌單內歌曲數
songs = soup.
select
('#playlist-track-count')[
0].get_text()
comments = soup.
select
('#cnt_comment_count')[
0].get_text()
# 輸出歌單詳情頁資訊
print
(title, tag, text, collection, play, songs, comments)
# 將詳情頁資訊寫入csv檔案中
with
open
('music_message.csv'
,'a+'
, encoding=
'utf-8-sig'
)as f:
f.write
(title +
','+ tag +
','+ text +
','+ collection +
','+ play +
','+ songs +
','+ comments +'')
# 獲取歌單內歌曲名稱
li = soup.
select
('.f-hide li a'
)for j in li:
with
open
('music_name.csv'
,'a+'
, encoding=
'utf-8-sig'
)as f:
f.write
(j.get_text()
+'')

爬取網易雲歌單

偶爾在微博上看到，要是歌單裡誰的歌超過30首，那肯定是真愛吧。我看了連忙開啟網易雲我的歌單，結果1000多首歌。這讓我自己數得數到猴年馬月呀.於是萌生出了寫一段小爬蟲來統計的想法。剛開始想直接解析網頁元素，後發現很麻煩，很多資訊不能一次抓取到，於是找到網頁請求的介面，結果介面有加密引數，看了一下j...

爬取網易雲歌單標籤

import reimport urllib.request import urllib.error import urllib.parse import jieba defget all hotsong url headers request urllib.request.request url ...

反爬蟲爬取網易雲歌單

一主題式網路爬蟲設計方案 1.主題式網路爬蟲名稱爬取網易雲歌單 2.主題式網路爬蟲爬取的內容與資料特徵分析 3.主題式網路爬蟲設計方案概述包括實現思路與技術難點實現思路使用單執行緒爬取，初始化資訊，設定請求頭部資訊，獲取網頁資源，使用etree進行網頁解析，爬取多頁時重新整理offset...

爬網易雲歌單

爬取網易雲歌單

爬取網易雲歌單標籤

反爬蟲爬取網易雲歌單

相關推薦