基於Python的爬蟲實戰

方法

一、使用bs4包

1.獲取酷狗**內容

#coding=utf-8

import requests,urllib

from bs4 import beautifulsoup

import os

result=urllib.request.urlopen("")

2.根據html結構獲取目標標籤內容

soup=beautifulsoup(result.read(),'html.parser')

for i in soup.find_all("div"):

if i.get("id")=="songtabcontent":

s=i.find_all("li")

方法二、使用scrapy框架

1）建立目錄

scrapy startproject test

（2）cd test下執行

scrapy genspider newsong www.kugou.com

（3）setting.py檔案中下面三行去掉注釋

item_pipelines =

（4）編寫items.py檔案

import scrapy

（5）newsong.py檔案

import scrapy

from groad.items import groaditem

class newsongspider(scrapy.spider):

name = 'newsong'

allowed_domains = ['www.kugou.com']

start_urls = ['/']

def parse(self, response):

item=groaditem()

for i in range(1,len(response.xpath('//*[@id="songtabcontent"]/ul'))+1):

for j in range(1,len(response.xpath('//*[@id="songtabcontent"]/ul[%s]/li' % i))+1):

item['songname']=response.xpath('//*[@id="songtabcontent"]/ul[%s]/li[%s]/a/span[1]/text()' % (i,j)).extract()[0]

item['songtime'] = \

response.xpath('//*[@id="songtabcontent"]/ul[%s]/li[%s]/a/span[@class="songtime"]/text()' % (i, j)).extract()[0]

item['href_song'] = \

response.xpath('//*[@id="songtabcontent"]/ul[%s]/li[%s]/a/@href' % (i, j)).extract()[0]

yield item

（6）pipelines.py檔案，儲存item資料

import json

class groadpipeline(object):

def __init__(self):

self.filename = open("e://downloads", "w",encoding="utf-8")

def process_item(self, item, spider):

text = json.dumps(dict(item),ensure_ascii=false)+'\n'

self.filename.write(text)

return item

def close_spider(self, spider):

self.filename.close()

（7）執行

scrapy crawl newsong

python爬蟲實戰

python python基礎 python快速教程 python學習路線圖 python大資料學習之路 python爬蟲實戰 python pandas技巧系量化小講堂 python機器學習入門資料梳理學習群大資料 python資料探勘2 323876621 r r語言知識體系怎樣學習r ...

基於python的爬蟲

本次初學，參考的資料見功能主要是抓取韓寒的部落格內容，以及儲存到 hanhan的資料夾中,執行環境實在linux下的。見具體如何 usr bin env python coding utf 8 import urllib import time url 60 con urllib.urlop...

Python爬蟲實戰（二）

實驗介紹本實驗通過使用beautifulsoup方法對網頁進行簡單的爬取工作,並對beatifulsoup方法進行簡單的介紹。beautifulsoup開發手冊示例網頁如下實驗內容從本地網頁爬取商品資訊，商品名，評分等級等相關資訊實驗 from bs4 import beautifulso...

基於Python的爬蟲實戰

python爬蟲實戰

基於python的爬蟲

Python爬蟲實戰（二）

相關推薦