Python爬取豆瓣讀書標籤程式設計

要爬取的**:

簡單版:

複雜版:

簡單版:

import numpy as np
import csv
import time
def get_one_page(url):
response = requests.get(url)
if response.status_code == 200:
return response.text
return none
def main():
url = ''
html = get_one_page(url)
soup=beautifulsoup(html,'lxml')
for book in soup.select('.subject-item'):
#find_all
#  bookimg=book.find('img')
bookbreif=book.get_text(strip=true)#去除換行,空格
print(bookbreif)
main()

公升級版:

import requests
from bs4 import beautifulsoup
import re
import numpy as np
import csv
import time
def get_one_page(url):
response = requests.get(url)
if response.status_code == 200:
return response.text
return none
def main():
url = ''
html = get_one_page(url)
soup=beautifulsoup(html,'lxml')
for book in soup.select('.subject-item'):
#find_all
for link in book.find_all('a'):
if link.get('title') != none:
# print("《"+link.get_text(strip=true)+"》")
bookurl=book.find('a').get('href')
print(bookurl)
bookpub=book.select('.pub')[0].text.lstrip('\n ').rstrip('\n ')
print(bookpub)
bookfeedback=book.select('.pl')[0].text.lstrip('\n ').rstrip('\n ')
print(bookfeedback)
main()

1,確定範圍

for book in soup.select('.subject-item'):
#尋找class='subject-item'的標籤

2, 獲得該書的書名

for link in book.find_all('a'):
if link.get('title') != none:
print("《"+link.get_text(strip=true)+"》")
#在class=subject-item的標籤下，尋找全部的標籤,然後if篩選,
#篩選標準:title不等於none

為什麼篩選標準是 title不等於none?因為標籤不止乙個，要找出含有書名的的特點，所以找到它(標籤)的特點就是:其title屬性不能為空

3,獲得該書的鏈結位址

bookurl=book.find('a').get('href')

4,獲得該書的出版資訊

bookpub=book.select('.pub')[0].text.lstrip('\n ').rstrip('\n ')

5,獲得該書的使用者評價

bookfeedback=book.select('.pl')[0].text.lstrip('\n ').rstrip('\n ')

(1),如何獲得標籤內的資訊

(1),bookfeedback=book.select('.pl')[0].text.lstrip('\n ').rstrip('\n ')

[0].text 表示獲取標籤內的資訊,轉為text

lstrip('\n ').rstrip('\n ') 表示刪除多餘的空格和換行

(2),find_all()與select()怎麼用

find_all()獲取標籤內資訊用get_text()屬性

select()獲取標籤內資訊用[0].text()

python爬取資料豆瓣讀書

xpath爬取指令碼 from urllib import request from lxml import etree base url response request.urlopen base url html response.read decode utf 8 htmls etree.ht...

python爬取豆瓣影評

看的別人的爬取某部影片的影評沒有模擬登入只能爬6頁 encoding utf 8 import requests from bs4 import beautifulsoup import re import random import io import sys import time 使用se...

爬取豆瓣讀書的書籍（一）

環境準備 python3 pycharm 2018.3.4 x64 google chrome瀏覽器爬取豆瓣讀書書籍的基本步驟 1 在pycharm中匯入urllib模組的request 2 獲取豆瓣讀書網的url資訊和user agent 3 用urlopen開啟並傳送請求 4 用urlret...

Python爬取 豆瓣讀書標籤 程式設計

python爬取資料豆瓣讀書

python爬取豆瓣影評

爬取豆瓣讀書的書籍（一）

相關推薦

Python爬取豆瓣讀書標籤程式設計