python3爬蟲入門

pip install requests

2、匯入requests

>>> 
import requests

3、 requests方法

requests.get() #獲取html網頁的主要方法，對應http的get

4、獲取流程

url="" #使用get方法獲取資料，返回包含網頁資料的response響應,超時時間測試 r = requests.get(url,timeout=***)) #http請求的返回狀態， 200表示連線成功 r.status_code #返回物件的文字內容 r.text #返回物件的二進位制形式 r.content #分析返回物件的編碼方式 r.encoding #響應內容編碼方式（備選編碼方式） #丟擲異常 raise_for_status

四、解析和匹配資料

三種方法：beautifulsoup、lxml的xpath、正規表示式

效率比較：

2.1.1 匯入lxml，返回xml結構：

from lxml import etree
html ='''
#省略'''
s = etree.html(html)
print(s.xpath())

2.2.2 xpath的幾個方法

#獲取文字內容 text() #獲取注釋 comment() @xx#獲取其它任何屬性 @href、@src、@value #獲取某個標籤下所有的文字（包括子標籤下的文字），使用string string() #匹配字串前面相等 starts-with #匹配任何位置相等 contains

xpath常用的符號：

2.3 正規表示式

幾個常見的正規表示式如下：

beautiful soup是python的乙個庫，最主要的功能是從網頁抓取資料。

import requests
import bs4
from bs4 import beautifulsoup
r=requests.get("")
soup = beautifulsoup(r.text,'lxml')
soup.title
soup.head.children
soup.find_all('a')  
a=soup.find_all('small',attrs=)
soup.find('small',attrs=).get_text()
soup.find('div',attrs=).get_text()
for i in range(len(a)):
print(a[i].get_text())

python3爬蟲入門

python3 爬蟲入門

Python3爬蟲入門一

python3爬蟲快速入門攻略

python3爬蟲入門

python3 爬蟲入門

Python3爬蟲入門 一

python3爬蟲快速入門攻略

相關推薦

Python3爬蟲入門一