爬蟲二 Python爬蟲入門二

1.認識爬蟲

1.1、什麼是爬蟲

爬蟲：一段自動抓取網際網路資訊的程式，從網際網路上抓取對於我們有價值的資訊。

1.2、python的爬蟲架構

網頁解析器：將乙個網頁字串進行解析，可以按照我們的要求來提取出我們有用的資訊，也可以根據dom樹的解析方式來解析。網頁解析器有正規表示式（直觀，將網頁轉成字串通過模糊匹配的方式來提取有價值的資訊，當文件比較複雜的時候，該方法提取資料的時候就會非常的困難）、html.parser（python自帶的）、beautifulsoup（第三方外掛程式，可以使用python自帶的html.parser進行解析，也可以使用lxml進行解析，相對於其他幾種來說要強大一些）、lxml（第三方外掛程式，可以解析 xml 和 html），html.parser 和 beautifulsoup 以及 lxml 都是以 dom 樹的方式進行解析的。

應用程式：就是從網頁中提取的有用資料組成的乙個應用。

2.訓練爬蟲

3.開始爬蟲

需要匯入或者安裝幾個python庫，如下：

import requests
import csv
from requests.exceptions import requestexception
from bs4 import beautifulsoup

流程：

1.我們需要獲取手機資料，首先獲取所有手機的網頁原始碼，以及url位址如下：

2.下面是完整**：

# _* coding: utf-8 _*_
# _author_:zeng
#2020/8/24_15:16
import requests
import csv
from requests.exceptions import requestexception
from bs4 import beautifulsoup
# 使用requests庫中的.get方法傳送乙個請求至京東電腦頁面，頭部修改為瀏覽器訪問，否則預設為python
def download(url, headers, num =3):
#列印**
response = requests.get(url,headers=headers)
print(requests.status_codes)
try:
if response.status_code == 200:
return response.content
return none
except requestexception as rr:
print(rr.response)
html = ""
if hasattr(rr.response, 'status_code'):
code = rr.response.status_code
print('error code', code)
if num > 0 and 500 <= code < 600:
html = download(url, headers, num - 1)
else:
code = none
return  html
#定義查詢手機的方法
def find_iphone(url, headers):
#呼叫download函式，成功則返回網頁原始碼賦值給r的變數
r = download(url, headers=headers)
#使用bs4 方法生成page物件
page = beautifulsoup(r,"lxml")
#通過find_all方法找到所有關於電腦的原始碼賦值給的all_items
all_items = page.find_all('li', attrs=)
#把獲取到的內容寫入iphone.csv
with open("d:\pycharm\iphone.csv", 'w', newline='',encoding='utf-8') as f:
write = csv.writer(f)
fields = ('id', '**', '名稱' )
write.writerow(fields)
#對所有京東頁面的手機原始碼進行迴圈
for all in all_items:
#獲取手機id
iphone_id = all["data-sku"]
print(f"手機id為：")
#獲取手機**
iphone_price = all.find('div', attrs=).find('i').text
print(f"手機的**為：元")
#獲取手機名字
iphone_name = all.find('div', attrs=).find('em').text
print(f"手機的名稱為：")
#定義乙個集合
row = 
#把手機的id,**，名字,新增到集合
#寫入集合
write.writerow(row)
#關閉i/o流
f.close()
def main():
#設定頭部檔案
headers = 
#京東搜尋手機的url位址
url = ""
#呼叫find_iphone方法
find_iphone(url,headers=headers)
if __name__ == '__main__':
main()

3.最後的輸出結果

4.到此就大工告成！！！新手入門，大佬勿噴。覺得有幫助的，歡迎點讚。

爬蟲二 Python爬蟲入門二

Python爬蟲入門二

爬蟲入門（二）

python網路爬蟲入門（二）

爬蟲二 Python爬蟲入門二

Python爬蟲入門 二

爬蟲入門（二）

python網路爬蟲入門（二）

相關推薦

Python爬蟲入門二