python爬蟲基礎（1）

例：開啟bing搜尋頁面檔案

c:\users\desktop> mkdir xy_web_scraping #在桌面建立乙個新資料夾
ps c:\users\desktop> cd .\xy_web_scraping\ #進入該資料夾
ps c:\users\desktop\xy_web_scraping> python -m venv scraping_vir #在資料夾中建立python虛擬環境
ps c:\users\desktop\xy_web_scraping\scraping_venv\scripts>
./activate#啟用虛擬環境（在這一步之前開啟許可權，啟用後關閉許可權）
(scraping_venv) ps c:\users\desktop\xy_web_scraping\scraping_venv\scripts>
#已進入虛擬環境
(scraping_venv) ps c:\users\desktop\xy_web_scraping> pip install beautifulsoup4#安裝beautifulsoup4
(scraping_venv) ps c:\users\微軟\desktop\xy_web_scraping\scraping_code> python -m idlelib#開啟自帶ide
建立乙個 new file
from urllib.request import urlopen
from bs4 import beautifulsoup#（b與s大寫）
html=urlopen(
"")bso=beautifulsoup(html.read())
#建立乙個beautifulsoupobj
print
(bso.h1)
#在網頁原始檔中找到'h1'內容

如果遇到此類問題報錯，可以用一下方法handle：

#如果報出錯誤

(e)#則將錯誤列印出來

#如果報出錯誤

(e)#則將錯誤列印出來

#如果報出錯誤

return

none

try:

bso=beautifulsoup(html.read())

title=bso.title

except attributeerror as e:

#如果報出錯誤

return

none

return title

url=

""title=get_title(url)

if title ==

none

:print

("there is no title."

)else

(title)

from urllib.request import urlopen
from bs4 import beautifulsoup
html=urlopen(
"")bso=beautifulsoup(html,
"html.parser"
)print
(bso)

from orllib.request import urlopen
from bs4 import beautifulsoup
html=urlopen(
"")bso=beautifulsoup(html,
"html.parser"
)a_list=bso.findall(
"div"
)#使用了findall方法
for item in a_list:
print
(item.get_text())
#使用了get_text方法

from orllib.request import urlopen
from bs4 import beautifulsoup
html=urlopen(
"")bso=beautifulsoup(html,
"html.parser"
)a_list=bso.findall(
"div"
,        
應該時時考慮
層級關係

Python爬蟲基礎 1

url uniform resource locator 統一資源定位符。採用url可以用一種統一的格式來描述各種資訊資源，包括檔案伺服器的位址和目錄等。url的一般格式為帶方括號的為可選項 protocol hostname port path parameters query fragmen...

Python爬蟲基礎1

python版本 3.6 編譯器 pycharm 系統 win 10 1 file new project create 2 右擊建立的專案 new python file 3 開始輸入 import urllib.request response urllib.request urlopen pr...

Python爬蟲 1 基礎

爬蟲 spider 是用指令碼代替瀏覽器請求伺服器獲取伺服器資源的程式。python爬蟲優勢，支援模組很多，有scrapy非常強大的爬蟲框架 1.通用爬蟲 2 功能訪問網頁抓取資料資料儲存資料處理提供檢索服務 3 爬取流程給定一些起始的url，放入待爬取佇列從佇列中獲取url物件，開始...

python爬蟲基礎（1）

Python爬蟲基礎 1

Python爬蟲基礎1

Python爬蟲 1 基礎

相關推薦