python爬蟲學習

例如列印當前時間
from datetime import datetime
print(datetime.now())
或者import datetime
print(datetime.datetime.now())

********html結構********
html → 
...head>
...body>
html>
— head → 
a useful page
head>
— title → a useful pagetitle>
— body → 
an int...h1>
lorem ip...div>
body>
— h1 → an interesting titleh1>
— div → lorem ipsum dolor...div>
我們從網頁中提取的  標籤被嵌在 beautifulsoup 物件 bsobj 結構的第二層（html → body → h1）。但是，當我們從物件裡提取 h1 標籤的時候，可以直接呼叫它：bsobj.h1
其實，下面的所有函式呼叫都可以產生同樣的結果：
except httperror as e:  #網頁在伺服器上不存在（或者獲取頁面的時候出現錯誤）
return
none
#伺服器不存在的情況，丟擲 attributeerro，html 也是乙個 none 物件
try:
bsobj = beautifulsoup(html.read(),"html.parser")
title = bsobj.body.h1
except attributeerror as e:     #網頁上沒有該項內容
return
none
return title
title = gettitle("")
if title == none:
print("title could not be found")   
else:
print(title)

for

name

in namelist:

print(name.get_text())

#之前，我們呼叫 bsobj.tagname 只能獲取頁面中的第乙個指定的標籤。現在，我們呼叫 bsobj.findall(tagname, tagattributes) 可以獲取頁面中所有指定的標籤，不再只是第乙個了

正規表示式

當我們動手開始寫正規表示式的時候，最好先寫乙個步驟列表描述出你的目標字串結構。

還要注意一些細節的處理。比如，當你識別**號碼的時候，會考慮國家**和分機號嗎？

Python爬蟲學習

最近由於需要，用python寫了爬蟲爬取資料。在這個過程中，認識到學習一門語言最好的辦法是動手，別無技巧。在動手程式設計的過程中，遇到了很多意想不到的問題，當然也學習了很多書本上不會講述的知識，感覺這才是真正的學習知識。在這個過程中，遇到的乙個問題讓我花費了很久時間，留下了很深的印象。擔心會隨著時...

Python學習爬蟲

在搜尋python知識的時候一直看到爬蟲相關知識，感覺挺好玩的，打算簡單了解一下。1 找到伺服器主機，向伺服器發出乙個請求，伺服器經過解析之後，傳送給使用者的瀏覽器 html js css 等檔案，瀏覽器解析出來，使用者便可以看到形形色色的了。因此，使用者看到的網頁實質是由 html 構成的，爬蟲爬...

python爬蟲學習

一 beautiful soup庫 1 引用 from bs4 import beautifulsoup from bs4 import beautifulsoup soup beautifulsoup data html.parser 2 beautifulsoup類的基本元素 tag 標籤，最基...

python爬蟲學習

Python爬蟲學習

Python學習 爬蟲

python爬蟲學習

相關推薦

Python學習爬蟲