《資料採集與網路爬蟲》之資料解析

#
使用字串查詢的方式提取網頁中所有的城市名
import
requests
url="
"response=requests.get(url)
response.encoding="
gbk"
#該**使用的字元編碼為gbk
html=response.text
'''石家莊市
唐山市
'''temp=html
str_begin="

"str_end="
"
list_city=
while
true:
pos_begin=temp.find(str_begin) #
位置

if pos_begin==-1:
break
pos_end=temp.find(str_end) #
位置
city=temp[pos_begin+len(str_begin):pos_end] #
擷取和之間的字串
加入列表
temp=temp[pos_end+len(str_end):] #
下一次迴圈從後面開始找
#清洗，刪除所有的'轄區'和'轄縣'
list_remove=['
轄區','轄縣'
]for city_remove in
list_remove:
for city in
list_city:
if city==city_remove:
list_city.remove(city)
print
(list_city)
print(len(list_city)) #
362

例1：

#
使用正規表示式查詢的方式提取網頁中所有的城市名
import
requests
import re #
python的正規表示式庫
url="
"response=requests.get(url)
response.encoding="
gbk"
html=response.text
'''石家莊市
唐山市
'''list_city=re.findall("
(.+?)"
,html)
#注意：括號表示要提取這一塊的資料，？表示非貪婪匹配，即匹配盡可能少的。
list_remove=['
轄區','轄縣'
]for city_remove in
list_remove:
for city in
list_city:
if city==city_remove:
list_city.remove(city)
print
(list_city)
print(len(list_city)) #
362

結論：字串查詢的方式比較繁瑣，正規表示式方式相對較簡單。

例2：

#
使用正規表示式查詢的方式提取網頁中所有的二級學院
import
requests
import re #
python的正規表示式庫
#1.得到html響應內容
url="
"response=requests.get(url)
response.encoding="
utf-8
"html=response.text
#2.縮小查詢範圍，只從id="jx"的div裡找
str_begin='
id="jx"
'str_end="
"pos_begin=html.find(str_begin)
temp=html[pos_begin+len(str_begin):]
pos_end=temp.find(str_end)
temp=temp[:pos_end]
'''機械工程學院
'''#
3.正規表示式查詢
list_department=re.findall(r"
(.+?)
", temp)
#注意：\)和\"表示括號和雙引號本身，因為括號和雙引號是正規表示式的特殊字元
print(list_department)

通過網路爬蟲採集大資料

在網際網路時代，網路爬蟲主要是為搜尋引擎提供最全面和最新的資料。在大資料時代，網路爬蟲更是從網際網路上採集資料的有利工具。目前已經知道的各種網路爬蟲工具已經有上百個，網路爬蟲工具基本可以分為 3 類。本節首先對網路爬蟲的原理和工作流程進行簡單介紹，然後對網路爬蟲抓取策略進行討論，最後對典型的網路工具...

python爬蟲之xpath資料採集

使用方式有兩種 1.最基本的lxml解析方式 from lxml import etree doc etree.parse exsample.html 2.另一種 from lxml import html text requests.get url text ht html.fromstring ...

python 爬蟲網路資料採集入門知識

1 正規表示式符號與方法常用符號匹配任意字元,換行符除外匹配前乙個字元0次或無限次匹配前乙個字元0次或1次貪心演算法非貪心演算法括號內的資料作為結果返回 2 正規表示式符號與方法常用方法 findall 匹配所有符合規律的內容,返回包含結果的列表 search 匹配並提取第乙個符合...

《資料採集與網路爬蟲》之資料解析

通過網路爬蟲採集大資料

python爬蟲之xpath資料採集

python 爬蟲 網路資料採集 入門知識

相關推薦

python 爬蟲網路資料採集入門知識