Python爬網獲取全國各地律師電話號

[本文出自天外歸雲的]

從64365**獲取全國各地律師**號，用到了python的lxml庫進行對html頁面內容的解析，對於xpath的獲取和正確性校驗，需要在火狐瀏覽器安裝firebug和firepath外掛程式。頁面內容如下（目標是爬「姓名+**」）：

**如下：

#
coding:utf-8
from lxml import
etree
import
requests,lxml.html,os
class
myerror(exception):
def__init__
(self, value):
self.value =value
def__str__
(self):
return
repr(self.value)
defget_lawyers_info(url):
r =requests.get(url)
html =lxml.html.fromstring(r.content)
phones = html.xpath('
//span[@class="law-tel"]')
names = html.xpath('
//div[@class="fl"]/p/a')
if(len(phones) ==len(names)):
list(zip(names,phones))
phone_infos = [(names[i].text, phones[i].text_content()) for i in
range(len(names))]
else
:        error = "
lawyers amount are not equal to the amount of phone_nums: 
"+url
raise
myerror(error)
phone_infos_list =
for phone_info in
phone_infos:
if(phone_info[1] == ""
):            
#print phone_info[0],u"沒留**"
info = phone_info[0]+"
: "+u"
沒留**\r\n"#
print phone_info[0],phone_info[1]
else
:            info = phone_info[0]+"
: "+phone_info[1]+"
\r\n
"print
info
return
phone_infos_list
defget_pages_num(url):
r =requests.get(url)
html =lxml.html.fromstring(r.content)
result = html.xpath('
//div[@class="u-page"]/a[last()-1]')
pages_num =result[0].text
ifpages_num.isdigit():
return
pages_num
defget_all_lawyers(cities):
dir_path = os.path.abspath(os.path.dirname(__file__
))    
print
dir_path
file_path = os.path.join(dir_path,"
lawyers_info.txt")
print
file_path
ifos.path.exists(file_path):
os.remove(file_path)
#input()
with open("
lawyers_info.txt
","ab
") as file:
for city in
cities:
#file.write("city:"+city+"\n")
#print city
pages_num = get_pages_num("
"+city+"
/lawyer/page_1.aspx")
ifpages_num:
for i in
range(int(pages_num)):
url = "
"+city+"
/lawyer/page_
"+str(i+1)+"
.aspx
"info =get_lawyers_info(url)
for each in
info:
file.write(each.encode(
"gbk"))
if__name__ == '
__main__':
cities = ['
beijing
','shanghai
','guangdong
','guangzhou
','shenzhen
','wuhan
','hangzhou
','ningbo
','tianjin
','nanjing
','jiangsu
','zhengzhou
','jinan
','changsha
','shenyang
','chengdu
','chongqing
','xian']
get_all_lawyers(cities)

這裡對熱門城市進行了爬網，輸入結果如下（儲存到了當前目錄下的「lawyers_info.txt」檔案中）：

Python爬網獲取全國各地律師電話號

本文出自天外歸雲的從64365 獲取全國各地律師號，用到了python的lxml庫進行對html頁面內容的解析，對於xpath的獲取和正確性校驗，需要在火狐瀏覽器安裝firebug和firepath外掛程式。頁面內容如下目標是爬姓名如下 coding utf 8 from lxml imp...

全國各地DNS（電信，移動，聯通，教育網）

202.96.199.133 202.96.0.133 202.106.0.20 202.106.148.1 202.97.16.195 202.96.199.132 202.96.199.133 202.96.209.5 202.96.209.6 202.96.209.133 202.99.96....

全國各地的美女都是怎樣的

全國各地的美女都是怎樣的?2011年05月13日全國各地的美女都是怎樣的？北京姑娘華貴美麗評價中上天津姑娘清麗美麗評價中上河北姑娘文靜美麗評價中山西姑娘英武美麗評價中上河南姑娘勤勞美麗評價下山東姑娘直爽美麗評價中內蒙姑娘豪邁美麗評價下遼寧姑娘...

Python爬網獲取全國各地律師電話號

Python爬網獲取全國各地律師電話號

全國各地DNS（電信，移動，聯通，教育網）

全國各地的美女都是怎樣的

相關推薦