[本文出自天外歸雲的]
從64365**獲取全國各地律師**號,用到了python的lxml庫進行對html頁面內容的解析,對於xpath的獲取和正確性校驗,需要在火狐瀏覽器安裝firebug和firepath外掛程式。頁面內容如下(目標是爬「姓名+**」):
**如下:
#這裡對熱門城市進行了爬網,輸入結果如下(儲存到了當前目錄下的「lawyers_info.txt」檔案中):coding:utf-8
from lxml import
etree
import
requests,lxml.html,os
class
myerror(exception):
def__init__
(self, value):
self.value =value
def__str__
(self):
return
repr(self.value)
defget_lawyers_info(url):
r =requests.get(url)
html =lxml.html.fromstring(r.content)
phones = html.xpath('
//span[@class="law-tel"]')
names = html.xpath('
//div[@class="fl"]/p/a')
if(len(phones) ==len(names)):
list(zip(names,phones))
phone_infos = [(names[i].text, phones[i].text_content()) for i in
range(len(names))]
else
: error = "
lawyers amount are not equal to the amount of phone_nums:
"+url
raise
myerror(error)
phone_infos_list =
for phone_info in
phone_infos:
if(phone_info[1] == ""
):
#print phone_info[0],u"沒留**"
info = phone_info[0]+"
: "+u"
沒留**\r\n"#
print phone_info[0],phone_info[1]
else
: info = phone_info[0]+"
: "+phone_info[1]+"
\r\n
info
return
phone_infos_list
defget_pages_num(url):
r =requests.get(url)
html =lxml.html.fromstring(r.content)
result = html.xpath('
//div[@class="u-page"]/a[last()-1]')
pages_num =result[0].text
ifpages_num.isdigit():
return
pages_num
defget_all_lawyers(cities):
dir_path = os.path.abspath(os.path.dirname(__file__
))
dir_path
file_path = os.path.join(dir_path,"
lawyers_info.txt")
file_path
ifos.path.exists(file_path):
os.remove(file_path)
#input()
with open("
lawyers_info.txt
","ab
") as file:
for city in
cities:
#file.write("city:"+city+"\n")
#print city
pages_num = get_pages_num("
"+city+"
/lawyer/page_1.aspx")
ifpages_num:
for i in
range(int(pages_num)):
url = "
"+city+"
/lawyer/page_
"+str(i+1)+"
.aspx
"info =get_lawyers_info(url)
for each in
info:
file.write(each.encode(
"gbk"))
if__name__ == '
__main__':
cities = ['
beijing
','shanghai
','guangdong
','guangzhou
','shenzhen
','wuhan
','hangzhou
','ningbo
','tianjin
','nanjing
','jiangsu
','zhengzhou
','jinan
','changsha
','shenyang
','chengdu
','chongqing
','xian']
get_all_lawyers(cities)
Python爬網獲取全國各地律師電話號
本文出自天外歸雲的 從64365 獲取全國各地律師 號,用到了python的lxml庫進行對html頁面內容的解析,對於xpath的獲取和正確性校驗,需要在火狐瀏覽器安裝firebug和firepath外掛程式。頁面內容如下 目標是爬 姓名 如下 coding utf 8 from lxml imp...
全國各地DNS(電信,移動,聯通,教育網)
202.96.199.133 202.96.0.133 202.106.0.20 202.106.148.1 202.97.16.195 202.96.199.132 202.96.199.133 202.96.209.5 202.96.209.6 202.96.209.133 202.99.96....
全國各地的美女都是怎樣的
全國各地的美女都是怎樣的?2011年05月13日 全國各地的美女都是怎樣的?北京姑娘 華貴 美麗評價 中上 天津姑娘 清麗 美麗評價 中上 河北姑娘 文靜 美麗評價 中 山西姑娘 英武 美麗評價 中上 河南姑娘 勤勞 美麗評價 下 山東姑娘 直爽 美麗評價 中 內蒙姑娘 豪邁 美麗評價 下 遼寧姑娘...