xpath 語法運用例項 爬取boos

2021-08-19 19:13:22 字數 2286 閱讀 1088

一:url的處理

import urllib.request

from lxml import etree

def bo_url(url):

headers =

for bo in bo_list:

bo_dict['職位型別'] = bo.xpath('./li//h3//div[@class="job-title"]/text()')

bo_dict['待遇'] = bo.xpath('./li//h3//span[@class="red"]/text()')

bo_dict['發布時間'] = bo.xpath('./li//div[@class="info-publis"]//p/text()')

bo_dict['公司名稱'] = bo.xpath('./li//div[@class="company-text"]//a/text()')

bo_dict['地點'] = bo.xpath('./li//div[@class="info-primary"]//p/text()')

print(bo_dict)

return bo_dict

三:用json儲存匹配的資料

def xiazai(bo_dict):

bo_list = json.dumps(bo_dict)

with open("boos.json", 'a') as fp:

fp.write(json.dumps(bo_list))

fp.close()

return bo_list

四:控制函式

def main():

work = input("請輸入你要爬取的崗位名稱")

url = "" + work + "%e7%88%ac%e8%99%ab&scity=101280600&industry=&position="

# text = xiazai(bo_spider(bo_url(url)))

text = bo_spider(bo_url(url))

return text

if __name__ == '__main__':

main()

五:整體**

import json

import urllib.request

from lxml import etree

def bo_url(url):

headers =

for bo in bo_list:

bo_dict['職位型別'] = bo.xpath('./li//h3//div[@class="job-title"]/text()')

bo_dict['待遇'] = bo.xpath('./li//h3//span[@class="red"]/text()')

bo_dict['發布時間'] = bo.xpath('./li//div[@class="info-publis"]//p/text()')

bo_dict['公司名稱'] = bo.xpath('./li//div[@class="company-text"]//a/text()')

bo_dict['地點'] = bo.xpath('./li//div[@class="info-primary"]//p/text()')

print(bo_dict)

return bo_dict

def xiazai(bo_dict):

bo_list = json.dumps(bo_dict)

with open("boos.json", 'a') as fp:

fp.write(json.dumps(bo_list))

fp.close()

return bo_list

def main():

work = input("請輸入你要爬取的崗位名稱")

url = "" + work + "%e7%88%ac%e8%99%ab&scity=101280600&industry=&position="

text = xiazai(bo_spider(bo_url(url)))

return text

if __name__ == '__main__':

main()

爬取51崗位(xpath的運用)

coding utf 8 import os import re import requests import lxml from lxml import etree 請求頭 獲取城市列表 def getcitylist url html requests.get url,headers heade...

datawhale爬蟲(xpath爬取丁香網評論)

1.xpath基礎學習 前面我們介紹了 beautifulsoup 的用法,這個已經是非常強大的庫了,不過還有一些比較流行的解析庫,例如 lxml,使用的是 xpath 語法,同樣是效率比較高的解析方法。如果大家對 beautifulsoup 使用不太習慣的話,可以嘗試下 xpath。xpath 是...

使用xpath解析爬取鏈家

from urllib import request from time import sleep from lxml import etree import csv import redis import re 1 資料抓取 定義乙個函式,用於將頁碼,城市等資訊轉化為乙個request物件 def...