愛卡 xcar 汽車詳解抓取

2021-09-20 02:41:07 字數 3304 閱讀 3771

爬蟲練手,主要運用requests,由於要對script內部進行分析,所以就直接用了 re 正則匹配,平時也可以用用beautifulsoup, 應該更加方便

讀取首頁,就是如 為了全部抓取,我們這裡都是 1.htm 結尾

遞迴抓取 全部頁面

這裡另外做的一點實際 弄了個xcar_lst 記錄所有頁面、等資訊,只是留作記錄,暫時沒用

python# coding:utf-8

__author__ = 'bonfy chen'

import requests

import re

proxies = none

headers =

base_folder = 'd:/***_folder/'

class xcardown(object):

_base_folder = none

_proxies = none

_headers = none

_website = ''

_xcar_lst =

def set_base_folder(self, base_folder):

self._base_folder = base_folder

def set_headers(self, headers):

self._headers = headers

def set_proxies(self, proxies):

self._proxies = proxies

def __init__(self, base_folder=base_folder, proxies=proxies, headers=headers):

self.set_base_folder(base_folder)

self.set_headers(headers)

self.set_proxies(proxies)

def download_image_from_url(self, url, name=none):

"""download_image_from_url

:param url: the resource image url

:param name: he destination file name

:return:

"""local_filename = name + '_' + url.split('/')[-1]

r = requests.get(url, proxies=self._proxies, headers=self._headers, stream=true)

with open(self._base_folder + local_filename, 'wb') as f:

for chunk in r.iter_content(chunk_size=1024):

if chunk:

f.write(chunk)

f.flush()

f.close()

return local_filename

def download_xcar(self, url):

""":param url: the source url in xcar.com.cn

/2674/2015/detail/1.htm

:return:

"""r = requests.get(url, proxies=self._proxies, headers=self._headers)

# print r.encoding

r.encoding = 'gbk'

m1 = re.search(r"var nexturl = '(?p.*.htm)'", r.text)

next_url = m1.groupdict()['n_url'] if m1 else none

m2 = re.search(r"

", r.text)

title = m3.groupdict()['title'] if m3 else ''

m4 = re.search(r"(?p.*)

", r.text)

cont = m4.groupdict()['cont'] if m4 else ''

m5 = re.search(r"(?p.*)", r.text)

model = m5.groupdict()['model'] if m5 else ''

if pic_url:

try:

self.download_image_from_url(pic_url, name='_'.join([model, title, cont]))

print 'download complete: pic from {} '.format(pic_url)

except ioerror:

print 'file name ioerror'

self.download_image_from_url(pic_url, name=model)

print 'download complete: pic from {} '.format(pic_url)

except exception as e:

print e

dct = dict(pic_url=pic_url, next_url=next_url, title=title, cont=cont, model=model)

if next_url[-4:] == '.htm':

self.download_xcar(self._website + next_url)

if __name__ == '__main__':

print("welcome to the pic download for xcar.com")

print("downloaded files in the folder: " + base_folder )

print("---------------------------------------")

id_modell = int(input("please enter the modell id(eg.2674): "))

year = int(input("please enter the year (eg.2015): "))

url = '/{}/{}/detail/1.htm'.format(id_modell, year)

xcar = xcardown()

xcar.download_xcar(url)

愛卡汽車網活動分站兩個SQL注射

1.sql注入,多個資料庫 2.可執行sql命令 3.可讀取檔案 之前的修復的很不徹底.詳細說明 貌似沒修理完,很多了 注入資訊 target host ip 118.67.112.75 web server apache db server mysql current db xcardb2 1.查...

愛馳CEO谷峰談愛馳汽車未來規劃

谷峰,作為愛馳汽車聯合創始人兼ceo,從研究生畢業之後到創業之前,他一直供職上汽20年。作為上海財經大學的高才生,谷峰的工作一直與財務有關,從上汽通用erp 財務模組的實施主管 乾到上汽集團cfo,成為上汽集團最程式設計客棧年輕的高管。直到2017年正式加入愛馳。談到愛馳未來計畫,谷峰說到 首先我們...

愛馳汽車運用多種手段鑄造智慧型工廠

愛馳汽車作為一家致力於智慧型製造 智慧型產品和運營服務加速汽車產業進化的企業,以工業4.0 標準自建具備整車資質的數位化 智慧型化 柔性化超級智慧型工廠,引領低能耗 自動化的綠色產業鏈發展潮流。在工業生產領域,愛馳汽車超級智慧型工廠以領先的智慧型製造 智慧型物流,打造低能耗 自動化的綠色生產鏈。智慧...