1、配置scrapy除錯
在工程檔案下新建python資料夾main.py用於除錯專案(當然還可以使用pdb進行除錯)
main.py
from scrapy.cmdline import execute
import sys
import os
execute(["scrapy", "crawl", "buycar"])
2、設定robots為false
3、拿到xsrf
get_xsrf()4、登陸全部**:
print('登陸成功')
else:
print('登陸失敗')
zhihu_login('18328020353','*****')
一片關webdriver xpath的使用
獲取屬性值
dr = driver.find_element_by_id('tooltip')
dr.get_attribute('data-original-title') #獲取tooltip的內容
我們唯一確定的是文字'profile'將始終包含在此影象的src中,因此我們可以在xpath中使用此提示,如下所示:
web.find_element_by_xpath(".//*[@class='login-content']/form/button/ img [contains(@src,'profile') ]").click()
常用查詢方式
find_element_by_name
find_element_by_id
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
5、處理知乎驗證碼
class jsmiddleware(object):
def process_request(self, request, spider):
web = webdriver.chrome("e:/software/python3.6/chromedriver.exe")
try:
if spider.name == "douyuimage":
# self.web.get(request.url)
web.get(request.url)
time.sleep(3)
body = web.page_source
print("訪問:".format(request.url))
print("^" * 50)
return htmlresponse(url=web.current_url, body=body, encoding="utf-8", request=request)
except exception as e:
print(e)
print("webdriver 失敗")
return none
python 鏈結 mysql 的sql語句中如果含有中文一定要用format
sql = 'select id from question where user = "" and q_title = ""'.format(item['q_user'],item['q_title']),一定要這樣寫,記住外面是有引號滴,有引號!!!!!!!!
Scrapy框架 日誌配置
import datetime 配置日誌檔名和位置 to day datetime.datetime.now log file path log scrapy log format to day.year,to day.month,to day.day log file log file path ...
python爬蟲 scrapy日誌
1 scrapy日誌介紹 scrapy的日誌系統是實現了對python內建的日誌的封裝 scrapy也使用python日誌級別分類 logging.critical logging.erroe logging.warining logging.info logging.debug 2 如何在pyth...
Python 爬蟲,scrapy,日誌配置
專案名 spiders 爬蟲名.py 爬蟲,例項化logger,輸出日誌資訊 coding utf 8 import scrapy import logging 匯入 例項化logger logger logging.getlogger name class demospider scrapy.sp...