Python模擬登陸萬能法

此文**：轉錄只是為了方便學習，感謝他的分享

python模擬登陸讓不少人傷透腦筋，今天奉上一種萬能登陸方法。你無須精通html，甚至也無須精通python，但卻能讓你成功的進行模擬登陸。本文講的是登陸所有**的一種方法，並不侷限於微博與知乎，僅用其作為例子來講解。

用到的庫有「selenium」和「requests」。通過selenium進行模擬登陸，然後將cookies傳入requests，最終用requests進行**的抓取。優點就是不但規避了「selenium」其本身抓取速度慢的問題（因為僅僅用其作為登陸），又規避了利用requests登陸時需要製作繁瑣的cookies的過程（因為是從selenium直接拿來cookies）。文章前面列出了步驟與**，後面補充了登陸微博與知乎的例項。

------------開始---------

匯入selenium庫

from selenium import webdriver

明確模擬瀏覽器在電腦中存放的位置，比如我存在了d盤

chromepath = r'd:\python program\chromedriver.exe'

用selenium的webdriver方程指明瀏覽器的路徑，同時開啟乙個瀏覽器。模擬瀏覽器有多種可選,比如firefox, safari。本次用的是谷歌的模擬瀏覽器。注意：'.chome'是大寫字母。

wd = webdriver.chrome(executable_path= chromepath)

讓webdriver為你填寫使用者名稱和密碼

wd.find_element_by_xpath('使用者名稱選項卡位置').send_keys('使用者名稱')
wd.find_element_by_xpath('密碼選項卡位置').send_keys('密碼')

讓webdrive點選登陸，若是按鈕就選擇用click(),若是表單就選擇submit()。

wd.find_element_by_xpath('登陸按鈕所在位置').click() #若是按鈕
wd.find_element_by_xpath('登陸按鈕所在位置').submit() #若是表單

登陸完成，所有的cookies現在都存在了'wd'裡面，可隨時呼叫。

匯入requests庫,並構建session()

import reqeusts
req = requests.session()

從『wd'裡調出cookies

cookies = wd.get_cookies()

將selenium形式的cookies轉換為requests可用的cookies。

for cookie in cookies:
req.cookies.set(cookie['name'],cookie['value'])

大功告成！嘗試用requests來抓取網頁。

req.get('待測試的鏈結')

以上就是python模擬登陸的萬能方法，你無需分析傳遞給**的cookies。只需要告訴python在什麼地方填寫使用者名稱與密碼就可以。十分的便利。

import requests
from selenium import webdriver
chromepath = r'瀏覽器存放位置'
wd = webdriver.chrome(executable_path= chromepath) #構建瀏覽器
loginurl = '' 
wd.get(loginurl) #進入登陸介面
wd.find_element_by_xpath('//*[@id="loginname"]').send_keys('userword') #輸入使用者名稱
wd.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[2]/div/input').send_keys('password') #輸入密碼
wd.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[6]/a').click() #點選登陸
req = requests.session() #構建session
cookies = wd.get_cookies() #匯出cookie
for cookie in cookies:
req.cookies.set(cookie['name'],cookie['value']) #轉換cookies
test = req.get('待測試的鏈結')

解釋下關鍵的幾個步驟：

1.找位置。推薦使用谷歌瀏覽器來查詢每個元素的xpath，參看這個：從chrome獲取xpath路徑。

2. 選擇click函式還是submit函式。推薦每個都試一下，總會有乙個成功的。

3.登陸微博是被要求輸入驗證碼怎麼辦？有時登陸微博會被要求輸入驗證碼，這個時候我們可以加一行手動輸入驗證碼的**。例如：

wd.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[6]/a').click() #點選登陸
wd.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[3]/div/input').send_keys(input("輸入驗證碼： "))
wd.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[6]/a').click()#再次點選登陸

輸入驗證碼的時候需要點選兩次登陸。因為驗證碼的輸入框只有在點選了一次登陸後才會彈出來！根據每個**的不同而靈活應用selenium是十分重要的！但這個和分析那些cookies比起來簡直是太小兒科了。

import time
import requests
from selenium import webdriver
chromepath = r'瀏覽器儲存的位置'
wd = webdriver.chrome(executable_path= chromepath) 
time.sleep(45)#設定45秒睡眠，期間進行手動登陸。十分關鍵，下面有解釋。
cookies = wd.get_cookies()#調出cookies
req = requests.session()
for cookie in cookies:
req.cookies.set(cookie['name'],cookie['value'])
req.headers.clear() 
test = req.get('待測試的鏈結')

req.headers.clear() 是刪除原始req裡面標記有python機械人的資訊。這個資訊會被一些**（比如知乎）捕捉到。造成登陸爬取失敗。務必要刪除！

感謝大家讀到這，文章最初說的懶人方法就是我登陸知乎用到的這種方法，半手動。但是也不要覺得它不好，畢竟我們的目的是爬取**的內容，盡快解決登陸問題。開始爬取工作才是正確的方向。這個方法可以幫您迅速登陸**，節省大量時間。這個方法萬能的原理就是它呼叫了真實的瀏覽器。那麼只要在正常情況下瀏覽器能夠訪問的**就都可以用這個方法登陸。

問題2：如何讓新開啟的webdriver帶有曾經儲存過的cookies？

from selenium import webdriver
from requests import session
from time import sleep
req = session()
req.headers.clear() 
chromepath = r'd:\python program\chromedriver.exe'
wd = webdriver.chrome(executable_path= chromepath) 
zhihuloginurl = ''
wd.get(zhihuloginurl)
wd.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[2]/div[1]/div[1]/div[2]/span').click()
wd.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[2]/form/div[1]/div[1]/input').send_keys('username') 
wd.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[2]/form/div[1]/div[2]/input').send_keys('password')
sleep(10) #手動輸入驗證碼 
wd.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[2]/form/div[2]/button').submit() 
sleep(10)#等待cookies載入
cookies = wd.get_cookies()
for cookie in cookies:
req.cookies.set(cookie['name'],cookie['value'])

Python模擬登陸萬能法

Python 模擬登陸

Python模擬登陸

普通python模擬登陸

Python模擬登陸萬能法

Python 模擬登陸

Python模擬登陸

普通python模擬登陸

相關推薦