模擬登陸並爬取Github

因為崔前輩給出的**執行有誤，略作修改和簡化了。

書上例題，不做介紹。

import

requests

from lxml import

etree

class

def__init__

(self):

self.headers =

#登陸位址

self.login_url = '

'#post請求位址

self.post_url = '

'#使用session保持狀態，並自動處理cookies(在訪問其他子網頁時，可以保持登陸，爬取網頁)

self.session =requests.session()

deftoken(self):

#獲取網頁資料

response = self.session.get(self.login_url, headers=self.headers)

#提取網頁中我們需要的authenticity_token並返回

selector =etree.html(response.text)

token = selector.xpath('

//input[@name="authenticity_token"]/@value')

return

token

deflogin(self, email, password):

post_data =

#使用post方法模擬登陸

response = self.session.post(self.post_url, data=post_data, headers=self.headers)

#登陸正常，輸出登陸後的網頁**，並將它儲存帶d盤github.txt

if response.status_code == 200:

(response.text)

with open(

'd:/github.txt

', '

w', encoding = '

utf-8

') as f:

f.write(response.text)

else

print("

error!!!")

if__name__ == "

__main__":

1024593536@qq.com

', password='

password

')#輸入你自己的賬戶密碼

可以改成網頁形式檢視

用Python模擬登陸GitHub並獲取資訊

搜狗的反爬有點厲害，即使我用了高匿它還是會提醒我ip訪問過於頻繁，然後跳轉驗證碼頁面。不過方法還是有的，通過其他搜狗搜尋動態改變乙個賬號沒辦法呀.這裡先對github進行模擬登陸，了解會話及cookies相關知識。01 網頁分析首先看一下登入頁，獲取authenticity token引數值...

使用requests模擬登陸github

學了了下python requests 以及文字處理和正則工具re,順便應用一下。使用requests模擬登陸github 準備 tampler data 使用教程利用它獲取到登陸所需要的header，post引數等資訊。requests 快速入門教程模擬 import requests imp...

模擬登陸 github模擬登陸，列印資訊流

目的動態獲取cookie 1 開啟開發者工具，檢視各自請求 2 可以看到name為session的請求方式post，傳入的data 3 檢視name為login的請求，原始碼中獲得token，作為上乙個請求中的data的一部分檢視資訊流請求的url，自行構建對應的url，解析 1 這裡有個技巧...

模擬登陸並爬取Github

用Python模擬登陸GitHub並獲取資訊

使用requests模擬登陸github

模擬登陸 github模擬登陸，列印資訊流

相關推薦