python 抓取微博資料

#匯入需要的模組

import urllib.request

import json

#定義要爬取的微博大v的微博id

id='1259110474'

#設定**ip

proxy_addr="122.241.72.191:808"

#定義頁面開啟函式

#獲取微博主頁的containerid，爬取微博內容時需要此id

def get_containerid(url):

data=use_proxy(url,proxy_addr)

content=json.loads(data).get('data')

for data in content.get('tabsinfo').get('tabs'):

if(data.get('tab_type')=='weibo'):

containerid=data.get('containerid')

return containerid

def get_userinfo(id):

url=''+id

data=use_proxy(url,proxy_addr)

content=json.loads(data).get('data')

profile_image_url=content.get('userinfo').get('profile_image_url')

description=content.get('userinfo').get('description')

profile_url=content.get('userinfo').get('profile_url')

verified=content.get('userinfo').get('verified')

guanzhu=content.get('userinfo').get('follow_count')

name=content.get('userinfo').get('screen_name')

fensi=content.get('userinfo').get('followers_count')

gender=content.get('userinfo').get('gender')

urank=content.get('userinfo').get('urank')

def get_weibo(id,file):

i=1while true:

url=''+id

weibo_url=''+id+'&containerid='+get_containerid(url)+'&page='+str(i)

try:

data=use_proxy(weibo_url,proxy_addr)

content=json.loads(data).get('data')

cards=content.get('cards')

if(len(cards)>0):

for j in range(len(cards)):

print("-----正在爬取第"+str(i)+"頁，第"+str(j)+"條微博------")

card_type=cards[j].get('card_type')

if(card_type==9):

mblog=cards[j].get('mblog')

attitudes_count=mblog.get('attitudes_count')

comments_count=mblog.get('comments_count')

created_at=mblog.get('created_at')

reposts_count=mblog.get('reposts_count')

scheme=cards[j].get('scheme')

text=mblog.get('text')

with open(file,'a',encoding='utf-8') as fh:

fh.write("----第"+str(i)+"頁，第"+str(j)+"條微博----"+"\n")

i+=1

else:

break

except exception as e:

print(e)

pass

if __name__=="__main__":

file=id+".txt"

get_userinfo(id)

get_weibo(id,file)

執行結果

文字中的內容

微博抓取嘗試

1 找人，通過關注列表 2 提取出微博的資料，放到資料庫微博暱稱，頭像關注，粉絲及微博數量根據一些基本的原則來決定是否將該使用者的微博入待爬的佇列指標關注人數粉絲人數但是有可能會很多人，而且有很多殭屍粉不好第一，低效第二，平台也不會讓你無限制的往下翻頁，肯定會有限制微博數，粉絲...

python抓取微博熱搜列表

20200912 今天收拾自己以前的看到了很久之前寫的乙個抓取微博熱搜的最起碼的兩年了，然後跑了一下，居然還行，只不過並不是理想，資料上有些偏差，但是能用。功能就是每1分鐘抓取一次，然後寫到日誌中。bin python coding utf 8 import requests user agen...

模擬登入微博通，抓取新浪微博c

廢話少說，下面的東西就在倆個小時內完成了。一首先我們要提到模擬登入微博通，我用fiddler檢視了下，發現登入走下面三個流程下面來上完成上面的描述，很簡單，如下 1 我們先定義乙個全域性變數 private static cookiecontainer cc new cookiecontain...

python 抓取微博資料

微博抓取嘗試

python抓取微博熱搜列表

模擬登入微博通，抓取新浪微博c

相關推薦