爬蟲之爬取微博

3忘記了包含yeild的函式生成器目的是可以作為乙個迭代物件

貼源**：

importrequests
fromurllib.parseimporturlencode
frompyqueryimportpyqueryaspq
base_url =''header =
defget_page(page,value):
parmas = 
url  = base_url+urlencode(parmas)
try:
r = requests.get(url,headers = header)
ifr.status_code == 200:
returnr.json()
except:
print('fail')
defparse_json(json):
ifjson:
weibo = {}
url_list = 
items = json.get('cards')##json可以直接get獲得，但是是一層層的get
foriteminitems :
))            item  = item.get('mblog')
weibo['time'] = item.get('created_at')
weibo['comments'] = item.get('comments_count')
weibo['zan'] = item.get('attitudes_count')
weibo['text'] = pq(item.get('text')).text()
yieldweibo##當時這裡我也寫了yeild weibo,url_list，然後在主函式中print，由於碰到yield函式就返回一次，所以輸出的url_list第一次只有1個，第二次只有2個這個楊子
'''def get_commet():try:r = requests.get("", headers=header)if r.status_code == 200:return r.json()except:print('fail')'''##失敗了if__name__ =='__main__':
print(''.format('時間','讚數','內容'))
fornuminrange(1,5):
json = get_page(num)
forweiboinparse_json(json):
print(''.format(weibo['time'],weibo['comments'],weibo['zan'],weibo['text']))

Python爬蟲之微博評論爬取

import requests 請求 import time 時間 from fake useragent import useragent 隨機請求頭 import re 正則模組登入微博，f12開啟瀏覽器抓包功能直接攜帶cookie 引數請求即可這個方法很簡單，但是只能爬取50頁左右，如果...

爬取新浪微博

學到的東西。1 習慣用logger，而不是用print self.logger.debug 開始解析 format response.url 2 習慣用正規表示式這是在pipeline清理資料時用到的 s 5分鐘前 if re.match d 分鐘前 s minute re.match d s g...

爬取微博指定使用者的微博內容

使用python3爬取微博指定使用者的內容 import urllib.request import json 定義要爬取的微博大v的微博id id 5866810652 設定 ip proxy addr 192.168.1.101 定義頁面開啟函式獲取微博主頁的containerid，爬取微博內...

爬蟲之爬取微博

Python爬蟲之微博評論爬取

爬取新浪微博

爬取微博指定使用者的微博內容

相關推薦