京東文胸爬蟲及資料分析

許久不來寫文章了，最近夏令營搞的確實沒時間。這次把上次直播講的東西寫成文字，帶大家開波車。

import requests
from lxml import etree
import time
import json
import re
import csv
headers = 
fp = open('c:/users/luopan/desktop/wenxiong1.csv','wt',newline='',encoding='utf-8')
writer = csv.writer(fp)
writer.writerow(('content','creationtime','productcolor','productsize','userclientshow','userlevelname'))
def get_id(url):
html = requests.get(url, headers=headers)
selector = etree.html(html.text)
infos = selector.xpath('//ul[@class="gl-warp clearfix"]/li')
for info in infos:
try:
id = info.xpath('@data-sku')[0]
comment_url = ''.format(id)
get_comment_info(comment_url,id)
except indexerror:
pass
def get_comment_info(url,id):
html = requests.get(url,headers=headers)
t = re.findall('fetchjson_comment98vv6\((.*)\);', html.text)
json_data = json.loads(t[0])
page = json_data['maxpage']
urls = [''.format(str(i)) for i in range(0,int(page))]
for path in urls:
html1 = requests.get(path%id, headers=headers)
t1 = re.findall('fetchjson_comment98vv6\((.*)\);', html1.text)
json_data = json.loads(t1[0])
for comment in json_data['comments']:
content = comment['content']
creationtime = comment['creationtime']
productcolor = comment['productcolor']
productsize = comment['productsize']
userclientshow = comment['userclientshow']
userlevelname = comment['userlevelname']
# print(content,creationtime,productcolor,productsize,userclientshow,userlevelname)
writer.writerow((content,creationtime,productcolor,productsize,userclientshow,userlevelname))
time.sleep(2)
if __name__ == '__main__':
url = ''
get_id(url)

首先匯入相應的庫檔案和讀入資料。

老司機大概感興趣的就是文胸尺寸、顏色、和購買的時間，我們對這些列資料進行簡單的清洗，以便之後的視覺化。

我們提取購買的時間。通過視覺化表現出來。

通過圖可以看出妹子們都喜歡10點後購買文胸，剛上會班，就開始「不務正業」了。

對於廣大男同胞來說，這些看著頭都暈，我們需要通過python進行資料的清洗，把它弄成abcde，嘿嘿。

通過視覺化可以看出，b的妹子是最多的，可我感覺**不對勁，後面再京東檢視了部分商品，發現a斷碼或者有的商品沒有a碼，所以這可能導致a偏少了，扎心了，老鐵。

統一進行清洗視覺化，直接上圖。

膚色的是最多的，大家知道原因麼，嘿嘿。

python爬取京東文胸資料三

上篇我們只爬了乙個牌子的文胸，這次我們來多爬幾個牌子的 1.爬取不同牌子的url 其實可以直接爬那個href,但我發現有的帶了https有的沒帶就索性直接取id拼接了 import requests import json import threading import time import re...

爬蟲資料分析 numpy

資料分析是把隱藏在一些看似雜亂無章的資料背後的資訊提煉出來，總結出所研究物件的內在規律資料分析三劍客 numpy,pandas,matplotlib numpy numerical python 是 python 語言的乙個擴充套件程式庫，支援大量的維度陣列與矩陣運算，此外也針對陣列運算提供大量...

筆試京東資料分析暑期實習

京東暑期實習是我參加的最後乙個公司的筆試，也是唯一過了的筆試。因為一直忙的沒時間準備，就直接拿筆試當學習機會。京東資料分析的筆試是20道選擇 2道程式設計題。都不是很難以上程式設計感覺還是用c c 順手，被迫無奈我又去翻了翻已經忘了的知識。1 選擇題主要是概率論和機器學習之類的，記不太清了。2 程...

京東文胸爬蟲及資料分析

python爬取京東文胸資料 三

爬蟲 資料分析 numpy

筆試 京東資料分析暑期實習

相關推薦

python爬取京東文胸資料三

爬蟲資料分析 numpy

筆試京東資料分析暑期實習