提取某個字段資料並統計其分布規律

import numpy
import io
import csv
import time
import sys,re
import numpy as np  
import matplotlib.pyplot as plt 
## function to change the type of time into seconds
deftime2itv
(stime):  
p="^([0-9]+):([0-5][0-9]):([0-5][0-9])$"  
cp=re.compile(p)  
try:  
mtime=cp.match(stime)  
except typeerror:  
return
"[inmoduleerror]:time2itv(stime) invalid argument type"
if mtime:  
t=map(int,mtime.group(1,2,3))  
return
3600*t[0]+60*t[1]+t[2]  
else:  
return
0##write the time into a csv,change it into seconds,and sort it 
defwrite2csv
(stime):
with open('result.csv', 'wb') as csvfile:
writer=csv.writer(csvfile)
writer.writerow(['time'])
writer.writerows([stime])
##read the time into numbers and time in txt,which are divided by'\t'
timelist=;
for line in open("test.txt"):
numbers,time =line.split("\t")
time=time.strip()
time=time.rstrip('"')
time=time.lstrip('"')
time=time2itv(time)
#    print time
timelist.sort(reverse=true);
#print "end"
#print timelist
write2csv(timelist)
##draw out the point
#x=1:len(timelist);
y=timelist;
plt.plot(y,marker='o')
plt.show()

長尾分布特徵十分明顯，復合帕累託定律。

解決的問題

1. 利用python語言讀取txt檔案並寫入csv檔案；

2. 除去所需字段的非必要字元，如空格，引號等；

2. 實現計時格式從xx：yy：zz到***格式的轉化；

乙個月之後看自己寫的**一坨屎，

除了計算時間那個正規表示式像回事，不過貌似第copy別人的

重新寫了一下這段**

好歹看上去舒服點。

import time
from numpy import array
from numpy.random import normal
from matplotlib import pyplot
defget_time
(filename):
readfile=open(filename)
stime=
lines=readfile.readlines()
for line in lines:
video_id,time=line.split("\t")
time=time.strip()
time=time.strip('"')
if time!='0':
time = time.split(':')
hour= int(time[0])
minite = int(time[1])
second = int(time[2])
#total_time=time[0]*3600+60*time[1]+time[2]
total_time=3600*hour+60*minite+second
else:
total_time=0
return array(stime)
defdraw_hist
(lenths):
pyplot.hist(lenths,100)
pyplot.xlabel('lenth')
pyplot.xlim(0.0,10000)
pyplot.ylabel('frequency')
pyplot.title('lenth of fake urls')
pyplot.show()
stime=get_time("stat_content_time.txt")
draw_hist(stime)

hive 查詢除了某個字段之外的全部字段資料

今天檢視的時候被前輩的操作秀了一臉。此邏輯為查詢表中除了某欄位之外的所有。set hive.support.quoted.identifiers none select aaa from database table where x set hive.support.quoted.identifi...

統計表中某個字段值相同的個數！！

在論壇裡找到的。呵呵年代久遠，拿出來曬一下！其實不錯現在有幾十萬的記錄，其中乙個欄位是車牌號，這個車牌號的記錄可能會有很多重複。比如車牌號為 abcde 的總共有多少個？請問能不能一次性的統計出所有不重複的車牌號的記錄數？比如車牌abcde 12345 abcde 54321 12345 ab...

MySQL 對某個欄位先統計後獲取排序名次

一，普通獲取排序名次比如獲取乙個班級成績排名，分兩步 1 查出所有使用者和他們的成績排名 select id,maxscore,rownum rownum 1 as rowno from t user,select rownum 0 b order by t user.maxscore desc ...

提取某個字段資料並統計其分布規律

hive 查詢除了某個字段之外的全部字段資料

統計表中某個字段值相同的個數！！

MySQL 對某個欄位先統計後獲取排序名次

相關推薦