Python篇多執行緒1TB資料生成指令碼

sorry，我騙了你！！！

「多執行緒」，在python中就是雞肋，我只是想得到很多份1tb資料檔案而已。

「指令碼」，這真的是個python指令碼。。。。。

重要的事情：

在windows和mac作業系統下，生成的資料會大於自己定義的size，但是沒有超出很多，這是檔案管理系統的鍋。

在ubuntu server 14.04系統下，生成的資料檔案大小和size一樣。

我也不知道為什麼。。。。一定是檔案管理系統的鍋！！！

# -*- coding: utf-8 -*-
import os
import random
import thread
####
## blog：
## 地點：天津工業大學大資料實驗室
####
#windows的存放目錄
#dx = os.path.getsize("d:\\users\\wyj\\desktop\\shadowsocks-manyuser\\a.txt")
#生成隨機數，並寫入filename檔案中，存放在file目錄下。
def randomnumber(filename,size):
#檔案大小初始化
dx=0
#mac的存放目錄
file = '/users/tanishindaira/desktop/express/'+filename+'.txt'
#只要生成的資料量小於size(自己定義的生成大小)，就一直追加寫入
while( dx < size ):
f = open(file, "a")
f.write(str(random.random())+"\n")  #此處可以改動，可以不用寫入隨機數
dx = os.path.getsize(file)
f.close()
#執行**。多執行緒執行。
try:
thread.start_new_thread(randomnumber,("a",1024*1024*1024*1024))
thread.start_new_thread(randomnumber,("b",1024*1024*1024*1024))
#thread.start_new_thread(randomnumber,("c",1024*1024*1024*1024))
#thread.start_new_thread(randomnumber,("d",1024*1024*1024*1024))
except:
print "有異常，執行緒衝突，請檢視檔案路徑"

**很短，但是很方便。

需要改動的地方：

如何在1TB檔案中找到重複的兩行資料

之前在網上看過乙個很有意思的問題？在單機且記憶體不能放下全部足量的資料的情況下，如何在1t的檔案中，找到重複的兩行？看完這個問題，不妨我們來思考一下如何實現.首先實現這個問題我們的第一想到的最笨的解決方案就是對每一行都與檔案中後面所有的行進行比較，這種方式的時間複雜度很高為o n 2 那麼有沒有更好...

Python 多執行緒 1

import thread import time 為執行緒定義乙個函式 defprint time threadname,delay count 0while count 3 time.sleep delay count 1print threadname,time.ctime 建立兩個執行緒 t...

python 多執行緒（十三 1）

玩遊戲 for i in range 3 print 玩遊戲 time.sleep 5 def network 上網 for i in range 3 print 上網.time.sleep 5 呼叫單任務的表現當25行 play函式呼叫沒有執行完畢之前。26行不會被執行，也就是會阻塞。play...

Python篇 多執行緒1TB資料生成指令碼

如何在1TB檔案中找到重複的兩行資料

Python 多執行緒 1

python 多執行緒（十三 1）

相關推薦

Python篇多執行緒1TB資料生成指令碼