python對檔案進行平行計算初探二）

上次的平行計算是通過將大檔案分割成小檔案，涉及到檔案分割，其實更有效的方法是在記憶體中對檔案進行分割，分別計算

最後將返回結果直接寫入目標檔案，省去了分割小檔案合併小檔案刪除小檔案的過程

**如下：

import
json
import
math
from multiprocessing import
pool
import
requests
"""不分割檔案，直接起多個程序對檔案進行讀寫
"""#
使用者業務邏輯
defget_jw(addr_name):
addr_name=addr_name.split('
,')[1]
url = '
'result = requests.get(url.format(addr_name=addr_name))
result_str = str(result.content, encoding="
utf-8")
rj =json.loads(result_str)
if len(rj['
geocodes
']) >0:
jwd = rj['
geocodes
'][0]['
location']
print
(jwd)
return addr_name + '
,' + jwd + '\n'
else
:        
print('
-,-'
)        
return addr_name + '
,' + '
-,-' + '\n'
defmy_callback(lines):
with open(
'/opt/test/qiuxue/target2.txt
', 'a'
) as f:
f.writelines(lines)
#讀取分塊檔案
class
reader(object):
def__init__
(self, file_name, start_pos, end_pos, business_func):
self.file_name =file_name
self.start_pos =start_pos
self.end_pos =end_pos
self.business_func =business_func
defexecute(self):
lines =
with open(self.file_name, 'rb
') as f:
if self.start_pos !=0:
f.seek(self.start_pos - 1)
if f.read(1) != '\n'
:                    line =f.readline()
self.start_pos =f.tell()
f.seek(self.start_pos)
while self.start_pos line =f.readline().strip()
line = str(line, encoding='
utf8')
try:
new_line =self.business_func(line)
except
exception as e:
offset = len(line.encode('
utf8
')) + 1f.seek(-offset, 1)
self.start_pos =f.tell()
return
''.join(lines)
#將檔案分成要求的塊數，以list返回起止pos
class
fileblock(object):
def__init__
(self, file_name, block_num):
self.file_name =file_name
self.block_num =block_num
defblock_file(self):
pos_list =
with open(self.file_name, 'r
') as f:
f.seek(0, 2)
start_pos =0
file_size =f.tell()
block_size = math.ceil(file_size /self.block_num)
while start_pos <=file_size:
if start_pos + block_size >file_size:
else
:                start_pos = start_pos + block_size + 1
return
pos_list
if__name__ == '
__main__':
concurrency = 8p =pool(concurrency)
input_file = '
/opt/test/qiuxue/target.txt
'fb =fileblock(input_file, concurrency)
for s, e in
fb.block_file():
reader =reader(input_file, s, e, get_jw)
p.close()
p.join()

python對檔案進行平行計算初探

最近工作中經常會有讀取乙個檔案，對資料做相關處理並寫入到另外乙個檔案的需求當檔案行數較少的時候，單程序順序讀取是沒問題的，但是當檔案行數過萬，就需要消耗很客觀的時間。一一次性讀入，多程序處理我最初想到的辦法是多程序，最初的辦法是一次性讀取所有行，然後分配給多個程序處理，最終還是寫入乙個檔案。其...

python平行計算 python平行計算

0.基礎並行發 multiprocessing threading 1.concurrent 2.併發 asynico 3.ipython下的平行計算使用ipyparallel庫的ipython提供了前所未有的能力，將科學python的探索能力與幾乎即時訪問多個計算核心相結合。系統可以直觀地與本...

平行計算模型

平行計算模型通常指從並行演算法的設計和分析出發，將各種並行計算機至少某一類並行計算機的基本特徵抽象出來，形成乙個抽象的計算模型。從更廣的意義上說，平行計算模型為平行計算提供了硬體和軟體介面在該介面的約定下，並行系統硬體設計者和軟體設計者可以開發對並行性的支援機制，從而提高系統的效能。有幾...

python對檔案進行平行計算初探 二）

python對檔案進行平行計算初探

python平行計算 python平行計算

平行計算模型

相關推薦

python對檔案進行平行計算初探二）