記憶體占用降低篇輸入資料壓縮

在參加機器學習模擬賽專案時，經常會遇到輸入資料量過大，導致讀入輸入資料時占用記憶體過多的問題，對於配置較低的電腦造成較大的負擔。對此，經常使用資料壓縮（高精度資料型別轉為低精度資料型別）的方法緩解這一問題，具體**如下：

def reduce_mem_usage(df, verbose=true):
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
start_mem = df.memory_usage().sum() / 1024**2    
for col in df.columns:
col_type = df[col].dtypes
if col_type in numerics:
c_min = df[col].min()
c_max = df[col].max()
if str(col_type)[:3] == 'int':
if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
df[col] = df[col].astype(np.int8)
elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
df[col] = df[col].astype(np.int16)
elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
df[col] = df[col].astype(np.int32)
elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
df[col] = df[col].astype(np.int64)  
else:
if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
df[col] = df[col].astype(np.float16)
elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
df[col] = df[col].astype(np.float32)
else:
df[col] = df[col].astype(np.float64)    
end_mem = df.memory_usage().sum() / 1024**2
if verbose: print('mem. usage decreased to  mb (% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
return df

原理顯而易見，就是通過數值範圍限制，合理使用低精度型別替換高精度型別，使得即不損傷原資料精度，又可以降低整體的資料空間占用。

降低虛擬系統占用記憶體未解決

最近，在學linux驅動，安裝乙個ubuntu10.4系統在vmware中，分配了700mb左右的記憶體。可是有乙個問題，因為很多操作都是在字元命令介面下完成的，基本上很少用到圖形介面，於是，就用windowsxp下telnet通過vmware提供的虛擬網路直接連線虛擬系統，這樣一來，感覺linux...

GO物件對齊怎麼輕鬆降低記憶體占用

我們先看下面的 var a struct var b struct 看起來這兩個變數包含的字段一模一樣的，都是兩個byte和乙個int，那麼他們的大小相同嗎？我們不妨使用reflect包檢查一下，如下圖 typea reflect.typeof a typeb reflect.typeof b fm...

redis hash資料占用記憶體測試

對於hash或者普通string set 其實都是hash 記憶體占用明顯跟key的數量有莫大的關係，key對應的value長度對於記憶體占用影響不大 r redis.redis host 127.0.0.1 port 6379,db 0 for i in xrange 100000 r.hmset...

記憶體占用降低篇 輸入資料壓縮

降低虛擬系統占用記憶體 未解決

GO物件對齊 怎麼輕鬆降低記憶體占用

redis hash資料占用記憶體測試

相關推薦

記憶體占用降低篇輸入資料壓縮

降低虛擬系統占用記憶體未解決

GO物件對齊怎麼輕鬆降低記憶體占用