python將arff檔案轉為csv檔案

資料集有可能是以arff格式(weka用的)儲存，一般的機器學習使用numpy，pandas和sklearn多一些，無法直接讀取檔案，所以需要scipy.io.arff.loadarff過渡下。

from scipy.io import arff
import pandas as pd 
file_name=
'/users/schillerxu/documents/sourcecode/python/pandas/cm1.arff'
data,meta=arff.loadarff(file_name)
#print(data)
print
(meta)
df=pd.dataframe(data)
print
(df.head())
#print(df)
#儲存為csv檔案
# out_file='/users/schillerxu/documents/sourcecode/python/pandas/cm1.csv'
# output=pd.dataframe(df)
# output.to_csv(out_file,index=false)

程式執行的結果如下：

[running] python -u "/users/schillerxu/documents/sourcecode/python/pandas/arff_to_csv.py" dataset: cm1 loc_blank's type is numeric branch_count's type is numeric call_pairs's type is numeric loc_code_and_comment's type is numeric loc_comments's type is numeric condition_count's type is numeric cyclomatic_complexity's type is numeric cyclomatic_density's type is numeric decision_count's type is numeric decision_density's type is numeric design_complexity's type is numeric design_density's type is numeric edge_count's type is numeric essential_complexity's type is numeric essential_density's type is numeric loc_executable's type is numeric parameter_count's type is numeric halstead_content's type is numeric halstead_difficulty's type is numeric halstead_effort's type is numeric halstead_error_est's type is numeric halstead_length's type is numeric halstead_level's type is numeric halstead_prog_time's type is numeric halstead_volume's type is numeric maintenance_severity's type is numeric modified_condition_count's type is numeric multiple_condition_count's type is numeric node_count's type is numeric normalized_cylomatic_complexity's type is numeric num_operands's type is numeric num_operators's type is numeric num_unique_operands's type is numeric num_unique_operators's type is numeric number_of_lines's type is numeric percent_comments's type is numeric loc_total's type is numeric defective's type is nominal, range is ( 'y', 'n' ) loc_blank branch_count call_pairs ... percent_comments loc_total defective 0 6.0 9.0 2.0 ... 4.00 25.0 b'n' 1 15.0 7.0 3.0 ... 39.22 32.0 b'y' 2 27.0 9.0 1.0 ... 47.27 33.0 b'y' 3 7.0 3.0 2.0 ... 0.00 12.0 b'n' 4 51.0 25.0 13.0 ... 11.67 106.0 b'n' [5 rows x 38 columns]

[done] exited with code=0 in 0.664 seconds

可以明顯看到meta儲存的是資料集的基本資訊。

python載入arff檔案

python將arff檔案轉為csv檔案

生成arff檔案，csv轉為arff

python將nc檔案轉為tiff

python3 將pdf檔案轉為text

python將arff檔案轉為csv檔案

生成arff檔案，csv轉為arff

python將nc檔案轉為tiff

python3 將pdf檔案轉為text

相關推薦