VOC格式資料轉COCO格式

博主使用labelme對目標檢測資料進行標註的時候，標註的結果只能選擇是voc格式的檔案。但是，現在新的網路一般都是用coco格式的資料進行執行的。所以，乙個voc轉coco格式的**就變得非常重要了。這篇部落格主要是為了資料增廣的那篇做引用的，所以原理就不說了，直接上**吧。

在實際執行執行的時候，資料有多少類，就把pre_define_categories中的值按照字典鍵值對應的方式修改，然後修改main部分的路徑為資料集的路徑就可以啦。

還有，該**是參考其他大佬的，但是由於年代久遠，我也找不到原博主的鏈結了。如果有冒犯，我會盡快刪除的。

import sys
import os
import json
import xml.etree.elementtree as et
start_bounding_box_id =
1pre_define_categories =
defget
(root, name)
:vars
= root.findall(name)
return
vars
defget_and_check
(root, name, length)
:vars
= root.findall(name)
iflen
(vars)==
0:raise notimplementederror(
'can not find %s in %s.'
%(name, root.tag)
)if length >
0and
len(
vars
)!= length:
raise notimplementederror(
'the size of %s is supposed to be %d, but is %d.'
%(name, length,
len(
vars))
)if length ==1:
vars
=vars[0
]return
vars
defget_filename_as_int
(filename)
:try
:        filename = os.path.splitext(filename)[0
]return
int(filename)
except
:raise notimplementederror(
'filename %s is supposed to be an integer.'
%(filename)
)def
convert
(xml_dir, json_file)
:    xmlfiles = os.listdir(xml_dir)
json_dict =
categories = pre_define_categories
bnd_id = start_bounding_box_id
num =
0for line in xmlfiles:
#         print("processing %s"%(line))
num +=
1if num %
50==0:
print
("processing "
, num,
"; file "
, line)
xml_f = os.path.join(xml_dir, line)
tree = et.parse(xml_f)
root = tree.getroot(
)## the filename must be a number
filename = line[:-
4]image_id = get_filename_as_int(filename)
size = get_and_check(root,
'size',1
)        width =
int(get_and_check(size,
'width',1
).text)
height =
int(get_and_check(size,
'height',1
).text)
# image = 
image =
json_dict[
'images'
]## cruuently we do not support segmentation
#  segmented = get_and_check(root, 'segmented', 1).text
#  assert segmented == '0'
for obj in get(root,
'object'):
category = get_and_check(obj,
'name',1
).text
if category not
in categories:
new_id =
len(categories)
categories[category]
= new_id
category_id = categories[category]
bndbox = get_and_check(obj,
'bndbox',1
)            xmin =
int(get_and_check(bndbox,
'xmin',1
).text)-1
ymin =
int(get_and_check(bndbox,
'ymin',1
).text)-1
xmax =
int(get_and_check(bndbox,
'xmax',1
).text)
ymax =
int(get_and_check(bndbox,
'ymax',1
).text)
assert
(xmax > xmin)
assert
(ymax > ymin)
o_width =
abs(xmax - xmin)
o_height =
abs(ymax - ymin)
ann =
json_dict[
'annotations'
]            bnd_id = bnd_id +
1for cate, cid in categories.items():
cat =
json_dict[
'categories'
]    json_fp =
open
(json_file,
'w')
json_str = json.dumps(json_dict)
json_fp.write(json_str)
json_fp.close(
)'''
在生成coco格式的annotations檔案之前:
1.執行renamedata.py對xml和jpg統一命名；
2.3.執行splitdata方法，切分好對應的train/val/test資料集
'''if __name__ ==
'__main__'
:    folder_list =
["train"
,"val"
,"test"
]# 注意更改base_dir為本地實際影象和標註檔案路徑
base_dir =
"data/coco/"
for i in
range(3
):foldername = folder_list[i]
xml_dir = base_dir + foldername +
"/annotations/"
json_dir = base_dir + foldername +
"/instances_"
+ foldername +
".json"
print
("deal: "
, foldername)
print
("xml dir: "
, xml_dir)
print
("json file: "
, json_dir)
convert(xml_dir, json_dir)

coco分割資料集轉voc格式

coco資料中的ploygon即為標註資料，兩個相連數字為乙個座標而voc的分割標註直接為png的8位偽彩色圖，通過呼叫調色盤來顯示色彩。因此，要把分割資料整理為voc格式，通過以下步驟第一，在原圖中繪製目標輪廓並填充，需要注意的是，一般我們的資料都是32位rgb彩色圖，因此，首先需要將32位r...

VOC格式資料集轉YOLO格式資料集

voc使用xml來描述標註，而yolo使用txt格式檔案，導致voc格式資料集無法直接拿來訓練yolo，這就需要轉換格式。為了不重複造輪子，我們使用convert2yolo來進行轉換。python3 example.py datasets voc img path downloads voc2028...

VOC資料集格式介紹

深度學習很多框架都在使用voc資料集，所以先來研究一下voc資料集的具體內容。以pascal voc2017為例，它包含如下5個資料夾 pascal voc提供的所有的，其中包括訓練，測試。存放xml格式的標籤檔案，每個xml對應jpegimage中的一張。可使用labelimg進行標註和檢視。影象...

VOC格式資料轉COCO格式

coco分割資料集轉voc格式

VOC格式資料集轉YOLO格式資料集

VOC資料集格式介紹

相關推薦