更新

直接Colab

数据集

我已经创建了一个虚拟数据集.链接

它总共有 4类和 df.object.value_counts():

  human 2313号车猫 5狗3

数据加载器和马赛克增强

数据加载器定义如下.但是，应该在内部定义镶嵌增强功能，但现在，我将创建一个单独的代码段以进行更好的演示.

 IMG_SIZE = 2000DatasetRetriever(Dataset)类:def __init__(self, main_df, image_ids, transforms=None, test=False):super().__ init __()self.image_ids = image_idsself.main_df = main_dfself.transforms =转换self.size_limit = 1self.test =测试def __getitem __(self，index:int):image_id = self.image_ids [index]图片，盒子，标签= self.load_mosaic_image_and_boxes(index)#标签= torch.tensor(labels，dtype = torch.int64)#多类labels = torch.ones((boxes.shape [0]，)，dtype = torch.int64)#单类目标= {}target ['boxes'] =盒子target ['cls'] =标签target ['image_id'] = torch.tensor([index])如果self.transforms:对于我在范围(10)中:样本= self.transforms(** {'图像':图像，'bboxes':target ['boxes']，标签":target ['cls']})断言len(sample ['bboxes'])== target ['cls'].shape [0]，不相等！"如果len(sample ['bboxes'])>0:# 图像图片=样本['图片']# 盒子target ['boxes'] = torch.tensor(sample ['bboxes'])target ['boxes'] [:，[0,1,2,3]] = target ['boxes'] [:，[1,0,3,2]]# 标签target ['cls'] = torch.stack(sample ['labels'])休息返回图像，目标def __len __(self)->整数:返回self.image_ids.shape [0]

基本转换

  def get_transforms():返回A.Compose([A.Resize(高度= IMG_SIZE，宽度= IMG_SIZE，p = 1.0)，ToTensorV2(p = 1.0)，]，p = 1.0，bbox_params = A.BboxParams(格式='pascal_voc'，min_area = 0，min_visibility = 0，label_fields = ['标签']))

马赛克增强

注意，它应该在数据加载器内部定义.主要问题是，在此扩充中，虽然将迭代所有 4 个样本以创建此类扩充，但图像和 bounding_box 的缩放比例如下:

  mosaic_image [y1a:y2a，x1a:x2a] =图像[y1b:y2b，x1b:x2b]offset_x = x1a-x1boffset_y = y1a-y1b框[:，0] + = offset_x框[:，1] + = offset_y框[:，2] + = offset_x框[:，3] + = offset_y

通过这种方式，我该如何为那些选择的 bounding_box 选择相关的类别标签?请查看下面的完整代码:

  def load_mosaic_image_and_boxes(self，index，s = 3000，minfrac = 0.25，maxfrac = 0.75):self.mosaic_size = sxc，yc = np.random.randint(s * minfrac，s * maxfrac，(2，))#随机其他3个样本索引= [索引] + random.sample(range(len(self.image_ids))，3)mosaic_image = np.zeros((s，s，3)，dtype = np.float32)final_boxes = []子区域的框final_labels = []#个相关的类别标签对于我，在枚举(索引)中的索引:图片，盒子，标签= self.load_image_and_boxes(index)if i == 0: # 左上角x1a，y1a，x2a，y2a = 0、0，xc，ycx1b，y1b，x2b，y2b = s-xc，s-yc，s，s#从右下角开始elif i == 1:#右上x1a，y1a，x2a，y2a = xc，0，s，ycx1b，y1b，x2b，y2b = 0，s-yc，s-xc，s#从左下角开始Elif I == 2:#左下x1a，y1a，x2a，y2a = 0，yc，xc，sx1b，y1b，x2b，y2b = s-xc，0，s，s-yc#从右上方开始elif i == 3:#右下x1a，y1a，x2a，y2a = xc，yc，s，sx1b，y1b，x2b，y2b = 0，0，s-xc，s-yc#从左上方开始#计算并应用由于替换而引起的框偏移offset_x = x1a-x1boffset_y = y1a-y1b框[:，0] + = offset_x框[:，1] + = offset_y框[:，2] + = offset_x盒子[:, 3] += offset_y#剪切图像，保存框mosaic_image [y1a:y2a，x1a:x2a] =图像[y1b:y2b，x1b:x2b]final_boxes.append(boxes)'''注意力:需要一些机制来获取相关的类标签'''final_labels.append(标签)#收集箱final_boxes = np.vstack(final_boxes)final_labels = np.hstack(final_labels)#剪辑框到图像区域final_boxes [:, 0:] = np.clip(final_boxes [:, 0:]，0，s).astype(np.int32)w =(final_boxes [:，2]-final_boxes [:，0])h =(final_boxes [:，3]-final_boxes [:，1])#丢弃w或h

就是这样.希望我能清楚地说明我的问题.您的建议将不胜感激.

通过此查询，我还更新了几天前提出的另一个非常相关查询，但没有得到足够的答复.我也更新了该查询，并使其更加清晰.如果您有兴趣，请链接:

`Update`

Direct Colab Link. Just grab the given dummy data set and load it to colab.

I'm trying to train an object detection model for a multi-class problem. In my training, I am using the Mosaic augmentation, Paper, for this task.

In my training mechanism, I'm a bit stuck to properly retrieve the class labels of each category, as the augmentation mechanism randomly picks the sub-portion of a sample. However, below is a result of a mosaic augmentation that we've achieved with a relevant bounding box until now.

`Data Set`

I've created a dummy data set. Link here. The df.head():

It has 4 class in total and df.object.value_counts():

human    23
car      13
cat       5
dog       3

`Data Loader and Mosaic Augmentation`

The data loader is defined as follows. However, the mosaic augmentation should be defined inside but for now, I'll create a separate code snippet for better demonstration.


IMG_SIZE = 2000

class DatasetRetriever(Dataset):

    def __init__(self, main_df, image_ids, transforms=None, test=False):
        super().__init__()

        self.image_ids = image_ids
        self.main_df = main_df
        self.transforms = transforms
        self.size_limit = 1
        self.test = test

    def __getitem__(self, index: int):
        image_id = self.image_ids[index]
        image, boxes, labels = self.load_mosaic_image_and_boxes(index)

        # labels = torch.tensor(labels, dtype=torch.int64) # for multi-class
        labels = torch.ones((boxes.shape[0],), dtype=torch.int64) # for single-class

        target = {}
        target['boxes'] = boxes
        target['cls'] = labels
        target['image_id'] = torch.tensor([index])

        if self.transforms:
            for i in range(10):
                sample = self.transforms(**{
                    'image' : image,
                    'bboxes': target['boxes'],
                    'labels': target['cls']
                })

                assert len(sample['bboxes']) == target['cls'].shape[0], 'not equal!'
                if len(sample['bboxes']) > 0:
                    # image
                    image = sample['image']

                    # box
                    target['boxes'] = torch.tensor(sample['bboxes'])
                    target['boxes'][:,[0,1,2,3]] = target['boxes'][:,[1,0,3,2]]

                    # label
                    target['cls'] = torch.stack(sample['labels'])
                    break

        return image, target

    def __len__(self) -> int:
        return self.image_ids.shape[0]

Basic Transform

def get_transforms():
    return A.Compose(
        [
            A.Resize(height=IMG_SIZE, width=IMG_SIZE, p=1.0),
            ToTensorV2(p=1.0),
        ],
        p=1.0,
        bbox_params=A.BboxParams(
            format='pascal_voc',
            min_area=0,
            min_visibility=0,
            label_fields=['labels']
        )
    )

Mosaic Augmentation

Note, It should be defined inside the data loader. The main issue is, in this augmentation, while iterating will all 4 samples to create such augmentation, image and bounding_box is rescaled as follows:

mosaic_image[y1a:y2a, x1a:x2a] = image[y1b:y2b, x1b:x2b]

offset_x = x1a - x1b
offset_y = y1a - y1b
boxes[:, 0] += offset_x
boxes[:, 1] += offset_y
boxes[:, 2] += offset_x
boxes[:, 3] += offset_y

In this way, how would I select the relevant class labels for those selected bounding_box? Please, see the full code below:

def load_mosaic_image_and_boxes(self, index, s=3000,
                                    minfrac=0.25, maxfrac=0.75):
        self.mosaic_size = s
        xc, yc = np.random.randint(s * minfrac, s * maxfrac, (2,))

        # random other 3 sample
        indices = [index] + random.sample(range(len(self.image_ids)), 3)

        mosaic_image = np.zeros((s, s, 3), dtype=np.float32)
        final_boxes  = [] # box for the sub-region
        final_labels = [] # relevant class labels

        for i, index in enumerate(indices):
            image, boxes, labels = self.load_image_and_boxes(index)

            if i == 0:    # top left
                x1a, y1a, x2a, y2a =  0,  0, xc, yc
                x1b, y1b, x2b, y2b = s - xc, s - yc, s, s # from bottom right
            elif i == 1:  # top right
                x1a, y1a, x2a, y2a = xc, 0, s , yc
                x1b, y1b, x2b, y2b = 0, s - yc, s - xc, s # from bottom left
            elif i == 2:  # bottom left
                x1a, y1a, x2a, y2a = 0, yc, xc, s
                x1b, y1b, x2b, y2b = s - xc, 0, s, s-yc   # from top right
            elif i == 3:  # bottom right
                x1a, y1a, x2a, y2a = xc, yc,  s, s
                x1b, y1b, x2b, y2b = 0, 0, s-xc, s-yc    # from top left

            # calculate and apply box offsets due to replacement
            offset_x = x1a - x1b
            offset_y = y1a - y1b
            boxes[:, 0] += offset_x
            boxes[:, 1] += offset_y
            boxes[:, 2] += offset_x
            boxes[:, 3] += offset_y

            # cut image, save boxes
            mosaic_image[y1a:y2a, x1a:x2a] = image[y1b:y2b, x1b:x2b]
            final_boxes.append(boxes)

            '''
            ATTENTION:
            Need some mechanism to get relevant class labels
            '''
            final_labels.append(labels)

        # collect boxes
        final_boxes  = np.vstack(final_boxes)
        final_labels = np.hstack(final_labels)

        # clip boxes to the image area
        final_boxes[:, 0:] = np.clip(final_boxes[:, 0:], 0, s).astype(np.int32)
        w = (final_boxes[:,2] - final_boxes[:,0])
        h = (final_boxes[:,3] - final_boxes[:,1])

        # discard boxes where w or h <10
        final_boxes = final_boxes[(w>=self.size_limit) & (h>=self.size_limit)]

        return mosaic_image, final_boxes, final_labels

That's it. I hope, I make my query clear. Your suggestion would be highly appreciated.

With this query, I've also update another very related query which I've asked a few days ago but didn't get enough response. I update that query too and make it more clear. In case you're interested, please, Link: Stratified K-Fold For Multi-Class Object Detection?

解决方案

`Solved -)`

The problem is solved. Initially, I thought it in a very hard way, However, all I just need to parse the bounding box and class label information at the same time. Jokes aside, I lost 100 bounties >_<, I should try one more time

Anyway, below is the output that we've achieved now. In case you're interested to try it with your own data set, here is the colab notebook for a starter. Happy coding -)

                        这篇关于如何从对象检测数据加载器中的镶嵌增强中获取类标签?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！