How to use Mask RCNN

Installing IceVision

We ussually install IceVision with [all], but we can also use [inference] to install only the packages that inference methods depend on.

!pip install icevision[all] icedata

Imports

from icevision.all import *

Data

We'll be using the Penn-Fudan dataset, which is already available under datasets.

data_dir = icedata.pennfudan.load_data()
class_map = icedata.pennfudan.class_map()

As usual, let's create the parser and perfom a random data split.

parser = icedata.pennfudan.parser(data_dir)

train_records, valid_records = parser.parse()

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=170.0), HTML(value='')))

Let's use the usual aug_tfms for training transforms with two small modifications: - Decrease the rotation limit from 45 to 10. - Use a more aggresive crop function.

shift_scale_rotate = tfms.A.ShiftScaleRotate(rotate_limit=10)
crop_fn = partial(tfms.A.RandomSizedCrop, min_max_height=(384//2, 384), p=.5)
train_tfms = tfms.A.Adapter(
    [
        *tfms.A.aug_tfms(size=384, presize=512, shift_scale_rotate=shift_scale_rotate, crop_fn=crop_fn),
        tfms.A.Normalize(),
    ]
)

And for validation transforms, the simple resize_and_pad.

valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(size=348), tfms.A.Normalize()])

Now we can create the Dataset and take a look on how the images look after the transforms.

train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

samples = [train_ds[1] for _ in range(6)]
show_samples(samples, denormalize_fn=denormalize_imagenet, ncols=3, display_label=False, show=True)

png

Now we're ready to create the DataLoaders:

train_dl = mask_rcnn.train_dl(train_ds, batch_size=16, shuffle=True, num_workers=4)
valid_dl = mask_rcnn.valid_dl(valid_ds, batch_size=16, shuffle=False, num_workers=4)

Metrics

Metrics are a work in progress for Mask RCNN.

# metrics = [COCOMetric(COCOMetricType.mask)]

Model

Similarly to faster_rcnn, we just need the num_classes to create a Mask RCNN model.

model = mask_rcnn.model(num_classes=len(class_map))

Training - fastai

We just need to create the learner and fine-tune.

Optional

You can use learn.lr_find() for finding a good learning rate.

learn = mask_rcnn.fastai.learner(dls=[train_dl, valid_dl], model=model)

learn.fine_tune(10, 5e-4, freeze_epochs=2)

epoch	train_loss	valid_loss	time
0	1.609195	0.804607	00:19
1	1.115979	0.536064	00:15

epoch	train_loss	valid_loss	time
0	0.777911	0.435238	00:22
1	0.618750	0.363259	00:19
2	0.555032	0.341701	00:18
3	0.507919	0.335720	00:19
4	0.486197	0.330338	00:19
5	0.450105	0.278701	00:19
6	0.426801	0.280237	00:18
7	0.417322	0.296473	00:16
8	0.411688	0.289382	00:17
9	0.401172	0.283575	00:17

Visualize predictions

Let's grab some images from valid_ds to visualize. For more info on how to do inference, check the inference tutorial.

mask_rcnn.show_results(model, valid_ds, class_map=class_map)

png

Happy Learning!

If you need any assistance, feel free to join our forum.