pipert.utils.structures¶
Submodules¶
Package Contents¶
Classes¶
This structure stores a list of boxes as a Nx4 torch.Tensor. |
|
Enum of different ways to represent a box. |
|
Structure that holds a list of images (of possibly |
|
This class represents a list of instances in an image. |
|
Stores keypoint annotation data. GT Instances have a gt_keypoints property |
|
This class stores the segmentation masks for all objects in one image, in the form of polygons. |
Functions¶
|
Given two lists of boxes of size N and M, |
|
Args: |
|
Rasterize the polygons into a mask image and |
-
class
pipert.utils.structures.Boxes(tensor: torch.Tensor)[source]¶ This structure stores a list of boxes as a Nx4 torch.Tensor. It supports some common methods about boxes (area, clip, nonempty, etc), and also behaves like a Tensor (support indexing, to(device), .device, and iteration over all boxes)
- Attributes:
tensor: float matrix of Nx4.
-
BoxSizeType¶
-
clone(self) → pipert.utils.structures.boxes.Boxes¶ Clone the Boxes.
- Returns:
Boxes
-
to(self, device: str) → pipert.utils.structures.boxes.Boxes¶
-
area(self) → torch.Tensor¶ Computes the area of all the boxes.
- Returns:
torch.Tensor: a vector with areas of each box.
-
clip(self, box_size: BoxSizeType) → None¶ Clip (in place) the boxes by limiting x coordinates to the range [0, width] and y coordinates to the range [0, height].
- Args:
box_size (height, width): The clipping box’s size.
-
nonempty(self, threshold: int = 0) → torch.Tensor¶ Find boxes that are non-empty. A box is considered empty, if either of its side is no larger than threshold.
- Returns:
- Tensor:
a binary vector which represents whether each box is empty (False) or non-empty (True).
-
__getitem__(self, item: Union[int, slice, torch.BoolTensor]) → pipert.utils.structures.boxes.Boxes¶ - Returns:
Boxes: Create a new
Boxesby indexing.
The following usage are allowed: 1. new_boxes = boxes[3]: return a Boxes which contains only one box. 2. new_boxes = boxes[2:10]: return a slice of boxes. 3. new_boxes = boxes[vector], where vector is a torch.BoolTensor
with length = len(boxes). Nonzero elements in the vector will be selected.
Note that the returned Boxes might share storage with this Boxes, subject to Pytorch’s indexing semantics.
-
inside_box(self, box_size: BoxSizeType, boundary_threshold: int = 0) → torch.Tensor¶ - Args:
box_size (height, width): Size of the reference box. boundary_threshold (int): Boxes that extend beyond the reference box
boundary by more than boundary_threshold are considered “outside”.
- Returns:
a binary vector, indicating whether each box is inside the reference box.
-
get_centers(self) → torch.Tensor¶ Returns: The box centers in a Nx2 array of (x, y).
-
scale(self, scale_x: float, scale_y: float) → None¶ Scale the box with horizontal and vertical scaling factors
-
static
cat(boxes_list: List[‘Boxes’]) → pipert.utils.structures.boxes.Boxes¶ Concatenates a list of Boxes into a single Boxes
- Arguments:
boxes_list (list[Boxes])
- Returns:
Boxes: the concatenated Boxes
-
__iter__(self) → Iterator[torch.Tensor]¶ Yield a box as a Tensor of shape (4,) at a time.
-
class
pipert.utils.structures.BoxMode[source]¶ Bases:
enum.EnumEnum of different ways to represent a box.
Attributes:
- XYXY_ABS: (x0, y0, x1, y1) in absolute floating points coordinates.
The coordinates in range [0, width or height].
XYWH_ABS: (x0, y0, w, h) in absolute floating points coordinates. XYXY_REL: (x0, y0, x1, y1) in range [0, 1]. They are relative to the size of the image. XYWH_REL: (x0, y0, w, h) in range [0, 1]. They are relative to the size of the image.
-
XYXY_ABS= 0¶
-
XYWH_ABS= 1¶
-
XYXY_REL= 2¶
-
XYWH_REL= 3¶
-
static
convert(box: _RawBoxType, from_mode: pipert.utils.structures.boxes.BoxMode, to_mode: pipert.utils.structures.boxes.BoxMode) → _RawBoxType¶ - Args:
box: can be a 4-tuple, 4-list or a Nx4 array/tensor. from_mode, to_mode (BoxMode)
- Returns:
The converted box of the same type.
-
pipert.utils.structures.pairwise_iou(boxes1: pipert.utils.structures.boxes.Boxes, boxes2: pipert.utils.structures.boxes.Boxes) → torch.Tensor[source]¶ Given two lists of boxes of size N and M, compute the IoU (intersection over union) between __all__ N x M pairs of boxes. The box order must be (xmin, ymin, xmax, ymax).
- Args:
boxes1,boxes2 (Boxes): two Boxes. Contains N & M boxes, respectively.
- Returns:
Tensor: IoU, sized [N,M].
-
class
pipert.utils.structures.ImageList(tensor: torch.Tensor, image_sizes: List[Tuple[int, int]])[source]¶ Bases:
objectStructure that holds a list of images (of possibly varying sizes) as a single tensor. This works by padding the images to the same size, and storing in a field the original sizes of each image
- Attributes:
image_sizes (list[tuple[int, int]]): each tuple is (h, w)
-
__getitem__(self, idx: Union[int, slice]) → torch.Tensor¶ Access the individual image in its original size.
- Returns:
Tensor: an image of shape (H, W) or (C_1, …, C_K, H, W) where K >= 1
-
to(self, *args: Any, **kwargs: Any) → pipert.utils.structures.image_list.ImageList¶
-
static
from_tensors(tensors: Sequence[torch.Tensor], size_divisibility: int = 0, pad_value: float = 0.0) → pipert.utils.structures.image_list.ImageList¶ - Args:
- tensors: a tuple or list of torch.Tensors, each of shape (Hi, Wi) or
(C_1, …, C_K, Hi, Wi) where K >= 1. The Tensors will be padded with pad_value so that they will have the same shape.
- size_divisibility (int): If size_divisibility > 0, also adds padding to ensure
the common height and width is divisible by size_divisibility
pad_value (float): value to pad
- Returns:
an ImageList.
-
class
pipert.utils.structures.Instances(image_size: Tuple[int, int], **kwargs: Any)[source]¶ This class represents a list of instances in an image. It stores the attributes of instances (e.g., boxes, masks, labels, scores) as “fields”. All fields must have the same __len__ which is the number of instances.
All other (non-field) attributes of this class are considered private: they must start with ‘_’ and are not modifiable by a user.
Some basic usage:
Set/Get a field: instances.gt_boxes = Boxes(…) print(instances.pred_masks) print(‘gt_masks’ in instances)
len(instances) returns the number of instances
Indexing: instances[indices] will apply the indexing on all the fields and returns a new Instances. Typically, indices is a binary vector of length num_instances, or a vector of integer indices.
-
set(self, name: str, value: Any) → None¶ Set the field named name to value. The length of value must be the number of instances, and must agree with other existing fields in this object.
-
get_fields(self) → Dict[str, Any]¶ - Returns:
dict: a dict which maps names (str) to data of the fields
Modifying the returned dict will modify this instance.
-
to(self, device: str) → pipert.utils.structures.instances.Instances¶ Returns: Instances: all fields are called with a to(device), if the field has this method.
-
__getitem__(self, item: Union[int, slice, torch.BoolTensor]) → pipert.utils.structures.instances.Instances¶ - Args:
item: an index-like object and will be used to index all the fields.
- Returns:
If item is a string, return the data in the corresponding field. Otherwise, returns an Instances where all fields are indexed by item.
-
static
cat(instance_lists: List[‘Instances’]) → pipert.utils.structures.instances.Instances¶ - Args:
instance_lists (list[Instances])
- Returns:
Instances
-
class
pipert.utils.structures.Keypoints(keypoints: Union[torch.Tensor, np.ndarray, List[List[float]]])[source]¶ Stores keypoint annotation data. GT Instances have a gt_keypoints property containing the x,y location and visibility flag of each keypoint. This tensor has shape (N, K, 3) where N is the number of instances and K is the number of keypoints per instance.
The visibility flag follows the COCO format and must be one of three integers: * v=0: not labeled (in which case x=y=0) * v=1: labeled but not visible * v=2: labeled and visible
-
to(self, *args: Any, **kwargs: Any) → pipert.utils.structures.keypoints.Keypoints¶
-
to_heatmap(self, boxes: torch.Tensor, heatmap_size: int) → torch.Tensor¶ - Arguments:
boxes: Nx4 tensor, the boxes to draw the keypoints to
- Returns:
- heatmaps:
A tensor of shape (N, K) containing an integer spatial label in the range [0, heatmap_size**2 - 1] for each keypoint in the input.
- valid:
A tensor of shape (N, K) containing whether each keypoint is in the roi or not.
-
__getitem__(self, item: Union[int, slice, torch.BoolTensor]) → pipert.utils.structures.keypoints.Keypoints¶ Create a new Keypoints by indexing on this Keypoints.
The following usage are allowed:
new_kpts = kpts[3]: return a Keypoints which contains only one instance.
new_kpts = kpts[2:10]: return a slice of key points.
new_kpts = kpts[vector], where vector is a torch.ByteTensor with length = len(kpts). Nonzero elements in the vector will be selected.
Note that the returned Keypoints might share storage with this Keypoints, subject to Pytorch’s indexing semantics.
-
-
pipert.utils.structures.heatmaps_to_keypoints(maps: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶ - Args:
maps (Tensor): (#ROIs, #keypoints, POOL_H, POOL_W) rois (Tensor): (#ROIs, 4)
Extract predicted keypoint locations from heatmaps. Output has shape (#rois, #keypoints, 4) with the last dimension corresponding to (x, y, logit, prob) for each keypoint.
Converts a discrete image coordinate in an NxN image to a continuous keypoint coordinate. We maintain consistency with keypoints_to_heatmap by using the conversion from Heckbert 1990: c = d + 0.5, where d is a discrete coordinate and c is a continuous coordinate.
-
class
pipert.utils.structures.PolygonMasks(polygons: List[List[Union[torch.Tensor, np.ndarray]]])[source]¶ This class stores the segmentation masks for all objects in one image, in the form of polygons.
- Attributes:
polygons: list[list[ndarray]]. Each ndarray is a float64 vector representing a polygon.
-
to(self, *args: Any, **kwargs: Any) → pipert.utils.structures.masks.PolygonMasks¶
-
get_bounding_boxes(self) → pipert.utils.structures.boxes.Boxes¶ Returns: Boxes: tight bounding boxes around polygon masks.
-
nonempty(self) → torch.Tensor¶ Find masks that are non-empty.
- Returns:
- Tensor:
a BoolTensor which represents whether each mask is empty (False) or not (True).
-
__getitem__(self, item: Union[int, slice, List[int], torch.BoolTensor]) → pipert.utils.structures.masks.PolygonMasks¶ Support indexing over the instances and return a PolygonMasks object. item can be:
An integer. It will return an object with only one instance.
A slice. It will return an object with the selected instances.
A list[int]. It will return an object with the selected instances, correpsonding to the indices in the list.
A vector mask of type BoolTensor, whose length is num_instances. It will return an object with the instances whose mask is nonzero.
-
__iter__(self) → Iterator[List[torch.Tensor]]¶ Yields: list[ndarray]: the polygons for one instance. Each Tensor is a float64 vector representing a polygon.
-
crop_and_resize(self, boxes: torch.Tensor, mask_size: int) → torch.Tensor¶ Crop each mask by the given box, and resize results to (mask_size, mask_size). This can be used to prepare training targets for Mask R-CNN.
- Args:
boxes (Tensor): Nx4 tensor storing the boxes for each mask mask_size (int): the size of the rasterized mask.
- Returns:
Tensor: A bool tensor of shape (N, mask_size, mask_size), where N is the number of predicted boxes for this image.
-
pipert.utils.structures.rasterize_polygons_within_box(polygons: List[np.ndarray], box: numpy.ndarray, mask_size: int) → torch.Tensor[source]¶ Rasterize the polygons into a mask image and crop the mask content in the given box. The cropped mask is resized to (mask_size, mask_size).
This function is used when generating training targets for mask head in Mask R-CNN. Given original ground-truth masks for an image, new ground-truth mask training targets in the size of mask_size x mask_size must be provided for each predicted box. This function will be called to produce such targets.
- Args:
polygons (list[ndarray[float]]): a list of polygons, which represents an instance. box: 4-element numpy array mask_size (int):
- Returns:
Tensor: BoolTensor of shape (mask_size, mask_size)