Protocol Documentation
Table of Contents
easy_vision/python/protos/anchor_generator.proto
Top
AnchorGenerator
Configuration proto for the anchor generator to use in the object detection
pipeline. See core/anchor_generator.py for details.
GridAnchorGenerator
Configuration proto for GridAnchorGenerator. See
anchor_generators/grid_anchor_generator.py for details.
Field | Type | Label | Description |
height |
int32 |
optional |
Anchor height in pixels. Default: 256 |
width |
int32 |
optional |
Anchor width in pixels. Default: 256 |
height_stride |
int32 |
optional |
Anchor stride in height dimension in pixels. Default: 16 |
width_stride |
int32 |
optional |
Anchor stride in width dimension in pixels. Default: 16 |
height_offset |
int32 |
optional |
Anchor height offset in pixels. Default: 0 |
width_offset |
int32 |
optional |
Anchor width offset in pixels. Default: 0 |
scales |
float |
repeated |
List of scales for the anchors. |
aspect_ratios |
float |
repeated |
List of aspect ratios for the anchors. |
MultiscaleAnchorGenerator
Configuration proto for RetinaNet anchor generator described in
https://arxiv.org/abs/1708.02002. See
anchor_generators/multiscale_grid_anchor_generator.py for details.
Field | Type | Label | Description |
min_level |
int32 |
optional |
minimum level in feature pyramid Default: 2 |
max_level |
int32 |
optional |
maximum level in feature pyramid Default: 6 |
anchor_scale |
float |
optional |
Scale of anchor to feature stride 8 * 4 = 32 Default: 8 |
aspect_ratios |
float |
repeated |
Aspect ratios for anchors at each grid point. |
scales_per_octave |
int32 |
optional |
Number of intermediate scale each scale octave Default: 1 |
normalize_coordinates |
bool |
optional |
Whether to produce anchors in normalized coordinates. Default: true |
SsdAnchorGenerator
Configuration proto for SSD anchor generator described in
https://arxiv.org/abs/1512.02325. See
anchor_generators/multiple_grid_anchor_generator.py for details.
Field | Type | Label | Description |
num_layers |
int32 |
optional |
Number of grid layers to create anchors for. Default: 6 |
min_scale |
float |
optional |
Scale of anchors corresponding to finest resolution. Default: 0.2 |
max_scale |
float |
optional |
Scale of anchors corresponding to coarsest resolution Default: 0.95 |
scales |
float |
repeated |
Can be used to override min_scale->max_scale, with an explicitly defined
set of scales. If empty, then min_scale->max_scale is used. |
aspect_ratios |
float |
repeated |
Aspect ratios for anchors at each grid point. |
interpolated_scale_aspect_ratio |
float |
optional |
When this aspect ratio is greater than 0, then an additional
anchor, with an interpolated scale is added with this aspect ratio. Default: 1 |
interpolate_in_all_layers |
bool |
optional |
When set true, intepolate scale=sqrt(max_size*min_size), aspect_ratios=1.0
in all layers, otherwise lowest layer will be ignored Default: false |
reduce_boxes_in_lowest_layer |
bool |
optional |
Whether to use the following aspect ratio and scale combination for the
layer with the finest resolution : (scale=0.1, aspect_ratio=1.0),
(scale=min_scale, aspect_ration=2.0), (scale=min_scale, aspect_ratio=0.5). Default: true |
reduce_boxes_in_larger_layers |
bool |
optional |
Default: false |
base_anchor_height |
float |
optional |
The base anchor size in height dimension. Default: 1 |
base_anchor_width |
float |
optional |
The base anchor size in width dimension. Default: 1 |
height_stride |
int32 |
repeated |
Anchor stride in height dimension in pixels for each layer. The length of
this field is expected to be equal to the value of num_layers. |
width_stride |
int32 |
repeated |
Anchor stride in width dimension in pixels for each layer. The length of
this field is expected to be equal to the value of num_layers. |
height_offset |
int32 |
repeated |
Anchor height offset in pixels for each layer. The length of this field is
expected to be equal to the value of num_layers. |
width_offset |
int32 |
repeated |
Anchor width offset in pixels for each layer. The length of this field is
expected to be equal to the value of num_layers. |
TemporalGridAnchorGenerator
Configuration proto for TemporalGridAnchorGenerator. See
anchor_generators/temporal_grid_anchor_generator.py for details.
Field | Type | Label | Description |
length |
int32 |
optional |
Anchor length in pixels. Default: 8 |
stride |
int32 |
optional |
Anchor stride in height dimension in pixels. Default: 8 |
offset |
int32 |
optional |
Anchor height offset in pixels. Default: 0 |
scales |
float |
repeated |
List of scales for the anchors. |
YOLOAnchorGenerator
Configuration proto for YOLOAnchorGenerator. See
anchor_generators/yolo_anchor_generator.py for details.
Field | Type | Label | Description |
anchor_group |
YOLOAnchorGenerator.AnchorGroup |
repeated |
List of Anchor groups, the number of groups must be equal to
the number of feature maps |
YOLOAnchorGenerator.AnchorGroup
List of Anchors in one level feature map
YOLOAnchorGenerator.AnchorSize
Anchor width and height in pixels
Field | Type | Label | Description |
width |
int32 |
required |
|
height |
int32 |
required |
|
easy_vision/python/protos/argmax_matcher.proto
Top
ArgMaxMatcher
Configuration proto for ArgMaxMatcher. See
matchers/argmax_matcher.py for details.
Field | Type | Label | Description |
matched_threshold |
float |
optional |
Threshold for positive matches. Default: 0.5 |
unmatched_threshold |
float |
optional |
Threshold for negative matches. Default: 0.5 |
ignore_thresholds |
bool |
optional |
Whether to construct ArgMaxMatcher without thresholds. Default: false |
negatives_lower_than_unmatched |
bool |
optional |
If True then negative matches are the ones below the unmatched_threshold,
whereas ignored matches are in between the matched and umatched
threshold. If False, then negative matches are in between the matched
and unmatched threshold, and everything lower than unmatched is ignored. Default: true |
force_match_for_each_row |
bool |
optional |
Whether to ensure each row is matched to at least one column. Default: false |
use_matmul_gather |
bool |
optional |
Force constructed match objects to use matrix multiplication based gather
instead of standard tf.gather Default: false |
YOLOArgMaxMatcher
Configuration proto for YOLOArgMaxMatcher. See
matchers/argmax_matcher.py for details.
Field | Type | Label | Description |
matched_threshold |
float |
optional |
Threshold for positive matches. Default: 1 |
use_matmul_gather |
bool |
optional |
Force constructed match objects to use matrix multiplication based gather
instead of standard tf.gather Default: false |
easy_vision/python/protos/aspp_block.proto
Top
ASPPBlock
Field | Type | Label | Description |
image_level_features |
bool |
optional |
Default: true |
batchnorm_trainable |
bool |
optional |
Default: true |
weight_decay |
float |
optional |
Default: 0 |
feature_depth |
int32 |
required |
|
atrous_rates |
int32 |
repeated |
|
aspp_with_separable_conv |
bool |
optional |
Default: true |
keep_prob |
float |
optional |
dropout config, keep_prob of aspp out features Default: 1 |
easy_vision/python/protos/auto_compression.proto
Top
CompressConfig
Messages for configuring the strategy for auto compression
Field | Type | Label | Description |
compress_mode |
string |
optional |
Compression mode: one in [`prune`, `quantize`]. Defualt to be `prune`. Default: prune |
is_finetune |
bool |
optional |
Whether is finetuning from a compressed model. Default: false |
speedup_target |
float |
optional |
The target speedup ratio. Default to 1.0. Default: 1 |
pretrain_model |
string |
required |
Path to pretrained model. |
tune_mode |
string |
optional |
Auto tuning mode.
For `prune`, one in [`RL`, `random`, `uniform`].
For `quantize`, one in [`KL`, `LSQ`] Default: uniform |
num_trials |
int32 |
optional |
Number fo trials for the automatic search of compression strategy. Default: 10 |
interval_steps |
int32 |
optional |
Number of steps for re-training steps of each compressed model. Default: 1000 |
metric_key |
string |
required |
Metric key to evaluate the model performance. |
metric_mode |
string |
optional |
Metric mode. Which is better, one on [`bigger`, `smaller`].
Default to be `bigger`. Default: bigger |
prune_params |
CompressConfig.PruneHparams |
optional |
|
quant_params |
CompressConfig.QuantHparams |
optional |
|
CompressConfig.PruneHparams
Configuration message for hyper-parameter of auto compression under `prune`
Field | Type | Label | Description |
include_scopes |
string |
repeated |
Graph scopes that would to to be included. |
exclude_scopes |
string |
repeated |
Graph scopes that would to to be excluded. |
nb_iters_recon |
int32 |
optional |
Number of iterations for layer-reconstruction. Default to be 1000. Default: 1000 |
lr_pgd_init |
float |
optional |
Initial learning rate for layer-selection. Default to be 1e-10. Default: 1e-10 |
lr_pgd_incr |
float |
optional |
Learning rate increase ratio for layer-selection. Default to be 1.4. Default: 1.4 |
lr_pgd_decr |
float |
optional |
Learning rate decrease ratio for layer-selection. Default to be 0.7. Default: 0.7 |
lr_adam |
float |
optional |
Learning rate for layer-reconstruction with Adam. Default to be 1e-4. Default: 0.0001 |
prunable_types |
string |
repeated |
List of op types that can be pruned.
Subset of [`Conv2D`, `MatMul`] |
channel_base_mod |
int32 |
optional |
Base number for remained channels number as a multiple of it. Default to be 4. Default: 4 |
CompressConfig.QuantHparams
Configuration message for hyper-parameter of auto compression under `quantize`
Field | Type | Label | Description |
include_scopes |
string |
repeated |
Graph scopes that would to to be included. |
exclude_scopes |
string |
repeated |
Graph scopes that would to to be excluded. |
bits |
int32 |
optional |
Quantization bits options. Such as [4, 8, 16, 32]. Default: 4 |
int8_layers |
string |
repeated |
Layer to be kept in INT8 when using INT4 quantization. |
per_channel |
CompressConfig.QuantHparams.PerChannel |
optional |
Whether to use per-channel quant for specific op types. |
calibs |
int32 |
optional |
Number of data batches for calibration. |
calib_path |
string |
optional |
Path to save calibration scale file. |
CompressConfig.QuantHparams.PerChannel
Configuration message to set whether to use
per-channel quant for specific op types.
Field | Type | Label | Description |
Conv2D |
bool |
optional |
Whether to use per-channel quant for Conv2D. Default: true |
MatMul |
bool |
optional |
Whether to use per-channel quant for MatMul. Default: false |
easy_vision/python/protos/backbone.proto
Top
Backbone
Field | Type | Label | Description |
class_name |
string |
required |
backbone class name, such as resnet_v1_50 |
weight_decay |
float |
required |
weight decay factor Default: 0.0005 |
batchnorm_trainable |
bool |
optional |
if set False, batchnorm parameters and moving mean/std will not be update Default: true |
output_stride |
int32 |
optional |
currently only resnet backbone support this parameter
if output_stride is set greater than 0, when the product of each layer's stride
is equal to output_stride, the stride of the upper conv layers will be set to 1,
and use dilation conv instead. Default: -1 |
global_pool |
bool |
optional |
boolean flag to control the avgpooling before the
logits layer. If false or unset, pooling is done with a fixed window
that reduces default-sized inputs to 1x1, while larger inputs lead to
larger outputs. If true, any input size is pooled down to 1x1. Default: false |
depth_multiplier |
float |
optional |
depth_multiplier used only for mobilenet, which is used to adjust network for different
computation cost, please refer to https://arxiv.org/abs/1704.04861 Default: 1 |
use_true_shape |
bool |
optional |
when image are padded, use_true_shape need to be set true,
then network will use true shape to global pool Default: true |
use_fc |
bool |
optional |
use fc or not, when the number of parameters are too large in fully connect layers,
use this parameter to drop fc layer in finetune cases. Default: true |
norm_type |
NormType |
optional |
normalization layer type Default: BATCH |
connect_survival_prob |
float |
optional |
block connect survival prob when training in efficientnet, default is 0.8 |
dropout_keep_prob |
float |
optional |
keep prob for dropout Default: 1 |
param |
UserDefinedParam |
repeated |
user-defined args |
Block
Field | Type | Label | Description |
resnet_block |
ResnetBlock |
optional |
|
fc_block |
FCBlock |
optional |
|
block_func |
string |
optional |
|
batchnorm_trainable |
bool |
optional |
Default: false |
FCBlock
Field | Type | Label | Description |
fc_hyperparams |
Hyperparams |
required |
|
depth |
int32 |
required |
Default: 1024 |
num_layers |
int32 |
required |
Default: 2 |
ResnetBlock
Field | Type | Label | Description |
class_name |
string |
required |
model class name |
block_name |
string |
required |
model block name, e.g. block4 |
depth |
int32 |
optional |
deprecated |
depth_bottleneck |
int32 |
optional |
deprecated |
stride |
int32 |
optional |
stride of the block Default: 1 |
unit_num |
int32 |
optional |
deprecated |
weight_decay |
float |
optional |
weight decay factor Default: 0 |
NormType
Name | Number | Description |
NONE |
1 |
|
BATCH |
2 |
|
GROUP |
3 |
|
easy_vision/python/protos/bipartite_matcher.proto
Top
BipartiteMatcher
Configuration proto for bipartite matcher. See
matchers/bipartite_matcher.py for details.
Field | Type | Label | Description |
use_matmul_gather |
bool |
optional |
Force constructed match objects to use matrix multiplication based gather
instead of standard tf.gather Default: false |
easy_vision/python/protos/box_coder.proto
Top
BoxCoder
Configuration proto for the box coder to be used in the object detection
pipeline. See core/box_coder.py for details.
FasterRcnnBoxCoder
Configuration proto for FasterRCNNBoxCoder. See
box_coders/faster_rcnn_box_coder.py for details.
Field | Type | Label | Description |
y_scale |
float |
optional |
Scale factor for anchor encoded box center. Default: 10 |
x_scale |
float |
optional |
Default: 10 |
height_scale |
float |
optional |
Scale factor for anchor encoded box height. Default: 5 |
width_scale |
float |
optional |
Scale factor for anchor encoded box width. Default: 5 |
KeypointBoxCoder
Configuration proto for KeypointBoxCoder. See
box_coders/keypoint_box_coder.py for details.
Field | Type | Label | Description |
num_keypoints |
int32 |
optional |
|
y_scale |
float |
optional |
Scale factor for anchor encoded box center and keypoints. Default: 10 |
x_scale |
float |
optional |
Default: 10 |
height_scale |
float |
optional |
Scale factor for anchor encoded box height. Default: 5 |
width_scale |
float |
optional |
Scale factor for anchor encoded box width. Default: 5 |
MeanStddevBoxCoder
Configuration proto for MeanStddevBoxCoder. See
box_coders/mean_stddev_box_coder.py for details.
Field | Type | Label | Description |
stddev |
float |
optional |
The standard deviation used to encode and decode boxes. Default: 0.01 |
SquareBoxCoder
Configuration proto for SquareBoxCoder. See
box_coders/square_box_coder.py for details.
Field | Type | Label | Description |
y_scale |
float |
optional |
Scale factor for anchor encoded box center. Default: 10 |
x_scale |
float |
optional |
Default: 10 |
length_scale |
float |
optional |
Scale factor for anchor encoded box length. Default: 5 |
YOLOBoxCoder
Configuration proto for YOLOBoxCoder. See
box_coders/yolo_box_coder.py for details.
Field | Type | Label | Description |
y_scale |
float |
optional |
Scale factor for anchor encoded box center. Default: 1 |
x_scale |
float |
optional |
Default: 1 |
height_scale |
float |
optional |
Scale factor for anchor encoded box height. Default: 1 |
width_scale |
float |
optional |
Scale factor for anchor encoded box width. Default: 1 |
easy_vision/python/protos/box_predictor.proto
Top
BoxPredictor
Configuration proto for box predictor. See core/box_predictor.py for details.
Convolutional3DBoxPredictor
Configuration proto for Convolutional box predictor.
Field | Type | Label | Description |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters for convolution ops used in the box predictor. |
min_depth |
int32 |
optional |
Minimum feature map depth prior to predicting box encodings and class
predictions, used collaborately with Default: 0 |
max_depth |
int32 |
optional |
Maximum feature depth map prior to predicting box encodings and class
predictions. If max_depth is set to 0, no additional feature map will be
inserted before location and class predictions. Default: 0 |
num_layers_before_predictor |
int32 |
optional |
Number of the additional conv layers before the predictor. Default: 0 |
dropout_keep_probability |
float |
optional |
Keep probability for dropout Default: 1 |
kernel_size |
int32 |
optional |
Size of final convolution kernel. If the spatial resolution of the feature
map is smaller than the kernel size, then the kernel size is set to
min(feature_width, feature_height). Default: 1 |
box_code_size |
int32 |
optional |
Size of the encoding for boxes. Default: 2 |
class_prediction_bias_init |
float |
optional |
Default: 0 |
use_depthwise |
bool |
optional |
Whether to use depthwise separable convolution for box predictor layers. Default: false |
ConvolutionalBoxPredictor
Configuration proto for Convolutional box predictor.
Field | Type | Label | Description |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters for convolution ops used in the box predictor. |
min_depth |
int32 |
optional |
Minimum feature map depth prior to predicting box encodings and class
predictions, used collaborately with Default: 0 |
max_depth |
int32 |
optional |
Maximum feature depth map prior to predicting box encodings and class
predictions. If max_depth is set to 0, no additional feature map will be
inserted before location and class predictions. Default: 0 |
num_layers_before_predictor |
int32 |
optional |
Number of the additional conv layers before the predictor. Default: 0 |
dropout_keep_probability |
float |
optional |
Keep probability for dropout Default: 1 |
kernel_size |
int32 |
optional |
Size of final convolution kernel. If the spatial resolution of the feature
map is smaller than the kernel size, then the kernel size is set to
min(feature_width, feature_height). Default: 1 |
box_code_size |
int32 |
optional |
Size of the encoding for boxes. Default: 4 |
class_prediction_bias_init |
float |
optional |
Default: 0 |
use_depthwise |
bool |
optional |
Whether to use depthwise separable convolution for box predictor layers. Default: false |
MaskRCNN3DBoxPredictor
Field | Type | Label | Description |
fc_hyperparams |
Hyperparams |
optional |
Hyperparameters for fully connected ops used in the box predictor. |
num_layers_before_predictor |
int32 |
optional |
Number of the additional fc layers before the predictor. Default: 0 |
depth |
int32 |
optional |
Output depth for the fc ops prior to predicting box encodings
and class predictions. Default: 0 |
dropout_keep_probability |
float |
optional |
Keep probability for dropout. This is only used if use_dropout is true. Default: 1 |
box_code_size |
int32 |
optional |
Size of the encoding for the boxes. Default: 2 |
agnostic |
bool |
optional |
Whether to use one box for all classes rather than a different box for each
class. Default: true |
MaskRCNNBoxPredictor
Field | Type | Label | Description |
fc_hyperparams |
Hyperparams |
optional |
Hyperparameters for fully connected ops used in the box predictor. |
num_layers_before_predictor |
int32 |
optional |
Number of the additional fc layers before the predictor. Default: 0 |
depth |
int32 |
optional |
Output depth for the fc ops prior to predicting box encodings
and class predictions. Default: 0 |
dropout_keep_probability |
float |
optional |
Keep probability for dropout. This is only used if use_dropout is true. Default: 1 |
box_code_size |
int32 |
optional |
Size of the encoding for the boxes. Default: 4 |
agnostic |
bool |
optional |
Whether to use one box for all classes rather than a different box for each
class. Default: true |
RfcnBoxPredictor
Field | Type | Label | Description |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters for convolution ops used in the box predictor. |
num_spatial_bins_height |
int32 |
optional |
Bin sizes for RFCN crops. Default: 3 |
num_spatial_bins_width |
int32 |
optional |
Default: 3 |
depth |
int32 |
optional |
Target depth to reduce the input image features to. Default: 1024 |
box_code_size |
int32 |
optional |
Size of the encoding for the boxes. Default: 4 |
crop_height |
int32 |
optional |
Size to resize the rfcn crops to. Default: 12 |
crop_width |
int32 |
optional |
Default: 12 |
agnostic |
bool |
optional |
Default: true |
WeightSharedConvolutionalBoxPredictor
Configuration proto for weight shared convolutional box predictor.
Field | Type | Label | Description |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters for convolution ops used in the box predictor. |
num_layers_before_predictor |
int32 |
optional |
Number of the additional conv layers before the predictor. Default: 0 |
depth |
int32 |
optional |
Output depth for the convolution ops prior to predicting box encodings
and class predictions. Default: 0 |
kernel_size |
int32 |
optional |
Size of final convolution kernel. If the spatial resolution of the feature
map is smaller than the kernel size, then the kernel size is set to
min(feature_width, feature_height). Default: 3 |
box_code_size |
int32 |
optional |
Size of the encoding for boxes. Default: 4 |
class_prediction_bias_init |
float |
optional |
Bias initialization for class prediction. It has been show to stabilize
training where there are large number of negative boxes. See
https://arxiv.org/abs/1708.02002 for details. Default: 0 |
dropout_keep_probability |
float |
optional |
Keep probability for dropout Default: 1 |
share_prediction_tower |
bool |
optional |
Whether to share the multi-layer tower between box prediction and class
prediction heads. Default: true |
use_depthwise |
bool |
optional |
Whether to use depthwise separable convolution for box predictor layers. Default: false |
box_encodings_clip_range |
WeightSharedConvolutionalBoxPredictor.BoxEncodingsClipRange |
optional |
|
agnostic |
bool |
optional |
Default: true |
WeightSharedConvolutionalBoxPredictor.BoxEncodingsClipRange
If specified, apply clipping to box encodings.
Field | Type | Label | Description |
min |
float |
optional |
|
max |
float |
optional |
|
YOLOBoxPredictor
Configuration proto for YOLO box predictor.
Field | Type | Label | Description |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters for convolution ops used in the box predictor. |
num_layers_before_predictor |
int32 |
optional |
Number of the additional conv layers before the predictor. Default: 1 |
easy_vision/python/protos/classification.proto
Top
ClassificationModel
Field | Type | Label | Description |
input_width |
int32 |
optional |
input width height, if not set, will use default input size instead |
input_height |
int32 |
optional |
|
backbone |
Backbone |
required |
Backbone configuration |
num_classes |
int32 |
required |
Number of classes |
loss |
ClassificationLoss |
required |
Loss configuration for training |
add_summary |
bool |
optional |
Whether to summary training related info Default: true |
label_id_offset |
int32 |
optional |
label_id offset, will be used to subtract from groundtruth class
when calcuating loss amd evaluation Default: 0 |
hidden_size |
int32 |
optional |
hidden size of last fc, if assigned, original fc will be replaced Default: -1 |
LargeScaleClassificationModel
Field | Type | Label | Description |
backbone |
Backbone |
required |
Backbone configuration |
input_layer |
string |
required |
input layer name |
num_classes |
int32 |
required |
Number of classes |
loss |
ClassificationLoss |
required |
Loss configuration for training |
label_id_offset |
int32 |
optional |
label_id offset, will be used to subtract from groundtruth class
when calcuating loss amd evaluation Default: 0 |
global_pool |
bool |
optional |
global pooling after backbone Default: true |
hidden_sizes |
int32 |
repeated |
dense layer hidden size appended before logits |
easy_vision/python/protos/cv_model.proto
Top
CVModel
CustomModel
easy_vision/python/protos/data_config.proto
Top
DataConfig
Field | Type | Label | Description |
separator |
string |
optional |
separator between classname and description, for example:
family_name#QING, # is the separator |
default_class |
string |
optional |
when there are no matched labels in class_map,
and default_class is set, the label will be matched
default class |
error_class |
string |
repeated |
all image of error_class, or has objects of error_class
will be discarded |
ignore_class |
string |
repeated |
all objects of ignore_class will be ignored
ignore_class is only used in detection tfrecord conversion |
class_map |
DataConfig.ClassMap |
repeated |
specified a map from label to class_name
if class_name is not specified, class_name == label_name |
max_image_size |
uint32 |
optional |
max height/width of all the images used in train
if not specified, its default value is 0,
images will not be resized to within max_image_size
these will result in slow training speed or even
OOM(out ouf memory) error Default: 0 |
image_format |
string |
required |
Default: jpg |
model_type |
DataConfig.ModelType |
required |
Default: CLASSIFICATION |
converter_class |
string |
required |
converter class, some converter are already implemented, such
as QinceConverter, user can also pass self-defined converter
using format like module.class_name Default: QinceConverter |
proc_num |
uint32 |
required |
number of generate process Default: 10 |
oss_config |
string |
optional |
path to .osscredentials file |
param_key |
string |
repeated |
custom key value parameters |
param_val |
string |
repeated |
|
char_replace_map_path |
string |
optional |
a csv contain two column ["original", "replaced"] for replace
special chars in text, such as complex chinese character to
simple chinese character, chinese punctuation to english punctuation etc. |
default_char_dict_path |
string |
optional |
a txt contain a list of characters for the characters used in
model training, each character for one line.
if the default_char_dict_path is empty, output_char_dict
will infer from the input data |
prefetch_thread_num |
uint32 |
optional |
number of parallel prefetch thread Default: 10 |
write_thread_num |
uint32 |
optional |
number of parallel write thread Default: 1 |
part_record_num |
int32 |
optional |
The number of samples in each parts of tfrecords:
If Mod(total_num, part_record_num) < part_record_num / 2:
the rest samples are pad to the end of each tfrecords
although the size is large than part_record_num;
If Mod(total_num, part_record_num) > part_record_num / 2:
the rest samples will be placed into a new tfrecord Default: -1 |
test_ratio |
float |
optional |
train/test dataset split ratio of test dataset Default: 0 |
max_test_image_size |
int32 |
optional |
max image size of test dataset images, if not set, it will use max_image_size |
decode_type |
int32 |
optional |
video decode parameters
decode type Default: 4 |
sample_fps |
int32 |
optional |
sample rate, default -1, full sampling. Default: -1 |
reshape_width |
int32 |
optional |
output size of decoded frames, -1 means no resize
if scalar is provided, height=width
otherwise output frame size is (heigh, width) Default: 112 |
reshape_height |
int32 |
optional |
Default: 112 |
decode_batch_size |
int32 |
optional |
batch size of each decode phase. Default: 10 |
decode_keep_size |
int32 |
optional |
left size of last decode phase. Default: 0 |
optical_flow |
string |
optional |
flow calc algo
'' means not calculate optical flow
opencv means calculate optical flow using opencv
tvnet means calculate optical flow using tvnet |
min_bbox_size |
int32 |
optional |
minimum bounding box size, bounding box size less than this value will be filtered Default: 5 |
user_defined_converter_path |
string |
optional |
file path for self defined converter |
user_defined_generator_path |
string |
optional |
file path for self-defined generator |
generator_class |
string |
optional |
class name for generator |
exif_rotate |
bool |
optional |
If false, do not rotate the image according to EXIF's orientation flag. Default: false |
ignore_recog_class |
string |
repeated |
all objects of ignore recog class will not be recognized in TextEnd2End model |
input_queue_size |
int32 |
optional |
size of input queue, one thread read .csv
file and feed input data into input queue Default: 1048576 |
prefetch_queue_size |
int32 |
optional |
size of prefetch queue, multithread thread prefetch
file data and feed into input queue Default: 1024 |
output_queue_size |
int32 |
optional |
size of output queue, subproc write serialized
tf example into output queue, and main queue
acquire the data from output queue Default: 1024 |
roi_padding_min_ratio_of_short_edge |
float |
optional |
padding cropped roi image with multiple of roi short edge Default: 0.75 |
roi_padding_max_ratio_of_short_edge |
float |
optional |
padding cropped roi image with multiple of roi short edge Default: 0.75 |
task_id |
string |
optional |
label task id for PaiConverter |
DataConfig.ClassMap
using this structure, we could map multiple marked
labels into one, for example: 'name'=>'text', 'address'=>'text'
thus enabling flexible collapse of labels
Field | Type | Label | Description |
label_name |
string |
required |
marked class |
class_name |
string |
optional |
the class used in tf record |
DataConfig.ModelType
task type classification, detection, segmentation
or even instance segmentation
Name | Number | Description |
CLASSIFICATION |
0 |
|
DETECTION |
1 |
|
SEGMENTATION |
2 |
|
INSTANCE_SEGMENTATION |
3 |
|
TEXT_END2END |
4 |
|
TEXT_RECOGNITION |
5 |
|
TEXT_DETECTION |
6 |
|
VIDEO_CLASSIFICATION |
7 |
|
TEXT_RECTIFICATION |
8 |
|
POLYGON_SEGMENTATION |
9 |
|
SELF_DEFINED |
100 |
|
easy_vision/python/protos/dataset.proto
Top
ActionDetectionDataDecoder
Field | Type | Label | Description |
label_map_path |
string |
optional |
label map path: specifying the mapping from class_name to class_ids |
ClassificationDataDecoder
Field | Type | Label | Description |
label_map_path |
string |
optional |
label map path: specifying the mapping from class_name to class_ids |
is_multi_label |
bool |
optional |
Default: false |
CustomDataDecoder
DatasetConfig
Field | Type | Label | Description |
input_path |
string |
repeated |
dataset input path, support pattern filename patterns
using tf.match_files(input_path) |
batch_size |
uint32 |
optional |
Effective batch size to use for training. Default: 32 |
data_augmentation_options |
PreprocessingStep |
repeated |
Data augmentation options. |
shuffle |
bool |
optional |
whether to shuffle data Default: true |
shuffle_buffer_size |
uint32 |
optional |
Buffer size to be used when shuffling. Default: 2048 |
filenames_shuffle_buffer_size |
uint32 |
optional |
Buffer size to be used when shuffling file names. Default: 100 |
num_epochs |
uint32 |
optional |
The number of times a data source is read. If set to zero, the data source
will be reused indefinitely. Default: 0 |
num_readers |
uint32 |
optional |
Number of reader instances to create. Default: 1 |
read_block_length |
uint32 |
optional |
Number of records to read from each reader at once. Default: 32 |
prefetch_size |
uint32 |
optional |
Number of decoded records to prefetch before batching. Default: 512 |
num_parallel_map_calls |
uint32 |
optional |
Number of parallel decode ops to apply. Default: 64 |
use_diff |
bool |
optional |
whether to use difficult samples Default: true |
shard |
bool |
optional |
shard dataset to 1/num_workers in distribute mode Default: false |
drop_remainder |
bool |
optional |
whether the last batch should be dropped in the case it has
fewer than batch_size elements Default: true |
bucket_sizes |
uint32 |
repeated |
bucketing size for height and width of images, default is empty, no bucketing
specific settings to each of the dataset, such as voc
will extend to dataset in the future |
input_class |
string |
optional |
input class name if want to direct use one input class |
voc_decoder_config |
VocDataDecoder |
optional |
|
classification_decoder_config |
ClassificationDataDecoder |
optional |
|
seg_decoder_config |
SegmentationDataDecoder |
optional |
|
text_recognition_decoder_config |
TextRecognitionDataDecoder |
optional |
|
text_end2end_decoder_config |
TextEnd2EndDataDecoder |
optional |
|
text_detection_decoder_config |
TextDetectionDataDecoder |
optional |
|
text_rectification_decoder_config |
TextRectificationDataDecoder |
optional |
|
video_classification_decoder_config |
VideoClassificationDataDecoder |
optional |
|
action_detection_decoder_config |
ActionDetectionDataDecoder |
optional |
|
custom_decoder_config |
CustomDataDecoder |
optional |
|
SegmentationDataDecoder
TextDetectionDataDecoder
Field | Type | Label | Description |
num_keypoints |
int32 |
optional |
key points number Default: 4 |
label_map_path |
string |
required |
label map path: specifying the mapping from class_name to class_ids |
TextEnd2EndDataDecoder
Field | Type | Label | Description |
char_dict_path |
string |
required |
dict_path: specifying the char dict |
upper_case |
bool |
optional |
transform label to upper case Default: false |
num_keypoints |
int32 |
optional |
key points number Default: 4 |
label_map_path |
string |
required |
label map path: specifying the mapping from class_name to class_ids |
TextRecognitionDataDecoder
Field | Type | Label | Description |
char_dict_path |
string |
required |
dict_path: specifying the char dict |
max_input_ratio |
float |
optional |
specify the maximal width/height of all the training images |
min_input_ratio |
float |
optional |
specify the minimal width/height of all the training images |
num_buckets |
int32 |
optional |
put data into similar-length buckets |
upper_case |
bool |
optional |
transform label to upper case Default: false |
filter_long_image |
bool |
optional |
filter image with aspect ratio > max_input_ratio Default: true |
max_text_length |
float |
optional |
specify the maximal text length of all the training images |
TextRectificationDataDecoder
VideoClassificationDataDecoder
Field | Type | Label | Description |
label_map_path |
string |
optional |
label map path: specifying the mapping from class_name to class_ids |
input_modal |
string |
optional |
load optical flow or rgb frame
'rgb', 'flow', 'rgb+flow' Default: rgb |
is_multi_label |
bool |
optional |
load multilabel data or not Default: false |
VocDataDecoder
Field | Type | Label | Description |
label_map_path |
string |
required |
label map path: specifying the mapping from class_name to class_ids |
load_instance_masks |
bool |
optional |
Whether to load groundtruth instance masks. Default: false |
mask_format |
MaskFormat |
optional |
Type of instance mask. Default: NUMERICAL_MASK_FORMAT |
num_keypoints |
uint32 |
optional |
Number of groundtruth keypoints per object. Default: 0 |
Name | Number | Description |
NUMERICAL_MASK_FORMAT |
1 |
[num_masks, H, W] float32 binary masks. |
PNG_MASK_FORMAT |
2 |
Encoded PNG masks. |
easy_vision/python/protos/decoder.proto
Top
FullyConnectedCTCDecoder
RNNDecoderWithAttention
Field | Type | Label | Description |
embedding_size |
int32 |
optional |
embedding size Default: 256 |
num_layers |
int32 |
optional |
decoder depth Default: 2 |
basic_lstm |
BasicLSTM |
optional |
|
gru |
GRU |
optional |
|
layer_norm_basic_lstm |
LayerNormBasicLSTM |
optional |
|
nas |
NAS |
optional |
|
residual |
bool |
optional |
whether to add residual connections Default: true |
beam_width |
int32 |
optional |
beam width when using beam search decoder. If 0 (default), use standard decoder with greedy helper Default: 0 |
length_penalty_weight |
float |
optional |
length penalty for beam search Default: 0 |
train_sampling_probability |
float |
optional |
the probability of sampling from the outputs instead of reading directly from the inputs when training Default: 0 |
attention_mechanism |
string |
optional |
attention mechanisms luong | scaled_luong | bahdanau | normed_bahdanau Default: normed_bahdanau |
num_attention_heads |
int32 |
optional |
number of attention heads Default: 1 |
output_attention |
bool |
optional |
whether use attention as the cell output at each timestep Default: true |
visualize_type |
string |
optional |
Visualize attentions type or not. choice: line | spatial |
pass_hidden_state |
bool |
optional |
whether to pass encoder's rnn state to decoder Default: true |
attention_type |
string |
optional |
attention type line | spatial Default: line |
Field | Type | Label | Description |
num_layers |
int32 |
required |
number of encoder layers |
hidden_size |
int32 |
required |
hidden units size |
num_heads |
int32 |
required |
number of attention heads |
filter_size |
int32 |
required |
hidden size of FeedForwardLayer |
layer_postprocess_dropout |
float |
optional |
postprocess layer dropout Default: 0.1 |
attention_dropout |
float |
optional |
attention layer dropout Default: 0.1 |
relu_dropout |
float |
optional |
relu layer dropout Default: 0.1 |
beam_width |
int32 |
optional |
beam search width Default: 1 |
length_penalty_weight |
float |
optional |
length penalty for beam search Default: 0 |
easy_vision/python/protos/deeplab.proto
Top
DeepLab
easy_vision/python/protos/eval.proto
Top
EvalConfig
Message for configuring DetectionModel evaluation jobs (eval.py).
Field | Type | Label | Description |
num_visualizations |
uint32 |
optional |
Number of visualization images to generate. Default: 10 |
num_examples |
uint32 |
optional |
Number of examples to process of evaluation. Default: 0 |
eval_interval_secs |
uint32 |
optional |
How often to run evaluation. Default: 300 |
max_evals |
uint32 |
optional |
Maximum number of times to run evaluation. If set to 0, will run forever. Default: 0 |
save_graph |
bool |
optional |
Whether the TensorFlow graph used for evaluation should be saved to disk. Default: false |
visualization_export_dir |
string |
optional |
Path to directory to store visualizations in. If empty, visualization
images are not exported (only shown on Tensorboard). |
eval_master |
string |
optional |
BNS name of the TensorFlow master. |
metrics_set |
string |
repeated |
Type of metrics to use for evaluation.
possible values:
pascal_voc_detection_metrics
pascal_voc07_detection_metrics
coco_detection_metrics |
export_path |
string |
optional |
Path to export detections to COCO compatible JSON format. |
ignore_groundtruth |
bool |
optional |
Option to not read groundtruth labels and only export detections to
COCO-compatible JSON file. Default: false |
use_moving_averages |
bool |
optional |
Use exponential moving averages of variables for evaluation.
TODO(rathodv): When this is false make sure the model is constructed
without moving averages in restore_fn. Default: false |
eval_instance_masks |
bool |
optional |
Whether to evaluate instance masks.
Note that since there is no evaluation code currently for instance
segmenation this option is unused. Default: false |
min_score_threshold |
float |
optional |
Minimum score threshold for a detected object box to be visualized Default: 0.5 |
max_num_boxes_to_visualize |
int32 |
optional |
Maximum number of detections to visualize Default: 20 |
skip_scores |
bool |
optional |
When drawing a single detection, each label is by default visualized as
<label name> : <label score>. One can skip the name or/and score using the
following fields: Default: false |
skip_labels |
bool |
optional |
Default: false |
visualize_groundtruth_boxes |
bool |
optional |
Whether to show groundtruth boxes in addition to detected boxes in
visualizations. Default: false |
groundtruth_box_visualization_color |
string |
optional |
Box color for visualizing groundtruth boxes. Default: black |
keep_image_id_for_visualization_export |
bool |
optional |
Whether to keep image identifier in filename when exported to
visualization_export_dir. Default: false |
retain_original_images |
bool |
optional |
Whether to retain original images (i.e. not pre-processed) in the tensor
dictionary, so that they can be displayed in Tensorboard. Default: true |
include_metrics_per_category |
bool |
optional |
If True, additionally include per-category metrics. Default: false |
coco_analyze |
bool |
optional |
If True, will open coco analyze function Default: false |
matching_iou_threshold |
float |
optional |
iou threshold used for evaluation Default: 0.5 |
include_metrics_per_dataset |
bool |
optional |
If True, additionally include per-dataset metrics. Default: false |
dataset_names |
string |
repeated |
when include_metrics_per_dataset is true, eval dataset in this dataset_names |
easy_vision/python/protos/export.proto
Top
ExportConfig
Message for configuring exporting models.
Field | Type | Label | Description |
batch_size |
int32 |
optional |
batch size used for exported model, -1 indicates batch_size is None
which is only supported by classification model right now, while
other models support static batch_size Default: -1 |
exporter_type |
string |
optional |
type of exporter [final | latest | none] when train_and_evaluation
final: performs a single export in the end of training
latest: regularly exports the serving graph and checkpoints
none: do not perform export Default: final |
color_format |
string |
optional |
type if color format [bgr | rbg] Default: rgb |
export_video_preprocess |
bool |
optional |
whether export preprocess graph Default: false |
param |
UserDefinedParam |
repeated |
custom defined parameters |
easy_vision/python/protos/faster_rcnn.proto
Top
FasterRcnn
Configuration for RegionProposal models, only objectness is predicted
multiclass is not supported
Field | Type | Label | Description |
backbone |
Backbone |
required |
backbone config |
fpn |
FPN |
optional |
|
rpn_head |
RPNHead |
required |
rpn head config |
region_feature_extractor |
Block |
optional |
block reuse part of backbone to extract box feature in second stage |
rcnn_head |
RCNNHead |
required |
rcnn head config |
mrcnn_head |
MRCNNHead |
optional |
rmask head config |
easy_vision/python/protos/fpn.proto
Top
FPN
Field | Type | Label | Description |
input |
string |
repeated |
|
fea_dim |
int32 |
optional |
Default: 256 |
extra_conv_layers |
int32 |
optional |
Param extra_conv_layers are used to extend feature maps beyond backbone,
so that larger anchors(larger than 256) could be placed on more coarsed
features(stride>=64).
When param retina_net is set to true, then will use strided convolution(s=2).
For fpn, extra_conv_layers = 1, which means that the fpn feature maps
will be P2(C2) P3(C3) P4(C4) P5(C5) P6(P5 pooled).
C2, C3, C4, C5 are backbone feature maps of level 2, 3, 4, 5, such as
resnet/block1, resnet/block2, resnet/block3, resnet/block4.
The anchors placed on PXs will be 32,64,128,256,512. Default: 0 |
retina_net |
bool |
optional |
Default: false |
resize_method |
ResizeMethod.Enum |
optional |
Default: BILINEAR |
roi_min_level |
int32 |
optional |
level refers to feature map indices, level is associated with feature
map strides = 2 ^ level, usually:
feature_map level stride
conv1 1 2
conv2(resnet/block1) 2 4
conv3(resnet/block2) 3 8
conv4(resnet/block3) 4 16
conv5(resnet/block4) 5 32
roi_min_level: refers to the roi level of lowest fpn feature map
example: resnet-50/block1 => 2 Default: 2 |
roi_max_level |
int32 |
optional |
roi_max_level: refers to the roi level of highest fpn feature map
example: resnet-50/block4 => 5 Default: 5 |
roi_canonical_level |
int32 |
optional |
roi_canonical_scale and roi_canonical_level specified the parameters
used in distribute proposals to feature maps:
k = floor(k0 + log2(sqrt(wh)/224))
here, roi_canonical_scale = k0, roi_canonical_level = 224
see (https://arxiv.org/abs/1612.03144) for details. Default: 4 |
roi_canonical_scale |
int32 |
optional |
Default: 224 |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters for convolution ops used in fpn. |
easy_vision/python/protos/graph_rewriter.proto
Top
GraphRewriter
Message to configure graph rewriter for the tf graph.
Field | Type | Label | Description |
quantization |
Quantization |
optional |
|
Quantization
Message for quantization options. See
tensorflow/contrib/quantize/python/quantize.py for details.
Field | Type | Label | Description |
delay |
int32 |
optional |
Number of steps to delay before quantization takes effect during training. Default: 500000 |
weight_bits |
int32 |
optional |
Number of bits to use for quantizing weights.
Only 8 bit is supported for now. Default: 8 |
activation_bits |
int32 |
optional |
Number of bits to use for quantizing activations.
Only 8 bit is supported for now. Default: 8 |
easy_vision/python/protos/hyperparams.proto
Top
BatchNorm
Configuration proto for batch norm to apply after convolution op. See
https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm
Field | Type | Label | Description |
decay |
float |
optional |
Default: 0.999 |
center |
bool |
optional |
Default: true |
scale |
bool |
optional |
Default: false |
epsilon |
float |
optional |
Default: 0.001 |
train |
bool |
optional |
Whether to train the batch norm variables. If this is set to false during
training, the current value of the batch_norm variables are used for
forward pass but they are never updated. Default: true |
Hyperparams
Configuration proto for the convolution op hyperparameters to use in the
object detection pipeline.
Field | Type | Label | Description |
op |
Hyperparams.Op |
optional |
Default: CONV |
regularizer |
Regularizer |
optional |
Regularizer for the weights of the convolution op. |
initializer |
Initializer |
optional |
Initializer for the weights of the convolution op. |
activation |
Hyperparams.Activation |
optional |
Default: RELU |
batch_norm |
BatchNorm |
optional |
BatchNorm hyperparameters. If this parameter is NOT set then BatchNorm is
not applied! |
regularize_depthwise |
bool |
optional |
Whether depthwise convolutions should be regularized. If this parameter is
NOT set then the conv hyperparams will default to the parent scope. Default: false |
Initializer
Proto with one-of field for initializers.
L1Regularizer
Configuration proto for L1 Regularizer.
See https://www.tensorflow.org/api_docs/python/tf/contrib/layers/l1_regularizer
Field | Type | Label | Description |
weight |
float |
optional |
Default: 1 |
L2Regularizer
Configuration proto for L2 Regularizer.
See https://www.tensorflow.org/api_docs/python/tf/contrib/layers/l2_regularizer
Field | Type | Label | Description |
weight |
float |
optional |
Default: 1 |
RandomNormalInitializer
Configuration proto for random normal initializer. See
https://www.tensorflow.org/api_docs/python/tf/random_normal_initializer
Field | Type | Label | Description |
mean |
float |
optional |
Default: 0 |
stddev |
float |
optional |
Default: 1 |
Regularizer
Proto with one-of field for regularizers.
TruncatedNormalInitializer
Configuration proto for truncated normal initializer. See
https://www.tensorflow.org/api_docs/python/tf/truncated_normal_initializer
Field | Type | Label | Description |
mean |
float |
optional |
Default: 0 |
stddev |
float |
optional |
Default: 1 |
VarianceScalingInitializer
Configuration proto for variance scaling initializer. See
https://www.tensorflow.org/api_docs/python/tf/contrib/layers/
variance_scaling_initializer
XavierInitializer
Field | Type | Label | Description |
uniform |
bool |
optional |
Default: true |
Hyperparams.Activation
Type of activation to apply after convolution.
Name | Number | Description |
NONE |
0 |
Use None (no activation) |
RELU |
1 |
Use tf.nn.relu |
RELU_6 |
2 |
Use tf.nn.relu6 |
LEAKY_RELU |
3 |
Use leaky relu |
MISH |
4 |
Use mish |
Hyperparams.Op
Operations affected by hyperparameters.
Name | Number | Description |
CONV |
1 |
Convolution, Separable Convolution, Convolution transpose. |
FC |
2 |
Fully connected |
VarianceScalingInitializer.Mode
Name | Number | Description |
FAN_IN |
0 |
|
FAN_OUT |
1 |
|
FAN_AVG |
2 |
|
easy_vision/python/protos/keypoint_predictor.proto
Top
KeypointPredictor
TextResnetKeypointPredictor
Field | Type | Label | Description |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters for convolution ops used in the keypoint predictor. |
fc_hyperparams |
Hyperparams |
optional |
Hyperparameters for fc ops used in the keypoint predictor. |
num_blocks_before_predictor |
int32 |
optional |
Number of resnet block before keypoint predictor, we use down sampling between block Default: 1 |
num_units_per_block |
int32 |
optional |
Number resnet units per resnet block Default: 1 |
base_depth_before_predictor |
int32 |
optional |
The depth of first resnet block Default: 256 |
se_rate |
int32 |
optional |
The rate of squeeze_and_excitation, less and equal than zeros for disable Default: 0 |
keypoint_prediction_num_fc_layers |
int32 |
optional |
The number of fc layers before predictor Default: 2 |
keypoint_prediction_fc_depth |
int32 |
optional |
The depth of fc layers Default: 1024 |
easy_vision/python/protos/losses.proto
Top
BootstrappedSigmoidClassificationLoss
Classification loss using a sigmoid function over the class prediction with
the highest prediction score.
Field | Type | Label | Description |
alpha |
float |
optional |
Interpolation weight between 0 and 1. |
hard_bootstrap |
bool |
optional |
Whether hard boot strapping should be used or not. If true, will only use
one class favored by model. Othewise, will use all predicted class
probabilities. Default: false |
anchorwise_output |
bool |
optional |
DEPRECATED, do not use.
Output loss per anchor. Default: false |
ClassificationLoss
Configuration for class prediction loss function.
HardExampleMiner
Configuration for hard example miner.
Field | Type | Label | Description |
num_hard_examples |
int32 |
optional |
Maximum number of hard examples to be selected per image (prior to
enforcing max negative to positive ratio constraint). If set to 0,
all examples obtained after NMS are considered. Default: 64 |
iou_threshold |
float |
optional |
Minimum intersection over union for an example to be discarded during NMS. Default: 0.7 |
loss_type |
HardExampleMiner.LossType |
optional |
Default: BOTH |
max_negatives_per_positive |
int32 |
optional |
Maximum number of negatives to retain for each positive anchor. If
num_negatives_per_positive is 0 no prespecified negative:positive ratio is
enforced. Default: 0 |
min_negatives_per_image |
int32 |
optional |
Minimum number of negative anchors to sample for a given image. Setting
this to a positive number samples negatives in an image without any
positive anchors and thus not bias the model towards having at least one
detection per image. Default: 0 |
LocalizationLoss
Configuration for bounding box localization loss function.
Loss
Message for configuring the localization loss, classification loss and hard
example miner used for training object detection models. See core/losses.py
for details
Field | Type | Label | Description |
localization_loss |
LocalizationLoss |
optional |
Localization loss to use. |
classification_loss |
ClassificationLoss |
optional |
Classification loss to use. |
hard_example_miner |
HardExampleMiner |
optional |
If not left to default, applies hard example mining. |
classification_weight |
float |
optional |
Classification loss weight. Default: 1 |
localization_weight |
float |
optional |
Localization loss weight. Default: 1 |
random_example_sampler |
RandomExampleSampler |
optional |
If not left to default, applies random example sampling. |
RandomExampleSampler
Configuration for random example sampler.
Field | Type | Label | Description |
positive_sample_fraction |
float |
optional |
The desired fraction of positive samples in batch when applying random
example sampling. Default: 0.01 |
SigmoidFocalClassificationLoss
Sigmoid Focal cross entropy loss as described in
https://arxiv.org/abs/1708.02002
Field | Type | Label | Description |
anchorwise_output |
bool |
optional |
DEPRECATED, do not use. Default: false |
gamma |
float |
optional |
modulating factor for the loss. Default: 2 |
alpha |
float |
optional |
alpha weighting factor for the loss. |
label_smoothing |
float |
optional |
use label smoothing in loss
please refer to label_smoothing explanation in tf.losses.sigmoid_cross_entropy Default: 0 |
WeightedIOULocalizationLoss
Intersection over union location loss: 1 - IOU
Field | Type | Label | Description |
mode |
string |
optional |
iou type [iou/giou/diou/ciou] Default: iou |
WeightedL2LocalizationLoss
L2 location loss: 0.5 * ||weight * (a - b)|| ^ 2
Field | Type | Label | Description |
anchorwise_output |
bool |
optional |
DEPRECATED, do not use.
Output loss per anchor. Default: false |
WeightedSigmoidClassificationLoss
Classification loss using a sigmoid function over class predictions.
Field | Type | Label | Description |
anchorwise_output |
bool |
optional |
DEPRECATED, do not use.
Output loss per anchor. Default: false |
label_smoothing |
float |
optional |
use label smoothing in loss
please refer to label_smoothing explanation in tf.losses.sigmoid_cross_entropy Default: 0 |
WeightedSmoothL1LocalizationLoss
SmoothL1 (Huber) location loss.
The smooth L1_loss is defined elementwise as .5 x^2 if |x| <= delta and
0.5 x^2 + delta * (|x|-delta) otherwise, where x is the difference between
predictions and target.
Field | Type | Label | Description |
anchorwise_output |
bool |
optional |
DEPRECATED, do not use.
Output loss per anchor. Default: false |
delta |
float |
optional |
Delta value for huber loss. Default: 1 |
WeightedSoftmaxClassificationAgainstLogitsLoss
Classification loss using a softmax function over class predictions and
a softmax function over the groundtruth labels (assumed to be logits).
Field | Type | Label | Description |
anchorwise_output |
bool |
optional |
DEPRECATED, do not use. Default: false |
logit_scale |
float |
optional |
Scale and softmax groundtruth logits before calculating softmax
classification loss. Typically used for softmax distillation with teacher
annotations stored as logits. Default: 1 |
WeightedSoftmaxClassificationLoss
Classification loss using a softmax function over class predictions.
Field | Type | Label | Description |
anchorwise_output |
bool |
optional |
DEPRECATED, do not use.
Output loss per anchor. Default: false |
logit_scale |
float |
optional |
Scale logit (input) value before calculating softmax classification loss.
Typically used for softmax distillation. Default: 1 |
label_smoothing |
float |
optional |
use label smoothing in loss
please refer to label_smoothing explanation in tf.losses.sigmoid_cross_entropy Default: 0 |
HardExampleMiner.LossType
Whether to use classification losses ('cls', default), localization losses
('loc') or both losses ('both'). In the case of 'both', cls_loss_weight and
loc_loss_weight are used to compute weighted sum of the two losses.
Name | Number | Description |
BOTH |
0 |
|
CLASSIFICATION |
1 |
|
LOCALIZATION |
2 |
|
easy_vision/python/protos/mask_predictor.proto
Top
MaskPredictor
MaskRCNNMaskPredictor
Field | Type | Label | Description |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters for convolution ops used in the box predictor. |
mask_prediction_conv_depth |
int32 |
optional |
The depth for the first conv2d_transpose op applied to the
image_features in the mask prediction branch. If set to 0, the value
will be set automatically based on the number of channels in the image
features and the number of classes. Default: 256 |
mask_height |
int32 |
optional |
The height and the width of the predicted mask. Default: 15 |
mask_width |
int32 |
optional |
Default: 15 |
mask_prediction_num_conv_layers |
int32 |
optional |
The number of convolutions applied to image_features in the mask prediction
branch. Default: 2 |
masks_are_class_agnostic |
bool |
optional |
Default: false |
convolve_then_upsample_masks |
bool |
optional |
Whether to apply convolutions on mask features before upsampling using
nearest neighbor resizing.
By default, mask features are resized to [`mask_height`, `mask_width`]
before applying convolutions and predicting masks. Default: false |
easy_vision/python/protos/matcher.proto
Top
Matcher
Configuration proto for the matcher to be used in the object detection
pipeline. See core/matcher.py for details.
easy_vision/python/protos/multi_label_classification.proto
Top
MultiLabelClassification
Field | Type | Label | Description |
backbone |
Backbone |
required |
Backbone configuration |
multi_label_classification_head |
MultiLabelClassificationHead |
required |
multi-label classification head |
include_metrics_per_category |
bool |
optional |
whether display class-specific evaluation metric Default: false |
MultiLabelClassificationHead
Field | Type | Label | Description |
input_layer |
string |
repeated |
input layer |
num_classes |
int32 |
required |
Number of classes |
multi_label_loss_weight |
float |
optional |
loss weight Default: 1 |
global_pooling_type |
string |
optional |
global pooling type, max for max_pooling, avg for average pooling Default: max |
hidden_sizes |
int32 |
repeated |
extra conv layer hidden size |
loss |
ClassificationLoss |
optional |
classification loss |
easy_vision/python/protos/optimizer.proto
Top
AdamOptimizer
Configuration message for the AdamOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
ConstantLearningRate
Configuration message for a constant learning rate.
Field | Type | Label | Description |
learning_rate |
float |
optional |
Default: 0.002 |
CosineDecayLearningRate
Configuration message for a cosine decaying learning rate as defined in
utils/learning_schedules.py
Field | Type | Label | Description |
learning_rate_base |
float |
optional |
Default: 0.002 |
total_steps |
uint32 |
optional |
Default: 4000000 |
warmup_learning_rate |
float |
optional |
Default: 0.0002 |
warmup_steps |
uint32 |
optional |
Default: 10000 |
hold_base_rate_steps |
uint32 |
optional |
Default: 0 |
ExponentialDecayLearningRate
Configuration message for an exponentially decaying learning rate.
See https://www.tensorflow.org/versions/master/api_docs/python/train/ \
decaying_the_learning_rate#exponential_decay
Field | Type | Label | Description |
initial_learning_rate |
float |
optional |
Default: 0.002 |
decay_steps |
uint32 |
optional |
Default: 4000000 |
decay_factor |
float |
optional |
Default: 0.95 |
staircase |
bool |
optional |
Default: true |
burnin_learning_rate |
float |
optional |
Default: 0 |
burnin_steps |
uint32 |
optional |
Default: 0 |
min_learning_rate |
float |
optional |
Default: 0 |
LearningRate
Configuration message for optimizer learning rate.
ManualStepLearningRate
Configuration message for a manually defined learning rate schedule.
ManualStepLearningRate.LearningRateSchedule
Field | Type | Label | Description |
step |
uint32 |
optional |
|
learning_rate |
float |
optional |
Default: 0.002 |
MomentumOptimizer
Configuration message for the MomentumOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
momentum_optimizer_value |
float |
optional |
Default: 0.9 |
Optimizer
Top level optimizer message.
PolyDecayLearningRate
Configuration message for a poly decaying learning rate.
See https://www.tensorflow.org/api_docs/python/tf/train/polynomial_decay.
Field | Type | Label | Description |
learning_rate_base |
float |
required |
|
total_steps |
int64 |
required |
|
power |
float |
required |
|
end_learning_rate |
float |
optional |
Default: 0 |
RMSPropOptimizer
Configuration message for the RMSPropOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/RMSPropOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
momentum_optimizer_value |
float |
optional |
Default: 0.9 |
decay |
float |
optional |
Default: 0.9 |
epsilon |
float |
optional |
Default: 1 |
Field | Type | Label | Description |
learning_rate_base |
float |
required |
|
hidden_size |
int32 |
required |
|
warmup_steps |
int32 |
required |
|
step_scaling_rate |
float |
optional |
Default: 1 |
easy_vision/python/protos/param_space.proto
Top
ParamSpace
Field | Type | Label | Description |
task_type |
ParamSpace.TaskType |
required |
task type, such as CLASSIFICATION | DETECTION etc. Default: CLASSIFICATION |
data_prefixs |
string |
repeated |
dataset data directory example
+ data/
+ tfrecord/
- name_train_1.tfrecord
- name_train_1_info.json
- name_test.tfrecord
- name_label_map.pbtxt
- name_char_dict
train data path prefix, e.g. data/tfrecord/name |
preference_type |
ParamSpace.PreferenceType |
optional |
param space preference Default: ACCURATE |
pretrained_model_dir |
string |
optional |
pretrain model directory for incremental training |
space_size |
int32 |
optional |
param space size Default: 1 |
ParamSpace.PreferenceType
Name | Number | Description |
ACCURATE |
0 |
|
FAST |
1 |
|
ParamSpace.TaskType
Name | Number | Description |
CLASSIFICATION |
0 |
|
DETECTION |
1 |
|
SEGMENTATION |
2 |
|
INSTANCE_SEGMENTATION |
3 |
|
TEXT_END2END |
4 |
|
TEXT_RECOGNITION |
5 |
|
TEXT_DETECTION |
6 |
|
easy_vision/python/protos/pipeline.proto
Top
CVEstimator
CVEstimator config:including train and test parameters
Field | Type | Label | Description |
train_config |
TrainConfig |
optional |
train config, including optimizer, weight decay, num_steps and so on |
eval_config |
EvalConfig |
optional |
|
export_config |
ExportConfig |
optional |
|
train_data |
DatasetConfig |
optional |
|
eval_data |
DatasetConfig |
optional |
|
model_config |
CVModel |
required |
cv model config |
user_resource_path |
string |
optional |
in local mode user_resource_path should be
set to the directory containing customized code |
ac_config |
CompressConfig |
optional |
auto-compression options. |
easy_vision/python/protos/post_processing.proto
Top
BatchNonMaxSuppression
Configuration proto for non-max-suppression operation on a batch of
detections.
Field | Type | Label | Description |
score_threshold |
float |
optional |
Scalar threshold for score used in evaluation (low scoring boxes are removed). Default: 0 |
predict_score_threshold |
float |
optional |
workaround setup for score_threshold used for prediction
to avoid revising config file before exporting models Default: 0.5 |
iou_threshold |
float |
optional |
Scalar threshold for IOU (boxes that have high IOU overlap
with previously selected boxes are removed). Default: 0.6 |
max_detections_per_class |
int32 |
optional |
Maximum number of detections to retain per class. Default: 100 |
max_total_detections |
int32 |
optional |
Maximum number of detections to retain across all classes. Default: 100 |
class_agnostic |
bool |
optional |
Class agnostic set in nms. Default: false |
PostProcessing
Configuration proto for post-processing predicted boxes and
scores.
Field | Type | Label | Description |
batch_non_max_suppression |
BatchNonMaxSuppression |
optional |
Non max suppression parameters. |
score_converter |
PostProcessing.ScoreConverter |
optional |
Score converter to use. Default: IDENTITY |
logit_scale |
float |
optional |
Scale logit (input) value before conversion in post-processing step.
Typically used for softmax distillation, though can be used to scale for
other reasons. Default: 1 |
PostProcessing.ScoreConverter
Enum to specify how to convert the detection scores.
Name | Number | Description |
IDENTITY |
0 |
Input scores equals output scores. |
SIGMOID |
1 |
Applies a sigmoid on input scores. |
SOFTMAX |
2 |
Applies a softmax on input scores |
easy_vision/python/protos/predictor_eval.proto
Top
PredictorEval
PredictorEval config: including train and test parameters
Field | Type | Label | Description |
predictor_name |
string |
required |
name of predictor or predictor class path |
model_path |
string |
optional |
predictor model path |
eval_config |
EvalConfig |
required |
evaluator config |
eval_data |
DatasetConfig |
required |
evaluation data config |
easy_vision/python/protos/preprocessor.proto
Top
ActionDetectionPreprocessing
Field | Type | Label | Description |
length |
int32 |
optional |
video length Default: 768 |
crop_size |
int32 |
optional |
video input size Default: 112 |
frame_height |
int32 |
optional |
resize height and width Default: 128 |
frame_width |
int32 |
optional |
Default: 171 |
means |
float |
repeated |
means |
norm_values |
float |
repeated |
|
is_flip |
bool |
optional |
flip and random crop indicator Default: true |
is_random_crop |
bool |
optional |
Default: true |
CifarNetPreprocessing
Field | Type | Label | Description |
output_width |
int32 |
optional |
Default: 32 |
output_height |
int32 |
optional |
Default: 32 |
is_training |
bool |
optional |
Default: false |
add_image_summaries |
bool |
optional |
Default: false |
ClassificationAutoAugment
Distort classification image with Auto Augment
ClassificationCentralCrop
Central crop image
Field | Type | Label | Description |
central_crop_fraction |
float |
optional |
central crop fraction Default: 0.875 |
ClassificationRandomAugment
Distort classification image with Random Augment
Field | Type | Label | Description |
num_layers |
int32 |
optional |
the number of augmentation transformations to apply
sequentially to an image Default: 2 |
magnitude |
int32 |
optional |
shared magnitude across all augmentation operations Default: 10 |
ClassificationRandomCrop
Randomly crops the image
Field | Type | Label | Description |
min_aspect_ratio |
float |
optional |
Aspect ratio bounds of cropped image. Default: 0.75 |
max_aspect_ratio |
float |
optional |
Default: 1.33 |
min_area |
float |
optional |
Allowed area ratio of cropped image to original image. Default: 0.1 |
max_area |
float |
optional |
Default: 1 |
DeepLabRandomCrop
Field | Type | Label | Description |
crop_size |
int32 |
optional |
Default: 513 |
DeepLabRandomHorizontalFlip
DeepLabResizeImage
Field | Type | Label | Description |
new_height |
int32 |
required |
|
new_width |
int32 |
required |
|
EfficientNetPreprocessing
Field | Type | Label | Description |
model_name |
string |
optional |
if model_name is set, output_width and output_width will use default for
this model, e.g., efficientnet-b0: 224, efficientnet-b1: 240 |
output_width |
int32 |
optional |
Default: 224 |
output_height |
int32 |
optional |
Default: 224 |
is_training |
bool |
optional |
Default: false |
augment_name |
string |
optional |
the name of the augmentation method to apply to the image.
`autoaugment` if AutoAugment is to be used
`randaugment` if RandAugment is to be used Default: randaugment |
randaug_num_layers |
int32 |
optional |
the number of augmentation transformations to apply
sequentially to an image in randaugment Default: 2 |
randaug_magnitude |
int32 |
optional |
shared magnitude across all augmentation operations in randaugment Default: 10 |
InceptionPreprocessing
Field | Type | Label | Description |
output_width |
int32 |
optional |
Default: 224 |
output_height |
int32 |
optional |
Default: 224 |
is_training |
bool |
optional |
Default: false |
add_image_summaries |
bool |
optional |
Default: false |
central_crop_fraction |
float |
optional |
Default: 0.875 |
KineticsPreprocessing
kinetics preprocess
Field | Type | Label | Description |
sample_duration |
int32 |
optional |
Default: 16 |
input_c |
int32 |
optional |
Default: 3 |
initial_scale |
float |
optional |
scale parameters Default: 1 |
n_scales |
int32 |
optional |
Default: 5 |
scale_step |
float |
optional |
Default: 0.840896428 |
train_crop |
string |
optional |
spatial crop type Default: corner |
sample_size |
int32 |
optional |
sample size Default: 112 |
n_samples_for_each_video |
int32 |
optional |
Default: 1 |
is_spatial_transform |
bool |
optional |
Default: true |
is_training |
bool |
optional |
Default: true |
LeNetPreprocessing
Field | Type | Label | Description |
output_width |
int32 |
optional |
Default: 28 |
output_height |
int32 |
optional |
Default: 28 |
is_training |
bool |
optional |
Default: false |
LetterBoxImage
Padding the short edge of image to fit the target image aspect_ratio
Field | Type | Label | Description |
aspect_ratio |
float |
optional |
target image aspect ratio |
pad_value |
float |
optional |
constant value to pad |
NormalizeImage
Normalizes pixel values in an image.
For every channel in the image, moves the pixel values from the range
[original_minval, original_maxval] to [target_minval, target_maxval].
Field | Type | Label | Description |
original_minval |
float |
optional |
|
original_maxval |
float |
optional |
|
target_minval |
float |
optional |
Default: 0 |
target_maxval |
float |
optional |
Default: 1 |
PreprocessingStep
Message for defining a preprocessing operation on input data.
See: //third_party/tensorflow_models/core/preprocessor.py
RGBtoGray
Converts the RGB image to a grayscale image. This also converts the image
depth from 3 to 1, unlike RandomRGBtoGray which does not change the image
depth.
RandomAdjustBrightness
Randomly changes image brightness by up to max_delta. Image outputs will be
saturated between 0 and 1.
Field | Type | Label | Description |
max_delta |
float |
optional |
Default: 0.2 |
RandomAdjustContrast
Randomly scales contract by a value between [min_delta, max_delta].
Field | Type | Label | Description |
min_delta |
float |
optional |
Default: 0.8 |
max_delta |
float |
optional |
Default: 1.25 |
RandomAdjustHue
Randomly alters hue by a value of up to max_delta.
Field | Type | Label | Description |
max_delta |
float |
optional |
Default: 0.02 |
RandomAdjustSaturation
Randomly changes saturation by a value between [min_delta, max_delta].
Field | Type | Label | Description |
min_delta |
float |
optional |
Default: 0.8 |
max_delta |
float |
optional |
Default: 1.25 |
RandomBlackPatches
Randomly adds black square patches to an image.
Field | Type | Label | Description |
max_black_patches |
int32 |
optional |
The maximum number of black patches to add. Default: 10 |
probability |
float |
optional |
The probability of a black patch being added to an image. Default: 0.5 |
size_to_image_ratio |
float |
optional |
Ratio between the dimension of the black patch to the minimum dimension of
the image (patch_width = patch_height = min(image_height, image_width)). Default: 0.1 |
RandomCropImage
Randomly crops the image and bounding boxes.
Field | Type | Label | Description |
min_object_covered |
float |
optional |
Cropped image must cover at least one box by this fraction. Default: 1 |
min_aspect_ratio |
float |
optional |
Aspect ratio bounds of cropped image. Default: 0.75 |
max_aspect_ratio |
float |
optional |
Default: 1.33 |
min_area |
float |
optional |
Allowed area ratio of cropped image to original image. Default: 0.1 |
max_area |
float |
optional |
Default: 1 |
overlap_thresh |
float |
optional |
Minimum overlap threshold of cropped boxes to keep in new image. If the
ratio between a cropped bounding box and the original is less than this
value, it is removed from the new image. Default: 0.3 |
random_coef |
float |
optional |
Probability of keeping the original image. Default: 0 |
RandomCropPadImage
Randomly crops an image followed by a random pad.
Field | Type | Label | Description |
min_object_covered |
float |
optional |
Cropping operation must cover at least one box by this fraction. Default: 1 |
min_aspect_ratio |
float |
optional |
Aspect ratio bounds of image after cropping operation. Default: 0.75 |
max_aspect_ratio |
float |
optional |
Default: 1.33 |
min_area |
float |
optional |
Allowed area ratio of image after cropping operation. Default: 0.1 |
max_area |
float |
optional |
Default: 1 |
overlap_thresh |
float |
optional |
Minimum overlap threshold of cropped boxes to keep in new image. If the
ratio between a cropped bounding box and the original is less than this
value, it is removed from the new image. Default: 0.3 |
random_coef |
float |
optional |
Probability of keeping the original image during the crop operation. Default: 0 |
min_padded_size_ratio |
float |
repeated |
Maximum dimensions for padded image. If unset, will use double the original
image dimension as a lower bound. Both of the following fields should be
length 2. |
max_padded_size_ratio |
float |
repeated |
|
pad_color |
float |
repeated |
Color of the padding. If unset, will pad using average color of the input
image. This field should be of length 3. |
RandomCropTextImage
Randomly crops the text image and bounding boxes.
Field | Type | Label | Description |
min_object_covered |
float |
optional |
Cropped image must cover at least one box by this fraction. Default: 1 |
min_aspect_ratio |
float |
optional |
Aspect ratio bounds of cropped image. Default: 0.2 |
max_aspect_ratio |
float |
optional |
Default: 5 |
min_area |
float |
optional |
Allowed area ratio of cropped image to original image. Default: 0.1 |
max_area |
float |
optional |
Default: 1 |
random_coef |
float |
optional |
Probability of keeping the original image. Default: 0.1 |
RandomCropTextRegion
Randomly crops the text region
text recognition and text rectification use
RandomCropToAspectRatio
Randomly crops an iamge to a given aspect ratio.
Field | Type | Label | Description |
aspect_ratio |
float |
optional |
Aspect ratio. Default: 1 |
overlap_thresh |
float |
optional |
Minimum overlap threshold of cropped boxes to keep in new image. If the
ratio between a cropped bounding box and the original is less than this
value, it is removed from the new image. Default: 0.3 |
RandomDistortColor
Performs a random color distortion. color_orderings should either be 0 or 1.
Field | Type | Label | Description |
color_ordering |
int32 |
optional |
0 means first adjust brightness then adjust saturation, 1 otherwise Default: 0 |
fast_mode |
bool |
optional |
in fast_mode, only adjust brightness and saturation
otherwise, adjust brightness, saturation, hue, contrast Default: false |
RandomHorizontalFlip
Randomly horizontally flips the image and detections 50% of the time.
Field | Type | Label | Description |
keypoint_flip_permutation |
int32 |
repeated |
Specifies a mapping from the original keypoint indices to horizontally
flipped indices. This is used in the event that keypoints are specified,
in which case when the image is horizontally flipped the keypoints will
need to be permuted. E.g. for keypoints representing left_eye, right_eye,
nose_tip, mouth, left_ear, right_ear (in that order), one might specify
the keypoint_flip_permutation below:
keypoint_flip_permutation: 1
keypoint_flip_permutation: 0
keypoint_flip_permutation: 2
keypoint_flip_permutation: 3
keypoint_flip_permutation: 5
keypoint_flip_permutation: 4 |
RandomImageScale
Randomly enlarges or shrinks image (keeping aspect ratio).
Field | Type | Label | Description |
min_scale_ratio |
float |
optional |
Default: 0.5 |
max_scale_ratio |
float |
optional |
Default: 2 |
RandomJitterAspectRatio
Random Change Image Aspect Ratio
RandomJitterBoxes
Randomly jitters corners of boxes in the image determined by ratio.
ie. If a box is [100, 200] and ratio is 0.02, the corners can move by [1, 4].
Field | Type | Label | Description |
ratio |
float |
optional |
Default: 0.05 |
RandomPadImage
Randomly adds padding to the image.
Field | Type | Label | Description |
min_height_ratio |
float |
optional |
Minimum dimensions for padded image. If unset, will use original image
dimension as a lower bound. |
min_width_ratio |
float |
optional |
|
max_height_ratio |
float |
optional |
Maximum dimensions for padded image. If unset, will use double the original
image dimension as a lower bound. |
max_width_ratio |
float |
optional |
|
pad_color |
float |
repeated |
Color of the padding. If unset, will pad using average color of the input
image. |
RandomPixelValueScale
Randomly scales the values of all pixels in the image by some constant value
between [minval, maxval], then clip the value to a range between [0, 1.0].
Field | Type | Label | Description |
minval |
float |
optional |
Default: 0.9 |
maxval |
float |
optional |
Default: 1.1 |
RandomRGBtoGray
Randomly convert entire image to grey scale.
Field | Type | Label | Description |
probability |
float |
optional |
Default: 0.1 |
RandomResizeImage
Random Resize images
RandomResizeMethod
Randomly resizes the image up to [target_height, target_width].
Field | Type | Label | Description |
target_height |
float |
optional |
|
target_width |
float |
optional |
|
RandomResizeToRange
RandomRotateTextRegion
Randomly rotates the text region image counter-clockwise.
Field | Type | Label | Description |
min_angle |
float |
optional |
Default: -10 |
max_angle |
float |
optional |
Default: 10 |
rot90 |
bool |
optional |
random rotate image 90 degree or not Default: true |
RandomRotation
Randomly rotates the image and detections by (min_angle to max_angle) degrees counter-clockwise
Field | Type | Label | Description |
min_angle |
float |
optional |
Default: -10 |
max_angle |
float |
optional |
Default: 10 |
use_keypoints_calc_boxes |
bool |
optional |
use keypoints to compute new bounding box or not Default: false |
RandomRotation90
Randomly rotates the image and detections by 90 degrees counter-clockwise
50% of the time.
RandomVerticalFlip
Randomly vertically flips the image and detections 50% of the time.
Field | Type | Label | Description |
keypoint_flip_permutation |
int32 |
repeated |
Specifies a mapping from the original keypoint indices to vertically
flipped indices. This is used in the event that keypoints are specified,
in which case when the image is vertically flipped the keypoints will
need to be permuted. E.g. for keypoints representing left_eye, right_eye,
nose_tip, mouth, left_ear, right_ear (in that order), one might specify
the keypoint_flip_permutation below:
keypoint_flip_permutation: 1
keypoint_flip_permutation: 0
keypoint_flip_permutation: 2
keypoint_flip_permutation: 3
keypoint_flip_permutation: 5
keypoint_flip_permutation: 4 |
ResizeImage
Resizes images to [new_height, new_width].
ResizeImageWithFixedHeight
Resizes images to fixed new_height and keep ratio
ResizeToRange
SSDRandomCrop
Randomly crops a image according to:
Liu et al., SSD: Single shot multibox detector.
This preprocessing step defines multiple SSDRandomCropOperations. Only one
operation (chosen at random) is actually performed on an image.
SSDRandomCropFixedAspectRatio
Randomly crops a image to a fixed aspect ratio according to:
Liu et al., SSD: Single shot multibox detector.
Multiple SSDRandomCropFixedAspectRatioOperations are defined by this
preprocessing step. Only one operation (chosen at random) is actually
performed on an image.
SSDRandomCropFixedAspectRatioOperation
Field | Type | Label | Description |
min_object_covered |
float |
optional |
Cropped image must cover at least this fraction of one original bounding
box. |
min_area |
float |
optional |
The area of the cropped image must be within the range of
[min_area, max_area]. |
max_area |
float |
optional |
|
overlap_thresh |
float |
optional |
Cropped box area ratio must be above this threhold to be kept. |
random_coef |
float |
optional |
Probability a crop operation is skipped. |
SSDRandomCropOperation
Field | Type | Label | Description |
min_object_covered |
float |
optional |
Cropped image must cover at least this fraction of one original bounding
box. |
min_aspect_ratio |
float |
optional |
The aspect ratio of the cropped image must be within the range of
[min_aspect_ratio, max_aspect_ratio]. |
max_aspect_ratio |
float |
optional |
|
min_area |
float |
optional |
The area of the cropped image must be within the range of
[min_area, max_area]. |
max_area |
float |
optional |
|
overlap_thresh |
float |
optional |
Cropped box area ratio must be above this threhold to be kept. |
random_coef |
float |
optional |
Probability a crop operation is skipped. |
SSDRandomCropPad
Randomly crops and pads an image according to:
Liu et al., SSD: Single shot multibox detector.
This preprocessing step defines multiple SSDRandomCropPadOperations. Only one
operation (chosen at random) is actually performed on an image.
SSDRandomCropPadFixedAspectRatio
Randomly crops and pads an image to a fixed aspect ratio according to:
Liu et al., SSD: Single shot multibox detector.
Multiple SSDRandomCropPadFixedAspectRatioOperations are defined by this
preprocessing step. Only one operation (chosen at random) is actually
performed on an image.
Field | Type | Label | Description |
operations |
SSDRandomCropPadFixedAspectRatioOperation |
repeated |
|
aspect_ratio |
float |
optional |
Aspect ratio to pad to. This value is used for all crop and pad operations. Default: 1 |
min_padded_size_ratio |
float |
repeated |
Min ratio of padded image height and width to the input image's height and
width. Two entries per operation. |
max_padded_size_ratio |
float |
repeated |
Max ratio of padded image height and width to the input image's height and
width. Two entries per operation. |
SSDRandomCropPadFixedAspectRatioOperation
Field | Type | Label | Description |
min_object_covered |
float |
optional |
Cropped image must cover at least this fraction of one original bounding
box. |
min_aspect_ratio |
float |
optional |
The aspect ratio of the cropped image must be within the range of
[min_aspect_ratio, max_aspect_ratio]. |
max_aspect_ratio |
float |
optional |
|
min_area |
float |
optional |
The area of the cropped image must be within the range of
[min_area, max_area]. |
max_area |
float |
optional |
|
overlap_thresh |
float |
optional |
Cropped box area ratio must be above this threhold to be kept. |
random_coef |
float |
optional |
Probability a crop operation is skipped. |
SSDRandomCropPadOperation
Field | Type | Label | Description |
min_object_covered |
float |
optional |
Cropped image must cover at least this fraction of one original bounding
box. |
min_aspect_ratio |
float |
optional |
The aspect ratio of the cropped image must be within the range of
[min_aspect_ratio, max_aspect_ratio]. |
max_aspect_ratio |
float |
optional |
|
min_area |
float |
optional |
The area of the cropped image must be within the range of
[min_area, max_area]. |
max_area |
float |
optional |
|
overlap_thresh |
float |
optional |
Cropped box area ratio must be above this threhold to be kept. |
random_coef |
float |
optional |
Probability a crop operation is skipped. |
min_padded_size_ratio |
float |
repeated |
Min ratio of padded image height and width to the input image's height and
width. Two entries per operation. |
max_padded_size_ratio |
float |
repeated |
Max ratio of padded image height and width to the input image's height and
width. Two entries per operation. |
pad_color_r |
float |
optional |
Padding color. |
pad_color_g |
float |
optional |
|
pad_color_b |
float |
optional |
|
ScaleBoxesToPixelCoordinates
Scales boxes from normalized coordinates to pixel coordinates.
SubtractChannelMean
Normalizes an image by subtracting a mean from each channel.
Field | Type | Label | Description |
means |
float |
repeated |
The mean to subtract from each channel. Should be of same dimension of
channels in the input image. |
TemporalCenterCrop
temporal center crop
Field | Type | Label | Description |
sample_duration |
int32 |
optional |
crop length Default: 16 |
sample_stride |
int32 |
optional |
downsampling stride after temporal crop
output length is sample_duration/sample_stride Default: 1 |
TemporalRandomCrop
temporal random crop
Field | Type | Label | Description |
sample_duration |
int32 |
optional |
crop length
downsampling stride after temporal crop
output length is sample_duration/sample_stride Default: 16 |
sample_stride |
int32 |
optional |
Default: 1 |
VggPreprocessing
For classification model, we provide a fixed preprocessing process according to
different backbone, if you want to add more preprocessing step, just use the ones
above, by adding them to data_augmentation_options
Field | Type | Label | Description |
output_width |
int32 |
optional |
Default: 224 |
output_height |
int32 |
optional |
Default: 224 |
is_training |
bool |
optional |
Default: false |
resize_side_min |
int32 |
optional |
Default: 256 |
resize_side_max |
int32 |
optional |
Default: 512 |
VideoSpatialCenterCrop
Field | Type | Label | Description |
crop_size |
int32 |
repeated |
crop size |
VideoSpatialRandomCrop
Field | Type | Label | Description |
crop_size |
int32 |
repeated |
crop size |
RandomJitterAspectRatio.Method
Name | Number | Description |
AREA |
1 |
|
BICUBIC |
2 |
|
BILINEAR |
3 |
|
NEAREST_NEIGHBOR |
4 |
|
ResizeImageWithFixedHeight.Method
Name | Number | Description |
AREA |
1 |
|
BICUBIC |
2 |
|
BILINEAR |
3 |
|
NEAREST_NEIGHBOR |
4 |
|
easy_vision/python/protos/rc3d.proto
Top
RC3D
Configuration for RegionProposal models, only objectness is predicted
multiclass is not supported
Field | Type | Label | Description |
backbone |
Backbone |
required |
backbone config |
trpn_head |
TRPNHead |
required |
rpn head config |
region_feature_extractor |
Block |
optional |
block reuse part of backbone to extract box feature in second stage |
trcnn_head |
TRCNNHead |
required |
rcnn head config |
easy_vision/python/protos/rcnn_head.proto
Top
MRCNNHead
Field | Type | Label | Description |
input_layer |
string |
repeated |
|
num_classes |
int32 |
required |
|
initial_crop_size |
int32 |
optional |
Output size (width and height are set to be the same) of the initial
bilinear interpolation based cropping during ROI pooling. |
maxpool_kernel_size |
int32 |
optional |
Kernel size of the max pool op on the cropped feature map during
ROI pooling. |
maxpool_stride |
int32 |
optional |
Stride of the max pool op on the cropped feature map during ROI pooling. |
third_stage_mask_predictor |
MaskPredictor |
optional |
Hyperparameters for the third stage mask predictor. |
second_stage_mask_loss_weight |
float |
optional |
Second stage instance mask loss weight. Default: 1 |
RCNNHead
Field | Type | Label | Description |
input_layer |
string |
repeated |
|
num_classes |
int32 |
required |
|
initial_crop_size |
int32 |
optional |
Output size (width and height are set to be the same) of the initial
bilinear interpolation based cropping during ROI pooling. |
maxpool_kernel_size |
int32 |
optional |
Kernel size of the max pool op on the cropped feature map during
ROI pooling. |
maxpool_stride |
int32 |
optional |
Stride of the max pool op on the cropped feature map during ROI pooling. |
second_stage_box_predictor |
BoxPredictor |
optional |
Hyperparameters for the second stage box predictor. If box predictor type
is set to rfcn_box_predictor, a R-FCN model is constructed, otherwise a
Faster R-CNN model is constructed. |
nms_config |
BatchNonMaxSuppression |
required |
|
second_stage_batch_size |
int32 |
optional |
The batch size per image used for computing the classification and refined
location loss of the box classifier.
Note that this field is ignored if `hard_example_miner` is configured. Default: 128 |
second_stage_balance_fraction |
float |
optional |
Fraction of positive examples to use per image for the box classifier. Default: 0.25 |
hard_example_miner |
HardExampleMiner |
optional |
|
second_stage_localization_loss_weight |
float |
optional |
Second stage RCNN localization loss weight Default: 1 |
second_stage_classification_loss_weight |
float |
optional |
Second stage RCNN classification loss weight Default: 1 |
output_roi_features |
bool |
optional |
Output detection roi features or not Default: false |
easy_vision/python/protos/region_similarity_calculator.proto
Top
IoaSimilarity
Configuration for intersection-over-area (IOA) similarity calculator.
IouSimilarity
Configuration for intersection-over-union (IOU) similarity calculator.
NegSqDistSimilarity
Configuration for negative squared distance similarity calculator.
RegionSimilarityCalculator
Configuration proto for region similarity calculators. See
core/region_similarity_calculator.py for details.
easy_vision/python/protos/resize_method.proto
Top
ResizeMethod
Enumeration type for image resizing methods provided in TensorFlow.
ResizeMethod.Enum
Name | Number | Description |
AREA |
1 |
Corresponds to tf.image.ResizeMethod.AREA |
BICUBIC |
2 |
Corresponds to tf.image.ResizeMethod.BICUBIC |
BILINEAR |
3 |
Corresponds to tf.image.ResizeMethod.BILINEAR |
NEAREST_NEIGHBOR |
4 |
Corresponds to tf.image.ResizeMethod.NEAREST_NEIGHBOR |
easy_vision/python/protos/rnn.proto
Top
BasicLSTM
Field | Type | Label | Description |
num_units |
int32 |
optional |
Hidden unit size. Default: 256 |
forget_bias |
float |
optional |
Forget bias for BasicLSTMCell. Default: 1 |
dropout |
float |
optional |
Dropout rate (not keep_prob) Default: 0.2 |
GRU
Field | Type | Label | Description |
num_units |
int32 |
optional |
Hidden unit size. Default: 256 |
dropout |
float |
optional |
Dropout rate (not keep_prob) Default: 0.2 |
LayerNormBasicLSTM
Field | Type | Label | Description |
num_units |
int32 |
optional |
Hidden unit size. Default: 256 |
forget_bias |
float |
optional |
Forget bias for BasicLSTMCell. Default: 1 |
dropout |
float |
optional |
Dropout rate (not keep_prob) Default: 0.2 |
NAS
Field | Type | Label | Description |
num_units |
int32 |
optional |
Hidden unit size. Default: 256 |
dropout |
float |
optional |
Dropout rate (not keep_prob) Default: 0.2 |
easy_vision/python/protos/rpn_head.proto
Top
RPNHead
Field | Type | Label | Description |
input_layer |
string |
repeated |
|
box_predictor |
BoxPredictor |
required |
|
first_stage_minibatch_size |
int32 |
optional |
The batch size to use for computing the first stage objectness and
location losses. Default: 256 |
first_stage_positive_balance_fraction |
float |
optional |
Fraction of positive examples per image for the RPN. Default: 0.5 |
first_stage_nms_score_threshold |
float |
optional |
Non max suppression score threshold applied to first stage RPN proposals. Default: 0 |
first_stage_nms_iou_threshold |
float |
optional |
Non max suppression IOU threshold applied to first stage RPN proposals. Default: 0.7 |
first_stage_max_proposals |
int32 |
optional |
Maximum number of RPN proposals retained after first stage postprocessing. Default: 300 |
first_stage_anchor_generator |
AnchorGenerator |
optional |
Anchor generator to compute RPN anchors. |
first_stage_localization_loss_weight |
float |
optional |
First stage RPN localization loss weight. Default: 1 |
first_stage_objectness_loss_weight |
float |
optional |
First stage RPN objectness loss weight. Default: 1 |
rpn_min_size |
int32 |
optional |
at postprocessing stage, filter rpn out box, drop all boxes[width/height<rpn_min_size] Default: 0 |
pre_nms_topn |
int32 |
optional |
pre nms topn, only valid when large than 0
it should be set large enough, otherwise
it may hurt performance Default: -1 |
boundary_threshold |
int32 |
optional |
remove rpn anchors that go outside the image by boundary_threshold pixels
set to -1 or a large value, e.g. 100000, to disable pruning anchors Default: 0 |
easy_vision/python/protos/seg_decode_head.proto
Top
SegDecoderHead
Field | Type | Label | Description |
weight_decay |
float |
optional |
Default: 0 |
batchnorm_trainable |
bool |
optional |
Default: true |
input_layer |
string |
repeated |
|
use_separable_conv |
bool |
optional |
Default: true |
decoder_depth |
int32 |
required |
|
output_stride |
int32 |
required |
Default: 4 |
num_classes |
int32 |
required |
Default: 5 |
resize_to_original |
bool |
optional |
whether convert the predictions to original shape
during postprocess Default: true |
easy_vision/python/protos/simple_rpn.proto
Top
SimpleRPN
Configuration for RegionProposal models, only objectness is predicted
multiclass is not supported
Field | Type | Label | Description |
backbone |
Backbone |
required |
backbone config |
rpn_head |
RPNHead |
required |
rpn head config |
first_stage_localization_loss_weight |
float |
optional |
First stage RPN localization loss weight. Default: 1 |
first_stage_objectness_loss_weight |
float |
optional |
First stage RPN objectness loss weight. Default: 1 |
easy_vision/python/protos/ssd.proto
Top
FPNFeaturemapLayout
Field | Type | Label | Description |
from_layer |
string |
repeated |
from which layer to contruct fpn feature map |
layer_depth |
int32 |
optional |
layer depth for all the fpn feature Default: 256 |
extra_conv_layers |
int32 |
optional |
number of layers appened after the pyramid features Default: 0 |
PPNFeaturemapLayout
Field | Type | Label | Description |
from_layer |
string |
repeated |
from which layer to contruct ppn feature map |
num_layers |
int32 |
optional |
Default: 6 |
layer_depth |
int32 |
optional |
layer depth for all the fpn feature Default: 1024 |
Ssd
Configuration for Single Shot Detection (SSD) models.
Field | Type | Label | Description |
normalize_method |
Ssd.NormalizeMethod |
optional |
Method to normalze resized image before feed into backbone Default: SUBMEAN |
backbone |
Backbone |
required |
Backbone configuration |
ssd_head |
SsdHead |
required |
SSD head configuration |
freeze_batchnorm |
bool |
optional |
Whether to update batch norm parameters during training or not.
When training with a relative small batch size (e.g. 1), it is
desirable to disable batch norm update and use pretrained batch norm
params.
Note: Some feature extractors are used with canned arg_scopes
(e.g resnet arg scopes). In these cases training behavior of batch norm
variables may depend on both values of `batch_norm_trainable` and
`is_training`.
When canned arg_scopes are used with feature extractors `conv_hyperparams`
will apply only to the additional layers that are added and are outside the
canned arg_scope. Default: false |
inplace_batchnorm_update |
bool |
optional |
Whether to update batch_norm inplace during training. This is required
for batch norm to work correctly on TPUs. When this is false, user must add
a control dependency on tf.GraphKeys.UPDATE_OPS for train/loss op in order
to update the batch norm moving average parameters. Default: false |
SsdFeaturemapLayout
Field | Type | Label | Description |
from_layer |
string |
repeated |
from which layer to contruct multi-scale feature map,
size must equals the size of layer_depth |
layer_depth |
int32 |
repeated |
Specify each feature map layer depth |
SsdHead
Field | Type | Label | Description |
num_classes |
int32 |
required |
Number of classes to predict. |
ssd_featuremap_layout |
SsdFeaturemapLayout |
optional |
multi-scale feature map used in original ssd paper https://arxiv.org/abs/1512.02325 |
fpn_featuremap_layout |
FPNFeaturemapLayout |
optional |
use feature pyramid network (https://arxiv.org/abs/1612.03144)
to extract multi-scale feature |
ppn_featuremap_layout |
PPNFeaturemapLayout |
optional |
use Pooling Pyramid network (https://arxiv.org/abs/1807.03284)
to extract multi-scale feature |
depth_multiplier |
float |
optional |
The factor to alter the depth of the channels in the multi-scale feature extraction. Default: 1 |
min_depth |
int32 |
optional |
Minimum number of the channels in the multi-scale feature extraction. Default: 16 |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters that affect the layers of feature extractor added on top
of the base feature extractor. |
box_coder |
BoxCoder |
optional |
Box coder to encode the boxes. |
matcher |
Matcher |
optional |
Matcher to match groundtruth with anchors. |
similarity_calculator |
RegionSimilarityCalculator |
optional |
Region similarity calculator to compute similarity of boxes. |
anchor_generator |
AnchorGenerator |
optional |
Anchor generator to compute anchors. |
box_predictor |
BoxPredictor |
optional |
Box predictor to attach to the features. |
post_processing |
PostProcessing |
optional |
Post processing to apply on the predictions. |
negative_class_weight |
float |
optional |
classification weight to be associated to negative
anchors (default: 1.0). The weight must be in [0., 1.]. Default: 1 |
normalize_loss_by_num_matches |
bool |
optional |
Whether to normalize the loss by number of groundtruth boxes that match to
the anchors. Default: true |
normalize_loc_loss_by_codesize |
bool |
optional |
Whether to normalize the localization loss by the code size of the box
encodings. This is applied along with other normalization factors. Default: false |
loss |
Loss |
optional |
Loss configuration for training. |
add_summary |
bool |
optional |
Whether to summary training related info Default: true |
Ssd.NormalizeMethod
Name | Number | Description |
SUBMEAN |
0 |
|
DIVIDE_255 |
1 |
|
DIVIDE_255_MULTIPLY_2_MINUS_1 |
2 |
|
easy_vision/python/protos/string_int_label_map.proto
Top
Message to store the mapping from class label strings to class id. Datasets
use string labels to represent classes while the object detection framework
works with class ids. This message maps them so they can be converted back
and forth as needed.
StringIntLabelMap
StringIntLabelMapItem
Field | Type | Label | Description |
name |
string |
optional |
String name. The most common practice is to set this to a MID or synsets
id. |
id |
int32 |
optional |
Integer id that maps to the string name above. Label ids should start from
1. |
display_name |
string |
optional |
Human readable string label. |
ignore_recog |
bool |
optional |
This label ignore recognition or not in Default: false |
easy_vision/python/protos/text_encoder.proto
Top
CNNLineEncoder
Field | Type | Label | Description |
cnn_name |
string |
optional |
cnn class name, if no specified, will degenerate to a LineEncoder |
input_layer |
string |
optional |
cnn output feature name, default use last layer of cnn |
norm_type |
NormType |
optional |
normalization layer type Default: BATCH |
batchnorm_trainable |
bool |
optional |
batchnorm trainable or not Default: true |
weight_decay |
float |
optional |
weight_decay for l2 regularization Default: 0.0001 |
CNNSpatialEncoder
Field | Type | Label | Description |
cnn_name |
string |
optional |
cnn class name, if no specified, will degenerate to a SpatialEncoder |
input_layer |
string |
optional |
cnn output feature name, default use last layer of cnn |
norm_type |
NormType |
optional |
normalization layer type Default: BATCH |
batchnorm_trainable |
bool |
optional |
batchnorm trainable or not Default: true |
weight_decay |
float |
optional |
weight_decay for l2 regularization Default: 0.0001 |
CRNNEncoder
Field | Type | Label | Description |
cnn_name |
string |
optional |
cnn class name, if no specified, will degenerate to a RNNEncoder |
input_layer |
string |
optional |
cnn output feature name, default use last layer of cnn |
norm_type |
NormType |
optional |
normalization layer type Default: BATCH |
batchnorm_trainable |
bool |
optional |
batchnorm trainable or not Default: true |
weight_decay |
float |
optional |
weight_decay for l2 regularization Default: 0.0001 |
num_layers |
int32 |
optional |
rnn encoder depth Default: 2 |
basic_lstm |
BasicLSTM |
optional |
|
gru |
GRU |
optional |
|
layer_norm_basic_lstm |
LayerNormBasicLSTM |
optional |
|
nas |
NAS |
optional |
|
encoder_type |
CRNNEncoder.RnnEncoderType |
optional |
uni | bi For bi, we build num_encoder_layers/2 bi-directional layers. Default: UNI |
residual |
bool |
optional |
whether to add residual connections Default: true |
Field | Type | Label | Description |
num_layers |
int32 |
required |
number of encoder layers |
hidden_size |
int32 |
required |
hidden units size |
num_heads |
int32 |
required |
number of attention heads |
filter_size |
int32 |
required |
hidden size of FeedForwardLayer |
pooling_rate |
int32 |
optional |
pooling rate of input's width Default: 4 |
layer_postprocess_dropout |
float |
optional |
postprocess layer dropout Default: 0.1 |
attention_dropout |
float |
optional |
attention layer dropout Default: 0.1 |
relu_dropout |
float |
optional |
relu layer dropout Default: 0.1 |
CRNNEncoder.RnnEncoderType
Name | Number | Description |
UNI |
1 |
|
BI |
2 |
|
easy_vision/python/protos/text_end2end.proto
Top
FixedHeightFeatureGather
Field | Type | Label | Description |
input_layer |
string |
required |
|
height |
int32 |
required |
feature output with fixed height Default: 8 |
max_width |
int32 |
required |
feature filtered with max width Default: 300 |
visualize_height |
int32 |
required |
roi visualize images with fixed height Default: 32 |
visualize_width |
int32 |
required |
roi visualize images with fixed width Default: 100 |
num_buckets |
int32 |
optional |
number of buckets Default: 1 |
subsample_batch_size |
int32 |
optional |
batch size of sampled text line when training |
FixedHeightPyramidFeatureGather
Field | Type | Label | Description |
input_layer |
string |
repeated |
pyramid input feature s |
height |
int32 |
repeated |
pyramid roi feature heights, length of height must equal
to input layer, the last value of height is the output height |
max_width |
int32 |
required |
feature filtered with max width Default: 300 |
visualize_height |
int32 |
required |
roi visualize images with fixed height Default: 32 |
visualize_width |
int32 |
required |
roi visualize images with fixed width Default: 100 |
num_buckets |
int32 |
optional |
number of buckets Default: 1 |
subsample_batch_size |
int32 |
optional |
batch size of sampled text line when training |
norm_type |
NormType |
optional |
normalization layer type Default: BATCH |
batchnorm_trainable |
bool |
optional |
batchnorm trainable or not Default: true |
weight_decay |
float |
optional |
weight_decay for l2 regularization Default: 0.0001 |
use_se |
bool |
optional |
use squeeze and excitation layer or not Default: true |
FixedSizeFeatureGather
Field | Type | Label | Description |
input_layer |
string |
required |
|
height |
int32 |
optional |
feature output with fixed height Default: 8 |
width |
int32 |
optional |
feature output with fixed width Default: 25 |
visualize_height |
int32 |
optional |
roi visualize images with fixed height Default: 32 |
visualize_width |
int32 |
optional |
roi visualize images with fixed width Default: 100 |
subsample_batch_size |
int32 |
optional |
batch size of sampled text line when training |
TextEnd2End
easy_vision/python/protos/text_head.proto
Top
TextAttentionHead
Field | Type | Label | Description |
input_layer |
string |
optional |
input layer |
crnn_encoder |
CRNNEncoder |
optional |
|
cnn_line_encoder |
CNNLineEncoder |
optional |
|
cnn_spatial_encoder |
CNNSpatialEncoder |
optional |
|
attention_decoder |
RNNDecoderWithAttention |
required |
rnn attention decoder |
time_major |
bool |
optional |
whether to use time-major mode,
if time major, features must be [time, batch, ...] style Default: true |
TextCTCHead
TextKeypointHead
Field | Type | Label | Description |
input_layer |
string |
repeated |
input layer |
keypoint_predictor |
KeypointPredictor |
required |
keypoints predictor name |
initial_crop_size |
int32 |
required |
Output size (width and height are set to be the same) of the initial
bilinear interpolation based cropping during ROI pooling. |
maxpool_kernel_size |
int32 |
required |
Kernel size of the max pool op on the cropped feature map during
ROI pooling. |
maxpool_stride |
int32 |
required |
Stride of the max pool op on the cropped feature map during ROI pooling. |
num_keypoints |
int32 |
optional |
number of key points Default: 4 |
predict_direction |
bool |
optional |
predict text direction or not Default: false |
direction_trainable |
bool |
optional |
train text direction predictor or not Default: false |
unified_direction |
bool |
optional |
unify all texts direction when inference or evaluation Default: false |
smart_unified_direction |
bool |
optional |
unify almost all texts direction (except height > 2 * width)
when inference or evaluation Default: false |
third_stage_batch_size |
int32 |
optional |
The batch size per image used for computing the classification and refined
location loss of the box classifier. Default: 128 |
TextRectificationHead
Field | Type | Label | Description |
input_layer |
string |
optional |
input layer |
keypoint_predictor |
KeypointPredictor |
required |
keypoints predictor name |
num_keypoints |
int32 |
optional |
number of key points Default: 4 |
predict_direction |
bool |
optional |
predict text direction or not Default: true |
direction_trainable |
bool |
optional |
train text direction predictor or not Default: true |
TextTransformerHead
easy_vision/python/protos/text_krcnn.proto
Top
TextKRCNN
Field | Type | Label | Description |
backbone |
Backbone |
required |
backbone config |
fpn |
FPN |
optional |
FPN |
rpn_head |
RPNHead |
optional |
rpn head config |
rcnn_head |
RCNNHead |
optional |
rcnn head config |
fcn_head |
RCNNHead |
optional |
fcn head config |
keypoint_head |
TextKeypointHead |
optional |
keypoint head config |
easy_vision/python/protos/text_recognition.proto
Top
TextRecognition
easy_vision/python/protos/text_rectification.proto
Top
TextRectification
Field | Type | Label | Description |
backbone |
Backbone |
required |
backbone config |
rectification_head |
TextRectificationHead |
optional |
text rectification head config |
easy_vision/python/protos/train.proto
Top
TrainConfig
Message for configuring DetectionModel training jobs (train.py).
Next id: 25
optimizer options
Field | Type | Label | Description |
optimizer |
Optimizer |
optional |
Optimizer used to train the DetectionModel. |
gradient_clipping_by_norm |
float |
optional |
If greater than 0, clips gradients by this value. Default: 0 |
bias_grad_multiplier |
float |
optional |
If greater than 0, multiplies the gradient of bias variables by this
amount. Default: 0 |
regularization_loss |
float |
optional |
Whether to add regularization loss to `total_loss`, also called weight_decay Default: 0.0001 |
num_steps |
uint32 |
optional |
Number of steps to train the CVModel for. If 0, will train the model
indefinitely. Default: 0 |
fine_tune_checkpoint |
string |
optional |
Checkpoint to restore variables from. Typically used to load feature
extractor variables trained outside of object detection. |
fine_tune_checkpoint_type |
string |
optional |
Type of checkpoint to restore variables from, e.g. 'classification' or
'detection'. Provides extensibility to from_detection_checkpoint. |
fine_tune_ckpt_var_map |
string |
optional |
|
sync_replicas |
bool |
optional |
Whether to synchronize replicas during training.
In case so, build a SyncReplicateOptimizer Default: false |
startup_delay_steps |
float |
optional |
Number of training steps between replica startup.
This flag must be set to 0 if sync_replicas is set to true. Default: 15 |
replicas_to_aggregate |
int32 |
optional |
Number of replicas to aggregate before making parameter updates. Default: 1 |
num_worker_replicas |
int32 |
optional |
Number of worker replicas Default: 1 |
model_dir |
string |
required |
train model save dir |
save_checkpoints_steps |
uint32 |
optional |
Step interval for saving checkpoint Default: 5000 |
save_summary_steps |
uint32 |
optional |
Save summaries every this many steps. Default: 100 |
log_step_count_steps |
uint32 |
optional |
The frequency global step/sec and the loss will be logged during training. Default: 100 |
summary_model_vars |
bool |
optional |
summary model variables or not Default: false |
train_distribute |
string |
optional |
DistributionStrategy, available values are 'mirrored' and 'collective' and 'ess'
- mirrored: MirroredStrategy, single machine and multiple devices;
- collective: CollectiveAllReduceStrategy, multiple machines and multiple devices. |
num_gpus_per_worker |
int32 |
optional |
Number of gpus per machine Default: 1 |
write_graph |
bool |
optional |
write meta graph into graph.pbtxt and summary and checkpoint or not Default: true |
is_profiling |
bool |
optional |
profiling or not Default: false |
force_restore_shape_compatible |
bool |
optional |
if variable shape is incompatible, clip or pad variables in checkpoint Default: false |
summary_outputs |
bool |
optional |
summary output tensor or not Default: false |
use_unified_memory |
bool |
optional |
If true, uses CUDA unified memory for memory allocations. Default: false |
sub_learning_rate |
float |
optional |
sub learning rate, to control the subpart parameters learning rate by this coefficient Default: 0 |
iter_size_per_step |
int32 |
optional |
gradient accumulate iter size Default: 1 |
easy_vision/python/protos/trcnn_head.proto
Top
TRCNNHead
Field | Type | Label | Description |
input_layer |
string |
repeated |
|
num_classes |
int32 |
required |
|
initial_crop_size |
int32 |
optional |
Output size (width and height are set to be the same) of the initial
bilinear interpolation based cropping during ROI pooling. |
maxpool_kernel_size |
int32 |
optional |
Kernel size of the max pool op on the cropped feature map during
ROI pooling. |
maxpool_stride |
int32 |
optional |
Stride of the max pool op on the cropped feature map during ROI pooling. |
second_stage_box_predictor |
BoxPredictor |
optional |
Hyperparameters for the second stage box predictor. If box predictor type
is set to rfcn_box_predictor, a R-FCN model is constructed, otherwise a
Faster R-CNN model is constructed. |
nms_config |
BatchNonMaxSuppression |
required |
|
second_stage_batch_size |
int32 |
optional |
The batch size per image used for computing the classification and refined
location loss of the box classifier.
Note that this field is ignored if `hard_example_miner` is configured. Default: 128 |
second_stage_balance_fraction |
float |
optional |
Fraction of positive examples to use per image for the box classifier. Default: 0.25 |
hard_example_miner |
HardExampleMiner |
optional |
|
second_stage_localization_loss_weight |
float |
optional |
Second stage RCNN localization loss weight Default: 1 |
second_stage_classification_loss_weight |
float |
optional |
Second stage RCNN classification loss weight Default: 1 |
easy_vision/python/protos/trpn_head.proto
Top
TRPNHead
Field | Type | Label | Description |
input_layer |
string |
repeated |
|
box_predictor |
BoxPredictor |
required |
|
first_stage_minibatch_size |
int32 |
optional |
The batch size to use for computing the first stage objectness and
location losses. Default: 256 |
first_stage_positive_balance_fraction |
float |
optional |
Fraction of positive examples per image for the RPN. Default: 0.5 |
first_stage_nms_score_threshold |
float |
optional |
Non max suppression score threshold applied to first stage RPN proposals. Default: 0 |
first_stage_nms_iou_threshold |
float |
optional |
Non max suppression IOU threshold applied to first stage RPN proposals. Default: 0.7 |
first_stage_max_proposals |
int32 |
optional |
Maximum number of RPN proposals retained after first stage postprocessing. Default: 300 |
first_stage_anchor_generator |
AnchorGenerator |
optional |
Anchor generator to compute RPN anchors. |
first_stage_localization_loss_weight |
float |
optional |
First stage RPN localization loss weight. Default: 1 |
first_stage_objectness_loss_weight |
float |
optional |
First stage RPN objectness loss weight. Default: 1 |
rpn_min_size |
int32 |
optional |
at postprocessing stage, filter rpn out box, drop all boxes[width/height<rpn_min_size] Default: 0 |
pre_nms_topn |
int32 |
optional |
pre nms topn, only valid when large than 0
it should be set large enough, otherwise
it may hurt performance Default: -1 |
easy_vision/python/protos/user_defined_param.proto
Top
UserDefinedParam
Field | Type | Label | Description |
name |
string |
required |
|
int64_value |
int64 |
optional |
|
int32_value |
int32 |
optional |
|
uint64_value |
uint64 |
optional |
|
uint32_value |
uint32 |
optional |
|
float_value |
float |
optional |
|
bool_value |
bool |
optional |
|
string_value |
string |
optional |
|
UserDefinedParams
Field | Type | Label | Description |
param |
UserDefinedParam |
repeated |
parameter with type float, int64, uint64, bool, string |
easy_vision/python/protos/video_classification.proto
Top
VideoClassificationModel
Field | Type | Label | Description |
input_width |
int32 |
optional |
input width height, if not set, will use default input size instead |
input_height |
int32 |
optional |
|
backbone |
Backbone |
required |
Backbone configuration |
num_classes |
int32 |
required |
Number of classes |
loss |
ClassificationLoss |
required |
Loss configuration for training |
preprocessing_method |
string |
optional |
Preprocessing method name, if not set, use the corresponding method for
the backbone |
add_summary |
bool |
optional |
Whether to summary training related info Default: true |
label_id_offset |
int32 |
optional |
label_id offset, will be used to subtract from groundtruth class
when calcuating loss amd evaluation Default: 0 |
class_specific_evaluation |
bool |
optional |
Whether to add class-specific evaluation Default: false |
modal |
string |
optional |
model input modal 'rgb', 'flow', 'rgb+flow' Default: rgb |
easy_vision/python/protos/yolo.proto
Top
YOLO
Configuration for YOLO models.
Field | Type | Label | Description |
backbone |
Backbone |
required |
Backbone configuration |
yolo_head |
YOLOHead |
required |
YOLO head configuration |
YOLOFeaturemapLayout
Field | Type | Label | Description |
from_layer |
string |
repeated |
from which layer to contruct multi-scale feature map |
use_pan |
bool |
optional |
use path aggregation network structure or not Default: false |
use_spp |
bool |
optional |
use spatial pyramid pooling structure or not Default: false |
use_sam |
bool |
optional |
use convolutional spatial attention module or not Default: false |
fpn_shrink_channel_before_fusion |
bool |
optional |
in top_down branch, shrink feature channels to
half of original channels before feature fusion. Default: false |
fixed_features_output_dim |
int32 |
optional |
control yolo head fpn to transform all output featuremaps to have same channels num, =0 means dont transform Default: 0 |
YOLOHead
Field | Type | Label | Description |
num_classes |
int32 |
required |
Number of classes to predict. |
yolo_featuremap_layout |
YOLOFeaturemapLayout |
required |
YOLO featuremap definition |
conv_hyperparams |
Hyperparams |
optional |
Hyperparameters that affect the layers of feature extractor added on top
of the base feature extractor. |
box_coder |
BoxCoder |
repeated |
Box coder to encode the boxes, if the number of box_coder > 1,
the number of box_coder must be equal to the number of feature_maps |
matcher |
Matcher |
required |
Matcher to match groundtruth with anchors. |
anchor_generator |
AnchorGenerator |
optional |
Anchor generator to compute anchors. |
box_predictor |
BoxPredictor |
required |
Box predictor to attach to the features. |
post_processing |
PostProcessing |
required |
Post processing to apply on the predictions. |
loss |
Loss |
optional |
Loss configuration for training. |
ignore_threshold |
float |
optional |
Ignore threshold, prediction box which has iou larger than this threshold
but not match a groundtruth box will not be considered as negative samples Default: 0.5 |
output_roi_features |
bool |
optional |
Output detection roi features or not Default: false |
roi_feature_depth |
int32 |
optional |
number of output channels of roi features Default: 512 |
Scalar Value Types
.proto Type | Notes | C++ Type | Java Type | Python Type |
double |
|
double |
double |
float |
float |
|
float |
float |
float |
int32 |
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. |
int32 |
int |
int |
int64 |
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. |
int64 |
long |
int/long |
uint32 |
Uses variable-length encoding. |
uint32 |
int |
int/long |
uint64 |
Uses variable-length encoding. |
uint64 |
long |
int/long |
sint32 |
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. |
int32 |
int |
int |
sint64 |
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. |
int64 |
long |
int/long |
fixed32 |
Always four bytes. More efficient than uint32 if values are often greater than 2^28. |
uint32 |
int |
int |
fixed64 |
Always eight bytes. More efficient than uint64 if values are often greater than 2^56. |
uint64 |
long |
int/long |
sfixed32 |
Always four bytes. |
int32 |
int |
int |
sfixed64 |
Always eight bytes. |
int64 |
long |
int/long |
bool |
|
bool |
boolean |
boolean |
string |
A string must always contain UTF-8 encoded or 7-bit ASCII text. |
string |
String |
str/unicode |
bytes |
May contain any arbitrary sequence of bytes. |
string |
ByteString |
str |