Writing Configuration File
NNTrainer requires network configuration file which includes network layers and hyper-parameters. The format of configuration file is iniparser format which is commonly used. Keywords are not case sensitive and the line start with '#' will be ignored.
If you want more about iniparser, please visit https://github.com/ndevilla/iniparser
Sections
Model Section
Model section includes the hyper-parameters for the Network such type, epochs, loss, save path and batch size.
Start with "[Model]"
-
type (mandatory) = <string>
Type of Network
- regression : network for linear regression
- knn : K-nearest neighbor
- neuralnetwork : Deep Neural Network
-
epochs = <unsigned int>
Number of epochs to train
Create a new section for this
-
loss = <string>
Loss function
- mse : mean squared error
- cross : cross entropy Only allowed with sigmoid and softmax activation function
- skip this property if no loss is desired for the model (this model will only support inference)
-
save_path = <string>
Model file path to save updated weights
-
batch_size = <unsigned int>
Mini batch size
Below is sample Network section.
# Network Section : Network
[Model]
type = NeuralNetwork
epochs = 1500
loss = cross
save_path = "model.bin"
batch_size = 32
Optimizer Section
Define the optimizer to be used for training. This is an optional section needed only for training, and can be skipped for inference.
Start with "[Optimizer]"
-
type = <string>
Optimizer type to apply the gradients to weights. The default value is adam if the type is not used.
- adam : Adaptive Moment Estimation
- sgd : stochastic gradient decent
-
beta1 = <float>
beta1 parameter for adam optimizer. Only valid for adam. The default value is 0.9.
-
beta2 = <float>
beta2 parameter for adam optimizer. Only valid for adam. The default value is 0.999.
-
epsilon = <float>
Epsilon parameter for adam optimizer. Only valid for adam. The default value is 1.0e-7.
Below is a sample Optimizer section.
# Optimizer Section
[Optimizer]
type = adam
beta1 = 0.9
beta2 = 0.999
epsilon = 1e-7
Learning Rate Scheduler Section
Define the type, learning rate, decay steps and decay rate.
Start with "[LearningRateScheduler]"
-
type = <string>
constant, exponential and step are supported.
- constant : constant learning rate
- exponential : exponential decay
- step: step decay
-
learning_rate = <float>
Initial learning rate to decay.
Constant and exponential receive only one float value.
However, step must receive two or more float values separated by commas.
learning_rate = <float>, <float>, ..., <float>
-
decay_steps = <float>
Decay steps. Only valid for exponential.
-
decay_rate = <float>
Decay rate
-
iteration = <unsigned int>, <unsigned int>, ..., <unsigned int>
Iteration, Only valid for step. Step receive one or more unsigned int value separated by commas.
Below is a sample Learning Rate scheduler Section.
# Learning Rate Scheduler Section
[LearningRateScheduler]
type=constant
learning_rate = 1e-4 # Learning Rate
Train Set Section
Define the type and path of the traing data file.
Start with "[train_set]"
-
type = <string>
Currently only file is supported.
-
path = <string>
Data path for training, The path is mandatory.
Below is a sample Train Set section.
# Train Set Section
[train_set]
type = file
path = trainDataset.dat
Validation Set Section
Define the type and path of the validation data file.
Start with "[valid_set]"
-
type = <string>
Currently only file is supported.
-
path = <string>
Data path for validation.
Below is a sample Validation Set Section.
# Validation Set Section
[valid_set]
type = file
path = validDataset.dat
Test Set Section
Define the type and path of the test data file.
Start with "[test_set]"
-
type = <string>
Currently only file is supported.
-
path = <string>
Data path for test.
Below is a sample Test Set Section.
# Test Set Section
[test_set]
type = file
path = testDataset.dat
Layer Section
Describe hyper-parameters for layer. Order of layers in the model follows the order of definition of layers here from top to bottom.
Start with "[ ${layer name} ]". This layer name must be unique throughout network model.
-
type = <string>
Type of Layer
- input : input layer
- fully_connected : fully connected layer
- batch_normalization : batch normalization layer
- conv2d : convolution 2D layer
- pooling2d : pooling 2D layer
- flatten : flatten layer
- activation : activation layer
- addition : addition layer
- concat : concat layer
- multiout : multiout layer
- embedding : embedding layer
- rnn : RNN layer
- lstm : LSTM layer
- split : split layer
- gru : GRU layer
- permute : permute layer
- dropout : dropout layer
- backbone_nnstreamer : backbone layer using nnstreamer
- backbone_tflite : backbone layer using tflite
- centroid_knn : centroid KNN layer
- conv1d : convolution 1D layer
- lstmcell : LSTM Cell layer
- grucell : GRU Cell layer
- rnncell : RNN Cell layer
- zoneout_lstmcell : Zoneout LSTM Cell layer
- preprocess_flip : preprocess flip layer
- preprocess_translate : preprocess translate layer
- preprocess_l2norm : preprocess l2norm layer
- mse : MSE loss layer
- cross_sigmoid : cross entropy with sigmoid loss layer
- cross_softmax : Cross entropy with softmax loss layer
-
key = value
The table below shows the available keys and values for each layer type. There are two types of layers. One type includes commonly trainable weights and the other type does not include. The following are the available properties for each layer type which include commonly trainable weights:
Type Key Value Default value Description (Universal properties) Universal properties that applies to every layer name (string) An identifier for each layer trainable (boolean) true Allow weights to be trained if true input_layers (string) Comma-separated names of layers to be inputs of the current layer input_shape (string) Comma-separated Formatted string as “channel:height:width”. If there is no channel then it must be 1. First layer of the model must have input_shape. Other can be omitted as it is calculated at compile phase. flatten (boolean) Flatten shape from c:h:w
to1:1:c*h*w
activation (categorical) Activation type tanh Hyperbolic tangent sigmoid Sigmoid function relu Relu function softmax Softmax function loss (float) 0 Loss weight_initializer (categorical) xavier_uniform Weight initializer zeros Zero initialization lecun_normal LeCun normal initialization lecun_uniform LeCun uniform initialization xavier_normal Xavier normal initialization xavier_uniform Xavier uniform initialization he_normal He normal initialization he_uniform He uniform initialization bias_initializer (categorical) zeros Bias initializer zeros Zero initialization lecun_normal LeCun normal initialization lecun_uniform LeCun uniform initialization xavier_normal Xavier normal initialization xavier_uniform Xavier uniform initialization he_normal He normal initialization he_uniform He uniform initialization weight_regularizer (categorical) Weight regularizer. Currently, only l2norm is supported l2norm L2 weight regularizer weight_regularizer_constant (float) 1 Weight regularizer constant fully_connected
Fully connected layer unit (unsigned integer) Number of outputs conv2d
2D Convolution layer filters (unsigned integer) Number of filters kernel_size (array of unsigned integer) Comma-separated unsigned integers for kernel size, height, width
respectively stride (array of unsigned integer) 1, 1 Comma-separated unsigned integers for strides, height, width
respectively padding (categorical) valid Padding type valid No padding same Preserve height/width dimension (unsigned integer) Size of padding applied uniformly to all side (array of unsigned integer of size 2) Padding for height, width (array of unsigned integer of size 4) Padding for top, bottom, left, right embedding
Embedding layer in_dim (unsigned integer) Vocabulary size out_dim (unsigned integer) Word embeddeing size rnn
RNN layer unit (unsigned integer) Number of output neurons hidden_state_activation (categorical) tanh Activation type tanh Hyperbolic tangent sigmoid Sigmoid function relu Relu function softmax Softmax function return_sequences (boolean) false Return only the last output if true, else return full output dropout (float) 0 Dropout rate lstm
LSTM layer unit (unsigned integer) Number of output neurons hidden_state_activation (categorical) tanh Activation type tanh Hyperbolic tangent sigmoid Sigmoid function relu Relu function softmax Softmax function recurrent_activation (categorical) sigmoid Activation type for recurrent step tanh Hyperbolic tangent sigmoid Sigmoid function relu Relu function softmax Softmax function return_sequences (boolean) false Return only the last output if true, else return full output dropout (float) 0 Dropout rate gru
GRU layer unit (unsigned integer) Number of output neurons hidden_state_activation (categorical) tanh Activation type tanh Hyperbolic tangent sigmoid Sigmoid function relu Relu function softmax Softmax function recurrent_activation (categorical) sigmoid Activation type for recurrent step tanh Hyperbolic tangent sigmoid Sigmoid function relu Relu function softmax Softmax function return_sequences (boolean) false Return only the last output if true, else return full output dropout (float) 0 Dropout rate The following are the available properties for each layer type which does not include (
weight_initializer
,bias_initializer
,weight_regularizer
,weight_regularizer_constant
) properties.Type Key Value Default value Description (Universal properties) Universal properties that applies to every layer name (string) An identifier for each layer trainable (boolean) true Allow weights to be trained if true input_layers (string) Comma-separated names of layers to be inputs of the current layer input_shape (string) Comma-separated Formatted string as “channel:height:width”. If there is no channel then it must be 1. First layer of the model must have input_shape. Other can be omitted as it is calculated at compile phase. flatten (boolean) Flatten shape from c:h:w
to1:1:c*h*w
activation (categorical) Activation type tanh Hyperbolic tangent sigmoid Sigmoid function relu Relu function softmax Softmax function loss (float) 0 Loss input
Input layer normalization (boolean) false Normalize input if true standardization (boolean) false Standardize input if true batch_normalization
Batch normalization layer epsilon (float) 0.001 Small value to avoid divide by zero moving_mean_initializer (categorical) zeros Moving mean initializer zeros Zero initialization lecun_normal LeCun normal initialization lecun_uniform LeCun uniform initialization xavier_normal Xavier normal initialization xavier_uniform Xavier uniform initialization he_normal He normal initialization he_uniform He uniform initialization moving_variance_initializer (categorical) ones Moving variance initializer zeros Zero initialization lecun_normal LeCun normal initialization lecun_uniform LeCun uniform initialization xavier_normal Xavier normal initialization xavier_uniform Xavier uniform initialization he_normal He normal initialization he_uniform He uniform initialization gamma_initializer (categorical) ones Gamma initializer zeros Zero initialization lecun_normal LeCun normal initialization lecun_uniform LeCun uniform initialization xavier_normal Xavier normal initialization xavier_uniform Xavier uniform initialization he_normal He normal initialization he_uniform He uniform initialization beta_initializer (categorical) zeros Beta initializer zeros Zero initialization lecun_normal LeCun normal initialization lecun_uniform LeCun uniform initialization xavier_normal Xavier normal initialization xavier_uniform Xavier uniform initialization he_normal He normal initialization he_uniform He uniform initialization momentum (float) 0.99 Momentum for moving average in batch normalization pooling2d
Pooling layer pooling (categorical) Pooling type max Max pooling average Average pooling global_max Global max pooling global_average Global average pooling pool_size (array of unsigned integer) Comma-separated unsigned intergers for pooling size, height, width
respectively stride (array of unsigned integer) 1, 1 Comma-separated unsigned intergers for stride, height, width
respectively padding (categorical) valid Padding type valid No padding same Preserve height/width dimension (unsigned integer) Size of padding applied uniformly to all side (array of unsigned integer of size 2) Padding for height, width (array of unsigned integer of size 4) Padding for top, bottom, left, right flatten
Flatten layer activation
Activation layer activation (categorical) Activation type tanh Hyperbolic tangent sigmoid Sigmoid function relu Relu function softmax Softmax function addition
Addition layer concat
Concat layer multiout
Multiout layer split
Split layer split_dimension (unsigned integer) Which dimension to split. Split batch dimension is not allowed permute
Permute layer dropout
Dropout layer dropout (float) 0 Dropout rate backbone_nnstreamer
NNStreamer layer model_path (string) NNStreamer model path backbone_tflite
TensorFlow Lite layer model_path (string) TensorFlow Lite model path centroid_knn
Centroid KNN layer num_class (unsigned integer) Number of class preprocess_flip
Preprocess flip layer flip_direction (categorical) Flip direction horizontal Horizontal direction vertical Vertiacl direction horizontal_and_vertical horizontal_and_vertical Horizontal_and vertical direction preprocess_translate
Preprocess translate layer random_translate (float) Translate factor value preprocess_l2norm
Preprocess l2norm layer mse
MSE loss layer cross_sigmoid
Cross entropy with sigmoid loss layer cross_softmax
Cross entropy with softmax loss layer Below is sample for layers to define a model.
[conv2d_c2_layer] type = conv2d kernel_size = 5,5 bias_initializer = zeros activation = sigmoid weight_initializer = xavier_uniform filters = 12 stride = 1,1 padding = 0,0 [outputlayer] type = fully_connected Unit = 10 weight_initializer = xavier_uniform bias_initializer = zeros activation = softmax
Backbone section
This allows to describe another model, termed as backbone, to be used in the model described by the current ini file. The backbone to be used can be described with another ini configuration file path, or with model file for external frameworks. Support for backbones of external framework for Tensorflow-Lite is provided natively with Tensorflow-Lite framework. Support for backbones of other external frameworks is done using nnstreamer and its plugin. When using nnstreamer for external framework, ensure to add the corresponding baseline ML framework and its corresponding nnstreamer plugin as a dependency or install manually. For example, when using PyTorch based model as a backbone, both the packages PyTorch and nnstreamer-pytorch must be installed.
Backbones made of nntrainer models, described using ini, support training the backbone also. However, this is not supported with external frameworks. It is possible to describe a backbone inside a backbone ini configuration file, as well as listing down multiple backbones to build a single model. For backbone ini configuration file, Model and Dataset sections are ignored.
Describing a backbone is very similar to describing a layer. Start with a "[ ${layer name} ]" which must be unique throughtout the model. In case of backbone, the name of the backbone is prepended to the name of all the layers inside the backbone.
-
backbone = <string>
Path of the backbone file. Supported model files:
- .ini - NNTrainer models
- .tflite - Tensorflow-Lite models
- .pb / .pt / .py / .circle etc via NNStreamer (corresponding nnstreamer plugin required)
-
trainable = <bool>
If this backbone must be trained (defaults to false). Only supported for ini backbones (nntrainer models).
Below is sample backbone section.
# Model Section [Model] ... # block1 [block1] backbone = resnet_block.ini trainable = false # block2 [block2] backbone = resnet_block.ini trainable = true [outputlayer] type = fully_connected unit = 10 activation = softmax
Configuration file example
Only INI formatted files *.ini is supported to construct a model from a file. Special sections [Model], [Optimizers], [LearningRateScheduler], [train_set], [valid_set], [test_set] are respectively referring to model, optimizer and data provider objects. Rest of INI sections map to a layer. Keys and values from each section set properties of the layer. All keys and values are treated as case-insensitive.
The following restrictions must be adhered to:
- Model file must have a
[Model]
section. - Model file must have at least one layer.
- Valid keys must have valid properties. The invalid keys in each section result in an error.
- All paths inside the INI file are relative to the INI file path unless the absolute path is stated.
Below is sample backbone section. It takes 1 x 28 x 28 gray data (0~255) as an input. Adam optimizer is used to apply gradient and learning rate is 1.0e-4.
# Model Section [Model] type = NeuralNetwork # Network Type : Regression, KNN, NeuralNetwork epochs = 1500 # Epochs loss = cross # Loss function : mse (mean squared error) # cross ( for cross entropy ) save_path = "mnist_model.bin" # model path to save / read batch_size = 32 # batch size [Optimizer] type = adam beta1 = 0.9 # beta 1 for adam beta2 = 0.999 # beta 2 for adam epsilon = 1e-7 # epsilon for adam [LearningRateScheduler] type=constant learning_rate = 1e-4 # Learning Rate # Train Set Section [train_set] type = file path = "trainDataset.dat" # Layer Section : Name [inputlayer] type = input input_shape = 1:28:28 # Layer Section : Name [conv2d_c1_layer] type = conv2d input_layers = inputlayer kernel_size = 5,5 bias_initializer = zeros activation = sigmoid weight_initializer = xavier_uniform filters = 6 stride = 1,1 padding = 0,0 [pooling2d_p1] type = pooling2d input_layers = conv2d_c1_layer pool_size = 2,2 stride = 2,2 padding = 0,0 pooling = average [conv2d_c2_layer] type = conv2d input_layers = pooling2d_p1 kernel_size = 5,5 bias_initializer = zeros activation = sigmoid weight_initializer = xavier_uniform filters = 12 stride = 1,1 padding = 0,0 [pooling2d_p2] type = pooling2d input_layers = conv2d_c2_layer pool_size = 2,2 stride =2,2 padding = 0,0 pooling = average [flatten] type = flatten input_layers = pooling2d_p2 [outputlayer] type = fully_connected input_layers = flatten unit = 10 # Output Layer Dimension ( = Weight Width ) weight_initializer = xavier_uniform bias_initializer = zeros activation = softmax # activation : sigmoid, softmax
-
The results of the search are