Writing Configuration File

NNTrainer requires network configuration file which includes network layers and hyper-parameters. The format of configuration file is iniparser format which is commonly used. Keywords are not case sensitive and the line start with '#' will be ignored.

If you want more about iniparser, please visit https://github.com/ndevilla/iniparser

Sections

Model Section

Model section includes the hyper-parameters for the Network such type, epochs, loss, save path and batch size.

Start with "[Model]"

  1. type (mandatory) = <string>

    Type of Network

    • regression : network for linear regression
    • knn : K-nearest neighbor
    • neuralnetwork : Deep Neural Network
  2. epochs = <unsigned int>

    Number of epochs to train

    Create a new section for this

  3. loss = <string>

    Loss function

    • mse : mean squared error
    • cross : cross entropy Only allowed with sigmoid and softmax activation function
    • skip this property if no loss is desired for the model (this model will only support inference)
  4. save_path = <string>

    Model file path to save updated weights

  5. batch_size = <unsigned int>

    Mini batch size

Below is sample Network section.

# Network Section : Network
[Model]
type = NeuralNetwork
epochs = 1500
loss = cross
save_path = "model.bin"
batch_size = 32

Optimizer Section

Define the optimizer to be used for training. This is an optional section needed only for training, and can be skipped for inference.

Start with "[Optimizer]"

  1. type = <string>

    Optimizer type to apply the gradients to weights. The default value is adam if the type is not used.

    • adam : Adaptive Moment Estimation
    • sgd : stochastic gradient decent
  2. beta1 = <float>

    beta1 parameter for adam optimizer. Only valid for adam. The default value is 0.9.

  3. beta2 = <float>

    beta2 parameter for adam optimizer. Only valid for adam. The default value is 0.999.

  4. epsilon = <float>

    Epsilon parameter for adam optimizer. Only valid for adam. The default value is 1.0e-7.

Below is a sample Optimizer section.

# Optimizer Section
[Optimizer]
type = adam
beta1 = 0.9
beta2 = 0.999
epsilon = 1e-7

Learning Rate Scheduler Section

Define the type, learning rate, decay steps and decay rate.

Start with "[LearningRateScheduler]"

  1. type = <string>

    constant, exponential and step are supported.

    • constant : constant learning rate
    • exponential : exponential decay
    • step: step decay
  2. learning_rate = <float>

    Initial learning rate to decay.

    Constant and exponential receive only one float value.

    However, step must receive two or more float values separated by commas.

    learning_rate = <float>, <float>, ..., <float>

  3. decay_steps = <float>

    Decay steps. Only valid for exponential.

  4. decay_rate = <float>

    Decay rate

  5. iteration = <unsigned int>, <unsigned int>, ..., <unsigned int>

    Iteration, Only valid for step. Step receive one or more unsigned int value separated by commas.

Below is a sample Learning Rate scheduler Section.

# Learning Rate Scheduler Section
[LearningRateScheduler]
type=constant
learning_rate = 1e-4 	# Learning Rate

Train Set Section

Define the type and path of the traing data file.

Start with "[train_set]"

  1. type = <string>

    Currently only file is supported.

  2. path = <string>

    Data path for training, The path is mandatory.

Below is a sample Train Set section.

# Train Set Section
[train_set]
type = file
path = trainDataset.dat

Validation Set Section

Define the type and path of the validation data file.

Start with "[valid_set]"

  1. type = <string>

    Currently only file is supported.

  2. path = <string>

    Data path for validation.

Below is a sample Validation Set Section.

# Validation Set Section
[valid_set]
type = file
path = validDataset.dat

Test Set Section

Define the type and path of the test data file.

Start with "[test_set]"

  1. type = <string>

    Currently only file is supported.

  2. path = <string>

    Data path for test.

Below is a sample Test Set Section.

# Test Set Section
[test_set]
type = file
path = testDataset.dat

Layer Section

Describe hyper-parameters for layer. Order of layers in the model follows the order of definition of layers here from top to bottom.

Start with "[ ${layer name} ]". This layer name must be unique throughout network model.

  1. type = <string>

    Type of Layer

    • input : input layer
    • fully_connected : fully connected layer
    • batch_normalization : batch normalization layer
    • conv2d : convolution 2D layer
    • pooling2d : pooling 2D layer
    • flatten : flatten layer
    • activation : activation layer
    • addition : addition layer
    • concat : concat layer
    • multiout : multiout layer
    • embedding : embedding layer
    • rnn : RNN layer
    • lstm : LSTM layer
    • split : split layer
    • gru : GRU layer
    • permute : permute layer
    • dropout : dropout layer
    • backbone_nnstreamer : backbone layer using nnstreamer
    • backbone_tflite : backbone layer using tflite
    • centroid_knn : centroid KNN layer
    • conv1d : convolution 1D layer
    • lstmcell : LSTM Cell layer
    • grucell : GRU Cell layer
    • rnncell : RNN Cell layer
    • zoneout_lstmcell : Zoneout LSTM Cell layer
    • preprocess_flip : preprocess flip layer
    • preprocess_translate : preprocess translate layer
    • preprocess_l2norm : preprocess l2norm layer
    • mse : MSE loss layer
    • cross_sigmoid : cross entropy with sigmoid loss layer
    • cross_softmax : Cross entropy with softmax loss layer
  2. key = value

    The table below shows the available keys and values for each layer type. There are two types of layers. One type includes commonly trainable weights and the other type does not include. The following are the available properties for each layer type which include commonly trainable weights:

    Type Key Value Default value Description
    (Universal properties) Universal properties that applies to every layer
     name (string) An identifier for each layer
     trainable (boolean) true Allow weights to be trained if true
     input_layers (string) Comma-separated names of layers to be inputs of the current layer
     input_shape (string) Comma-separated Formatted string as “channel:height:width”. If there is no channel then it must be 1. First layer of the model must have input_shape. Other can be omitted as it is calculated at compile phase.
     flatten (boolean) Flatten shape from c:h:w to 1:1:c*h*w
     activation (categorical) Activation type
     tanh Hyperbolic tangent
     sigmoid Sigmoid function
     relu Relu function
     softmax Softmax function
     loss (float) 0 Loss
     weight_initializer (categorical) xavier_uniform Weight initializer
     zeros Zero initialization
     lecun_normal LeCun normal initialization
     lecun_uniform LeCun uniform initialization
     xavier_normal Xavier normal initialization
     xavier_uniform Xavier uniform initialization
     he_normal He normal initialization
     he_uniform He uniform initialization
     bias_initializer (categorical) zeros Bias initializer
     zeros Zero initialization
     lecun_normal LeCun normal initialization
     lecun_uniform LeCun uniform initialization
     xavier_normal Xavier normal initialization
     xavier_uniform Xavier uniform initialization
     he_normal He normal initialization
     he_uniform He uniform initialization
     weight_regularizer (categorical) Weight regularizer. Currently, only l2norm is supported
     l2norm L2 weight regularizer
     weight_regularizer_constant (float) 1 Weight regularizer constant
    fully_connected Fully connected layer
     unit (unsigned integer) Number of outputs
    conv2d 2D Convolution layer
     filters (unsigned integer) Number of filters
     kernel_size (array of unsigned integer) Comma-separated unsigned integers for kernel size, height, width respectively
     stride (array of unsigned integer) 1, 1 Comma-separated unsigned integers for strides, height, width respectively
     padding (categorical) valid Padding type
     valid No padding
     same Preserve height/width dimension
     (unsigned integer) Size of padding applied uniformly to all side
     (array of unsigned integer of size 2) Padding for height, width
     (array of unsigned integer of size 4) Padding for top, bottom, left, right
    embedding Embedding layer
     in_dim (unsigned integer) Vocabulary size
     out_dim (unsigned integer) Word embeddeing size
    rnn RNN layer
     unit (unsigned integer) Number of output neurons
     hidden_state_activation (categorical) tanh Activation type
     tanh Hyperbolic tangent
     sigmoid Sigmoid function
     relu Relu function
     softmax Softmax function
     return_sequences (boolean) false Return only the last output if true, else return full output
     dropout (float) 0 Dropout rate
    lstm LSTM layer
     unit (unsigned integer) Number of output neurons
     hidden_state_activation (categorical) tanh Activation type
     tanh Hyperbolic tangent
     sigmoid Sigmoid function
     relu Relu function
     softmax Softmax function
     recurrent_activation (categorical) sigmoid Activation type for recurrent step
     tanh Hyperbolic tangent
     sigmoid Sigmoid function
     relu Relu function
     softmax Softmax function
     return_sequences (boolean) false Return only the last output if true, else return full output
     dropout (float) 0 Dropout rate
    gru GRU layer
     unit (unsigned integer) Number of output neurons
     hidden_state_activation (categorical) tanh Activation type
     tanh Hyperbolic tangent
     sigmoid Sigmoid function
     relu Relu function
     softmax Softmax function
     recurrent_activation (categorical) sigmoid Activation type for recurrent step
     tanh Hyperbolic tangent
     sigmoid Sigmoid function
     relu Relu function
     softmax Softmax function
     return_sequences (boolean) false Return only the last output if true, else return full output
     dropout (float) 0 Dropout rate

    The following are the available properties for each layer type which does not include (weight_initializer, bias_initializer, weight_regularizer, weight_regularizer_constant) properties.

    Type Key Value Default value Description
    (Universal properties) Universal properties that applies to every layer
     name (string) An identifier for each layer
     trainable (boolean) true Allow weights to be trained if true
     input_layers (string) Comma-separated names of layers to be inputs of the current layer
     input_shape (string) Comma-separated Formatted string as “channel:height:width”. If there is no channel then it must be 1. First layer of the model must have input_shape. Other can be omitted as it is calculated at compile phase.
     flatten (boolean) Flatten shape from c:h:w to 1:1:c*h*w
     activation (categorical) Activation type
     tanh Hyperbolic tangent
     sigmoid Sigmoid function
     relu Relu function
     softmax Softmax function
     loss (float) 0 Loss
    input Input layer
     normalization (boolean) false Normalize input if true
     standardization (boolean) false Standardize input if true
    batch_normalization Batch normalization layer
     epsilon (float) 0.001 Small value to avoid divide by zero
     moving_mean_initializer (categorical) zeros Moving mean initializer
     zeros Zero initialization
     lecun_normal LeCun normal initialization
     lecun_uniform LeCun uniform initialization
     xavier_normal Xavier normal initialization
     xavier_uniform Xavier uniform initialization
     he_normal He normal initialization
     he_uniform He uniform initialization
     moving_variance_initializer (categorical) ones Moving variance initializer
     zeros Zero initialization
     lecun_normal LeCun normal initialization
     lecun_uniform LeCun uniform initialization
     xavier_normal Xavier normal initialization
     xavier_uniform Xavier uniform initialization
     he_normal He normal initialization
     he_uniform He uniform initialization
     gamma_initializer (categorical) ones Gamma initializer
     zeros Zero initialization
     lecun_normal LeCun normal initialization
     lecun_uniform LeCun uniform initialization
     xavier_normal Xavier normal initialization
     xavier_uniform Xavier uniform initialization
     he_normal He normal initialization
     he_uniform He uniform initialization
     beta_initializer (categorical) zeros Beta initializer
     zeros Zero initialization
     lecun_normal LeCun normal initialization
     lecun_uniform LeCun uniform initialization
     xavier_normal Xavier normal initialization
     xavier_uniform Xavier uniform initialization
     he_normal He normal initialization
     he_uniform He uniform initialization
     momentum (float) 0.99 Momentum for moving average in batch normalization
    pooling2d Pooling layer
     pooling (categorical) Pooling type
     max Max pooling
     average Average pooling
     global_max Global max pooling
     global_average Global average pooling
     pool_size (array of unsigned integer) Comma-separated unsigned intergers for pooling size, height, width respectively
     stride (array of unsigned integer) 1, 1 Comma-separated unsigned intergers for stride, height, width respectively
     padding (categorical) valid Padding type
     valid No padding
     same Preserve height/width dimension
     (unsigned integer) Size of padding applied uniformly to all side
     (array of unsigned integer of size 2) Padding for height, width
     (array of unsigned integer of size 4) Padding for top, bottom, left, right
    flatten Flatten layer
    activation Activation layer
     activation (categorical) Activation type
     tanh Hyperbolic tangent
     sigmoid Sigmoid function
     relu Relu function
     softmax Softmax function
    addition Addition layer
    concat Concat layer
    multiout Multiout layer
    split Split layer
     split_dimension (unsigned integer) Which dimension to split. Split batch dimension is not allowed
    permute Permute layer
    dropout Dropout layer
     dropout (float) 0 Dropout rate
    backbone_nnstreamer NNStreamer layer
     model_path (string) NNStreamer model path
    backbone_tflite TensorFlow Lite layer
     model_path (string) TensorFlow Lite model path
    centroid_knn Centroid KNN layer
     num_class (unsigned integer) Number of class
    preprocess_flip Preprocess flip layer
     flip_direction (categorical) Flip direction
     horizontal Horizontal direction
     vertical Vertiacl direction
     horizontal_and_vertical horizontal_and_vertical Horizontal_and vertical direction
    preprocess_translate Preprocess translate layer
     random_translate (float) Translate factor value
    preprocess_l2norm Preprocess l2norm layer
    mse MSE loss layer
    cross_sigmoid Cross entropy with sigmoid loss layer
    cross_softmax Cross entropy with softmax loss layer

    Below is sample for layers to define a model.

    [conv2d_c2_layer]
    type = conv2d
    kernel_size = 5,5
    bias_initializer = zeros
    activation = sigmoid
    weight_initializer = xavier_uniform
    filters = 12
    stride = 1,1
    padding = 0,0
    
    [outputlayer]
    type = fully_connected
    Unit = 10
    weight_initializer = xavier_uniform
    bias_initializer = zeros
    activation = softmax
    

    Backbone section

    This allows to describe another model, termed as backbone, to be used in the model described by the current ini file. The backbone to be used can be described with another ini configuration file path, or with model file for external frameworks. Support for backbones of external framework for Tensorflow-Lite is provided natively with Tensorflow-Lite framework. Support for backbones of other external frameworks is done using nnstreamer and its plugin. When using nnstreamer for external framework, ensure to add the corresponding baseline ML framework and its corresponding nnstreamer plugin as a dependency or install manually. For example, when using PyTorch based model as a backbone, both the packages PyTorch and nnstreamer-pytorch must be installed.

    Backbones made of nntrainer models, described using ini, support training the backbone also. However, this is not supported with external frameworks. It is possible to describe a backbone inside a backbone ini configuration file, as well as listing down multiple backbones to build a single model. For backbone ini configuration file, Model and Dataset sections are ignored.

    Describing a backbone is very similar to describing a layer. Start with a "[ ${layer name} ]" which must be unique throughtout the model. In case of backbone, the name of the backbone is prepended to the name of all the layers inside the backbone.

    1. backbone = <string>

      Path of the backbone file. Supported model files:

      • .ini - NNTrainer models
      • .tflite - Tensorflow-Lite models
      • .pb / .pt / .py / .circle etc via NNStreamer (corresponding nnstreamer plugin required)
    2. trainable = <bool>

      If this backbone must be trained (defaults to false). Only supported for ini backbones (nntrainer models).

    Below is sample backbone section.

    # Model Section
    [Model]
    ...
    
    # block1
    [block1]
    backbone = resnet_block.ini
    trainable = false
    
    # block2
    [block2]
    backbone = resnet_block.ini
    trainable = true
    
    [outputlayer]
    type = fully_connected
    unit = 10
    activation = softmax
    

    Configuration file example

    Only INI formatted files *.ini is supported to construct a model from a file. Special sections [Model], [Optimizers], [LearningRateScheduler], [train_set], [valid_set], [test_set] are respectively referring to model, optimizer and data provider objects. Rest of INI sections map to a layer. Keys and values from each section set properties of the layer. All keys and values are treated as case-insensitive.

    The following restrictions must be adhered to:

    • Model file must have a [Model] section.
    • Model file must have at least one layer.
    • Valid keys must have valid properties. The invalid keys in each section result in an error.
    • All paths inside the INI file are relative to the INI file path unless the absolute path is stated.

    Below is sample backbone section. It takes 1 x 28 x 28 gray data (0~255) as an input. Adam optimizer is used to apply gradient and learning rate is 1.0e-4.

    # Model Section
    [Model]
    type = NeuralNetwork          # Network Type : Regression, KNN, NeuralNetwork
    epochs = 1500                 # Epochs
    loss = cross                  # Loss function : mse (mean squared error)
                                  #                 cross ( for cross entropy )
    save_path = "mnist_model.bin" # model path to save / read
    batch_size = 32               # batch size
    
    [Optimizer]
    type = adam
    beta1 = 0.9       # beta 1 for adam
    beta2 = 0.999     # beta 2 for adam
    epsilon = 1e-7    # epsilon for adam
    
    [LearningRateScheduler]
    type=constant
    learning_rate = 1e-4 # Learning Rate
    
    # Train Set Section
    [train_set]
    type = file
    path = "trainDataset.dat"
    
    # Layer Section : Name
    [inputlayer]
    type = input
    input_shape = 1:28:28
    
    # Layer Section : Name
    [conv2d_c1_layer]
    type = conv2d
    input_layers = inputlayer
    kernel_size = 5,5
    bias_initializer = zeros
    activation = sigmoid
    weight_initializer = xavier_uniform
    filters = 6
    stride = 1,1
    padding = 0,0
    
    [pooling2d_p1]
    type = pooling2d
    input_layers = conv2d_c1_layer
    pool_size = 2,2
    stride = 2,2
    padding = 0,0
    pooling = average
    
    [conv2d_c2_layer]
    type = conv2d
    input_layers = pooling2d_p1
    kernel_size = 5,5
    bias_initializer = zeros
    activation = sigmoid
    weight_initializer = xavier_uniform
    filters = 12
    stride = 1,1
    padding = 0,0
    
    [pooling2d_p2]
    type = pooling2d
    input_layers = conv2d_c2_layer
    pool_size = 2,2
    stride =2,2
    padding = 0,0
    pooling = average
    
    [flatten]
    type = flatten
    input_layers = pooling2d_p2
    
    [outputlayer]
    type = fully_connected
    input_layers = flatten
    unit = 10		# Output Layer Dimension ( = Weight Width )
    weight_initializer = xavier_uniform
    bias_initializer = zeros
    activation = softmax 	# activation : sigmoid, softmax
    

The results of the search are