Writing Configuration File

NNTrainer requires network configuration file which includes network layers and hyper-parameters. The format of configuration file is iniparser format which is commonly used. Keywords are not case sensitive and the line start with '#' will be ignored.

If you want more about iniparser, please visit https://github.com/ndevilla/iniparser

Sections

Model Section

Model section includes the hyper-parameters for the Network such type, epochs, loss, save path and batch size.

Start with "[Model]"

type (mandatory) = <string>

Type of Network
- regression : network for linear regression
- knn : K-nearest neighbor
- neuralnetwork : Deep Neural Network
epochs = <unsigned int>

Number of epochs to train

Create a new section for this
loss = <string>

Loss function
- mse : mean squared error
- cross : cross entropy Only allowed with sigmoid and softmax activation function
- skip this property if no loss is desired for the model (this model will only support inference)
save_path = <string>

Model file path to save updated weights
batch_size = <unsigned int>

Mini batch size

Below is sample Network section.

# Network Section : Network
[Model]
type = NeuralNetwork
epochs = 1500
loss = cross
save_path = "model.bin"
batch_size = 32

Optimizer Section

Define the optimizer to be used for training. This is an optional section needed only for training, and can be skipped for inference.

Start with "[Optimizer]"

type = <string>

Optimizer type to apply the gradients to weights. The default value is adam if the type is not used.
- adam : Adaptive Moment Estimation
- sgd : stochastic gradient decent
beta1 = <float>

beta1 parameter for adam optimizer. Only valid for adam. The default value is 0.9.
beta2 = <float>

beta2 parameter for adam optimizer. Only valid for adam. The default value is 0.999.
epsilon = <float>

Epsilon parameter for adam optimizer. Only valid for adam. The default value is 1.0e-7.

Below is a sample Optimizer section.

# Optimizer Section
[Optimizer]
type = adam
beta1 = 0.9
beta2 = 0.999
epsilon = 1e-7

Learning Rate Scheduler Section

Define the type, learning rate, decay steps and decay rate.

Start with "[LearningRateScheduler]"

type = <string>

constant, exponential and step are supported.
- constant : constant learning rate
- exponential : exponential decay
- step: step decay
learning_rate = <float>

Initial learning rate to decay.

Constant and exponential receive only one float value.

However, step must receive two or more float values separated by commas.

learning_rate = <float>, <float>, ..., <float>
decay_steps = <float>

Decay steps. Only valid for exponential.
decay_rate = <float>

Decay rate
iteration = <unsigned int>, <unsigned int>, ..., <unsigned int>

Iteration, Only valid for step. Step receive one or more unsigned int value separated by commas.

Below is a sample Learning Rate scheduler Section.

# Learning Rate Scheduler Section
[LearningRateScheduler]
type=constant
learning_rate = 1e-4 	# Learning Rate

Train Set Section

Define the type and path of the traing data file.

Start with "[train_set]"

type = <string>

Currently only file is supported.
path = <string>

Data path for training, The path is mandatory.

Below is a sample Train Set section.

# Train Set Section
[train_set]
type = file
path = trainDataset.dat

Validation Set Section

Define the type and path of the validation data file.

Start with "[valid_set]"

type = <string>

Currently only file is supported.
path = <string>

Data path for validation.

Below is a sample Validation Set Section.

# Validation Set Section
[valid_set]
type = file
path = validDataset.dat

Test Set Section

Define the type and path of the test data file.

Start with "[test_set]"

type = <string>

Currently only file is supported.
path = <string>

Data path for test.

Below is a sample Test Set Section.

# Test Set Section
[test_set]
type = file
path = testDataset.dat

Layer Section

Describe hyper-parameters for layer. Order of layers in the model follows the order of definition of layers here from top to bottom.

Start with "[ ${layer name} ]". This layer name must be unique throughout network model.

type = <string>

Type of Layer
- input : input layer
- fully_connected : fully connected layer
- batch_normalization : batch normalization layer
- conv2d : convolution 2D layer
- pooling2d : pooling 2D layer
- flatten : flatten layer
- activation : activation layer
- addition : addition layer
- concat : concat layer
- multiout : multiout layer
- embedding : embedding layer
- rnn : RNN layer
- lstm : LSTM layer
- split : split layer
- gru : GRU layer
- permute : permute layer
- dropout : dropout layer
- backbone_nnstreamer : backbone layer using nnstreamer
- backbone_tflite : backbone layer using tflite
- centroid_knn : centroid KNN layer
- conv1d : convolution 1D layer
- lstmcell : LSTM Cell layer
- grucell : GRU Cell layer
- rnncell : RNN Cell layer
- zoneout_lstmcell : Zoneout LSTM Cell layer
- preprocess_flip : preprocess flip layer
- preprocess_translate : preprocess translate layer
- preprocess_l2norm : preprocess l2norm layer
- mse : MSE loss layer
- cross_sigmoid : cross entropy with sigmoid loss layer
- cross_softmax : Cross entropy with softmax loss layer

key = value

The table below shows the available keys and values for each layer type. There are two types of layers. One type includes commonly trainable weights and the other type does not include. The following are the available properties for each layer type which include commonly trainable weights:

Type	Key	Value	Default value	Description
(Universal properties)				Universal properties that applies to every layer
	name	(string)		An identifier for each layer
	trainable	(boolean)	true	Allow weights to be trained if true
	input_layers	(string)		Comma-separated names of layers to be inputs of the current layer
	input_shape	(string)		Comma-separated Formatted string as “channel:height:width”. If there is no channel then it must be 1. First layer of the model must have input_shape. Other can be omitted as it is calculated at compile phase.
	flatten	(boolean)		Flatten shape from `c:h:w` to `1:1:chw`
	activation	(categorical)		Activation type
		tanh		Hyperbolic tangent
		sigmoid		Sigmoid function
		relu		Relu function
		softmax		Softmax function
	loss	(float)	0	Loss
	weight_initializer	(categorical)	xavier_uniform	Weight initializer
		zeros		Zero initialization
		lecun_normal		LeCun normal initialization
		lecun_uniform		LeCun uniform initialization
		xavier_normal		Xavier normal initialization
		xavier_uniform		Xavier uniform initialization
		he_normal		He normal initialization
		he_uniform		He uniform initialization
	bias_initializer	(categorical)	zeros	Bias initializer
		zeros		Zero initialization
		lecun_normal		LeCun normal initialization
		lecun_uniform		LeCun uniform initialization
		xavier_normal		Xavier normal initialization
		xavier_uniform		Xavier uniform initialization
		he_normal		He normal initialization
		he_uniform		He uniform initialization
	weight_regularizer	(categorical)		Weight regularizer. Currently, only l2norm is supported
		l2norm		L2 weight regularizer
	weight_regularizer_constant	(float)	1	Weight regularizer constant
`fully_connected`				Fully connected layer
	unit	(unsigned integer)		Number of outputs
`conv2d`				2D Convolution layer
	filters	(unsigned integer)		Number of filters
	kernel_size	(array of unsigned integer)		Comma-separated unsigned integers for kernel size, `height, width` respectively
	stride	(array of unsigned integer)	1, 1	Comma-separated unsigned integers for strides, `height, width` respectively
	padding	(categorical)	valid	Padding type
		valid		No padding
		same		Preserve height/width dimension
		(unsigned integer)		Size of padding applied uniformly to all side
		(array of unsigned integer of size 2)		Padding for height, width
		(array of unsigned integer of size 4)		Padding for top, bottom, left, right
`embedding`				Embedding layer
	in_dim	(unsigned integer)		Vocabulary size
	out_dim	(unsigned integer)		Word embeddeing size
`rnn`				RNN layer
	unit	(unsigned integer)		Number of output neurons
	hidden_state_activation	(categorical)	tanh	Activation type
		tanh		Hyperbolic tangent
		sigmoid		Sigmoid function
		relu		Relu function
		softmax		Softmax function
	return_sequences	(boolean)	false	Return only the last output if true, else return full output
	dropout	(float)	0	Dropout rate
`lstm`				LSTM layer
	unit	(unsigned integer)		Number of output neurons
	hidden_state_activation	(categorical)	tanh	Activation type
		tanh		Hyperbolic tangent
		sigmoid		Sigmoid function
		relu		Relu function
		softmax		Softmax function
	recurrent_activation	(categorical)	sigmoid	Activation type for recurrent step
		tanh		Hyperbolic tangent
		sigmoid		Sigmoid function
		relu		Relu function
		softmax		Softmax function
	return_sequences	(boolean)	false	Return only the last output if true, else return full output
	dropout	(float)	0	Dropout rate
`gru`				GRU layer
	unit	(unsigned integer)		Number of output neurons
	hidden_state_activation	(categorical)	tanh	Activation type
		tanh		Hyperbolic tangent
		sigmoid		Sigmoid function
		relu		Relu function
		softmax		Softmax function
	recurrent_activation	(categorical)	sigmoid	Activation type for recurrent step
		tanh		Hyperbolic tangent
		sigmoid		Sigmoid function
		relu		Relu function
		softmax		Softmax function
	return_sequences	(boolean)	false	Return only the last output if true, else return full output
	dropout	(float)	0	Dropout rate

The following are the available properties for each layer type which does not include (weight_initializer, bias_initializer, weight_regularizer, weight_regularizer_constant) properties.

Type	Key	Value	Default value	Description
(Universal properties)				Universal properties that applies to every layer
	name	(string)		An identifier for each layer
	trainable	(boolean)	true	Allow weights to be trained if true
	input_layers	(string)		Comma-separated names of layers to be inputs of the current layer
	input_shape	(string)		Comma-separated Formatted string as “channel:height:width”. If there is no channel then it must be 1. First layer of the model must have input_shape. Other can be omitted as it is calculated at compile phase.
	flatten	(boolean)		Flatten shape from `c:h:w` to `1:1:chw`
	activation	(categorical)		Activation type
		tanh		Hyperbolic tangent
		sigmoid		Sigmoid function
		relu		Relu function
		softmax		Softmax function
	loss	(float)	0	Loss
`input`				Input layer
	normalization	(boolean)	false	Normalize input if true
	standardization	(boolean)	false	Standardize input if true
`batch_normalization`				Batch normalization layer
	epsilon	(float)	0.001	Small value to avoid divide by zero
	moving_mean_initializer	(categorical)	zeros	Moving mean initializer
		zeros		Zero initialization
		lecun_normal		LeCun normal initialization
		lecun_uniform		LeCun uniform initialization
		xavier_normal		Xavier normal initialization
		xavier_uniform		Xavier uniform initialization
		he_normal		He normal initialization
		he_uniform		He uniform initialization
	moving_variance_initializer	(categorical)	ones	Moving variance initializer
		zeros		Zero initialization
		lecun_normal		LeCun normal initialization
		lecun_uniform		LeCun uniform initialization
		xavier_normal		Xavier normal initialization
		xavier_uniform		Xavier uniform initialization
		he_normal		He normal initialization
		he_uniform		He uniform initialization
	gamma_initializer	(categorical)	ones	Gamma initializer
		zeros		Zero initialization
		lecun_normal		LeCun normal initialization
		lecun_uniform		LeCun uniform initialization
		xavier_normal		Xavier normal initialization
		xavier_uniform		Xavier uniform initialization
		he_normal		He normal initialization
		he_uniform		He uniform initialization
	beta_initializer	(categorical)	zeros	Beta initializer
		zeros		Zero initialization
		lecun_normal		LeCun normal initialization
		lecun_uniform		LeCun uniform initialization
		xavier_normal		Xavier normal initialization
		xavier_uniform		Xavier uniform initialization
		he_normal		He normal initialization
		he_uniform		He uniform initialization
	momentum	(float)	0.99	Momentum for moving average in batch normalization
`pooling2d`				Pooling layer
	pooling	(categorical)		Pooling type
		max		Max pooling
		average		Average pooling
		global_max		Global max pooling
		global_average		Global average pooling
	pool_size	(array of unsigned integer)		Comma-separated unsigned intergers for pooling size, `height, width` respectively
	stride	(array of unsigned integer)	1, 1	Comma-separated unsigned intergers for stride, `height, width` respectively
	padding	(categorical)	valid	Padding type
		valid		No padding
		same		Preserve height/width dimension
		(unsigned integer)		Size of padding applied uniformly to all side
		(array of unsigned integer of size 2)		Padding for height, width
		(array of unsigned integer of size 4)		Padding for top, bottom, left, right
`flatten`				Flatten layer
`activation`				Activation layer
	activation	(categorical)		Activation type
		tanh		Hyperbolic tangent
		sigmoid		Sigmoid function
		relu		Relu function
		softmax		Softmax function
`addition`				Addition layer
`concat`				Concat layer
`multiout`				Multiout layer
`split`				Split layer
	split_dimension	(unsigned integer)		Which dimension to split. Split batch dimension is not allowed
`permute`				Permute layer
`dropout`				Dropout layer
	dropout	(float)	0	Dropout rate
`backbone_nnstreamer`				NNStreamer layer
	model_path	(string)		NNStreamer model path
`backbone_tflite`				TensorFlow Lite layer
	model_path	(string)		TensorFlow Lite model path
`centroid_knn`				Centroid KNN layer
	num_class	(unsigned integer)		Number of class
`preprocess_flip`				Preprocess flip layer
	flip_direction	(categorical)		Flip direction
		horizontal		Horizontal direction
		vertical		Vertiacl direction
		horizontal_and_vertical	horizontal_and_vertical	Horizontal_and vertical direction
`preprocess_translate`				Preprocess translate layer
	random_translate	(float)		Translate factor value
`preprocess_l2norm`				Preprocess l2norm layer
`mse`				MSE loss layer
`cross_sigmoid`				Cross entropy with sigmoid loss layer
`cross_softmax`				Cross entropy with softmax loss layer

Below is sample for layers to define a model.

[conv2d_c2_layer]
type = conv2d
kernel_size = 5,5
bias_initializer = zeros
activation = sigmoid
weight_initializer = xavier_uniform
filters = 12
stride = 1,1
padding = 0,0

[outputlayer]
type = fully_connected
Unit = 10
weight_initializer = xavier_uniform
bias_initializer = zeros
activation = softmax

Backbone section

This allows to describe another model, termed as backbone, to be used in the model described by the current ini file. The backbone to be used can be described with another ini configuration file path, or with model file for external frameworks. Support for backbones of external framework for Tensorflow-Lite is provided natively with Tensorflow-Lite framework. Support for backbones of other external frameworks is done using nnstreamer and its plugin. When using nnstreamer for external framework, ensure to add the corresponding baseline ML framework and its corresponding nnstreamer plugin as a dependency or install manually. For example, when using PyTorch based model as a backbone, both the packages PyTorch and nnstreamer-pytorch must be installed.

Backbones made of nntrainer models, described using ini, support training the backbone also. However, this is not supported with external frameworks. It is possible to describe a backbone inside a backbone ini configuration file, as well as listing down multiple backbones to build a single model. For backbone ini configuration file, Model and Dataset sections are ignored.

Describing a backbone is very similar to describing a layer. Start with a "[ ${layer name} ]" which must be unique throughtout the model. In case of backbone, the name of the backbone is prepended to the name of all the layers inside the backbone.

backbone = <string>

Path of the backbone file. Supported model files:
- .ini - NNTrainer models
- .tflite - Tensorflow-Lite models
- .pb / .pt / .py / .circle etc via NNStreamer (corresponding nnstreamer plugin required)
trainable = <bool>

If this backbone must be trained (defaults to false). Only supported for ini backbones (nntrainer models).

Below is sample backbone section.

# Model Section
[Model]
...

# block1
[block1]
backbone = resnet_block.ini
trainable = false

# block2
[block2]
backbone = resnet_block.ini
trainable = true

[outputlayer]
type = fully_connected
unit = 10
activation = softmax

Configuration file example

Only INI formatted files *.ini is supported to construct a model from a file. Special sections [Model], [Optimizers], [LearningRateScheduler], [train_set], [valid_set], [test_set] are respectively referring to model, optimizer and data provider objects. Rest of INI sections map to a layer. Keys and values from each section set properties of the layer. All keys and values are treated as case-insensitive.

The following restrictions must be adhered to:

Model file must have a [Model] section.
Model file must have at least one layer.
Valid keys must have valid properties. The invalid keys in each section result in an error.
All paths inside the INI file are relative to the INI file path unless the absolute path is stated.

Below is sample backbone section. It takes 1 x 28 x 28 gray data (0~255) as an input. Adam optimizer is used to apply gradient and learning rate is 1.0e-4.

# Model Section
[Model]
type = NeuralNetwork          # Network Type : Regression, KNN, NeuralNetwork
epochs = 1500                 # Epochs
loss = cross                  # Loss function : mse (mean squared error)
                              #                 cross ( for cross entropy )
save_path = "mnist_model.bin" # model path to save / read
batch_size = 32               # batch size

[Optimizer]
type = adam
beta1 = 0.9       # beta 1 for adam
beta2 = 0.999     # beta 2 for adam
epsilon = 1e-7    # epsilon for adam

[LearningRateScheduler]
type=constant
learning_rate = 1e-4 # Learning Rate

# Train Set Section
[train_set]
type = file
path = "trainDataset.dat"

# Layer Section : Name
[inputlayer]
type = input
input_shape = 1:28:28

# Layer Section : Name
[conv2d_c1_layer]
type = conv2d
input_layers = inputlayer
kernel_size = 5,5
bias_initializer = zeros
activation = sigmoid
weight_initializer = xavier_uniform
filters = 6
stride = 1,1
padding = 0,0

[pooling2d_p1]
type = pooling2d
input_layers = conv2d_c1_layer
pool_size = 2,2
stride = 2,2
padding = 0,0
pooling = average

[conv2d_c2_layer]
type = conv2d
input_layers = pooling2d_p1
kernel_size = 5,5
bias_initializer = zeros
activation = sigmoid
weight_initializer = xavier_uniform
filters = 12
stride = 1,1
padding = 0,0

[pooling2d_p2]
type = pooling2d
input_layers = conv2d_c2_layer
pool_size = 2,2
stride =2,2
padding = 0,0
pooling = average

[flatten]
type = flatten
input_layers = pooling2d_p2

[outputlayer]
type = fully_connected
input_layers = flatten
unit = 10		# Output Layer Dimension ( = Weight Width )
weight_initializer = xavier_uniform
bias_initializer = zeros
activation = softmax 	# activation : sigmoid, softmax

The results of the search are