GELU

Versioned name : Gelu-2

Category : Activation function

Short description : Gaussian error linear unit element-wise activation function.

Detailed description

Gelu operation is introduced in this article. It performs element-wise activation function on a given input tensor, based on the following mathematical formula:

Gelu(x)=xΦ(x)=x12[1+erfx2]

where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.

Additionally, Gelu function may be approximated as follows:

Gelu(x)0.5x(1+tanh[2/π(x+0.044715x3)])

Attributes : Gelu operation has no attributes.

Inputs :

  • 1 : A tensor of type T and arbitrary shape. Required.

Outputs :

  • 1 : The result of element-wise Gelu function applied to the input tensor. A tensor of type T and the same shape as input tensor.

Types

  • T : arbitrary supported floating-point type.

Example

<layer ... type="Gelu">
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>128</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>128</dim>
        </port>
    </output>
</layer>