激活函数汇总,包含公式、求导过程以及numpy实现,妥妥的万字干货

举报
AI浩 发表于 2022/01/15 05:26:33 2022/01/15
【摘要】 @[toc] 1、激活函数的实现 1.1 sigmoid 1.1.1 函数函数:f(x)=11+e−xf(x)=\frac{1}{1+e^{-x}}f(x)=1+e−x1​ 1.1.2 导数求导过程:根据:(uv)′=u′v−uv′v2\left ( \frac{u}{v} \right ){}'=\frac{{u}'v-u{v}'}{v^{2}}(vu​)′=v2u′v−uv′​f(x)′...

@[toc]

1、激活函数的实现

1.1 sigmoid

1.1.1 函数

函数: f ( x ) = 1 1 + e x f(x)=\frac{1}{1+e^{-x}}

img

1.1.2 导数

求导过程:

根据: ( u v ) = u v u v v 2 \left ( \frac{u}{v} \right ){}'=\frac{{u}'v-u{v}'}{v^{2}}

f ( x ) = ( 1 1 + e x ) = 1 × ( 1 + e x ) 1 × ( 1 + e x ) ( 1 + e x ) 2 = e x ( 1 + e x ) 2 = 1 + e x 1 ( 1 + e x ) 2 = ( 1 1 + e x ) ( 1 1 1 + e x ) = f ( x ) ( 1 f ( x ) ) \begin{aligned} f(x)^{\prime} &=\left(\frac{1}{1+e^{-x}}\right)^{\prime} \\ &=\frac{1^{\prime} \times\left(1+e^{-x}\right)-1 \times\left(1+e^{-x}\right)^{\prime}}{\left(1+e^{-x}\right)^{2}} \\ &=\frac{e^{-x}}{\left(1+e^{-x}\right)^{2}} \\ &=\frac{1+e^{-x}-1}{\left(1+e^{-x}\right)^{2}} \\ &=\left(\frac{1}{1+e^{-x}}\right)\left(1-\frac{1}{1+e^{-x}}\right) \\ &=\quad f(x)(1-f(x)) \end{aligned}

img

1.1.3 代码实现

import numpy as np

class Sigmoid():
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))

    def gradient(self, x):
        return self.__call__(x) * (1 - self.__call__(x))

1.2 softmax

1.2.1 函数

softmax用于多分类过程中,它将多个神经元的输出,映射到(0,1)区间内,可以看成概率来理解,从而来进行多分类!

假设我们有一个数组,V,Vi表示V中的第i个元素,那么这个元素的softmax值就是:

S i = e i j e j S_{i}=\frac{e^{i}}{\sum _{j}e^{j}}

更形象的如下图表示:

1180120-20180520190635891-1537309048

y 1 = e z 1 e z 1 + e z 2 + e z 3 y 2 = e z 2 e z 1 + e z 2 + e z 3 y 3 = e z 3 e z 1 + e z 2 + e z 3 (1) y1=\frac{e^{z_{1}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}}\\ y2=\frac{e^{z_{2}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}}\\ y3=\frac{e^{z_{3}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}}\\ \tag{1}

要使用梯度下降,就需要一个损失函数,一般使用交叉熵作为损失函数,交叉熵函数形式如下:

L o s s = i y i l n a i (2) Loss = -\sum_{i}^{}{y_{i}lna_{i} } \tag{2}

1.2.2 导数

求导分为两种情况。

第一种j=i:

S i = e i j e j = e i i e i S_{i}=\frac{e^{i}}{\sum _{j}e^{j}}=\frac{e^{i}}{\sum _{i}e^{i}}

推导过程如下:

f = ( e i i e i ) = ( e i ) × i e i e i × e i ( i e i ) 2 = e i i e i e i i e i × e i i e i = e i i e i ( 1 e i i e i ) = f ( 1 f ) \begin{aligned} f^{\prime}&=\left(\frac{e^{i}}{\sum_{i} e^{i}}\right)^{\prime} & \\ &=\frac{\left(e^{i}\right) \times \sum_{i} e^{i}-e^{i} \times e^{i}}{\left(\sum_{i} e^{i}\right)^{2}} \\ &=\frac{e^{i}}{\sum_{i} e^{i}}-\frac{e^{i}}{\sum_{i} e^{i}} \times \frac{e^{i}}{\sum_{i} e^{i}} \\ &= \frac{e^{i}}{\sum_{i} e^{i}}\left(1-\frac{e^{i}}{\sum_{i} e^{i}}\right) \\ &= f(1-f) \end{aligned}

1.2.3 代码实现

import numpy as np
class Softmax():
    def __call__(self, x):
        e_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
        return e_x / np.sum(e_x, axis=-1, keepdims=True)

    def gradient(self, x):
        p = self.__call__(x)
        return p * (1 - p)

1.3 tanh

1.3.1 函数

t a n h ( x ) = e x e x e x + e x tanh(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}

img

1.3.2 导数

求导过程:

tanh ( x ) = ( e x e x e x + e x ) = ( e x e x ) ( e x + e x ) ( e x e x ) ( e x + e x ) ( e x + e x ) 2 = ( e x + e x ) 2 ( e x e x ) 2 ( e x + e x ) 2 = 1 ( e x e x e x + e x ) 2 = 1 tanh ( x ) 2 \begin{aligned} \tanh (x)^{\prime} &=\left(\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}\right)^{\prime} \\ &=\frac{\left(e^{x}-e^{-x}\right)^{\prime}\left(e^{x}+e^{-x}\right)-\left(e^{x}-e^{-x}\right)\left(e^{x}+e^{-x}\right)^{\prime}}{\left(e^{x}+e^{-x}\right)^{2}} \\ &=\frac{\left(e^{x}+e^{-x}\right)^{2}-\left(e^{x} \cdot e^{-x}\right)^{2}}{\left(e^{x}+e^{-x}\right)^{2}} \\ &=1-\left(\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}\right)^{2} \\ &=1-\tanh (x)^{2} \end{aligned}

1.3.3 代码实现

import numpy as np
class TanH():
    def __call__(self, x):
        return 2 / (1 + np.exp(-2*x)) - 1

    def gradient(self, x):
        return 1 - np.power(self.__call__(x), 2)

1.4 relu

1.4.1 函数

f ( x ) = max ( 0 , x ) f(x)=\max (0, x)

1.4.2 导数

f ( x ) = { 1  if  ( x > 0 ) 0  if  ( x < = 0 ) f^{\prime}(x)=\left\{\begin{array}{cc} 1 & \text { if } (x>0) \\ 0 & \text { if } (x<=0) \end{array}\right.

1.4.3 代码实现

import numpy as np
class ReLU():
    def __call__(self, x):
        return np.where(x >= 0, x, 0)

    def gradient(self, x):
        return np.where(x >= 0, 1, 0)

1.5 leakyrelu

1.5.1 函数

f ( x ) = max ( a x , x ) f(x)=\max (a x, x)

1.5.2 导数

f ( x ) = { 1  if  ( x > 0 ) a  if  ( x < = 0 ) f^{\prime}(x)=\left\{\begin{array}{cl} 1 & \text { if } (x>0) \\ a & \text { if }(x<=0) \end{array}\right.

1.5.3 代码实现

import numpy as np
class LeakyReLU():
    def __init__(self, alpha=0.2):
        self.alpha = alpha

    def __call__(self, x):
        return np.where(x >= 0, x, self.alpha * x)

    def gradient(self, x):
        return np.where(x >= 0, 1, self.alpha)

1.6 ELU

1.61 函数

f ( x ) = { x ,  if  x 0 a ( e x 1 ) ,  if  ( x < 0 ) f(x)=\left\{\begin{array}{cll} x, & \text { if } x \geq 0 \\ a\left(e^{x}-1\right), & \text { if } (x<0) \end{array}\right.

1.6.2 导数

当x>=0时,导数为1。

当x<0时,导数的推导过程:

f ( x ) = ( a ( e x 1 ) ) = a e x = a ( e x 1 ) + a = f ( x ) + a = a e x \begin{aligned} \\ f(x)^{\prime} &=\left(a\left(e^{x}-1\right)\right)^{\prime} \\ &=a e^{x} \\ &\left.=a (e^{x}-1\right)+a \\ &=f(x)+a=ae^{x} \end{aligned}

所以,完整的导数为:

f = { 1  if  x 0 f ( x ) + a = a e x  if  x < 0 f^{\prime}=\left\{\begin{array}{cll} 1 & \text { if } & x \geq 0 \\ f(x)+a=ae^{x} & \text { if } & x<0 \end{array}\right.

1.6.3 代码实现

import numpy as np
class ELU():
    def __init__(self, alpha=0.1):
        self.alpha = alpha 

    def __call__(self, x):
        return np.where(x >= 0.0, x, self.alpha * (np.exp(x) - 1))

    def gradient(self, x):
        return np.where(x >= 0.0, 1, self.__call__(x) + self.alpha)

1.7 selu

1.7.1 函数

selu ( x ) = λ { x  if  ( x > 0 ) α e x α  if  ( x 0 ) \operatorname{selu}(x)=\lambda \begin{cases}x & \text { if } (x>0) \\ \alpha e^{x}-\alpha & \text { if } (x \leqslant 0)\end{cases}

1.7.2 导数

selu ( x ) = λ { 1 x > 0 α e x 0 \operatorname{selu}^{\prime}(x)=\lambda \begin{cases}1 & x>0 \\ \alpha e^{x} & \leqslant 0\end{cases}

1.7.3 代码实现

import numpy as np
class SELU():
    # Reference : https://arxiv.org/abs/1706.02515,
    # https://github.com/bioinf-jku/SNNs/blob/master/SelfNormalizingNetworks_MLP_MNIST.ipynb
    def __init__(self):
        self.alpha = 1.6732632423543772848170429916717
        self.scale = 1.0507009873554804934193349852946 

    def __call__(self, x):
        return self.scale * np.where(x >= 0.0, x, self.alpha*(np.exp(x)-1))

    def gradient(self, x):
        return self.scale * np.where(x >= 0.0, 1, self.alpha * np.exp(x))

1.8 softplus

1.81 函数

Softplus ( x ) = log ( 1 + e x ) \operatorname{Softplus}(x)=\log \left(1+e^{x}\right)

1.8.2 导数

log默认的底数是 e e

f ( x ) = e x ( 1 + e x ) ln e = 1 1 + e x = σ ( x ) f^{\prime}(x)=\frac{e^{x}}{(1+e^{x})\ln e}=\frac{1}{1+e^{-x}}=\sigma(x)

1.8.3 代码实现

import numpy as np
class SoftPlus():
    def __call__(self, x):
        return np.log(1 + np.exp(x))

    def gradient(self, x):
        return 1 / (1 + np.exp(-x))

1.9 Swish

1.9.1 函数

f ( x ) = x sigmoid ( β x ) f(x)=x \cdot \operatorname{sigmoid}(\beta x)

1.9.2 导数

f ( x ) = σ ( β x ) + β x σ ( β x ) ( 1 σ ( β x ) ) = σ ( β x ) + β x σ ( β x ) β x σ ( β x ) 2 = β x σ ( x ) + σ ( β x ) ( 1 β x σ ( β x ) ) = β f ( x ) + σ ( β x ) ( 1 β f ( x ) ) \begin{aligned} f^{\prime}(x) &=\sigma(\beta x)+\beta x \cdot \sigma(\beta x)(1-\sigma(\beta x)) \\ &=\sigma(\beta x)+\beta x \cdot \sigma(\beta x)-\beta x \cdot \sigma(\beta x)^{2} \\ &=\beta x \cdot \sigma(x)+\sigma(\beta x)(1-\beta x \cdot \sigma(\beta x)) \\ &=\beta f(x)+\sigma(\beta x)(1-\beta f(x)) \end{aligned}

1.9.3 代码实现

import numpy as np


class Swish(object):
    def __init__(self, b):
        self.b = b

    def __call__(self, x):
        return x * (np.exp(self.b * x) / (np.exp(self.b * x) + 1))

    def gradient(self, x):
        return self.b * x / (1 + np.exp(-self.b * x)) + (1 / (1 + np.exp(-self.b * x)))(
            1 - self.b * (x / (1 + np.exp(-self.b * x))))

1.10 Mish

1.10.1 函数

f ( x ) = x tanh ( ln ( 1 + e x ) ) f(x)=x * \tanh \left(\ln \left(1+e^{x}\right)\right)

1.10.2 导数

f ( x ) = sech 2 ( soft plus ( x ) ) x sigmoid ( x ) + f ( x ) x = Δ ( x ) s w i sh ( x ) + f ( x ) x \begin{gathered} f^{\prime}(x)=\operatorname{sech}^{2}(\operatorname{soft} \operatorname{plus}(x)) x \operatorname{sigmoid}(x)+\frac{f(x)}{x} \\ =\Delta(x) s w i \operatorname{sh}(x)+\frac{f(x)}{x} \end{gathered}

where softplus ( x ) = ln ( 1 + e x ) (x)=\ln \left(1+e^{x}\right) and sigmoid ( x ) = 1 / ( 1 + e x ) (x)=1 /\left(1+e^{-x}\right) .

1.10.3 代码实现

import numpy as np


def sech(x):
    """sech函数"""
    return 2 / (np.exp(x) + np.exp(-x))


def sigmoid(x):
    """sigmoid函数"""
    return 1 / (1 + np.exp(-x))


def soft_plus(x):
    """softplus函数"""
    return np.log(1 + np.exp(x))


def tan_h(x):
    """tanh函数"""
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))


class Mish:

    def __call__(self, x):
        return x * tan_h(soft_plus(x))

    def gradient(self, x):
        return sech(soft_plus(x)) * sech(soft_plus(x)) * x * sigmoid(x) + tan_h(soft_plus(x))

1.11 SiLU

1.11.1 函数

f ( x ) = x × s i g m o i d ( x ) f(x)=x \times sigmoid (x)

img

1.11.2 导数

推导过程

f ( x ) = ( x s i g m o i d ( x ) ) = s i g m o i d ( x ) + x ( s i g m o i d ( x ) ( 1 s i g m o i d ( x ) ) = s i g m o i d ( x ) + x s i g m o i d ( x ) x s i g m o i d 2 ( x ) = f ( x ) + sigmoid ( x ) ( 1 f ( x ) ) \begin{aligned} &f(x)^{\prime}=(x \cdot sigmoid(x))^{\prime}\\ &=sigmoid(x)+x \cdot(sigmoid(x)(1-sigmoid(x))\\ &=sigmoid(x)+x \cdot sigmoid(x)-x \cdot sigmoid^{2}(x)\\ &=f(x)+\operatorname{sigmoid}(x)(1-f(x)) \end{aligned}

img

1.11.3 代码实现

import numpy as np


def sigmoid(x):
    """sigmoid函数"""
    return 1 / (1 + np.exp(-x))


class SILU(object):

    def __call__(self, x):
        return x * sigmoid(x)

    def gradient(self, x):
        return self.__call__(x) + sigmoid(x) * (1 - self.__call__(x))

1.12 完整代码

定义一个activation_function.py,将下面的代码复制进去,到这里激活函数就完成了。

import numpy as np


# Collection of activation functions
# Reference: https://en.wikipedia.org/wiki/Activation_function

class Sigmoid():
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))

    def gradient(self, x):
        return self.__call__(x) * (1 - self.__call__(x))


class Softmax():
    def __call__(self, x):
        e_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
        return e_x / np.sum(e_x, axis=-1, keepdims=True)

    def gradient(self, x):
        p = self.__call__(x)
        return p * (1 - p)


class TanH():
    def __call__(self, x):
        return 2 / (1 + np.exp(-2 * x)) - 1

    def gradient(self, x):
        return 1 - np.power(self.__call__(x), 2)


class ReLU():
    def __call__(self, x):
        return np.where(x >= 0, x, 0)

    def gradient(self, x):
        return np.where(x >= 0, 1, 0)


class LeakyReLU():
    def __init__(self, alpha=0.2):
        self.alpha = alpha

    def __call__(self, x):
        return np.where(x >= 0, x, self.alpha * x)

    def gradient(self, x):
        return np.where(x >= 0, 1, self.alpha)


class ELU(object):
    def __init__(self, alpha=0.1):
        self.alpha = alpha

    def __call__(self, x):
        return np.where(x >= 0.0, x, self.alpha * (np.exp(x) - 1))

    def gradient(self, x):
        return np.where(x >= 0.0, 1, self.__call__(x) + self.alpha)


class SELU():
    # Reference : https://arxiv.org/abs/1706.02515,
    # https://github.com/bioinf-jku/SNNs/blob/master/SelfNormalizingNetworks_MLP_MNIST.ipynb
    def __init__(self):
        self.alpha = 1.6732632423543772848170429916717
        self.scale = 1.0507009873554804934193349852946

    def __call__(self, x):
        return self.scale * np.where(x >= 0.0, x, self.alpha * (np.exp(x) - 1))

    def gradient(self, x):
        return self.scale * np.where(x >= 0.0, 1, self.alpha * np.exp(x))


class SoftPlus(object):
    def __call__(self, x):
        return np.log(1 + np.exp(x))

    def gradient(self, x):
        return 1 / (1 + np.exp(-x))


class Swish(object):
    def __init__(self, b):
        self.b = b

    def __call__(self, x):
        return x * (np.exp(self.b * x) / (np.exp(self.b * x) + 1))

    def gradient(self, x):
        return self.b * x / (1 + np.exp(-self.b * x)) + (1 / (1 + np.exp(-self.b * x)))(
            1 - self.b * (x / (1 + np.exp(-self.b * x))))


def sech(x):
    """sech函数"""
    return 2 / (np.exp(x) + np.exp(-x))


def sigmoid(x):
    """sigmoid函数"""
    return 1 / (1 + np.exp(-x))


def soft_plus(x):
    """softplus函数"""
    return np.log(1 + np.exp(x))


def tan_h(x):
    """tanh函数"""
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))


class Mish:

    def __call__(self, x):
        return x * tan_h(soft_plus(x))

    def gradient(self, x):
        return sech(soft_plus(x)) * sech(soft_plus(x)) * x * sigmoid(x) + tan_h(soft_plus(x))

class SILU(object):

    def __call__(self, x):
        return x * sigmoid(x)

    def gradient(self, x):
        return self.__call__(x) + sigmoid(x) * (1 - self.__call__(x))


参考公式
( C ) = 0 (C)^{\prime}=0
( a x ) = a x ln a \left(a^{x}\right)^{\prime}=a^{x} \ln a
( x μ ) = μ x μ 1 \left(x^{\mu}\right)^{\prime}=\mu x^{\mu-1}
( e x ) = e x \left(e^{x}\right)^{\prime}=e^{x}
( sin x ) = cos x (\sin x)^{\prime}=\cos x
( log a x ) = 1 x ln a \left(\log _{a} x\right)^{\prime}=\frac{1}{x \ln a}
( cos x ) = sin x (\cos x)^{\prime}=-\sin x
( ln x ) = 1 x (\ln x)^{\prime}=\frac{1}{x}
( tan x ) = sec 2 x (\tan x)^{\prime}=\sec ^{2} x
( arcsin x ) = 1 1 x 2 (\arcsin x)^{\prime}=\frac{1}{\sqrt{1-x^{2}}}
( cot x ) = csc 2 x (\cot x)^{\prime}=-\csc ^{2} x
( arccos x ) = 1 1 x 2 (\arccos x)^{\prime}=-\frac{1}{\sqrt{1-x^{2}}}
( sec x ) = sec x tan x (\sec x)^{\prime}=\sec x \cdot \tan x
( arctan x ) = 1 1 + x 2 (\arctan x)^{\prime}=\frac{1}{1+x^{2}}
( csc x ) = csc x cot x (\csc x)^{\prime}=-\csc x \cdot \cot x
( arccot x ) = 1 1 + x 2 (\operatorname{arccot} x)^{\prime}=-\frac{1}{1+x^{2}}

双曲正弦: sinh x = e x e x 2 \sinh x=\frac{e^{x}-e^{-x}}{2}
双曲余弦: cosh x = e x + e x 2 \cosh x=\frac{e^{x}+e^{-x}}{2}
双曲正切: tanh x = sinh x cosh x = e x e x e x + e x \tanh x=\frac{\sinh x}{\cosh x}=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}
双曲余切: coth x = 1 tanh x = e x + e x e x e x \operatorname{coth} x=\frac{1}{\tanh x}=\frac{e^{x}+e^{-x}}{e^{x}-e^{-x}}
双曲正割: sech x = 1 cosh x = 2 e x + e x \operatorname{sech} x=\frac{1}{\cosh x}=\frac{2}{e^{x}+e^{-x}}
双曲余割: csch x = 1 sinh x = 2 e x e x \operatorname{csch} x=\frac{1}{\sinh x}=\frac{2}{e^{x}-e^{-x}}

【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。