Multi-layer perceptron (MLP) is a universal classifier

</script> Notes from CMU 11-785. Lecture #1 ## Perceptron is a linear classifier A single perceptron (or neuron) acts as a linear classifier. `W` are weights, `X` is the input vector, `b` is the bias and `f` is the activation function - step function for binary classification. $$ Y = f(W^T X + b) $$ In two dimensions it will represent a line. $$ \begin{bmatrix} y \\ \end{bmatrix} = \begin{bmatrix} w1 & w2 \\ \end{bmatrix} \begin{bmatrix} x1 \\ x2 \end{bmatrix} + \begin{bmatrix} b \end{bmatrix} $$ ## Logic AND is a linear classification problem There exists atleast one linear decision boudnary that can separate the two outputs (0 and 1). An example is $$ f(x1 + x2 - 1.5) $$ | x1 | x2 | x1 + x2 - 1.5 | f(x1 + x2 - 1.5) | |----|----|---------------|------------------| | 0 | 0 | -1.5 | 0 | | 0 | 1 | -0.5 | 0 | | 1 | 0 | -0.5 | 0 | | 1 | 1 | 0.5 | 1 | Here is a way to visualize ```plaintext y | 1 + (1,1) | | 0 + (0,0) (0,1) | (1,0) +----------------------> x 0 1 ``` Tangentially, individual perceptrons can act as boolean gates (e.g AND, OR), network of perceptrons are boolean functions (eg. XOR) and MLPs are universal boolean functions. ## Perceptron(s) can create an arbitrary bounding box We can use multiple perceptrons to create an arbitrary bounding box. Let's create a triangle. Everything within the triangle belongs to class A, everything outside is class B. $$ \begin{bmatrix} y1 \\ y2 \\ y3 \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} x1 \\ x2 \end{bmatrix} + \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix} $$ The output of the above perceptrons can then be used by an AND layer. For examle, ```plaintext Input Layer: First Layer (Bounding Box): Second Layer (AND): (x1, x2) ----> [Perceptron 1: x ≥ 0.5] -----------> [AND Perceptron] ----> Output \ / \ / \-> [Perceptron 2: x ≤ 1.5] -/ /-> [Perceptron 3: y ≥ 0.5] -\ / \ / \ (x1, x2) ----> [Perceptron 4: y ≤ 1.5] -----------> ``` A universal classifier can then be a tree of such linear classifers + AND layers and could be used to model any classification problem with arbitary precision.