
Advanced Topics in Machine Learning for Bioinformatics and Biomedical Engineering
February 16, 2026
In vertebrates, it consists of two main parts:

By the numbers1:

from iStockphoto and Queensland Brain Institute

Convert a specific type of stimulus, via their receptors, into action potentials or graded receptor potentials.
Role: Activated by sensory input from the environment.
Types of input:
Structure: Mostly pseudounipolar — one axon split into two branches.



\[ y = H\left( \sum_{i=1}^{n} w_i x_i + b \right) \]
\[ \tau_m \frac{dV}{dt} = R \cdot I(t) \]
\[ \tau_m \frac{dV}{dt} = -(V(t) - V_{rest}) + R \cdot I(t) \]


\[ C \frac{dV}{dt} = -g_L (V - E_L) + I(t) - w \]
\[ \tau_w \frac{dw}{dt} = a (V - E_L) - w \]
Spike condition:
Parameters:

STDP updates a synaptic weight \(w\) based on spike timing.
Let the timing difference be: \[\Delta t = t_{post} - t_{pre}\]
\(t_{pre}\) = the time when the presynaptic neuron (the neuron sending the signal along the synapse) fires a spike.
\(t_{post}\) = the time when the postsynaptic neuron (the neuron receiving the signal) fires a spike.
Updates a synaptic weight based on spike timing.
Let the timing difference be: \[\Delta t = t_{post} - t_{pre}\]
The closer the spikes occur in time, the larger the change in weight.
\[ \Delta w = \begin{cases} A_+ e^{-\Delta t / \tau_+}, & \Delta t > 0 \\ -A_- e^{\Delta t / \tau_-}, & \Delta t < 0 \end{cases} \]
Weight dependence (multiplicative STDP)
The update depends on the current synaptic strength \(w\), e.g.: \[\Delta w = f(\Delta t)\,g(w)\]
Homeostatic scaling (stabilize activity)
If \(w\) too much/too little, \(w\) are scaled to push toward a target rate \(r^{*}\): \[w_i \leftarrow w_i\left(1+\eta\,(r^{*}-r)\right)\] where \(r\) is the current firing rate and \(\eta\) is a small step size.
Metaplasticity (plasticity of plasticity)
The ability to learn changes over time: parameters like learning rate or thresholds adapt based on activity/history, e.g.: \[A_{+}=A_{+}(r), \qquad A_{-}=A_{-}(r)\]
Hebb postulate, from Empirical findings (Donald Hebb, 1949):
“When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A ’s efficiency, as one of the cells firing B , is increased”
Also:
“Cells that fire together, wire together.”

Assume a simple model of a neuron \(j\), represented as
\[ y_j = \mathbf{w_j}^\top \mathbf{x} \]
\[ \Delta w_{ij} = \eta y_j x_i \]
\[ \Delta \mathbf{w_{j}} = \eta \, ( \mathbf{w_j}^\top \mathbf{x} ) \mathbf{x} \]
If we use a set of data patterns for learning \(\mathbf{S}\): \[ \Delta \mathbf{w_{j}} = \sum_s ( \mathbf{w_j}^\top \mathbf{x^s} ) \mathbf{x^s} \equiv \eta \langle ( \mathbf{w_j}^\top \mathbf{x} ) \mathbf{x} \rangle_S \]
\[ \Delta w_{ij} = \eta \, x_i \, y_j \]
\(x_i\) : presynaptic activity
\(y_j\) : postsynaptic activity
\(\eta\): learning rate
Vector form for one postsynaptic unit with input vector \(\mathbf{x}\) and output \(y\):
\[ \Delta \mathbf{w} = \eta \, y \, \mathbf{x} \]
\[ \Delta \mathbf{w_{j}} = \eta \langle ( \mathbf{w_j}^\top \mathbf{x} ) \mathbf{x} \rangle_S \]
\[ \Delta \mathbf{w_{j}} = \eta \langle ( \mathbf{w_j}^\top \mathbf{x} ) \mathbf{x} \rangle_S \]
\[ \Delta \mathbf{w_{j}} = \eta \langle ( \mathbf{w_j}^\top \mathbf{x} ) \mathbf{x} \rangle_S \]
\[ \Delta \mathbf{w_{j}} = \eta \langle ( \mathbf{w_j}^\top \mathbf{x} ) \mathbf{x} \rangle_S \]
\[ \lVert \mathbf{w}_{t+1} \rVert^2 = \lVert \mathbf{w}_t \rVert^2 + 2\eta\, y\, \mathbf{w}_t^\top \mathbf{x} + \eta^2 y^2 \lVert \mathbf{x} \rVert^2 \]
\[ \Delta w_{ij} = \eta \, y_j \, (x_i - y_j w_{ij}) \]
\[ \Delta w_{i} = \eta \, y \, (x_i - y w_{i}) \]
\[ \Delta w_{i} = \eta \, y \, \bigl(x_i - \cancel{y w_{i}}\bigr) = \eta y x_i \]
\[ \Delta w_{i} = \eta \, y \, \bigl(\cancel{x_i} - y w_{i}\bigr) = - \eta y^2 w_i \]
\[ y = \mathbf{w}^\top \mathbf{x} \]
\[ \eta^{-1} \Delta \mathbf{w} = y \mathbf{x} - y^2 \mathbf{w} \]
We can evaluate three aspects of the Oja correction:
Let’s simplify a bit the update rule
\[ \begin{eqnarray} \eta^{-1} \Delta \mathbf{w} &=& \langle y \mathbf{x} - y^2 \mathbf{w} \rangle \\ &=& \langle ( \mathbf{w}^\top \mathbf{x}) \mathbf{x} - ( \mathbf{w}^\top \mathbf{x})^2 \mathbf{w} \rangle \\ &=& \langle \mathbf{x} ( \mathbf{w}^\top \mathbf{x}) - ( \mathbf{w}^\top \mathbf{x}) ( \mathbf{w}^\top \mathbf{x}) \mathbf{w} \rangle \\ &=& \langle \mathbf{x} ( \mathbf{x}^\top \mathbf{w}) - ( \mathbf{w}^\top \mathbf{x}) ( \mathbf{x}^\top \mathbf{w}) \mathbf{w} \rangle \\ &=& \langle ( \mathbf{x} \mathbf{x}^\top ) \mathbf{w} - ( \mathbf{w}^\top ( \mathbf{x} \mathbf{x}^\top ) \mathbf{w}) \mathbf{w} \rangle \\ &=& \langle \mathbf{x} \mathbf{x}^\top \rangle \mathbf{w} - ( \mathbf{w}^\top \langle \mathbf{x} \mathbf{x}^\top \rangle \mathbf{w}) \mathbf{w} \\ &=& C \mathbf{w} - ( \mathbf{w}^\top C \mathbf{w}) \mathbf{w} \\ \end{eqnarray} \]
So
\[ \eta^{-1} \Delta \mathbf{w} = (C-\mathbf{w}^\top C \mathbf{w} ) \mathbf{w} \]
Given by \[
\eta^{-1} \Delta \mathbf{w} = 0
\] So \[
C\mathbf{w} = (\mathbf{w}^\top C \mathbf{w} ) \mathbf{w}
\] if we define \(\lambda = \mathbf{w}^\top C \mathbf{w}\) we retreive a standard eigenvalues problem: \[
C\mathbf{w} = \lambda \mathbf{w}
\] in which we see that
\[
\lambda = \mathbf{w}^\top C \mathbf{w} = \mathbf{w}^\top \lambda \mathbf{w} = \lambda ||\mathbf{w} ||^2
\]
So in stationary case the norm of \(\mathbf{w}\) must be 1.
In summary, in stationary values of the weights \(\eta^{-1} \langle \Delta \mathbf{w} = 0 \rangle\):
We have this dynamics: \[ \eta^{-1} \Delta \mathbf{w} = C\mathbf{w} - (\mathbf{w}^\top C \mathbf{w} ) \mathbf{w} \]
Then, we can perturbate \(\mathbf{w}\) around \(e_\alpha\) to see its dynamics stability points:
\[ \mathbf{w} = e_\alpha + \varepsilon \]
Substitute into \(\eta^{-1} \Delta \mathbf{w}\)
Substitute into \(\eta^{-1} \Delta \mathbf{w}\), you will find1: \[ \begin{aligned} \eta^{-1}\Delta \mathbf{w} &= C(e_\alpha+\varepsilon) -\bigl((e_\alpha+\varepsilon)^\top C(e_\alpha+\varepsilon)\bigr)(e_\alpha+\varepsilon) \\ &= (Ce_\alpha + C\varepsilon) -\Bigl(e_\alpha^\top C e_\alpha + e_\alpha^\top C\varepsilon + \varepsilon^\top C e_\alpha + \varepsilon^\top C\varepsilon\Bigr)(e_\alpha+\varepsilon) \\ &= Ce_\alpha + C\varepsilon -(e_\alpha^\top C e_\alpha)e_\alpha -(e_\alpha^\top C e_\alpha)\varepsilon -(e_\alpha^\top C\varepsilon)e_\alpha -(\varepsilon^\top C e_\alpha)e_\alpha + \mathcal{O}(\|\varepsilon\|^2) \\ &= \lambda_\alpha e_\alpha + C\varepsilon -\lambda_\alpha e_\alpha -\lambda_\alpha \varepsilon -(e_\alpha^\top C\varepsilon)e_\alpha -(\varepsilon^\top C e_\alpha)e_\alpha + \mathcal{O}(\|\varepsilon\|^2) \\ &= (C-\lambda_\alpha I)\varepsilon -\Bigl(e_\alpha^\top C\varepsilon+\varepsilon^\top C e_\alpha\Bigr)e_\alpha + \mathcal{O}(\|\varepsilon\|^2) \\ &= (C-\lambda_\alpha I)\varepsilon -2\lambda_\alpha (e_\alpha^\top \varepsilon)\,e_\alpha + \mathcal{O}(\|\varepsilon\|^2), \end{aligned} \]
So
\[ \eta^{-1}\Delta \mathbf{w} = (C-\lambda_\alpha I)\varepsilon -2\lambda_\alpha (e_\alpha^\top \varepsilon)\,e_\alpha + \mathcal{O}(\|\varepsilon\|^2) \]
We can simplify by projecting againt one eigenvector \(e_\beta\); \[ \begin{aligned} \eta^{-1} e_\beta^\top \Delta \varepsilon &\approx e_\beta^\top\Bigl[(C-\lambda_\alpha I)\varepsilon -2\lambda_\alpha (e_\alpha^\top \varepsilon)\,e_\alpha\Bigr] \\[4pt] &= e_\beta^\top C\varepsilon -\lambda_\alpha e_\beta^\top \varepsilon -2\lambda_\alpha (e_\alpha^\top \varepsilon)\, e_\beta^\top e_\alpha \\[4pt] &= \lambda_\beta e_\beta^\top \varepsilon -\lambda_\alpha e_\beta^\top \varepsilon -2\lambda_\alpha (e_\alpha^\top \varepsilon)\,\delta_{\alpha\beta} \\[4pt] &= (\lambda_\beta-\lambda_\alpha)\,e_\beta^\top \varepsilon -2\lambda_\alpha \delta_{\alpha\beta}\, e_\alpha^\top \varepsilon \\[8pt] &= \begin{cases} -2\lambda_\alpha\, e_\alpha^\top \varepsilon, & \beta=\alpha,\\[4pt] (\lambda_\beta-\lambda_\alpha)\, e_\beta^\top \varepsilon, & \beta\neq \alpha. \end{cases} \end{aligned} \]
We have now
\[ \begin{aligned} \eta^{-1} e_\beta^\top \Delta \varepsilon &\approx \begin{cases} -2\lambda_\beta\, e_\beta^\top \varepsilon, & \beta=\alpha,\\[4pt] (\lambda_\beta-\lambda_\alpha)\, e_\beta^\top \varepsilon, & \beta\neq \alpha. \end{cases} \end{aligned} \]
then:
\[ \begin{aligned} e_\beta^\top \Delta \varepsilon &= e_\beta^\top ( \varepsilon^{n+1} - \varepsilon^{n} ) \\ &= (e_\beta^\top \varepsilon)^{n+1} - (e_\beta^\top \varepsilon)^{n} ) \\ &= \Delta (e_\beta^\top \varepsilon) \end{aligned} \]
let’s define: \[ \begin{aligned} \mathbf{s}_\beta& := e_\beta^\top \varepsilon \\ \mathbf{\kappa}_{\alpha \beta}& := \begin{cases} \eta (-2\lambda_\alpha) & \beta=\alpha,\\[4pt] \eta^ (\lambda_\beta-\lambda_\alpha) & \beta\neq \alpha. \end{cases} \end{aligned} \]
then
\[ \Delta \mathbf{s}_\beta \sim \mathbf{\kappa}_{\alpha \beta} \mathbf{s}_\beta \]
So, we have three cases :)
Case 1 \(\begin{aligned} \beta &\neq \alpha \\ \lambda_\beta &< \lambda_\alpha\end{aligned}\)
\[ \eta^{-1} e_\beta^\top \Delta \varepsilon \approx \begin{cases} -2\lambda_\beta\, e_\beta^\top \varepsilon, & \beta=\alpha,\\[4pt] (\lambda_\beta-\lambda_\alpha)\, e_\beta^\top \varepsilon, & \beta\neq \alpha. \end{cases} \]
Case 2 \(\begin{aligned} \beta &\neq \alpha \\ \lambda_\beta &> \lambda_\alpha\end{aligned}\)
\[ \eta^{-1} e_\beta^\top \Delta \varepsilon \approx (\lambda_\beta-\lambda_\alpha)\, e_\beta^\top \varepsilon \]
Case 3 \(\begin{aligned} \beta &\neq \alpha \\ \lambda_\beta &= \lambda_\alpha\end{aligned}\)
\[ \eta^{-1} e_\beta^\top \Delta \varepsilon \approx (\lambda_\beta-\lambda_\alpha)\, e_\beta^\top \varepsilon = 0 \]
Decompose the weight direction around an eigenvector \(e_\alpha\) with \(\lambda_\alpha\):
If \(\lambda_\beta < \lambda_\alpha\): perturbations shrink \(\Rightarrow\) \(e_\alpha\) is stable against lower-eigenvalue directions.
If \(\lambda_\beta > \lambda_\alpha\): perturbations grow \(\Rightarrow\) \(e_\alpha\) is unstable (pushed toward larger eigenvalues).
If \(\lambda_\beta = \lambda_\alpha\): perturbations are neutral \(\Rightarrow\) any direction in the degenerate eigenspace can persist.
Oja’s rule drives \(\mathbf{w}\) toward the principal eigenvector (largest \(\lambda\)); only the top-eigenvalue subspace is asymptotically stable.

b2slab.upc.edu alexandre.perera@upc.edu