Variational Free Energy
Last updated
Last updated
“Variational free energy is defined as a function of sensory outcomes and a probability density over their hidden causes.”
-Active Inference in Multiagent Systems; Chapter 4
Variational Free Energy is an Information Theory quantity derived from Bayes Theorem, but lets explore Bae’s Theorem which is much easier to understand:
The probability of Chill happening, conditional upon Netflix [P(chill | Netflix)] =
Probability of Netflix happening, conditional upon Chill [(P(Netflix | chill) *
Probability of Chill happening [P(chill)] all divided by
Probability of Netflix happening [P(Netflix)].
Variational Free Energy is measured through the application of Bayesian Statistics, which is a branch of statistics that emphasizes the degree of a priori knowledge about the outcome of an event. For reference:
Similar to probabilities graphed along a distribution (Normal, Gaussian, Pareto, etc.)…
….Bayesian probabilities can be graphed along a posterior distribution:
The probability distribution comes posterior, or after, relevant evidence has been accounted for:
In other words, what does the probability look like if we account for previously known data?
At a very basic level, this describes Learning.
The Prior probability of an event will occur before any new evidence is accounted for. When combined with the Likelihood and Data, integration – learning – of this conditional probability will provide a Posterior Distribution.
I.e. A summary of what you know after data has been observed.
Statistics can help separate signal from noise. A distribution is really just a set of predictions about how information (entropy) – or randomness – may exist in the world. In a very simplistic way, the term Variational refers to the variations amongst the unknowns and hidden states that will always exist because of (information) entropy.
So, the free energy corresponding with information entropy - which is minimized by all self-organizing systems - is Variational Free Energy.
It can be conceptualized as the margin of error in approximating the posterior distribution:
Essentially, it is a Prediction Error, which every system is trying to curtail by minimizing its Variational Free Energy.
Why?
Because minimizing Variational Free Energy enables the system to generate accurate Models of the World. Generating Models of the World, in turn, will facilitate the minimization of Free Energy.
The process is circular.