Differences

This shows you the differences between two versions of the page.

--- klderivation [2024/12/24 12:45] – pedroortega
+++ klderivation [2024/12/24 12:58] (current) – [Connecting to the free energy objective] pedroortega
@@ Line 19: / Line 19: @@
 ===== Assumption 1: Temporal progress as conditioning =====
-First we need to model temporal progress of any kind. We'll go with a "spacetime" representation that is standard in measure theory. This works as follows. We assume that we have a collection of all the possible realizations of a process of interest. This is our sample space $\Omega$. To make things simple, let's assume this set is finite. We also place a probability distribution $P$ over all the realizations $\omega \in \Omega$.
+First we need to model temporal progress of any kind. We'll go with a "spacetime" representation that is standard in measure theory. This works as follows. We assume that we have a collection of all the possible realizations of a process of interest. This is our sample space $\Omega$. To make things simple, let's assume this set is finite (but potentially huge). We also place a probability distribution $P$ over all the realizations $\omega \in \Omega$.
 Now, any event –be it a choice, an observation, a thought, etc.– is a subset of $\Omega$. Whenever an event $e \subset \Omega$ occurs, we condition our sample space by $e$. This means that we restrict our focus only on the elements $\omega \in e$ inside the event, and then renormalize our probabilities:
@@ Line 37: / Line 37: @@
 ===== Assumption 2: Restrictions on the cost function =====
-Next, we'll impose constraints on the cost function. We want our cost function to capture efforts that are structurally consistent with the underlying probability space. The following requirements are natural:
+Next, we'll impose constraints on the cost function. We want our cost function to capture efforts that are structurally consistent with the underlying probability space. (Later, we'll see how to relax these assumptions without compromising these structural constraints.) The following requirements are natural:
 {{ ::cost-axioms.png?nolink |}}
@@ Line 65: / Line 65: @@
 ===== Cost of deliberation =====
-Now, let's calculate the cost of transforming the prior choice probabilities into posterior choice probabilities:
+Now, based on our sketch above, let's calculate the cost of transforming the prior choice probabilities into posterior choice probabilities:
 \[
   \begin{align}
@@ Line 82: / Line 82: @@
 We've obtained two expectation terms. The second is proportional to the Kullback-Leibler divergence between of the posterior to the prior choice probabilities. What is the first expectation?
-The first expectation represents the expected cost of each individual choice. This is because each term $C(x \cap d|x \cap c)$ measures the cost of transforming the relative probability of a specific choice.
+The first expectation represents the expected cost of each individual choice (if each choice were to occur deterministically). This is because each term $C(x \cap d|x \cap c)$ measures the cost of transforming the relative probability of a specific choice.
 ===== Connecting to the free energy objective =====
@@ Line 88: / Line 88: @@
 We can transform the above equality into a variational principle by replacing the individual choice costs $C(x \cap d|x \cap c)$ with arbitrary numbers. The resulting expression is convex in the posterior choice probabilities $P(x|d)$, so we get a nice and clean objective function with a unique minimum.
-We can even go a step further: by multiplying the expression by $-1$, we can treat the costs as utilities, obtaining
+We can even go a step further: noticing that the variational problem is translationally invariant in the costs, and multiplying the expression by $-1$, we can treat the resulting "negative costs plus a constant" as utilities, obtaining
 \[
   \sum_x P(x|d) U(x) - \frac{1}{\beta} \sum_x P(x|d) \log \frac{ P(x|d) }{ P(x|c) }.