Differences
This shows you the differences between two versions of the page.
belief_flows [2023/11/19 17:21] – created - external edit 127.0.0.1 | belief_flows [2025/04/08 15:20] (current) – [Belief flows] pedroortega | ||
---|---|---|---|
Line 52: | Line 52: | ||
- **Prior:** We place a Gaussian distribution $P(w)$ to represent our parameter uncertainty. To simplify our exposition, we assume that the covariance matrix is diagonal, and so \[ P(w) = N(w; \mu, \Sigma) = \prod_n N(w_n; \mu_n, \sigma^2_n), | - **Prior:** We place a Gaussian distribution $P(w)$ to represent our parameter uncertainty. To simplify our exposition, we assume that the covariance matrix is diagonal, and so \[ P(w) = N(w; \mu, \Sigma) = \prod_n N(w_n; \mu_n, \sigma^2_n), | ||
- **Parameter choice:** The learning algorithm now has to choose model parameters to minimize the prediction error. It does so using Thompson sampling, that is, by sampling a parameter vector $w'$ from the prior distribution: | - **Parameter choice:** The learning algorithm now has to choose model parameters to minimize the prediction error. It does so using Thompson sampling, that is, by sampling a parameter vector $w'$ from the prior distribution: | ||
- | - **Evaluation of Loss and Local Update:** Once the parameter is chosen, the learning algorithm is given a supervised pair $(x, y)$ that is can use to evaluate the loss $\ell(y, \hat{y})$, where $\hat{y} = F_{\bar{w}}(x)$ is the predicted output. Based on this loss, the learning algorithm can calculate the update of the parameter $\bar{w}$ using SGD: \[ \bar{w}' | + | - **Evaluation of Loss and Local Update:** Once the parameter is chosen, the learning algorithm is given a supervised pair $(x, y)$ that it can use to evaluate the loss $\ell(y, \hat{y})$, where $\hat{y} = F_{\bar{w}}(x)$ is the predicted output. Based on this loss, the learning algorithm can calculate the update of the parameter $\bar{w}$ using SGD: \[ \bar{w}' |
- **Global Update:** Now, the algorithm has to change its prior beliefs $P(w)$ into posterior beliefs $P' | - **Global Update:** Now, the algorithm has to change its prior beliefs $P(w)$ into posterior beliefs $P' | ||
- If we assume a quadratic error function with uncorrelated coordinates, | - If we assume a quadratic error function with uncorrelated coordinates, |