Suppose two Bayesian agents are presented with the same spreadsheet – IID samples of data in each row, a feature in each column. Each agent develops a generative model of the data distribution. We’ll assume the two converge to the same predictive distribution, but may have different generative models containing different latent variables. We’ll also assume that the two agents develop their models independently, i.e. their models and latents don’t have anything to do with each other informationally except via the data. Under what conditions can a latent variable in one agent’s model be faithfully expressed in terms of the other agent’s latents?

Let’s put some math on that question.

The n “features” in the data are random variables

. By assumption the two agents converge to the same predictive distribution (i.e. distribution of a data point), which we’ll call . Agent ’s generative model must account for all the interactions between the features, i.e. the features must be independent given the latent variables in model . So, bundling all the latents together into one, we get the high-level graphical structure:

which says that all features are independent given the latents, under each agent’s model.

Now for the question: under what conditions on agent 1’s latent(s) can we *guarantee* that is expressible in terms of , no matter what generative model agent 2 uses (so long as the agents agree on the predictive distribution )? In particular, let’s require that be a function of . (Note that we’ll weaken this later.) So, when is *guaranteed* to be a function of , for *any* generative model which agrees on the predictive distribution ? Or, worded in terms of latents: when is *guaranteed* to be a function of , for *any* latent(s) which account for all interactions between features in the predictive distribution ?

## The Main Argument

must be a function of for *any* generative model which agrees on the predictive distribution. So, here’s one graphical structure for a simple model which agrees on the predictive distribution:

In English: we take to be , i.e. all but the feature. Since the features are always independent given all but one of them (because any random variables are independent given all but one of them), is a valid choice of latent . And since must be a function of for any valid choice of , we conclude that must be a function of . Graphically, this implies

By repeating the argument, we conclude that the same must apply for all :

Now we’ve shown that, in order to *guarantee* that is a function of for *any* valid choice of , and for to account for all interactions between the features in the first place, must satisfy at least the conditions:

… which are exactly the (weak) __natural latent conditions__, i.e. *mediates* between all ’s and all information about is *redundantly represented* across the ’s. From the standard __Fundamental Theorem of Natural Latents__, we also know that the natural latent conditions are almost sufficient^{[1]}: they don’t quite guarantee that is a function of , but they guarantee that is a *stochastic function* of , i.e. can be computed from plus some noise which is independent of everything else (and in particular the noise is independent of ).

… so if we go back up top and allow for to be a stochastic function of , rather than just a function, then the natural latent conditions provide necessary and sufficient conditions for the guarantee which we want.

## Approximation

Since we’re basically just invoking the Fundamental Theorem of Natural Latents, we might as well check how the argument behaves under approximation.

The standard approximation results allow us to relax both the mediation and redundancy conditions. So, we can weaken the requirement that the latents *exactly* mediate between features under each model to allow for *approximate* mediation, and we can weaken the requirement that information about be *exactly* redundantly represented to allow for *approximately* redundant representation. In both cases, we use the KL-divergences associated with the relevant graphs in the previous sections to quantify the approximation. The standard results then say that is approximately a stochastic function of , i.e. contains all the information about relevant to to within the approximation bound (measured in bits).

The main remaining loophole is the tiny mixtures problem: arguably-small differences in the two agents’ predictive distributions can sometimes allow large failures in the theorems. On the other hand, our two hypothetical agents could in-principle resolve such differences via experiment, since they involve different predictions.

## Why Is This Interesting?

This argument was one of our earliest motivators for natural latents. It’s still the main argument we have which singles out natural latents *in particular* – i.e. the conclusion says that the natural latent conditions are not only *sufficient* for the property we want, but *necessary*. Natural latents are the only way to achieve the guarantee we want, that our latent can be expressed in terms of *any* other latents which explain all interactions between features in the predictive distribution.

^{^}Note that, in invoking the Fundamental Theorem, we also implicitly put weight on the assumption that the two agents’ latents have nothing to do with each other except via the data. That particular assumption can be circumvented or replaced in multiple ways – e.g. we could instead construct a new latent via resampling, or we could add an assumption that either or has low entropy given .

Source link

#Minimal #Motivation #Natural #Latents #Alignment #Forum

Unlock the potential of cutting-edge AI solutions with our comprehensive offerings. As a leading provider in the AI landscape, we harness the power of artificial intelligence to revolutionize industries. From machine learning and data analytics to natural language processing and computer vision, our AI solutions are designed to enhance efficiency and drive innovation. Explore the limitless possibilities of AI-driven insights and automation that propel your business forward. With a commitment to staying at the forefront of the rapidly evolving AI market, we deliver tailored solutions that meet your specific needs. Join us on the forefront of technological advancement, and let AI redefine the way you operate and succeed in a competitive landscape. Embrace the future with AI excellence, where possibilities are limitless, and competition is surpassed.