Introduction

It’s generative models vs. discriminative models.
Generative modeling is a branch of machine learning that involves training a model to produce new data that is similar to a given dataset.
Generative models require a lot of training data → each data point is called an “observation”.
Generative models are probabilistic.
- A generative model must include a random component that influences the individual samples generated by the model.
- In other words, we can imagine that there is some unknown probabilistic distribution that explains why some images are likely to be found in the training dataset and other images are not. It is our job to build a model that mimics this distribution as closely as possible and then sample from it to generate new, distinct observations that look as if they could have been included in the original training set.
In discriminative modeling, $p(y|\mathbf{x})$, each data point comes with a label, in generative modeling, $p(\mathbf{x})$, there’s no label.
- There’s conditional generative models though, $p(\mathbf{x}|y)$: Ex. if our dataset contains different types of fruit, we could tell our generative model to specifically generate an image of an apple.
Why generative AI is important?

Representation Learning

Representation Learning → Instead of trying to model the high-dimensional sample space directly, we describe each observation in the training set using some lower-dimensional latent space and then learn a mapping function that can take a point in the latent space and map it to a point in the original domain. In other words, each point in the latent space is a representation of some high-dimensional observation.
- Example: To us, it is obvious that there are two features that can uniquely represent each of these tins: the height and width of the tin. That is, we can convert each image of a tin to a point in a latent space of just two dimensions, even though the training set of images is provided in high-dimensional pixel space. Notably, this means that we can also produce images of tins that do not exist in the training set, by applying a suitable mapping function $f$ to a new point in the latent space.

<aside> 💡 The concept of encoding the training dataset into a latent space so that we can sample from it and decode the point back to the original domain is common to many generative modeling techniques. Mathematically speaking, encoder-decoder techniques try to transform the highly nonlinear manifold on which the data lies (e.g., in pixel space) into a simpler latent space that can be sampled from, so that it is likely that any point in the latent space is the representation of a well-formed image.

The dog manifold in high-dimensional pixel space is mapped to a simpler latent space that can be sampled from.

The dog manifold in high-dimensional pixel space is mapped to a simpler latent space that can be sampled from.

</aside>

Table of Content

Core Probability Theory

Sample Space → The sample space is the complete set of all values an observation $\mathbf{x}$ can take.

<aside> 💡 NOTE In our previous example (map of the world), the sample space consists of all points of latitude and longitude $\mathbf{x} = (x_1, x_2)$ on the world map. For example, $\mathbf{x} = (40.7306, –73.9352)$ is a point in the sample space (New York City) that belongs to the true data-generating distribution. $\mathbf{x} = (11.3493, 142.1996)$ is a point in the sample space that does not belong to the true data-generating distribution (it’s in the sea).

</aside>