Introduction#
Unsupervised Learning (UL) is a branch of machine learning where the inputs are not labeled. Generally speaking, the goal of UL is to find some sort of "hidden" (sometimes called "latent") structure in the dataset.
The two most common tasks in UL are dimensionality reduction and clustering. Parts of the field known as generative modeling also fit under the umbrella of unspervised learning.
-
Dimensionality Reduction consists of techniques to, as the name suggests, reduce the dimension of the data. Concretely, suppose your data consists of vectors that are in \(\mathbb{R}^d\). Dimensionality reduction procedures will, for each data point \(x_i\in\mathbb{R}^d\) in your dataset, create a corresponding vector \(\tilde{x}_i\in \mathbb{R}^k\) where \(k\) is much smaller than \(d\).
-
Clustering consists of methods to partition the data into subgroups that are disjoint (i.e., do not overlap). After clustering has been performed the discovered groups can be analyzed and compared.
-
Generative Modeling is a subfield of machine learning that aims to learn the probability distribution that generated the dataset. This can be useful for trying to generate synthetic examples of data. In addition some techniques for generative modeling, such as variational autoencoders (VAEs), can also be used for dimensionality reduction.