Demystifying AI/ML algorithms – Part III: Selfies, the Unsupervised.

About the series

In this third part of the series on the basics of AI/ML algorithms, I would deal with so called Un Supervised algorithms, which I refer to as Selfies. ‘Seen it before’ or Supervised algorithms were the subject of discussions in the second part ( https://ai-positive.com/2024/10/20/demystifying-ai-ml-algorithms-part-ii-supervised-algorithms-2/). The series started with my treatment of Good-Old-Fashioned-AI that gave a real start to practical use of AI (https://ai-positive.com/2024/08/28/understanding-gofai-rules-rule-and-symbolic-reasoning-in-ai/).

Getting rid of teacher

All algorithms we discussed in the second part require data with label, the outcome which the teacher as a trainer relate to the other input variables in the data to identify pattern. The output of the machine learning algorithms is a trained model which when fed with a new data can predict the label to which the new data belong, hence the outcome. Finding a good teacher is always a challenging task. What if we must automatically find the pattern in the data without explicit label?

Child learns within the first 3 years after born without a real teacher! They observe, listen, touch, taste, and smell everything they encounter which helps them learn about their environment. Imitation and play enable child to learn quickly. The logic of learning is already there in child’s mind. It becomes human nature to group things together or categorize them to make better sense of things. We see stars and constellations appear. Unsupervised algorithms in that sense are selfies which find hidden patterns, structures, and relationship within the data. There are several popular unsupervised learning algorithms widely used for machine learning. Let us look at most common ones.

K-Means Clustering: This algorithm partitions data into K distinct clusters based on the distance to the centroids of those clusters. This is like separating items of particular colour from mixed items of various colours. K indicates number of clusters to be formed. Popular implementations of K-Means clustering algorithms are:

Customer Segmentation: Grouping customers based on purchasing behaviour for targeted marketing.

Image Compression: Reducing the number of colours in an image by clustering similar colours together.

Document Classification: Organizing documents into topics or categories based on their content.

Hierarchical Clustering: It is a method of clustering that builds a hierarchy of clusters, which can be visualized as a dendrogram. Hierarchical clustering works something like this. Suppose you have group of students in a class and want to form groups based on similar interests for club activities. Initially each student is a cluster. Identify how similar each student is to every other student based on their interests. Group together two students who have most similar interests. Now find similarity of this new group to other students or other groups so formed. Repeat the process until everyone is part of any of the groups.

 Typical use cases of Hierarchical Clustering are:

Gene Expression Analysis: Grouping genes with similar expression patterns in biological research.

Market Research: Segmenting markets based on consumer preferences and behaviours.

Social Network Analysis: Identifying communities within social networks.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise): DBSCAN groups together points that are close to each other based on their density and marks points that are in low-density regions as outliers/ noise. It uses parameters like radius and minimum points to define dense regions and expands clusters from core points.

Suppose you are at a crowded party, and you want to figure out who are loners. Starting with a person check who are all close by. Anyone within the close range is part of same group. For each of the new group members, you again see who is close to them and keep adding them to the group. If someone is not near enough to form a group, they are considered an outsider or noise. The process is to be repeated till everyone at the party is either part of a group or classified as noise.

It is quite natural that DBSCAN is used for these applications:

Anomaly Detection: Identifying outliers in datasets, such as fraudulent transactions.

Geospatial Analysis: Detecting clusters of geographic locations, like hotspots in crime data.

Astronomy: Clustering stars or galaxies based on their characteristics.

Apriori Algorithm: Apriori algorithm is a learning method which discovers frequent itemsets in data. It then generates several association rules for those set of items. By calculating two factors – namely ‘confidence’ and ‘lift,’ Apriori algorithm eliminates rules which do not meet the minimum requirement and retains only those rules that qualify.

This algorithm works in context of a supermarket to identify items which are bought together frequently. It looks first at individual items such as soap or shampoo and counts how often they are bought. Retaining items that are bought frequently enough (above a particular number of times in a period like week), the algorithm looks at pairs of these times such as ‘soap and shampoo’ to see how often they are bought together. Retaining further only those pairs bought together frequently enough, the algorithm looks for larger sets of items like soap, shampoo and possibly conditioner and repeats the process. It keeps expanding and counting sets of items filtering out the ones that are not bought often enough together. The process results in item sets that are frequently bought together which helps to understand customer behaviour to make decisions by supermarket management.

Apriori Algorithm is used for:

Market Basket Analysis: Finding frequent item sets in transactional data to understand buying patterns.

Recommendation Systems: Suggesting products to customers based on their purchase history as well as other customer’s purchase history.

Web Usage Mining: Identifying common patterns in web navigation behaviour.

Self-Organizing Maps (SOM): SOM is used for clustering and visualization of high-dimensional data, i.e., data with several features/ variables. Preserving the topological structure of the original data, it creates a lower dimensional grid of computational units (called neurons) making it easier to identify patterns, clusters, and relationships.

SOM can be used to visualize the similarities between songs breaking down higher dimensional features such as Tempo, Genre, Duration, Energy, Danceability, Loudness, Musical key, Acousticness, Valence, Instrumentalness and Speechiness into a 2 dimensional grid with top-left cluster containing songs with high energy, fast tempo and high danceability (dance and electronic music category) and bottom right with high acousticness, low energy and high instrumentalness (classical, acoustic music category) and the centre containing songs with moderate energy, positive valence and high speechiness (pop and hip-hop music category).

Typical real-life applications of SOMs include:

Speech/ Handwriting Recognition: Recognizing patterns in complex datasets, such as speech or handwriting.

Social Network Analysis: Visualizing the structure of social networks and identifying communities or influential individual within the network.

Manufacturing Process: Used to monitor the health of machinery and detect potential failures based on the pattern of sensor data on temperature, vibration, and acoustic emissions and identifying deviations from the normal patterns indicating potential issues.

Principal Component Analysis (PCA): Reduces the dimensionality of data by transforming it into a new set of variables (principal components) that capture the most variance. Dimensionality refers to the number of features or variables in a dataset. In other words, PCA simplifies the data while preserving the variance as much as possible so that resulting data is easier to visualize and analyse. PCA is used as pre-processing step to reduce the number of features prior to applying any other machine learning algorithm to build a model.

Imagine you have a huge photo album, and each photo has several details like people, locations, activities, attires, and dates. It would be overwhelming to look through every photo and find key moments. We can identify key features or common themes such as weddings, birth days, vacations that most photos share and group them according to the theme. Choose a few representative photos from each group that capture the essence of the theme, which will highlight the key moments and people. PCA works like this.

Most common use cases of PCA are:

Face Recognition: Identifying prominent features in facial images for recognition, typically used by the police to identify a criminal from the description by a witness.

Stock Market Analysis: PCA is used to analyse and reduce the dimensionality of financial data, to identify the most key factors affecting stock prices and make informed decisions.

Environmental Studies: Analysing environmental data such as air quality and water pollution to determine the main sources of pollution to develop strategy for environmental protection.

Seen-it-before or Selfies – which way to go?

Selfies are the best for exploratory tasks such as customer segmentation, anonymity detection and market basket analysis when you do not have labels.

Selfies focus on exploration and discovering insights from data without pre-defined labels, but not on accuracy that can be obtained from Seen-it-before algorithms.

Selfie algorithms can be used to pre-train a model or extract features from data which then can be fed into Seen-it-before algorithms for building models with more accurate prediction.

There is also a cross between Seen-it-before algorithms and Selfies where a small amount of labelled data is combined with a large amount of unlabelled data to improve learning accuracy iteratively.

Ensemble method of using multiple models from both categories are used to arrive at the most accurate final model.


Leave a comment