Unsupervised Machine Learning | Introduction to Machine Learning, Part 2

unsupervised machine learning looks for patterns and datasets that don’t have labeled responses you’d use this technique when you want to explore your data but don’t yet have a specific goal or you’re not sure what information the data contains it’s also a good way to reduce the dimension of your data as we’ve previously discussed most unsupervised learning techniques are a form of cluster analysis which separates data into groups based on shared characteristics clustering algorithms fall into two broad groups hard clustering where each data point belongs to only one cluster and soft clustering where each data point can belong to more than one cluster for context here’s a hard clustering example say you’re an engineer building cell phone towers you need to decide where and how many towers to construct to make sure you’re providing the best signal reception you need to locate the towers within clusters of people to start you need an initial guess at the number of clusters to do this compare scenarios with three towers and four towers to see how well each is able to provide service because a phone can only talk to one tower at a time this is a hard clustering problem for this you could use k-means clustering because the k-means algorithm treats each observation in the data as an object having a location in space it finds cluster centers or means that reduce the total distance from data points to their cluster centers so that was hard clustering let’s see how you might use a soft clustering algorithm in the real world pretend you’re a biologist analyzing the genes involved in normal and abnormal cell division you have data from two tissue samples and you want to compare them to determine whether certain patterns of gene features correlate to cancer because the same genes can be involved in several biological processes no single gene is likely to belong to one cluster only apply a fuzzy C means algorithm to the data and then visualize the clusters to see which groups of genes behave in similar ways you can then use this model to help see which features correlate with normal or abnormal cell division this covers the two main techniques hard soft clustering for exploring data with unlabeled responses remember though that you can also use unsupervised machine learning to reduce the number of features or the dimensionality of your data you do this to make your data less complex especially if you’re working with data that has hundreds or thousands of variables by reducing the complexity of your data you’re able to focus on the important features and gain better insights let’s look at three common dimensionality reduction algorithms principal component analysis or PCA performs a linear transformation on the data so that most of the variance in your data set is captured by the first few principal components this could be useful for developing condition indicators for machine health monitoring factor analysis identifies underlying correlations between variables in your data set it provides a representation of unobserved latent or common factors factor analysis is sometimes used to explain stock price variation non-negative matrix factorization is used when model terms must represent non-negative quantities such as physical quantities if you need to compare a lot of text on webpages or documents this would be a good method to start with as text is either not present or occurs a positive number of times in this video we took a closer look at hard and soft clustering algorithms and we also showed why you’d want to use unsupervised machine learning to reduce the number of features in your data set as for your next steps unsupervised learning might be your end goal if you’re just looking to segment data a clustering algorithm is an appropriate choice on the other hand you might want to use unsupervised learning as a dimensionality reduction step for supervised learning in our next video we’ll take a closer look at supervised learning for now that wraps up this video don’t forget to check out the description below for more resources and links

1 Comment

Leave a Reply