Wednesday 22 June 2022

MathWithNaziaa : Mathematics in Machine Learning

Machine Learning

Machine learning is, without a doubt, the most known application of artificial intelligence (AI). The main idea behind machine learning is giving systems the power to automatically learn and improve from experience without being explicitly programmed to do so. Machine learning functions through building programs that have access to data (constant or updated) to analyze, find patterns and learn from. Once the programs discover relationships in the data, it applies this knowledge to new sets of data.

Linear algebra has several applications in machine learning, such as loss functions, regularization, support vector classification, and much more. In this article, however, we will only cover linear algebra in loss functions.

Loss Function

The way machine learning algorithms work is, they collect data, analyze it, and then build a model using one of many approaches (linear regression, logistic regression, decision tree, random forest, etc.). Then, based on the results, they can predict future data queries.

How can you measure the accuracy of your prediction model?

Using linear algebra, in particular using loss functions. The loss function is a method of evaluating how accurate your prediction models are. Will it perform well with new datasets? If your model is totally off, your loss function will output a higher number. Where if it were a good one, the loss function would output a lower amount.

Regression is modeling a relationship between a dependent variable, Y, and several independent variables, Xi’s. After this relation is plotted, we try to fit a line in space on these variables, and then use this line to predict future values of Xi’s. There many types of loss functions, some of which are more complicated than others; however, the most commonly used two are Mean Squared Error and Mean Absolute Error.

  • Mean Squared Error

Mean Squared Error (MSE) is probably the most used loss error approach, easy to understand and implement and generally works quite well in most regression problems. Most Python libraries used in data science, Numpy, Scikit, and TensorFlow have their own built-in implementation of the MSE functionality. Despite that, they all work based on the same equation:

Icon

Description automatically generated

Where N is the number of data points in both the observed and predicted values.

Steps of calculating the MSE:

  1. Calculate the difference between each pair of the observed and predicted values.
  2. Take the square of the difference.
  3. Add the squared differences together to find the cumulative value.
  4. Calculate the average error of the cumulative sum.
  5. Mean Absolute Error

The Mean Absolute Error (MAE) is quite similar to the MSE; the difference is, we calculate the absolute difference between the observed data and the protected one.

A picture containing text

Description automatically generated

The MAE cost is more robust compared to MSE. However, a disadvantage of MAE is that handling the absolute or modulus operator in mathematical equations is not easy. Yet, MAE is the most intuitive of all the loss function calculating methods.

Computer Vision

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world by using images, videos, and deep learning models. Doing so allows algorithms can accurately identify and classify objects — or in other words, they will be able to “see” — visual data.

In computer vision, linear algebra is used in applications such as image recognition, some image processing techniques including image convolution and image representation as tensors — or as we call them in linear algebra, vectors

Image Convolution

I know what you’re thinking, “convolution” sounds like a very advanced term, so it must be something complex. The truth is, it’s really not. Whether you already did some computer vision or not, I am quite sure that you either did image convolution or saw one. Have you ever blurred or smoothing an image, that’s convolution!

Convolutions are one of the fundamental building-blocks in computer vision in general and image processing in particular. To simply put it, convolution is an element-wise multiplication of two matrices followed by a sum. In image processing applications, images are represented by a multi-dimensional array. Multi-dimensional because it has rows and columns representing the pixels of the image as well as other dimensions for the color data. For example, RGB images have a depth of 3, used to describe any pixel’s corresponding red, green, and blue color.

One way of thinking of image convolution is considering the image as a big matrix and kernel (convolutional matrix) as a small matrix that is used for blurring, sharpening, edge detection, or any other image processing functions. So, what happens is this kernel passes on top of the image sliding from left-to-right, top-to-bottom motion. While doing that, it applies some mathematical operation at each (x, y) coordinate of the image to produce a convoluted image.

Different kernels perform different types of image convolutions. Kernels are always square matrices. They often have size 3x3; it can be reshaped based on the image dimensions.

Word Embedding

Computers can’t understand text data; that’s why to perform any NLP techniques on text; we need to represent the test data numerically. Here’s where algebra comes in! Word Embedding is a type of word representation that allows words with similar meaning to be understood by machine learning algorithms.

Okay, where’s the math?

Word Embeddings is a way of representing words as vectors of numbers while preserving their context in the document. These representations are obtained by training different neural networks on a large amount of text called a corpus. It is a language modeling learning technique. One of the most used word embedding technique is called word2vec.

Word2vec is a technique to produce word embedding for better word representation. It does that by capturing a large number of precise syntactic and semantic words relationships. Word2Vev learns word meanings by checking its surrounding context. Word2Vec utilizes two methods

1.      Continuous bag of words: predicts the current word given context words within a specific window.

2.      Skip gram: predicts the surrounding context words within a specific window given the current word.

Dimensionality Reduction

We live today in a world surrounded by massive amounts of data, data that needs to processes, analyzed, and stored. Data with a large number of features, if we are talking about image data, then it a high-resolution image or video, which then translates to huge matrices of numbers. Dealing with big matrices of numbers can be a challenging task even for supercomputers, that’s why we sometimes reduce the original data to a smaller subset of features that are the most relevant to our application.

The most famous approach to reduce the dimensions of data is called Singular-Value Decomposition (SVD). SVD is a matrix decomposition method used for reducing a matrix to its essential parts to make matrix calculations simpler. SVD is the basic concept of the Latent Semantic Analysis (LSA) technique. LSA is a technique used in topic modeling, which is an unsupervised ML technique used to match words to topics across various text documents. In NLP, topics are represented as clusters of synonyms — related words. A topic model analyzes the various topics, their distributions in each document, and the frequency of different words they contain.

Steps of Topic Modeling:

1.      Create a matrix representation of a text document.

2.      Use SVD to break the text matrix into three sub-matrices: Document-Topic matrix, Topic Importance Diagonal Matrix, and Topic-term matrix.

3.      Reduce the dimensions of the matrices based on the importance of the topics.

These smaller matrices are then used to match words to the topic and create a distribution of words and topics.

No comments:

Post a Comment