Machine Learning
Machine
learning is, without a doubt, the most known application of artificial
intelligence (AI). The main idea behind machine learning is giving systems the
power to automatically learn and improve from experience without being
explicitly programmed to do so. Machine learning functions through building
programs that have access to data (constant or updated) to analyze, find
patterns and learn from. Once the programs discover relationships in the data,
it applies this knowledge to new sets of data.
Linear algebra has several applications in
machine learning, such as loss functions, regularization, support vector
classification, and much more. In this article, however, we will only cover
linear algebra in loss functions.
Loss Function
The way
machine learning algorithms work is, they collect data, analyze it, and then
build a model using one of many approaches (linear regression, logistic
regression, decision tree, random forest, etc.). Then, based on the results,
they can predict future data queries.
How can you measure the accuracy of
your prediction model?
Using linear algebra, in particular
using loss functions. The loss function is a method of evaluating how accurate
your prediction models are. Will it perform well with new datasets? If
your model is totally off, your loss function will output a higher number.
Where if it were a good one, the loss function would output a lower amount.
Regression is modeling a
relationship between a dependent variable, Y, and several independent
variables, Xi’s. After this relation is plotted, we try to fit a line in space
on these variables, and then use this line to predict future values of Xi’s.
There many types of loss functions, some of which are more complicated than
others; however, the most commonly used two are Mean Squared Error and Mean
Absolute Error.
- Mean Squared Error
Mean Squared Error (MSE) is probably
the most used loss error approach, easy to understand and implement and
generally works quite well in most regression problems. Most Python libraries
used in data science, Numpy, Scikit, and TensorFlow have their own built-in
implementation of the MSE functionality. Despite that, they all work based on
the same equation:
Where N is the number of data points
in both the observed and predicted values.
Steps of calculating the MSE:
- Calculate the difference between each pair of the
observed and predicted values.
- Take the square of the difference.
- Add the squared differences together to find the
cumulative value.
- Calculate the average error of the cumulative sum.
- Mean Absolute Error
The Mean Absolute Error (MAE) is
quite similar to the MSE; the difference is, we calculate the absolute
difference between the observed data and the protected one.
The MAE cost is more robust compared
to MSE. However, a disadvantage of MAE is that handling the absolute or modulus
operator in mathematical equations is not easy. Yet, MAE is the most intuitive
of all the loss function calculating methods.
Computer Vision
Computer
vision is a field of artificial intelligence that trains computers to interpret
and understand the visual world by using images, videos, and deep learning
models. Doing so allows algorithms can accurately identify and classify objects
— or in other words, they will be able to “see” — visual data.
In
computer vision, linear algebra is used in applications such as image
recognition, some image processing techniques including image convolution and
image representation as tensors — or as we call them in linear algebra, vectors
Image Convolution
I know
what you’re thinking, “convolution” sounds like a very advanced term, so it
must be something complex. The truth is, it’s really not. Whether you already
did some computer vision or not, I am quite sure that you either did image
convolution or saw one. Have you ever blurred or smoothing an image, that’s
convolution!
Convolutions
are one of the fundamental building-blocks in computer vision in general and
image processing in particular. To simply put it, convolution is an
element-wise multiplication of two matrices followed by a sum. In image
processing applications, images are represented by a multi-dimensional array.
Multi-dimensional because it has rows and columns representing the pixels of
the image as well as other dimensions for the color data. For example, RGB
images have a depth of 3, used to describe any pixel’s corresponding red,
green, and blue color.
One way of
thinking of image convolution is considering the image as a big matrix and
kernel (convolutional matrix) as a small matrix that is used for blurring,
sharpening, edge detection, or any other image processing functions. So, what
happens is this kernel passes on top of the image sliding from left-to-right,
top-to-bottom motion. While doing that, it applies some mathematical operation
at each (x, y) coordinate of the image to produce a convoluted image.
Different kernels perform different types of image
convolutions. Kernels are always square matrices. They often have size 3x3; it
can be reshaped based on the image dimensions.
Word Embedding
Computers
can’t understand text data; that’s why to perform any NLP techniques on text;
we need to represent the test data numerically. Here’s where algebra comes in!
Word Embedding is a type of word representation that allows words with similar
meaning to be understood by machine learning algorithms.
Okay,
where’s the math?
Word
Embeddings is a way of representing words as vectors of numbers while
preserving their context in the document. These representations are obtained by
training different neural networks on a large amount of text called a corpus.
It is a language modeling learning technique. One of the most used word
embedding technique is called word2vec.
Word2vec is a technique to produce word embedding
for better word representation. It does that by capturing a large number of
precise syntactic and semantic words relationships. Word2Vev learns word
meanings by checking its surrounding context. Word2Vec utilizes two methods
1. Continuous
bag of words: predicts the current word given context words within a
specific window.
2. Skip
gram: predicts the surrounding context words within a specific window
given the current word.
Dimensionality Reduction
We live
today in a world surrounded by massive amounts of data, data that needs to
processes, analyzed, and stored. Data with a large number of features, if we
are talking about image data, then it a high-resolution image or video, which
then translates to huge matrices of numbers. Dealing with big matrices of
numbers can be a challenging task even for supercomputers, that’s why we
sometimes reduce the original data to a smaller subset of features that are the
most relevant to our application.
The most
famous approach to reduce the dimensions of data is called Singular-Value
Decomposition (SVD). SVD is a matrix decomposition method used for reducing
a matrix to its essential parts to make matrix calculations simpler. SVD is the
basic concept of the Latent Semantic Analysis (LSA) technique. LSA is a
technique used in topic modeling, which is an unsupervised ML
technique used to match words to topics across various text documents. In NLP,
topics are represented as clusters of synonyms — related words. A topic model
analyzes the various topics, their distributions in each document, and the
frequency of different words they contain.
Steps of Topic Modeling:
1. Create
a matrix representation of a text document.
2. Use
SVD to break the text matrix into three sub-matrices: Document-Topic matrix,
Topic Importance Diagonal Matrix, and Topic-term matrix.
3. Reduce
the dimensions of the matrices based on the importance of the topics.
These
smaller matrices are then used to match words to the topic and create a
distribution of words and topics.
No comments:
Post a Comment