Wednesday 22 June 2022

MathWithNaziaa : Mathematics in Data Science

 The big three in data science

When you Google for the math requirements for data science, the three topics that consistently come up are calculus, linear algebra, and statistics. The good news is that — for most data science positions — the only kind of math you need to become intimately familiar with is statistics.

Calculus

Multivariate calculus is used for gradient descent and in algorithm training. You will study derivatives, curvature, divergence, and quadratic approximations.  

For many people with traumatic experiences of mathematics from high school or college, the thought that they’ll have to re-learn calculus is a real obstacle to becoming a data scientist.

In practice, while many elements of data science depend on calculus, you may not need to (re)learn as much as you might expect. For most data scientists, it’s only really important to understand the principles of calculus, and how those principles might affect your models. 

If you understand that the derivative of a function returns its rate of change, for example, then it’ll make sense that the rate of change trends toward zero as the graph of the function flattens out. 

 

That, in turn, will allow you to understand how a gradient descent works by finding a local minima for a function. And it’ll also make it clear that a traditional gradient descent only works well for functions with a single minima. If you have multiple minima (or saddle points), a gradient descent might find a local minima without finding the global minima unless you start from multiple points. 

 

Now, if it’s been a while since you did high school math, the last few sentences might sound a little dense. But the good news is that you can learn all of these principles in under an hour (look out for a future article on the topic!). And it’s way less difficult than being able to algebraically solve a differential equation, which (as a practicing data scientist) you’ll probably never have to do — that’s what we have computers and numerical approximations for!

________________________________________

Interested in learning data science? Flatiron School’s Data Science program teaches you all the skills you need to start a career as a data scientist. Then we help you find a job and start your career.

________________________________________

Linear algebra

Knowing how to build linear equations is a critical component of machine learning algorithm development. You will use these to examine and observe data sets. For machine learning, linear algebra is used in loss functions, regularization, covariance matrices, and support vector machine classification.


If you’re doing data science, your computer is going to be using linear algebra to perform many of the required calculations efficiently. If you perform a Principal Component Analysis to reduce the dimensionality of your data, you’ll be using linear algebra. If you’re working with neural networks, the representation and processing of the network is also going to be performed using linear algebra. In fact, it’s hard to think of many models that aren’t implemented using linear algebra under the hood for the calculations.

At the same time, it’s very unlikely that you’re going to be handwriting code to apply transformations to matrices when applying existing models to your particular data set. So, again, understanding of the principles will be important, but you don’t need to be a linear algebra guru to model most problems effectively.

Probability and statistics

The bad news is that this is a domain you’re really going to have to learn. And if you don’t have a strong background in probability and statistics, learning enough to become a practicing data scientist is going to take a significant chunk of time. The good news is that there is no single concept in this field that’s super difficult — you just need to take the time to really internalize the basics and then build from there.

This is essential in machine learning when working with classifications such as logistic regression, discrimination analysis—and hypothesis testing and distributions. 

This is critical for hypothesis testing and distributions such as Gaussian distribution and probability density function. 

Even more math

There are lots of other types of math that may also help you when thinking about how to solve a data science problem. They include:

Discrete math

This isn’t math that won’t blab. Rather, it’s mathematics dealing with numbers with finite precision. In continuous math, you are often working with functions that could (at least theoretically) be calculated for any possible set of values and with any necessary degree of precision.

As soon as you start to use computers for math, you’re in the world of discrete mathematics because each number only has so many “bits” available to represent it. There are a number of principles from discrete math that will both serve as constraints and inspiration for approaches to solving problems.

Graph theory

Certain classes of problems can be solved using graph theory. Whether you’re looking to optimize routes for a shipping system or building a fraud detection system, a graph-based approach will sometimes outperform other solutions.

Information theory

You’re going to bump up along the edges of information theory pretty often while learning data science. Whether you’re optimizing the information gain when building a decision tree or maximizing the information retained using Principal Component Analysis, information theory is at the heart of many optimizations used for data science models.

Mathematics is an integral part of data science. Any practicing data scientist or person interested in building a career in data science will need to have a strong background in specific mathematical fields. 

Depending on your career choice as a data scientist, you will need at least a B.A., M.A., or Ph.D. degree to qualify for hire at most organizations. A significant portion of your ability to translate your data science skills into real-world scenarios depends on your success and understanding of mathematics.

Data science careers require mathematical study because machine learning algorithms, and performing analyses and discovering insights from data require math. While math will not be the only requirement for your educational and career path in data science, but it’s often one of the most important. Identifying and understanding business challenges and translating them into mathematical ones is widely considered one of the most important steps in a data scientist’s workflow.

Will you be a data scientist, machine learning engineer, business intelligence developer, data architect, or another industry specialist? Maybe you don’t yet know the exact path you will take in your data science career. But take a look at the various types of mathematical requirements and what they are used for in data science. You will have a better understanding of your skills and interests and can ultimately better pursue your choice of mathematical education. 

Let’s start by taking a look at the different types of math used in data science so that you have a better idea of what you really need to know when it comes to mathematics and your data science career.


No comments:

Post a Comment