Table of content:


Maths are the foundation of science

Mathematics are the foundation of science, you may need to review below areas if you really want to be an expert of a data scientist. There are so many easy to use packages, libraries, examples to help us on machine learning, natural language processing development. The real question is: what is the right model / algorithm I have to choose? How do you know the algorithm, model fit into your questions?

The answer is to know about the assumption of each model and algorithm. The knowledge about math in below areas will definitely give you a hand to fully understand those assumptions. You will not only know how but also why you choose the model to deal with your own questions.

There are suggesting topics you may need know now (or maybe in the future :) ).

1. Calculus

1-0. Reference material: Machine Learning Cheatsheet - Calculus

1-1. Limits

The foundation of differential and integral calculus.
1-2. Taylor Series

Taylor series is a method to approximate a function by polynomial. In some of machine learning algorithms, we may need exponential function as target function due to its nice properties (can be differential in everywhere). Due to the complexities of differential calculus on exponential function, we can use Taylor series to approaching the similar result base on specific point. When you understand the background technique of those algorithms, you will understand why learning rate has to be a small number (or steps).

1-3. Differential calculus

How to get the optimal(minimum or maximum) value in a function? What is Gradient Decent optimizer? It’s all about differential calculus.

1-4. Integral calculus.

Probabilistic modeling is one of most important models in machine learning. Integral calculus help us to get the expectation of our models.

2. Probability and Statistics

2-1. Probabilities and Expectations

Gaussian model, Bayes theory, Naive Bayes, Markov Chain, Hidden Markov Model, Viterbi Algorithm…etc. All of those models related to probability and statistics.

2-1-1. Review of Probability Theory at Stanford CS229 machine learning

2-1-2. Distributions and Tests

You will need these tools to make sure your data distribution is the same as your assumption.

3. Linear Algebra / Discrete Mathematics

3-1 Linear Algebra Review and Reference at Stanford CS229 machine learning 3-2 Deep Learning Book Series · Introduction

Most of machine learning algorithm involved high dimensional computations. A single dimensional array is a vector, a two dimensional array calls a matrix, a three or higher dimension array calls tensor. The linear algebra helps us to efficiently calculate high dimensional operations in an easy form. GPU is designed to perform 3D computer graphics and its hardware also help on deep learning high dimensional computation.