This repository contains examples of popular machine learning algorithms implemented in Python with mathematics behind them being explained.

For Octave/MatLab version of Machine Learning algorithms please check Machine Learning Course in Octave / MatLab repository.

For Deep Learning algorithms please check Deep Learning repository.

For Natural Language Processing (NLU = NLP + NLG) please check Natural Language Processing repository.

For Computer Vision please check Computer Vision repository.

Machine Learning Map
Courses
Online Courses (Coursera, Udacity, Edx, DataCamp, etc.)
YouTube Videos
Books
Conferences
- International
- North America
- Europe
- Ukraine
Websites
GitHub Repositories :octocat:
Awesome List :octocat:
Other
Big Data
Neural Networks
Reinforcement Learning
- Books
- Classes
- Task / Tutorials
- GitHub Repositories
Mathematics for AI, ML, DL, CV
- Linear Algebra
- Theory of Probability and Mathematical Statistics
- Bayesian Statistics
- Causal Inference
Algorithms
Machine Learning System Design
Python, IPython, Scikit-learn etc.
Code editors
JavaScript-libraries for visualizing
R
LaTeX
Open Datasets list
Reddit
Social Networks (chanels, chats, groups, etc.)
Implementation
- Supervised Learning
  - Classification
    - K Nearest Neighbor, K-NN
    - Logistic Regression
  - Regression
    - Linear Regression
    - Polynomial Regression
- Weak Supervision
- Active Learning
- Unsupervised Learning
  - Clustering
    - K-Means
- Neural Networks
  - CNN
  - FCNN
  - GRU
What’s is the difference between train, validation and test set, in neural networks?
Projects
- Spam Detection
- Text Generator
- Quora Insincere Questions Classification
- Question Answering System using BiDAF Model on SQuAD

Machine Learning Map

🎓 Courses

MIT OpenCourseWare
- MIT OpenCourseWare on YouTube
Top 50 FREE Artificial Intelligence, Computer Science, Engineering and Programming Courses from the Ivy League Universities
This is The Entire Computer Science Curriculum in 1000 YouTube Videos
Курс Машинное обучение. Воронцов
Курс “Машинное обучение” на ФКН ВШЭ, Евгений Соколов
Data Mining
Deep Learning LUN Cources
Тонна разнообразных курсов по программированию, алгоритмам, в том числе 14 курсов по ML
Machine Learning Foundations

Machine Learning Foundations: Linear Algebra, Calculus, Statistics & Computer Science
[ ]

🔹 Introductory Lectures:

These are great courses to get started in machine learning and AI. No prior experience in ML and AI is needed. You should have some knowledge of linear algebra, introductory calculus and probability. Some programming experience is also recommended.

Machine Learning (Stanford CS229)
- Course website
- This modern classic of machine learning courses is a great starting point to understand the concepts and techniques of machine learning. The course covers many widely used techniques, The lecture notes are detailed and review necessary mathematical concepts.
CS 329S: Machine Learning Systems Design by Stanford, Winter 2021
- This course aims to provide an iterative framework for designing real-world machine learning systems. The goal of this framework is to build a system that is deployable, reliable, and scalable.
- Machine Learning Interviews
Convolutional Neural Networks for Visual Recognition (Stanford CS231n)
- Course website
- (:octocat: repo on github) — отличный десятинедельный курс по нейросетям и компьютерному зрению.
- A great way to start with deep learning. The course focuses on convolutional neural networks and computer vision, but also gives an overview on recurrent networks and reinforcement learning.
Introduction to Artificial Intelligence (UC Berkeley CS188)
- Course website
- Covers the whole field of AI. From search methods, game trees and machine learning to Bayesian networks and reinforcement learning.
Applied Machine Learning 2020 (Columbia)
- Alternative to Stanford CS229. As the name implies, this course takes a more applied perspective than Andrew Ng’s machine learning lecture at Stanford. You will see more code than mathematics. Concepts and algorithms are using the popular Python libraries scikit-learn and Keras.
Introduction to Reinforcement learning with David Silver (DeepMind)
- UCL Course on Reinforcement Learning by David Silver
- Course website
- Introduction to reinforcement learning by one of the leading researchers behind AlphaGo and AlphaZero.
Introduction to Deep Learning (MIT 6.S191 )
- MIT’s official introductory course on deep learning methods with applications in medicine, and more!
Natural Language Processing with Deep Learning (Stanford CS224N)
- Course website
- Modern NLP techniques from recurrent neural networks and word embeddings to transformers and self-attention. Covers applied topics like questions answering and text generation.
Machine Learning at MIPT
- This course aims to introduce students to modern state of Machine Learning and Artificial Intelligence. It is designed to take one year (two terms at MIPT) - approximately 2 * 15 lectures and seminars.
[ ]

🔸 Advanced Lectures:

Advanced courses that require prior knowledge in machine learning and AI.

🔹 Online Courses

🟥 YouTube

TensorFlow: Coding TensorFlow playlist
Google Cloud Platform: AI Adventures
3Blue1Brown channel
Grammarly AI-NLP Club playlist
Lviv Data Science Summer School 2020 lectures
Samsung AI Innovation Campus - Russia
Machine Learning University, at GitHub

📚 Books

Conferences

International

North America

MAIS, Montreal AI Symposium
Computer Vision:
Nalural Language Processing:
- NAACLP, North American Chapter of the Association for Computational Linguistics

Europe

Ukraine

▶️ Websites

Talking Machines
Made with ML - Join 20K+ developers in learning how to responsibly deliver value with applied ML.
Laconic Machine Learning
Towards AI
- Tutorials
  - AI-related tutorials.
[ ]

:octocat: GitHub Repositories

Title	Description, Information
Top-down learning path: Machine Learning for Software Engineers
100-Days-Of-ML-Code
ml-course-msu	Репозиторий с конспектами, кодом и прочими материалами к семинарам по машинному обучению ВМК МГУ
100-best-github-machine-learning
awesome-machine-learning
trekhleb, homemade-machine-learning	Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
trekhleb, machine-learning-experiments	Interactive Machine Learning experiments: models training + models demo
trekhleb, machine-learning-octave	MatLab/Octave examples of popular machine learning algorithms with code examples and mathematics being explained
Machine Learning Notebooks	A collection of Machine Learning fundamentals and useful python notebooks by Diego Inácio
Open Source Society University’s Data Science course	This is a solid path for those of you who want to complete a Data Science course on your own time, for free, with courses from the best universities in the World
data-science-blogs
Dive into Machine Learning	(:octocat: repo on github) with Python Jupyter notebook and scikit-learn
	Рекомендации от преподавателей курса «Математика и Python» и специализации
Литература для поступления в ШАД
Machine learning cheat sheet	- soulmachine (2015)
Probabilistic Programming and Bayesian Methods for Hackers	(free)
ml-surveys	Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.
Machine_Learning_and_Deep_Learning	Getting started with Machine Learning and Deep Learning
MachineLearning_DeepLearning	Share about Machine Learning and Deep Learning
Machine Learning Guide	A guide covering Machine Learning including the applications, libraries and tools that will make you better and more efficient with Machine Learning development.

Awesome List

📌 Other

Big Data

Books:
- Big Data Fundamentals: Concepts, Drivers & Techniques by Thomas Erl, Wajid Khattak
- Big Data: Principles and best practices of scalable realtime data systems by Nathan Marz, James Warren
Courses:
- Coursera:
  - Big Data Specialization
Data Engineer VS Data Scientist:
- Data Scientist vs Data Engineer
- Data Engineer VS Data Scientist

Neural Networks

Reinforcement Learning

Books:
- Reinforcement Learning: An Introduction (2nd Edition)
Classes:
- David Silver’s Reinforcement Learning Course (UCL, 2015)
  - Introduction to Reinforcement learning with David Silver (DeepMind)
  - Course website
  - Introduction to reinforcement learning by one of the leading researchers behind AlphaGo and AlphaZero.
- CS294 - Deep Reinforcement Learning (Berkeley, Fall 2015)
- CS 8803 - Reinforcement Learning (Georgia Tech)
- CS885 - Reinforcement Learning (UWaterloo), Spring 2018
- CS294-112 - Deep Reinforcement Learning (UC Berkeley)
Task / Tutorials:
GitHub Repositories:
- dennybritz/reinforcement-learning, Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton’s Book and David Silver’s course.
Deep Reinforcment Learning, Decision Making and Control (UC Berkeley CS285)
New Directions in Reinforcement Learning and Control (Institure for Advanced Study)

Mathematics for AI, ML, DL, CV

Linear Algebra

Theory of Probability and Mathematical Statistics

Bayesian Statistics

Files from lecture:
- Code

Bayesian statistics and related books:

C.P. Robert: The Bayesian choice (advanced)
Gelman, Carlin, Stern, Rubin: Bayesian data analysis (nice easy older book)
Congdon: Applied Bayesian modelling; Bayesian statistical modelling (relatively nice books for references)
Casella, Robert: Introducing Monte Carlo methods with R (nice book about MCMC)
Robert, Casella: Monte Carlo Statistical Methods
some parts of Bishop: Pattern recognition and machine learning (very nice book for engineers)
Puppy book from Kruschke

Causal Inference

Correlation does not imply causation

More online lectures, courses, papers, books, etc. on Causality:

Coursera:
- A Crash Course in Causality: Inferring Causal Effects from Observational Data
Powerful Concepts in Social Science playlists, Duke
4 lectures on causality by J.Peters (8 h), MIT Statistics and Data Science Center, 2017
Causality tutorial by D.Janzing and S.Weichwald (4 h), Conference on Cognitive Computational Neuroscience 2019
Course on causality by S.Bauer and B.Schölkopf (3 h), Machine Learning Summer School 2020
Course on causality by D.Janzing and B.Schölkopf (3 h), Machine Learning Summer School 2013
Causal Inference 3: Counterfactuals
Causality for Machine Learning, Bernhard Schölkopf, 2019
Elements of Causal Inference
Causal Structure Learning,Christina Heinze-Deml, Marloes H. Maathuis, Nicolai Meinshausen, 2017
Causal inference in statistics: An overview, 2009
JUDEA PEARL, MADELYN GLYMOUR, NICHOLAS P. JEWELL CAUSAL INFERENCE IN STATISTICS: A PRIMER
JUDEA PEARL - CAUSALITY, 2nd Edition, 2009
Causation, Prediction, and Search, Second Edition
Learning DAGs with Continuous Optimization
Causality in cognitive neuroscience: concepts, challenges, and distributional robustness
Active Invariant Causal Prediction: Experiment Selection through Stability, Juan L Gamella, Christina Heinze-Deml, 2020
Investigating Causal Relations by Econometric Models and Cross-spectral Methods, 1969
Fast Greedy Equivalence Search (FGES) Algorithm for Continuous Variables
Greedy Fast Causal Inference (GFCI) Algorithm for Continuous Variables
awesome-causality-algorithms

Casual Machine Learning (Papers):

Causal Decision Trees, Jiuyong Li, Saisai Ma, Thuc Duy Le, Lin Liu and Jixue Liu, 2015
Discovery of Causal Rules Using Partial Association, 2012
Causal Inference in Data Science From Prediction to Causation, 2016

Experimental designs for casual learning:

Matching
Incident user design
Active comparator
Instrumental variables estimation
Difference-in-differences
Regression discontinuity design
Modeling

Algorithms

Data Structures and Algorithms – специализация на Coursera
MIT 6.046J Introduction to Algorithms – teaches techniques for the design and analysis of efficient algorithms, emphasizing methods useful in practice
Visualizing Algorithms;
Реализации алгоритмов

Machine Learning System Design

Machine Learning System Design

Deploy Machine Learning Model to Production

API:
- FastAPI
  - How to deploy Machine Learning models as a Microservice using FastAPI
  - Почему Вы должны попробовать FastAPI?

Python, IPython, Scikit-learn etc.

Code editors

PyCharm от JetBrains - серьезная IDE для больших проектов
Spyder – the Scientific PYthon Development EnviRonment. Spyder входит в Анаконду (просто введите spyder в командной строке)
Canopy — scientific and analytic Python deployment with integrated analysis environment (рекомендуют в курсе MITx)
Rodeo — a data science IDE for Python
Jupyter – open source, interactive data science and scientific computing across over 40 programming languages. The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text
nbviewer – renders notebooks available on other websites
Sublime Text 3 - VIM XXI века*;, отлично подходит для python, если использовать вместе с плагинами:
- Package Control - для быстрой и удобной работы с дополнениями
- Git - для работы с git
- Jedi - делает автодополнения для Python более умными и глубокими
- SublimeREPL - запускает Read-eval-print loop в соседней вкладке, удобно для пошаговой отладки кода
- Auto-PEP8 - приводит код в соответствие с каноном стиля pep8
- Python Checker - проверка кода
PyCharm vs Sublime Text – a blog post comparing these two popular development tools and text editors.
PEP 0008 – Style Guide for Python Code.

JavaScript-libraries for visualizing

R

R in Action
Basic Statistics – в этом курсе практические задания на R;
Анализ данных на R в примерах и задачах — видеолекции от Computer Science Center;
Advanced R by Hadley Wickham – онлайн-книга для тех, кто хочет повысить свой навык программирования на R и лучше понять этот язык (в т.ч. для программистов на других языках);
A detailed list of online courses on R;
Machine Learning in R (:octocat: github repo) — Interface to a large number of classification and regression techniques, including machine-readable parameter descriptions;
Лучшие пакеты для машинного обучения в R – статья на Хабрахабре.

LaTeX

Не очень краткое введение в LateX (PDF),
IDE:
- http://www.texstudio.org/,
- https://www.lyx.org/,
Курс по LateX на Coursera от Высшей Школы Экономики*,
https://vk.com/hse.latex – группа курса vk (много полезного и разного).

📑 Open Datasets list

Kaggle
Google’s Public Data Sets
/r/datasets
UCI Machine Learning Repository
awesome-public-datasets – an awesome list of high-quality open datasets in public domains (on-going)

The initial list was provided by Kevyn Collins-Thomson from the University of Michigan School of Information.

Long general-purpose list of datasets:
- https://vincentarelbundock.github.io/Rdatasets/datasets.html
This website has dozens of public datasets - some fun, some a bit, well.. quirky. external link:
- https://rs.io/100-interesting-data-sets-for-statistics/
The Academic Torrents site has a growing number of datasets, including a few text collections that might be of interest (Wikipedia, email, twitter, academic, etc.) for current or future projects.
- http://academictorrents.com/browse.php?cat=6
Google Books n-gram corpus
- External link: http://books.google.com/ngrams
- Dataset: external link: http://aws.amazon.com/datasets/8172056142375670
Common Crawl: • Currently 6 billion Web documents (81 Tb) • Amazon S3 Public Data Set
- http://aws.amazon.com/datasets/41740
- https://commoncrawl.atlassian.net/wiki/display/CRWL/About+the+Data+Set
- Award project using Common Crawl: http://norvigaward.github.io/entries.html
- Python example: http://www.freelancer.com/projects/Python-Data-Processing/Python-script-for-CommonCrawl.html
Business/commercial data Yelp external link:
- http://www.yelp.com/developers/documentation/v2/search_api
- Upcoming Deprecation of Yelp API v2 on June 30, 2018 (Posted by Yelp Jun 28, 2017)
Internet Archive (huge, ever-growing archive of the Web going back to 1990s) external link:
- http://archive.org/help/json.php
WikiData:
- https://www.wikidata.org/wiki/Wikidata:Main_Page
World Food Facts
- http://world.openfoodfacts.org/data
Data USA - a variety of census data
- https://datausa.io/
U.S. Government open data - datasets from 75 agencies and subagencies
- https://data.gov/
NASA data portal - space and earth science
- https://data.nasa.gov/

Deep Learning Russia — Канал сообщества vk.com/deeplearning_ru
ModelOverfit — Канал сообщества modeloverfit
Data Science — Первый новостной канал про data science
Big Data & Machine Learning — Чат по большим данным, обработке и машинному обучению
Data Science Chat — Чат по теме Data Science

What’s is the difference between train, validation and test set, in neural networks?

Training Set: this data set is used to adjust the weights on the neural network.

Validation Set: this data set is used to minimize overfitting. You’re not adjusting the weights of the network with this data set, you’re just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn’t trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set stays the same or decreases, then you’re overfitting your neural network and you should stop training.

The validation data set is a set of data for the function you want to learn, which you are not directly using to train the network. You are training the network with a set of data which you call the training data set. If you are using gradient based algorithm to train the network then the error surface and the gradient at some point will completely depend on the training data set thus the training data set is being directly used to adjust the weights. To make sure you don’t overfit the network you need to input the validation dataset to the network and check if the error is within some range. Because the validation set is not being using directly to adjust the weights of the network, therefore a good error for the validation and also the test set indicates that the network predicts well for the train set examples, also it is expected to perform well when new example are presented to the network which was not used in the training process.

Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

Also, in the case you do not have enough data for a validation set, you can use cross-validation to tune the parameters as well as estimate the test error.

Cross-validation set is used for model selection, for example, select the polynomial model with the least amount of errors for a given parameter set. The test set is then used to report the generalization error on the selected model.

Early stopping is a way to stop training. There are different variations available, the main outline is, both the train and the validation set errors are monitored, the train error decreases at each iteration (backpropagation and brothers) and at first the validation error decreases. The training is stopped at the moment the validation error starts to rise. The weight configuration at this point indicates a model, which predicts the training data well, as well as the data which is not seen by the network . But because the validation data actually affects the weight configuration indirectly to select the weight configuration. This is where the Test set comes in. This set of data is never used in the training process. Once a model is selected based on the validation set, the test set data is applied on the network model and the error for this set is found. This error is a representative of the error which we can expect from absolutely new data for the same problem.

⚙️ Models and Algorithms Implementation:

k Nearest Neighbor
- Code
- Subset of MNIST
Linear Regression
- 📘 Math
- 📘 Polynomial Regression
- 💻 Code
Logistic Regression
- Code
Fully Connected Neural Networks
- Fully connected neural network that recognizes handwriting numbers from MNIST database (Modified National Institute of Standards and Technology database)
- MNIST Database
- Code
Convolutional Neural Network (CNN)
- Code
Gated Recurrent Units (GRU)
- Understanding GRU Networks, Towards Data Science

👩‍💻 Projects:

Spam Detection

⚙️ Methods:
- Naive Bayes spam filtering
- K-Nearest Neighbors algorithm
- Decision Tree learning
- Support Vector Machine (SVM)
- Random Forest
💻 Code
SMS Spam Collection Dataset
Text Generator

Neural Network for generating text based on training txt file using Google Colab. As a base text were used Alice in Wonderland by Lewis Carroll.
Question Answering System using BiDAF Model on SQuAD

Implemented a Bidirectional Attention Flow neural network as a baseline on SQuAD, improving Chris Chute’s model implementation, adding word-character inputs as described in the original paper and improving GauthierDmns’ code.

Constantly updated. Subscribe not to miss anything.

Table of Contents