Machine Learning
roadmap.sh: https://roadmap.sh/machine-learning
Suggested path through the Machine Learning nodes. Each node links to its lesson when written.
Nodes
Math foundations
- Linear algebra
- Vector operations
- Matrix operations
- Determinants & inverse of matrix
- Eigenvalues & diagonalization
- Calculus
- Derivatives & partial derivatives
- Chain rule of derivation
- Gradient, Jacobian, Hessian
- Discrete mathematics
Probability & statistics
- Basics of probability
- Bayes theorem
- Probability distributions
- Descriptive statistics
- Inferential statistics
Programming & tooling
- Basic syntax
- Data structures
- Conditionals
- Functions & built-in functions
- Exceptions
- Essential libraries
- NumPy
- Pandas
- Scikit-learn
- Version control
- APIs
- Databases (SQL / NoSQL)
Data collection & preparation
- Data sources
- Data formats (CSV, JSON, Excel)
- Data loading
- Data cleaning
- Data preparation
- Feature engineering
- Feature selection
- Feature scaling / normalization
- Dimensionality reduction
- Visualization (graphs & charts)
Core ML concepts
- Basic concepts
- Supervised learning
- Unsupervised learning
- Reinforcement learning
- Overfitting
- Underfitting
- Regularization (Lasso, Ridge, ElasticNet)
Supervised algorithms
- Linear regression
- Logistic regression
- Decision trees & random forest
- K-nearest neighbors (KNN)
- Support vector machines (SVM)
- Gradient boosting machines
- XGBoost
Unsupervised algorithms
- Clustering
- Hierarchical clustering
- Dimensionality reduction (PCA)
Model evaluation & selection
- Model evaluation
- Model selection
- Confusion matrix
- Accuracy
- Precision
- Recall
- F1-score
- ROC-AUC
- Log loss
- Mean squared error
- Root mean squared error
- K-fold cross-validation
- LOOCV
- Test-train split
- Validation strategies
Optimization
- Optimization
- Gradient descent
- Stochastic gradient descent (SGD)
Neural networks & deep learning
- Neural networks
- Neural network architectures
- Multilayer perceptron
- Forward propagation
- Back propagation
- Activation functions
- Softmax
- Vanishing gradient
- Deep learning architectures
- Deep learning libraries
- TensorFlow
- Keras
- PyTorch
Computer vision
- Convolutional neural network (CNN)
- Convolution
- Pooling
- Applications of CNNs
- Image classification
- Image segmentation
- Object detection
- Image & video recognition
Sequence models
- Recurrent neural network (RNN)
- LSTM
- GRU
- Attention mechanisms
- Attention models
- Transformers
- Embeddings
- Multimodal learning
NLP
- Natural language processing
- Text processing
- Preprocessing
- Tokenization
- Word tokenization
- Stemming
- Lemmatization
- Word embeddings
- Sentiment analysis
- Qualitative analysis
Generative & advanced models
- Autoencoders
- Variational autoencoders
- Generative adversarial networks (GANs)
- Transfer learning
- Recommendation systems
- Pattern recognition
Reinforcement learning
- Q-learning
- Deep Q-networks
- Actor-critic methods
Production & MLOps
- Production
- Quantization
- Explainable AI
- Uncertainty estimation
Resources
See resources.md.
Project ideas
- House-price regression pipeline — clean a tabular dataset, engineer features, compare linear regression, random forest, and XGBoost with k-fold cross-validation, and report RMSE/R².
- Image classifier with transfer learning — fine-tune a pretrained CNN (e.g. ResNet) on a small custom image dataset in PyTorch or Keras, tracking accuracy and a confusion matrix.
- Sentiment-analysis NLP service — build a text-preprocessing + embedding + classifier pipeline, wrap it in an API, and serve predictions on movie/product reviews.