Data Analyst
roadmap.sh: https://roadmap.sh/data-analyst
Suggested path through the Data Analyst nodes. Each node links to its lesson when written.
Nodes
Introduction
- What is Data Analysis?
- Data Analyst vs Data Scientist vs Data Engineer
- The data analysis lifecycle
- Types of data analysis (descriptive, diagnostic, predictive, prescriptive)
- Data-driven decision making
- Common tools overview
Data Literacy & Fundamentals
- Structured vs unstructured data
- Qualitative vs quantitative data
- Data types & formats (CSV, JSON, Parquet)
- Data sources (databases, APIs, files, web scraping)
- Metadata
- Data quality dimensions
Spreadsheets
- Excel / Google Sheets fundamentals
- Formulas & functions
- Pivot tables
- Lookup functions (VLOOKUP / XLOOKUP / INDEX-MATCH)
- Conditional formatting
- Charts in spreadsheets
SQL
- Relational database basics
- SELECT, WHERE, ORDER BY
- Joins (INNER, LEFT, RIGHT, FULL)
- Aggregations & GROUP BY
- HAVING
- Subqueries
- Common Table Expressions (CTEs)
- Window functions
- CASE statements
- Set operations (UNION / INTERSECT / EXCEPT)
- Indexes & query performance
Programming for Analysis
- Python for data analysis
- NumPy
- Pandas
- R & tidyverse
- Jupyter notebooks
- Virtual environments & package management
Data Collection & Wrangling
- Data importing & loading
- Data cleaning
- Handling missing values
- Removing duplicates
- Outlier detection & treatment
- Data transformation & normalization
- Feature engineering basics
- Joining & merging datasets
- Reshaping (pivot / melt)
- ETL / ELT concepts
Statistics & Probability
- Descriptive statistics (mean, median, mode)
- Measures of dispersion (variance, std dev)
- Probability distributions
- Correlation vs causation
- Hypothesis testing
- p-values & significance
- Confidence intervals
- A/B testing
- Regression analysis
- Sampling methods
Exploratory Data Analysis (EDA)
- Univariate analysis
- Bivariate & multivariate analysis
- Distribution analysis
- Identifying trends & patterns
- Summary statistics
- Correlation matrices
Data Visualization
- Principles of effective visualization
- Choosing the right chart type
- Matplotlib
- Seaborn
- Plotly
- Storytelling with data
- Color, labeling & accessibility
- Avoiding misleading charts
BI & Dashboarding Tools
- Tableau
- Power BI
- Looker / Looker Studio
- Metabase
- Building interactive dashboards
- KPIs & metrics design
Communication & Soft Skills
- Translating business questions into data questions
- Reporting & presentations
- Stakeholder communication
- Data storytelling
- Documentation
Advanced & Next Steps
- Big data tools (Spark, BigQuery)
- Introduction to machine learning
- Version control with Git
- Data ethics & privacy (GDPR)
- Building a portfolio
- Reproducible analysis
Resources
See resources.md.
Project ideas
- Take a messy public dataset (e.g. NYC taxi trips), clean it in Pandas, run EDA, and write up three insights with supporting visualizations.
- Build an end-to-end sales dashboard in Power BI or Looker Studio on top of a SQL database, with KPIs, filters, and a drill-down view.
- Design and analyze a simulated A/B test: define a hypothesis, compute significance, and present a go/no-go recommendation to a “stakeholder”.