Data Analyst

roadmap.sh: https://roadmap.sh/data-analyst

Suggested path through the Data Analyst nodes. Each node links to its lesson when written.

Nodes

Introduction

  • What is Data Analysis?
  • Data Analyst vs Data Scientist vs Data Engineer
  • The data analysis lifecycle
  • Types of data analysis (descriptive, diagnostic, predictive, prescriptive)
  • Data-driven decision making
  • Common tools overview

Data Literacy & Fundamentals

  • Structured vs unstructured data
  • Qualitative vs quantitative data
  • Data types & formats (CSV, JSON, Parquet)
  • Data sources (databases, APIs, files, web scraping)
  • Metadata
  • Data quality dimensions

Spreadsheets

  • Excel / Google Sheets fundamentals
  • Formulas & functions
  • Pivot tables
  • Lookup functions (VLOOKUP / XLOOKUP / INDEX-MATCH)
  • Conditional formatting
  • Charts in spreadsheets

SQL

  • Relational database basics
  • SELECT, WHERE, ORDER BY
  • Joins (INNER, LEFT, RIGHT, FULL)
  • Aggregations & GROUP BY
  • HAVING
  • Subqueries
  • Common Table Expressions (CTEs)
  • Window functions
  • CASE statements
  • Set operations (UNION / INTERSECT / EXCEPT)
  • Indexes & query performance

Programming for Analysis

  • Python for data analysis
  • NumPy
  • Pandas
  • R & tidyverse
  • Jupyter notebooks
  • Virtual environments & package management

Data Collection & Wrangling

  • Data importing & loading
  • Data cleaning
  • Handling missing values
  • Removing duplicates
  • Outlier detection & treatment
  • Data transformation & normalization
  • Feature engineering basics
  • Joining & merging datasets
  • Reshaping (pivot / melt)
  • ETL / ELT concepts

Statistics & Probability

  • Descriptive statistics (mean, median, mode)
  • Measures of dispersion (variance, std dev)
  • Probability distributions
  • Correlation vs causation
  • Hypothesis testing
  • p-values & significance
  • Confidence intervals
  • A/B testing
  • Regression analysis
  • Sampling methods

Exploratory Data Analysis (EDA)

  • Univariate analysis
  • Bivariate & multivariate analysis
  • Distribution analysis
  • Identifying trends & patterns
  • Summary statistics
  • Correlation matrices

Data Visualization

  • Principles of effective visualization
  • Choosing the right chart type
  • Matplotlib
  • Seaborn
  • Plotly
  • Storytelling with data
  • Color, labeling & accessibility
  • Avoiding misleading charts

BI & Dashboarding Tools

  • Tableau
  • Power BI
  • Looker / Looker Studio
  • Metabase
  • Building interactive dashboards
  • KPIs & metrics design

Communication & Soft Skills

  • Translating business questions into data questions
  • Reporting & presentations
  • Stakeholder communication
  • Data storytelling
  • Documentation

Advanced & Next Steps

  • Big data tools (Spark, BigQuery)
  • Introduction to machine learning
  • Version control with Git
  • Data ethics & privacy (GDPR)
  • Building a portfolio
  • Reproducible analysis

Resources

See resources.md.

Project ideas

  • Take a messy public dataset (e.g. NYC taxi trips), clean it in Pandas, run EDA, and write up three insights with supporting visualizations.
  • Build an end-to-end sales dashboard in Power BI or Looker Studio on top of a SQL database, with KPIs, filters, and a drill-down view.
  • Design and analyze a simulated A/B test: define a hypothesis, compute significance, and present a go/no-go recommendation to a “stakeholder”.

1 item under this folder.