Automata & formal languages
A formal language is a set of strings over an alphabet, and an automaton is an abstract machine that decides which strings belong — the foundation for parsing, regex, and what “computable” even means.
Why it matters
This is where computation gets a precise definition. Regular expressions, the lexers in every compiler, protocol validators, and search engines all rest on automata theory. The Chomsky hierarchy tells you, before you write a line of code, whether a problem is even solvable with the tool you reached for — e.g. why you cannot match balanced parentheses with a true regular expression.
How it works
The classic ladder of language classes, each strictly more powerful, with a matching machine:
- Regular — recognized by finite automata (
DFA/NFA); exactly what regular expressions describe. Finite memory only, so no counting. - Context-free — recognized by pushdown automata (a finite machine plus a stack); generated by
grammars like
S -> aSb | ε. The stack lets them balance nesting. - Context-sensitive and recursively enumerable — top of the ladder, recognized by the Turing machine, the model of general computation.
A DFA is a directed graph of states: an edge per input symbol, one start state, a set of
accepting states. An NFA allows multiple/ε transitions but recognizes the same languages — you can
always determinize it via subset construction. The pumping lemma is the standard tool for proving
a language is not regular (or not context-free).
Example
A DFA over {a, b} accepting strings with an even number of as — two states, even and odd:
state even: on a -> odd; on b -> even (accepting)
state odd: on a -> even; on b -> odd
start = even
Feed abba: even -> odd -> odd -> odd -> even → accept. A trie is a closely related
structure: a DFA-shaped tree that recognizes a finite set of words.
Pitfalls
- Asking regex to count or balance — matching
a^n b^nor nested brackets needs a stack; that is context-free, beyond true regular languages. (Many “regex” engines cheat with extensions.) - Confusing
NFApower withDFApower — nondeterminism adds convenience, not expressiveness; both recognize exactly the regular languages. - Forgetting the pumping lemma proves a negative only — passing it does not prove a language is regular.