Regular languages #
Definitions #
Alphabet #
Definition.
An alphabet is a finite set of characters.
Examples.
{
a
} is a alphabet,{
a
,b
} is another alphabet,{
a
,b
,3
,5
,\n
} is yet another alphabet.
Warning. A blank space is is a character.
The letter $\Sigma$ is often used to refer to an alphabet.
Word #
Definition. A word over an alphabet $\Sigma$ is a (possibly empty) finite sequence of character in $\Sigma$.
The empty word (a.k.a. “empty string”) is traditionally denoted with $\varepsilon$.
Examples. Let $\Sigma$ be the alphabet {
a
,b
,3
,5
,\n
}. Then possible words over $\Sigma$ are:
- $\varepsilon$
a
aa
ba\n
\n\nba353\n\nb
Language #
Definition. A formal language over an alphabet $\Sigma$ is a (possibly infinite) set of words over $\Sigma$.
Examples. Let $\Sigma$ be the alphabet {
a
,b
}.Then possible languages over $\Sigma$ are:
- { }
- {
a
}- {
a
,aa
,aaa
,b
,aba
}- { $\varepsilon$,
a
,ba
}- the set of all words that contain only the character
a
- the set of all words over $\Sigma$
- the set of all words over $\Sigma$ that do not contain 2 consecutive
b
’s- the set of all words over $\Sigma$ that contain as many
a
’s asb
’s
Operations on languages #
Element-wise concatenation #
If $L_1$ and $L_2$ are languages, then we use $L_1 \circ L_2$ for the language that consists of the element-wise concatenation of all words in $L_1$ and $L_2$ (in that order).
Examples.
If $L_1$ = {
ab
,c
} and $L_2$ = {d
,ef
}, then $L_1 \circ L_2$ = {abd
,abef
,cd
,cef
}If $L_3$ = {
ab
,b
} and $L_4$ = { $\varepsilon$,ac
}, then $L_3 \circ L_4$ = {ab
,abac
,b
,bac
}
Closure #
If $L$ is a language, then the closure (also called Kleene closure) $L^*$ of $L$ is the language
$L^*$ = { $\varepsilon$ } $\cup\ L \cup\ (L \circ L) \cup\ (L \circ L \circ L) \cup (L \circ L \circ L \circ L) \cup …$
Equivalently, $L^*$ can be defined as:
- the set that consists of $L$, the empty word $\varepsilon$, and all possible concatenations of words in $L$.
- the smallest superset of $L\ \cup$ { $\varepsilon$ } that is closed under word concatenation (meaning that if $L^*$ contains two words $w_1$ and $w_2$, then it also contains the word $w_1 w_2$).
Regular language #
Definition. A language is regular if it is:
- the empty set, or
- the language { $\varepsilon$ }, or
- the language {
a
} for some charactera
, or- the union of two regular languages, or
- the element-wise concatenation of two regular languages, or
- the closure of a regular language.
Examples
- $L_1$ = {
a
} is regular- $L_2$ = {
b
} is regular- $L_3$ = $L_1 \cup L_2$ = {
a
,b
} is regular- $L_4 = L_1 \circ L_3$ = {
aa
,ab
} is regular- $(L_4)^*$ = { $\varepsilon$,
aa
,ab
,aaaa
,aaab
,abaa
,abab
,aaaaaa
,aaaaab
, … } is regular
Observation. If $\Sigma$ is an alphabet, then the language $\Sigma^*$ of all possible words over $\Sigma$ is regular.
Counter-examples.
If $\Sigma$ = {
a
,b
}, then the set of all words over $\Sigma$ that contain as manya
’s asb
’s is not regular.If $\Sigma$ = {
0
,1
,2
,3
,4
,5
,6
,7
,8
,9
} $\cup$ {+
,-
,(
,)
} , then the set of well-formed algebraic expressions over $\Sigma$ (in particular, where each opening parenthesis is closed) is not regular.More generally, most programming languages are not regular.
Properties #
- The intersection of two regular languages is a regular language.
- The difference between two regular languages is a regular language.
Regular expressions #
A language is regular iff it can be described by a regular expression.
Regular expressions are covered in the dedicated chapter.
Finite automata #
Regular languages have a close connection to so-called finite automata.
A finite automaton can be viewed as a simple program that recognizes a regular language $L$, meaning that the automaton takes as input a word $w$ and returns true iff $w$ belongs to $L$.
Finite automata are one of the simplest (theoretical) models of computation. They (approximately) correspond to computers that cannot use memory.
Automata theory is beyond the scope of this course.