In Short
- Consists of a sequence of atoms (an atom is a literal - letter, digit, special character)
- Letters are case sensitive
- You can test regexes in online tools like: https://regex101.com/
Quantifiers
The quantifier tells how many occurrences of a given atom there can be. A quantifier refers to the element to the left of the quantifier. If there is no quantifier next to an atom, it means that the atom will have one occurrence (in the examples below, atom b
always has one occurrence.
*
- zero or more occurrencese.g.
a*b
can translate into, among others, the following expressions:b
,ab
,aab
,aaaaaaab
, …+
- one or more occurrencese.g.
a+b
can translate into, among others, the following expressions:ab
,aab
,aaaaab
, …?
- zero or one occurrencee.g.
a?b
translates tob
,ab
{min,max}
- At leastmin
occurrences and maximummax
occurrencese.g.
a{2,4}b
translates into the expressions:aab
,aaab
,aaaab
{min,}
- At leastmin
occurrences. The maximum can be infinite.{,max}
- There is no minimum number of occurrences, and the maximum can bemax
occurrences{n}
- the exact number of occurrences equal ton
e.g.
a{4}b
- translates into:aaaab
Ranges []
Range means that the expression can be one letter/number from a range. The dash -
denotes a range from, to.
[abc]
- means that the expression will bea
orb
orc
(only one letter)[a-bB-Z]
- the expression can bea
,b
,B
,C
,D
, …,Z
[a-Z]
- the expression can be any English character (uppercase or lowercase)[0-9]
- the expression can be any digit[a-ZąćęłńóśźżĄĘŁŃÓŚŹŻ]
- range of all Polish characters.
Groups ()
(ab){2}
- means the expressionabab
#Flags
(?i)
- ignore the case of letters to the right of this character
Special signs
.
- any character$
- end of line (if we use this character at the end, it means that there is no character after the searched expression.^
- beginning of the line (if we use this character at the beginning, it means that there is no character before the searched expression.[^e]
- negation - the expression will not contain the lettere
.|
- the sign means logical or, i.e. the expression will be any expression separated by the|
signe.g.
a|b
- meansa
orb
\s
- Space, tab or newline\S
- A character that is the negation of\s
, i.e. a character that is not a space, tab or newline character\w
- letter, digit or character_
(equivalent to[a-Z_]
\W
- a character that is the negation of\w
, i.e. a character that is not a letter, digit or_
\d
- digit (equivalent to[0-9]
)\D
- a character that is the negation of\d
, i.e. a character that is not a digit\b
- Any whitespace character, the beginning of a string, the end of a string, and any character that is neither a letter nor a digit.
If we want to use one of the special characters in the regular expression such as .
, *
, /
, ?
, :
, .
, ^
, +
, \
, =
, |
, then we precede it with the \
character, e.g. \.