Awk

De Linuxmemo.

awk 'instructions' files
awk -f script files

Awk, in the usual case, interprets each input line as a record and each word on that line, delimited byspaces or tabs, as a field. (These defaults can be changed.) One or more consecutive spaces or tabs count as a single delimiter. Awk allows you to reference these fields, in either patterns or procedures. $0 represents the entire input line. $1, $2, ... refer to the individual fields on the input line.

awk '{ print $1 }' list
John
Alice
Orville

Print the first word of each line containing the string "MA". We can say "word" because by default awk separates the input into fields using either spaces or tabs as the field separator " ".

awk '/MA/ { print $1 }' list
John
Eric
Sal

we use the -F option to change the field separator to a comma.

awk -F, '/MA/ { print $1 }' list
John Daggett
Eric Adams
Sal Carpenter

Multiple commands are separated by semicolons.

awk -F, '{ print $1; print $2; print $3 }' list

[modifier] Variables

Note that we don't have to assign to a variable before using it (because awk variables are initialized to the empty string).

[modifier] Special Characters Usage

.      Matches any single character except newline. In awk, dot can match newline also.
*      Matches any number (including zero) of the single character (including a character specified by a regular expression) that immediately precedes it.
[...]   Matches any one of the class of characters enclosed between the brackets. 
A circumflex (^) as first character inside brackets reverses the match to all characters except newline and those listed in the class. In awk, newline will also match.
A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in class is a member of the class. All other metacharacters lose their meaning when specified as members  of a class.
^     First character of regular expression, matches the beginning of the line. Matches the beginning of a string in awk, even if the string contains embedded newlines.
$     As last character of regular expression, matches the end of the line. Matches the end of a string in awk, even if the string contains embedded newlines.
\{n,m\}     Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. \{n\} will match exactly n occurrences, \{n,\} will match at least n occurrences, and \{n,m\} will match any number of occurrences between n and m. (sed and grep only, may not be in some very old versions.)
\      Escapes the special character that follows. Extended Metacharacters (egrep and awk) SpecialCharacters Usage
+     Matches one or more occurrences of the preceding regular expression.
?     Matches zero or one occurrences of the preceding regular expression.
|     Specifies that either the preceding or following regular expression can be matched (alternation).
()     Groups regular expressions.
{n,m}     Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. {n} will match exactly n occurrences, {n,} will match at least n occurrences, and {n,m} will match any number of occurrences between n and m. (POSIX egrep and POSIX awk, not in traditional egrep or awk.)

POSIX Character Classes - Matching Characters

[:alnum:] Printable characters (includes whitespace)
[:alpha:] Alphabetic characters
[:blank:] Space and tab characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:] Printable and visible (non-space) characters[:lower:]
[:print:] Lowercase characters
[:punct:] Alphanumeric characters
[:space:] Punctuation characters
[:upper:] Uppercase characters
[:xdigit:] Hexadecimal digits

[modifier] Astuces

  • Remove duplicate lines:
awk '!a[$0]++'
Outils personnels