As with so many Linux tools, the awk command was developed in Bell Labs in the 1970s. It was named after its creators – Alfred Aho, Peter Weinberger, and Brian Kernighan, whose last names’ letters form “AWK”. Awk is a tool for manipulating strings. It can take a text file as an input, read its lines one by one, perform pattern matching, and then take actions based on what it finds. If this sounds familiar, it’s because you’re probably used to “grep”, which sounds like it does something similar. But while grep is great for simple pattern matching, awk takes things a step further. Here’s an introduction to grep that you might find useful.
Why Awk is Different from Grep
Given how common grep is, it seems fitting to begin a discussion by examining what makes awk different. A comparison like this will illustrate better what makes awk special.
1. Awk Can Process Individual Fields
While both awk and grep take inputs line-by-line, only awk can manipulate individual fields. Using the space as the default limiter, you can access each “field” in awk using the built-in variables. This means you can perform calculations or comparisons on text within each line and take appropriate actions.
2. Can Perform Complex Operations
With awk, you can take individual fields, compare them to others, perform calculations, and even use control loops like if, for, and while statements. You can do none of these with grep, which only matches patterns.
3. Built-in Variables
Awk makes extensive use of variables to assist with the processing. For example, there is a special variable for each field in a line. You also have variables indicating the number of fields in each line and the number of records processed so far. With awk, you can also use persistent variables, allowing you to carry forward the results of previous lines. So if you want to add all the lines in a column, for example, you can do so.
4. Customizable Outputs
With grep, you can only print out the lines that match a certain pattern. With awk, on the other hand, you can perform complex output formatting. For example, you can add clarifying information to fields such as labels, you can change the way the fields are displayed by changing the separator, change the alignment of the fields with the labels, and much more.
In other words, you can completely transform the contents of a text file using awk, while with grep, you can only display the matching lines.
Awk Allows for Scripting
While you can only use simple pattern matching with grip, with awk you can script functions and perform more complex manipulations on fields. You don’t need to be restricted only to pattern matching.
In conclusion, the awk command is much more powerful than grep, and while the latter is faster since it’s so simple, there are many situations when awk is the only viable option.
Basic Awk Syntax
The basic awk syntax is this:
awk 'pattern { action }' input-file
Here, “pattern” refers to the pattern that you’re searching for in the file. It’s optional, and if it’s missing, awk simply assumes that it evaluates to “true” for each line. This is a way to feed awk all the lines in a file.
The “action” section refers to what you want to do. This too is optional, and in its absence, awk assumes that you just want to print the whole line.
Here’s an example of an awk command that prints the second field in every line of a text file:
awk '{ print $2 }' input-file
Here’s the output:
Never mind the logic of the contents – I messed it up – but you can see that awk prints the second field in each line. The above command didn’t have a “pattern” to match, so it simply executed the logic for each line.
The reverse is true as well. You can have a pattern without an action, meaning that you can use the awk command much like grep. For example, the following command prints all lines that have the pattern “THIRD” in it:
awk '/THIRD/' test.txt
Here’s the output:
As you can see, by default, awk prints the line where the pattern is found. Of course, if this is the only thing you want to do, then it’s far more efficient to use grep. Awk is meant for more sophisticated use cases.
List of Awk Built-in Variables
One of the strengths of awk is the ability to use in-built variables to manipulate the lines of text. Here is a list of what you can use:
$0:
This is the entire input record, which means the current line that’s being processed by awk. It includes all the fields and the field separators. When you don’t specify an action, the default action by awk is:
{ print $0 }
Which is to print the whole line.
$1, $2…:
These refer to the individual fields in the line. $1 refers to the first field, $2 to the second, and so forth. Think of these like column numbers in a table. By default, the delimiter is the space key, so they can also be used to refer to the position of specific words in a line.
If you want, you can use awk to iterate through each of the fields using a for-loop like the one below:
for (i = 1; i <= NF; i++) {
print "Field " i ": " $i
}
This will print each field in each line one by one. “NF” is another variable as explained below.
NF:
This refers to the number of fields in the current record. Since each line of input might have a different number of fields if the data isn’t a table, you should expect this variable to change from line to line.
NR:
NR refers to the number of records processed so far. It starts at “1” for the first line, and keeps incrementing one by one. You can use this to output a set of line numbers for each line in the input file, as an example.
FS:
“FS” stands for “field separator”. It’s what awk uses to parse the input text file when determining what counts as a new “field”. By default, it’s either a space or a tab, but you can set it to be something else, like a comma. For example, if your input is a CSV file, then you should set FS to be a comma like this:
echo "a,b,c" | awk -F, '{ print $1, $2, $3 }'
This ensures that awk can identify which character is used to separate the various fields.
OFS:
OFS refers to the output field separator. You can use it to control how awk displays results when it prints a sequence of fields separated by a comma. For example, consider this:
echo "a:b:c" | awk 'BEGIN { FS = ":"; OFS = "-" } { print $1, $2, $3
The above code first tells awk to use a colon for the FS, and then a dash (-) for the OFS, so when it prints the output, it looks like this:
RS and ORS:
These are record separators. Both are set to “\n” by default, indicating a new line. But you can change them to whatever you want. If you want awk to display each line separated by a comma, you can change the ORS to a comma and the same goes for RS.
For example, you can use these variables to process complicated formatting with awk to generate reports.
Conclusion
As you can see, awk is much more complex than simple pattern matching. You can use it to process and display reports in a way that grep simply can’t. The downside of that power is that it’s slower than grep, so only use it when grep can’t do the job.
I’m a NameHero team member, and an expert on WordPress and web hosting. I’ve been in this industry since 2008. I’ve also developed apps on Android and have written extensive tutorials on managing Linux servers. You can contact me on my website WP-Tweaks.com!
rdrtx1 says
pound for pound awk is the best scripting utility for file parsing, report generation and calculations ever.