Awk One Liners

I’ve been doing this stuff all day, so let me show you few nifty awk tricks. For these examples lets assume we have a tab delimited file with n columns and m rows.

To take an average of each row (average all the values from each column of a given row do the following:

awk '{sum=0; for(i=1; i<=NF; i++){sum+=$i}; sum/=NF; print sum}' file

The NF variable is a reserved awk word which expands to the number of fields (or columns) in the current row. If you want to take an average of all the columns do:

awk '{for(i=1; i<=NF; i++){sum[i]+=$i}} END {for(i=1; i<=NF; i++){printf sum[i]/NR "\t"}}' file

The NR variable is another awk built-in, that gives you number of records (or rows) read. You can exploit NR To add line numbers to your file do:

awk '{print NR, $0}' file

Of course the same can be accomplished by:

cat -n file

To find the number of lines in a file you can do:

awk 'END {print NR}' file

If you want to find the combined number of lines in all the files passed in on the command line do:

awk 'END {print FNR}' file1 file2 file3 ...

A lot of data manipulation can be done with some awk magic, and simple unix commands such as grep, paste, wc and etc..

[tags]awk, unix, linux, text processing, text manipulation, data processing[/tags]

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Awk One Liners

Peteris Krumins says:

September 28, 2008 at 1:58 am

Hi. I wanted to let you know that I just wrote a blog post about Awk One-Liners.

In this post I explain all the famous (you’ll see what I mean) Awk one liners.

The post is here:
Famous Awk One-Liners Explained

Sincerely,
Peteris

Reply | Quote
John says:

October 27, 2008 at 5:56 pm

Hello,

How do I stop awk from rounding? If I have a variable “N” with
the following values in it:

“138.00, 0.00, 0.00, 1351.95, 0.00, 90030.80, 58.50, 0.00”

And I use part of one of your one liners up there like this:

echo $N | awk '{sum=0; for(i=0; i<=NF; i++){sum+=$i}; print sum}'

I get the following number/output:

91717.2

When it actually should be:

91579.25

Thanks a lot,
John

Reply | Quote
Chris Wellons says:

April 14, 2009 at 3:05 pm
John, your mistake is to include $0, which is actually the entire line, not the first element. This will output 91579.2, which is still missing a digit. To get this back, use printf. Here is the fix,
```
awk '{sum=0; for(i=1; i<NF; i++){sum+=$i}; printf "%f", sum}'
```
Something that awk needs is a CSV switch so that it can handle CSV files with quoting, like my Apache logs.
Reply | Quote

Awk One Liners

3 Responses to Awk One Liners

Leave a Reply Cancel reply

Subscribe

Social

Categories

Archives

Interesting Blogs

My Software Projects

People I Know

Web Comics I Read

Meta

Awk One Liners

3 Responses to Awk One Liners

Leave a Reply Cancel reply

Subscribe

Social

Categories

Tags

Archives

Interesting Blogs

My Software Projects

People I Know

Web Comics I Read

Meta