Awk One Liners

I’ve been doing this stuff all day, so let me show you few nifty awk tricks. For these examples lets assume we have a tab delimited file with n columns and m rows.

To take an average of each row (average all the values from each column of a given row do the following:

awk '{sum=0; for(i=1; i<=NF; i++){sum+=$i}; sum/=NF; print sum}' file

The NF variable is a reserved awk word which expands to the number of fields (or columns) in the current row. If you want to take an average of all the columns do:

awk '{for(i=1; i<=NF; i++){sum[i]+=$i}} END {for(i=1; i<=NF; i++){printf sum[i]/NR "\t"}}' file

The NR variable is another awk built-in, that gives you number of records (or rows) read. You can exploit NR To add line numbers to your file do:

awk '{print NR, $0}' file

Of course the same can be accomplished by:

cat -n file

To find the number of lines in a file you can do:

awk 'END {print NR}' file

If you want to find the combined number of lines in all the files passed in on the command line do:

awk 'END {print FNR}' file1 file2 file3 ...

A lot of data manipulation can be done with some awk magic, and simple unix commands such as grep, paste, wc and etc..

[tags]awk, unix, linux, text processing, text manipulation, data processing[/tags]

This entry was posted in programming and tagged , , . Bookmark the permalink.



3 Responses to Awk One Liners

  1. Hi. I wanted to let you know that I just wrote a blog post about Awk One-Liners.

    In this post I explain all the famous (you’ll see what I mean) Awk one liners.

    The post is here:
    Famous Awk One-Liners Explained

    Sincerely,
    Peteris

    Reply  |  Quote
  2. John UNITED STATES Mozilla Firefox Ubuntu Linux says:

    Hello,

    How do I stop awk from rounding?  If I have a variable “N” with
    the following values in it:

    “138.00, 0.00, 0.00, 1351.95, 0.00, 90030.80, 58.50, 0.00″

    And I use part of one of your one liners up there like this:

    echo $N | awk '{sum=0; for(i=0; i<=NF; i++){sum+=$i}; print
    sum}'

    I get the following number/output:

    91717.2

    When it actually should be:

    91579.25

    Thanks a lot,
    John

    Reply  |  Quote
  3. John, your mistake is to include $0, which is actually the entire line, not the first element. This will output 91579.2, which is still missing a digit. To get this back, use printf. Here is the fix,

    awk '{sum=0; for(i=1; i<NF; i++){sum+=$i}; printf "%f", sum}'
    

    Something that awk needs is a CSV switch so that it can handle CSV files with quoting, like my Apache logs.

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>