I’ve been doing this stuff all day, so let me show you few nifty awk tricks. For these examples lets assume we have a tab delimited file with n columns and m rows.
To take an average of each row (average all the values from each column of a given row do the following:
awk '{sum=0; for(i=1; i<=NF; i++){sum+=$i}; sum/=NF; print sum}' file
The NF variable is a reserved awk word which expands to the number of fields (or columns) in the current row. If you want to take an average of all the columns do:
awk '{for(i=1; i<=NF; i++){sum[i]+=$i}} END {for(i=1; i<=NF; i++){printf sum[i]/NR "\t"}}' file
The NR variable is another awk built-in, that gives you number of records (or rows) read. You can exploit NR To add line numbers to your file do:
awk '{print NR, $0}' file
Of course the same can be accomplished by:
cat -n file
To find the number of lines in a file you can do:
awk 'END {print NR}' file
If you want to find the combined number of lines in all the files passed in on the command line do:
awk 'END {print FNR}' file1 file2 file3 ...
A lot of data manipulation can be done with some awk magic, and simple unix commands such as grep, paste, wc and etc..
[tags]awk, unix, linux, text processing, text manipulation, data processing[/tags]
Hi. I wanted to let you know that I just wrote a blog post about Awk One-Liners.
In this post I explain all the famous (you’ll see what I mean) Awk one liners.
The post is here:
Famous Awk One-Liners Explained
Sincerely,
Peteris
Hello,
How do I stop awk from rounding? If I have a variable “N” with
the following values in it:
“138.00, 0.00, 0.00, 1351.95, 0.00, 90030.80, 58.50, 0.00”
And I use part of one of your one liners up there like this:
echo $N | awk '{sum=0; for(i=0; i<=NF; i++){sum+=$i}; print
sum}'
I get the following number/output:
91717.2
When it actually should be:
91579.25
Thanks a lot,
John
John, your mistake is to include $0, which is actually the entire line, not the first element. This will output 91579.2, which is still missing a digit. To get this back, use printf. Here is the fix,
Something that awk needs is a CSV switch so that it can handle CSV files with quoting, like my Apache logs.