Adding new column to a text file

The output of some of the tests I do in my research produces pages upon pages of log files detailing what is happening to my data. This was very useful during debugging, and now it just helps me to gather and analyze data about the test runs. Since the actual output is human
readable, I usually need to extract the data itself out of the log files using few simple awk scripts. What I end up with are basic files with a single numeric value on each line. I graph these files using gnuplot.

Sometimes the stuff I want to graph is spread over several files that need to be merged. I’m posting this here because the other day saw someone opening bunch of these files and copying them one by one into Star Office spreadsheet to make a composite graph of all the data. Ugh…

That is way to much work. Not even mentioning that Star Office takes like a full minute to open, while flashing you with some ugly splash screen. I hate fucking splash screens. But I digress. There is a much simpler and easier way of merging simple list files on the command line.

Let’s assume we have two files. Our first file will look like this:

file111
11
111
1
111
1
111
11

Our second file (file2) will be tab delimited list looking like this:

file222 3333
222 33
2 3
2222 33333
22 333
2 3333
222

We want to merge them both and create a file with three columns. How do you do it? There is nothing simpler than using the unix paste command:

$paste file1 file2 > file3

The output of this will look as follows:

file311 22 3333
11 222 33
111 2 3
1 2222 33333
111 22 333
1 2 3333
111 222
11

There is one small issue that you need to watch out for – when you “paste” together two files with different rows (lines), you need to be really careful. For example, see what happens when I do the following:

$paste file2 file1 > file4

The output will look like this:

file422 3333 11
222 33 11
2 3 111
2222 33333 1
22 333 111
2 3333 1
222 111
11

Note how in the last row, the value ended up in the second column instead of the third. This is because paste doesn’t actually know about columns. It just glues together all the lines from the input files and puts tab in between them. You have to remember this little caveat when you paste together many files.

The quick workaround here is to put the “bigger” file as the first one in the command, and thus making it appear as the leftmost column. It’s probably best to pad the smaller file with an appropriate number of tabs. For example if one of your files has 25 more lines than the other one you can do:

for i in `seq 1 25`; do echo -e '\t' >> file2; done

This might be enough to make paste to do what you want it to do. For safety, paste to standard output first, before you write it to a file to make sure it looks right.

Oh, and if you messed up the loop, you can easily delete all the blank lines at the end with vim. Open the file, and type in \^\n\|^\t and hit enter to find the first line that either starts with a newline or a tab. Then press dG to delete everything below that line.

[tags]paste, bash, unix, linux, text files, columns, files, lines[/tags]

This entry was posted in sysadmin notes and tagged , , . Bookmark the permalink.



2 Responses to Adding new column to a text file

  1. Todd UNITED STATES Safari Mac OS says:

    This would have great when I was analzing my dissertation data. I didn’t cut and paste in StarOffice, but I did write some programs in Pascal to transform and organize my data. That’s right, I said Pascal! ;-)

    Reply  |  Quote
  2. Luke UNITED STATES Mozilla Firefox Windows says:

    Oh man… I never really used Pascal so I’m not sure how suited it would be for the task. But I guess it’s better use something you are familiar with than nothing at all. :)

    My first instinct when working with text files is to use simple unix tools kile awk, sed, grep, paste and etc.. If it’s more complex than that, I use perl. :mrgreen:

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>