The output of some of the tests I do in my research produces pages upon pages of log files detailing what is happening to my data. This was very useful during debugging, and now it just helps me to gather and analyze data about the test runs. Since the actual output is human
readable, I usually need to extract the data itself out of the log files using few simple awk scripts. What I end up with are basic files with a single numeric value on each line. I graph these files using gnuplot.
Sometimes the stuff I want to graph is spread over several files that need to be merged. I’m posting this here because the other day saw someone opening bunch of these files and copying them one by one into Star Office spreadsheet to make a composite graph of all the data. Ugh…
That is way to much work. Not even mentioning that Star Office takes like a full minute to open, while flashing you with some ugly splash screen. I hate fucking splash screens. But I digress. There is a much simpler and easier way of merging simple list files on the command line.
Let’s assume we have two files. Our first file will look like this:
Our second file (file2) will be tab delimited list looking like this:
We want to merge them both and create a file with three columns. How do you do it? There is nothing simpler than using the unix paste command:
$paste file1 file2 > file3
The output of this will look as follows:
11 22 3333
11 222 33
111 2 3
1 2222 33333
111 22 333
1 2 3333
There is one small issue that you need to watch out for – when you “paste” together two files with different rows (lines), you need to be really careful. For example, see what happens when I do the following:
$paste file2 file1 > file4
The output will look like this:
22 3333 11
222 33 11
2 3 111
2222 33333 1
22 333 111
2 3333 1
Note how in the last row, the value ended up in the second column instead of the third. This is because paste doesn’t actually know about columns. It just glues together all the lines from the input files and puts tab in between them. You have to remember this little caveat when you paste together many files.
The quick workaround here is to put the “bigger” file as the first one in the command, and thus making it appear as the leftmost column. It’s probably best to pad the smaller file with an appropriate number of tabs. For example if one of your files has 25 more lines than the other one you can do:
for i in `seq 1 25`; do echo -e '\t' >> file2; done
This might be enough to make paste to do what you want it to do. For safety, paste to standard output first, before you write it to a file to make sure it looks right.
Oh, and if you messed up the loop, you can easily delete all the blank lines at the end with vim. Open the file, and type in \^\n\|^\t and hit enter to find the first line that either starts with a newline or a tab. Then press dG to delete everything below that line.
[tags]paste, bash, unix, linux, text files, columns, files, lines[/tags]