Have I mentioned that I love Vim? It is such a useful little tool. Let me give you an example. The other day someone gave me a messy list – a dump of email addresses from some database as a comma separated file. Here is a sample of how it looked like – note that instead of actual emails I’m using names of vegetables, fruits and other stuff (like Rupert, or poo for example):
test, poop, boob, apple, carrot, mango, kiwi, apricot, banana, apple, tomato, prune, cranberry, raspberry, orange, lemon, potato, pudding, lemonade, pants, spoon, flax, dogmeat, poison, pee, hamburger, rupert, apple, mango, sunflower, bee, pumpernickel, puddle
Of course the actual list had over a thousand emails and did not include Rupert (or his poo for that matter). It was equally messy, full of duplicate emails and basically looking like a wall of text. Someone requested it expecting a sorted, itemized list that they could print out and look at for reference. What they got was a text blob.
So I grabbed the file and opened it in the trusty old Vim and issued three commands. The first one was:
:%s/, /\r/g
Of course this is a single regexp. I’m replacing every occurrence of a comma followed by a space with a carriage return. This sort of unrolled my csv into a file with a single item on each line:
test
poop
boob
apple
carrot
mango
kiwi
apricot
banana
apple
tomato
prune
cranberry
raspberry
orange
lemon
potato
pudding
lemonade
pants
spoon
flax
dogmeat
poison
pee
hamburger
rupert
apple
mango
sunflower
bee
pumpernickel
puddle
To sort it, I simply did:
:%sort u
This sorted my list and removed the duplicates (the ‘u’ stands for unique list):
apple
apricot
banana
bee
boob
carrot
cranberry
dogmeat
flax
hamburger
kiwi
lemon
lemonade
mango
orange
pants
pee
poison
poop
potato
prune
pudding
puddle
pumpernickel
raspberry
rupert
spoon
sunflower
test
tomato
Last touch was to add line numbers to every single line. Yes, I know – I could print the file with line numbers enabled but the person who would be using this file was barely capable of using notepad. So the lines had to be hard coded. This is actually a new trick that I just learned and it goes like this:
:%s/^/\=line('.').". "/
The \=line('.') bit does the actual line numbering, while the .". " bit simply appends a dot and a space to each number so they nicely stand out from the actual items. The end result looks like this:
1. apple
2. apricot
3. banana
4. bee
5. boob
6. carrot
7. cranberry
8. dogmeat
9. flax
10. hamburger
11. kiwi
12. lemon
13. lemonade
14. mango
15. orange
16. pants
17. pee
18. poison
19. poop
20. potato
21. prune
22. pudding
23. puddle
24. pumpernickel
25. raspberry
26. rupert
27. spoon
28. sunflower
29. test
30. tomato
I’m putting this here as a useful tip – more for myself than anyone else. Chances are I will forget the line numbering trick in a few weeks and will need to look it up again. Hopefully some of you may find it useful as well.
To summarize: Vim is awesome. It is like a Swiss Army Knife for text files. Use it, learn it, love it!
It’s these little ‘tricks’ that distinguishes vim from other text editors.
The \=line bit is new to me. Any explanation regarding the (‘.’) would be appreciated.
Thanks for the tip.
Nice tip. Maybe this will be a stupid question, but why use “\r” instead of “\n” ?
@ mcai8sh4:
Here it is:
This is taken directly from the vim’s inline help (ie. I typed in :help line). :)
@ Zel:
Oh… Because I was working with Windows file. The file needed to have windows line endings. It probably should have been \n\r but whatever.
And yet most programmers dismiss vim (and/or emacs) as some kind of pointless historical relic! They don’t know what they’re missing.
I like your note about how this post is mostly for your own memory. I do the exact same thing.
lol – I googled….and failed. But not even thinking about the help files – major fail.
/me takes his red face and cowers in the corner.
@Chris : I know what you mean, although, as with most things it’s about the right tool for the job. Now imagine a full featured IDE that is fused with Vim – everyone would be a winner
OR you could just have done:
awk -v RS=”, ” ‘{print $0}’ ugly.csv | sort -u | awk ‘{print FNR “. ” $0}’
;)
@ Luke Maciak: For Windows line-endings, you can always do
:set fileformat=dos
.@ mcai8sh4:
I actually did the same thing when I first saw the line(‘.’) thing used. I tried to google it and failed. Then I was like: “hey, vim has inline help – let me try that” :)
@ Tino:
Yep, that works too. I usually use awk for bigger things though – for example, if I had 20 documents like this one, I wouldn’t even bother with vim and would go directly to awk. When I’m working with a single data file however I usually prefer to have it loaded in the editor and do these things one step at a time so I can see how the data shapes up.