Comments on: File Format Overhead for Data Storage

By: Glenna James

Glenna James — Thu, 26 Feb 2009 22:58:50 +0000

I tried to send you an email listed on your contact me page, but it was rejected. So I’m taking a chance and asking this question here.

You helped me with an excel problem a while back. I went back to you site to get help with this problem but I didn’t know how to look it up in your archives so I’m emailing you.

BACKGROUND

I am an accountant and I work in governmental accounting. (I am not employed by the government; I just do work for them)

I have been working for months (on and off) on a spread sheet that will allow me entering information the least amount of times as possible. I have numerous formulas where the I enter the information once in a workbook and it pulls to multiple other sheets in the workbook. I work with three different areas so I did the master work on one area (Central) and a week or so ago when I finally finished my work, I copied that workbook (Central’s) and made a duplicate for the other two areas (Western and Southwest) After copying the original workbook I went in to Western and Southwest and entered their data into their workbook.

PROBLEM

With Central only many of my cells are not pulling the data I enter in the original without help from me. I have to go to the sheet and click in the cell and then click up at the top — I don’t know what you call that space but immediately to the left of it, it says, “fx”.

Then I hit enter and the data is then pulled to that cell.

I checked everything I could think of under TOOLs and Options. It is set to calculate automatically.

Of course having to do this defeats the whole purpose of having the formula in the first place.

The thing that puzzles me is that I have the problem only in the original workbook and not in the two I copied.

Since I consider you to be a genius, I hope you will be able to help me. Thanks.

Glenna James

By: Luke Maciak

Luke Maciak — Thu, 20 Mar 2008 04:43:52 +0000

Nice! I’d love a review copy if you can spare one. :)

I’ll shoot you an email.

By: Philipp K Janert

Philipp K Janert — Thu, 20 Mar 2008 01:58:47 +0000

This may be slightly off-topic, but since you mention it…

There is now a book on Gnuplot: “Gnuplot in Action”. You can pre-order it directly from the publisher: Manning: Gnuplot in Action.

The book assumes no previous familiarity with Gnuplot. It introduces the most basic concepts rapidly, and then moves on to explain Gnuplot’s advanced concepts and power features in detail.

If you want to learn more about the book and the author, check out my book page at Principal Value – Gnuplot in Action.

Let me know if you are interested in a review copy.

By: Luke Maciak

Luke Maciak — Tue, 12 Feb 2008 00:26:47 +0000

[quote comment=”8064″]Luke,

The xlsx files are already zip compressed. Rename one to end with zip & you’ll be able to decompress it, then you can see how (un)cooperative m$ has been with their new ‘open’ standard.[/quote]

Will, I know that. This is why I said this in my post:

[quote post=”2282″]There is naturally a reason for this. In case you didn’t know the OpenXML files are really zip compressed directory trees full of verbose MSXML. They are already compressed – there is not much we can do about it! This files are and will be huge for many reasons.[/quote]

Funny thing is that the XML inside is convoluted and it can’t be readily edited. I tried to unzip a Word file, slightly change some text and then zip it back but naturally this doesn’t work. They have checksums and hashes of the content stowed away in more than one place to ensure that modifying OOXML files is as convoluted as possible.

By: Will Sheldon

Will Sheldon — Mon, 11 Feb 2008 22:55:18 +0000

Luke,

The xlsx files are already zip compressed. Rename one to end with zip & you’ll be able to decompress it, then you can see how (un)cooperative m$ has been with their new ‘open’ standard.

By: Luke Maciak

Luke Maciak — Fri, 08 Feb 2008 20:54:48 +0000

[quote post=”2282″]What is this mudkips captcha, I don’t get it.[/quote]

Mudkips == 4chan meme. In other words, you probably don’t want to know. ;) I added it to the word list because it is short and random word that is not really in a dictionary, but it is easy to spell and type in and also acts as a silly inside joke for some.

By: Luke Maciak

Luke Maciak — Fri, 08 Feb 2008 20:38:37 +0000

[quote post=”2282″]you can bet your donkey you’ll be able to open a txt file in 30 years and import it into whatever you need. If your data was only in xls/xlsx or other proprietary formats (see stats program for more examples), in 30 years you’re probably hosed. [/quote]

Dito! You hit the nail on the head. I totally glossed over that, but the availability of your information in the future is paramount. Locking all your data in proprietary formats is not a smart thing to do.

By: jambarama

jambarama — Fri, 08 Feb 2008 19:39:30 +0000

BTW – saving your data in ascii is a great idea for a few reasons: it is more searchable, more easily manipulated, smaller size (as you demonstrated), and you can bet your donkey you’ll be able to open a txt file in 30 years and import it into whatever you need. If your data was only in xls/xlsx or other proprietary formats (see stats program for more examples), in 30 years you’re probably hosed.

By: jambarama

jambarama — Fri, 08 Feb 2008 19:35:08 +0000

What is this mudkips captcha, I don’t get it.

A blog I used to read (when entries were still forthcoming), 3monkeys, did a file size comparison between doc, xml, txt, & odt about a year ago. They came up with essentially similar same results as you. They also compared file sizes across different apps but the same filetype, different results. OOXML wasn’t out at that time, so no comparison there.

Thanks for this, it is pretty interesting to me. If you ever try it again, maybe run some odf comparisons too. :)

By: Luke Maciak

Luke Maciak — Fri, 08 Feb 2008 00:52:09 +0000

The file was essentially rows of floating point numbers. They were readings taken at each iteration and they were supposed to converge and stop changing at some point. Here is the sample of the data:

0.002479512	0.003447492	0.004360885	0.00417913
0.002243689	0.003142163	0.003983163	0.003899171
0.002080678	0.002927045	0.003714036	0.00369954
0.001953793	0.002756688	0.003503308	0.003538016
0.001857135	0.002622514	0.003341999	0.003409899
0.001786521	0.002518304	0.003223567	0.003311575
0.001739003	0.002439494	0.003143176	0.003240258
0.001712549	0.002382714	0.003097226	0.003193814
0.001705819	0.002345494	0.003083038	0.003170632
0.001718007	0.002326056	0.003098644	0.003169526
0.001748726	0.00232317	0.003142633	0.003189665
0.001797924	0.002336054	0.003214054	0.003230512
0.001865815	0.002364298	0.003312335	0.003291781
0.001952821	0.002407811	0.003437223	0.003373395
0.002059526	0.002466784	0.003588746	0.00347545
0.002186619	0.002541665	0.003767171	0.003598183
0.002334844	0.002633143	0.003972968	0.003741938
0.002504938	0.002742141	0.004206782	0.003907128
0.002697553	0.002869812	0.00446939	0.004094192
0.002913174	0.003017547	0.004761655	0.004303547
0.003151996	0.003186983	0.005084467	0.004535527
0.003413796	0.003380021	0.005438662	0.004790304
0.003697758	0.003598848	0.005824921	0.005067787

It’s all numeric, no letters and the values are relatively close to each other. It compresses very well. For this test I zipped all the files it using WinRar with the “Best Compression” option.