File Compression in the Multi-Core Era

DeveloperD · March 2, 2009, 12:00am

There is of course a certain point where current OSes and apps give diminishing returns when more CPUs are used. There are also some studies that say beyond a certain point processing actually slows down for some tasks.

However, having four or eight cores is not a waste for many consumer desktops. I have an eight core MacPro with 10GB or RAM, running OSX (and often Windows and Linux in VMWare). OSX has a nice little dev tool where you can view the CPU load graphically, and you can disable CPUs as you wish - down to one CPU.

To make a point this weekend to someone considering whether to get a dual core or a quad core machine, I ran only an email client, a browser and iTunes - a typical consumer app load. In iTunes I was using a ‘spectrum analyzer’ visualizer which does load down the CPUs a bit, but it is not unusual for consumers to do something like this (or to be watching a DVD or streaming video).

At 8 cores not all 8 cores were used at the same time - it was switching between them. At four cores it was using all four cores simultaneously at about 25% average load each. At two cores it was using about 50% average load each. At one core the CPU was often pegged.

The point is that it doesn’t much to load up current CPUs - even a top of the line machine like mine with an efficient OS can easily make efficient use of more than 2 cores with typical consumer use. Whether it is worth the extra money for a given consumer depends on their budget, their use and what they are buying. I have seen Dell quad core desktops on sale for under $500 so I don’t hesitate to tell someone to get a quad core machine if the price is right. It won’t be wasted and in the near future apps and OSes will take even more advantage of more cores.

I remember when the '386 first came out. A pundit in a review article said it would never be useful for the desktop - it would only be useful for ‘file servers’. I also remember a couple of years after the Mac came out - another pundit asserted that Macs would never run Windows.

Never say never.

Tom_Dibble · March 2, 2009, 12:00am

Guy: And remember, recovery time is just as important as backup time.

Why would you claim this? It seems that, generally speaking, you should be doing the backup well over 100x as many times as you do the recovery (other than spot checks for backup reliability, but that’s not a time-sensitive task unless you insert it as a blocker). If I have a catastrophic failure and need to take 2 hours to decompress the backup, I’m willing to charge that to the unfortunate but uncommon catastrophic failure event; if I’m taking 2 hours to compress every day so that recovery is a 10 minute operation I think I’m wasting a heck of a lot more time for little real payoff!

Or, are you implying that every backup should be uncompressed and verified immediately before closing the backup window?

AndrewF · March 2, 2009, 12:00am

@ Tom Dibble: Recovery time is as important as backup time if you want to have a DR site ready to go when a failure happens. In that kind of situation, you’ll want to restore your latest backup to your DR server as fast as possible.

Calyth · March 2, 2009, 12:00am

Unfortunately compression algorithms are like symmetric block-ciphers. If you want fast, then it won’t compress well / it will leak information.
If you want it to compress well / not leak information, the task becomes highly serialized.

If we can only standardize on a compression algorithm, and then add specialized circuits that would do that very quickly, then it would be fast. It’s kinda like that test where the Via Nano trounced a C2Q on AES compression.

GuillermoG · March 3, 2009, 12:00am

In 7-zip 4.19 (IIRC) bzip2 implementation was rewritten to archive higher compression ratios on redundant data (such as source tarballs) on highest modes while remaining compatible to bzip2. Unfortunatelly that rewrite also resulted in the normal mode becoming slower in total CPU time than the standard bzip2 implementation without higher compression ratio.

Joe · March 3, 2009, 12:00am

@Jeff: Even the simplest compression algorithms (Shannon-Fano and Huffman) already eliminate between 80-90% of the information content in a file.

With so much entropy reduced, trying to compress the file again usually yields less than 10% size reduction (or even a negative compression) so it’s far more efficient to use a better algorithm from the very beginning (I bet SQL2008 is simply using simple ZIP’s deflate compression).

Now, what I have never understood is: why has nobody ever created a compression card, just as we have specialized graphics, sound and network cards? As a DBA I have to do backups continuously, copy them over the network, burn them, etc. (and yes, I do full backups only on Sundays, differential backups at midnight the other days and transaction backups twice during working hours - I hope the same’s done on StackOverflow’s database).

I think there’s a market for a card whose circuitry is dedicated to compress.
Just imagine it: it’d copy a stream of bytes from memory (using DMA, of course), compress it without bothering the main CPU(s), then copy the result back to main memory. For both database and web server scenarios this would be a huge win!!!

Joe · March 3, 2009, 12:00am

@Developer Dude:
You must be kidding. Seriously. Are you really a developer?

Watching a video is the only CPU intensive application from the bunch you mention. Fetching email is mostly bandwith-constrained rather than processor-bound - no matter what Intel tells you, you won’t be either navigating faster or retrieving your mails faster by using a more powerful CPU (or multi-cores).

Writing to the HD is thousands of times slower than writing to memory - your backup task is constraint by your HD’s speed, you’re not gaining much sending it to its own core.

Placing data on a CD/DVD is actually a slow operation and requires very few CPU, the big CPU usage when burning is due to copying big chunks of data from HD to memory, and hence to the burner. The process is also highly intolerant of delays, so unless you have a RAID or a darn fast HD you should not be playing a video while burning a disk, anyway.

Joe · March 3, 2009, 12:00am

@Developer Dude:
You must be kidding. Seriously. Are you really a developer?

Watching a video is the only CPU intensive application from the bunch you mention. Fetching email is mostly bandwith-constrained rather than processor-bound - no matter what Intel tells you, you won’t be either navigating faster or retrieving your mails faster by using a more powerful CPU (or multi-cores).

Writing to the HD is thousands of times slower than writing to memory - your backup task is constraint by your HD’s speed, you’re not gaining much sending it to its own core.

Placing data on a CD/DVD is actually a slow operation and requires very few CPU, the big CPU usage when burning is due to copying big chunks of data from HD to memory, and hence to the burner. The process is also highly intolerant of delays, so unless you have a RAID or a darn fast HD you should not be playing a video while burning a disk, anyway.

FrederikS · March 3, 2009, 12:00am

Why not simply try splitting the original file in four equally sized parts and compress them all individually with 7zip?

I’d be really curious what the total size would be. If it’s not significantly larger than the total size when compressing the original file in one piece, then doing this in parallel would be a piece of cake. Once you’ve established a maximum size of input file at which point making the input file larger doesn’t give better compression, you can easily split up the input file. It would require a slightly different 7zip file format though.

If the combined size is much larger than the single size, it might be much harder to parallelize 7zip though…

Andrew · March 3, 2009, 12:00am

Did you notice that Apple has remained resolutely in favor of Core Duos in their latest refresh of their iMac consumer desktop? I’m in complete agreement with you, desktop PC performance is primarily driven by CPU clock speed, L1/L2 cache size and memory bus bandwidth rather than the number of cores.

As an aside did you measure file I/O during this test? I wonder how much of the bzip time was just spend reading/writing to the disk? I/O latency is definitely something that multiple cores can do very little about, and may actually make things worse (by randomizing the read/write operations).

MetroS · March 3, 2009, 12:00am

Even though additional cpus past 2 cannot be used efficently, I would think that a system with 3+ cores would feel more responsive in interactive use since there is a greater possibility that cpu resources would be available immediately to respond to the user.

Yuhong_Bao · March 3, 2009, 12:00am

Basic queuing theory might indicate that 3 cores/CPUs may offer a practical advantage over just 2 even in light-load settings (under ~ 33% utilization) so a quad scenario might be overkill on the desktop.
Yep, as a compromise between limiting to dual-core and going all the way to quad-core, AMD offers tri-core (X3) versions of the Phenom and Phenom II. Sadly, Intel doesn’t offer tri-cores.

Keng · March 3, 2009, 12:00am

ESEUTIL? This Technet article mentions it and it works fabulously on sql databases as well (it’s designed to move large Exchange dbs).

http://blogs.technet.com/askperf/archive/2007/05/08/slow-large-file-copy-issues.aspx

File Size: 388 GB (417,091,305,984 bytes)in 10.27 hours.

keng1 · March 3, 2009, 12:00am

btw:btw: that same file transfer failed when it was taking almost 24hours.

PAWebb · March 3, 2009, 12:00am

As Frederik suggests, you should …simply try splitting the original file in four equally sized parts and compress them all individually with 7zip.

Let us know what the result is. If this turns out to be a winner, it would be easy enough to automate the process.

PAWebb · March 3, 2009, 12:00am

I should mention, however, that in similar scenarios I have often found no improvement or even poorer performance when paralelizing encodings in this manner. You won’t know until you try, though. It all depends on where your bottlenecks are, and what kind of resource contention might develop between the processes.

Sarel_Botha · March 3, 2009, 12:00am

Another alternative you can consider for remote backups is rdiff-backup. It uses delta compression like rsync does to only send the difference. It also keeps reverse diffs on the server side so that you can go back to a previous version of the data at any time. It works kind of like Time Machine on the Mac. You can see what everything looked like X days ago and extract files from the backup.

This means you can keep 60 days worth of old backups at a fraction of the cost of disk space. It’s very convenient to be able to go back to old data in case you find a bug in your code that’s been making your data go away.

We back up 25 GB in less than an hour over a 70kbytes / sec connection.

It can be tricky to get this going. Let me know if you want to try it and I’ll give you some pointers.

DeveloperD · March 3, 2009, 12:00am

JeffA wrote:
No actual published benchmarks of typical computer use support your statement. I can point to dozens of articles backed by data on AnandTech, TechReport, Tom’s Hardware, etc, that all show the same thing – there is a massive point of diminishing return beyond 2 cores.

If you aren’t in one of the narrow niches that can exploit parallelism, an ultra-fast dual core is all you need.
lt;lt;

If you are running benchmarks where you are looking at the performance of a single app, sure, many apps are not very parallel including not being very parallel beyond 2 CPUs.

However, if you are looking at the bigger picture where someone is using more than one app at a time, then you can start exploiting the ability of most modern OSes to use more than one CPU, to assign a given application at least one CPU.

It is not a ‘narrow’ niche to be listening to tunes or watching a video while something else is happening on your computer. Whether you email client is fetching email/RSSfeeds/etc., and/or you are burning a DVD, and/or you have an automatic backup process running, and/or you are downloading something. These are all typical uses that if you - the user - multitask - can find that you can easily put more than 2 CPUs to good use.

I bought my MacPro because it supported OSX (so I can run OSX, Linux and Windows, including 64 bit versions, all at the same time - I am a cross platform developer) and because it supports up to 32 GB (those VMs and DBMS servers and web app back ends take up a lot of memory) - not so much because it had 8 Xeon CPUs - but I did get the 8 CPUs instead of the 4 because I knew from time to time I could use them and in the near future more and more apps would take advantage of them.

LeptoS · March 4, 2009, 12:00am

Jeff -

Is there some reason not to compress (and maybe encrypt) the db in situ? Would insert/compress or extract/decompress on-the-fly on (presumably) mostly small text files impact on response time? If not, your backup problem becomes a simple copy or ftp. Sarel’s point is good - when the backup medium of choice was 9-track tapes, backing up changes was the only feasible method.

Lepto

Collin · March 5, 2009, 12:00am

Jeff,
We did the same benchmarks a year ago and winzip won hands down.