Finding CrashPlan's Most Frequently Used Directories

Bottom Line: I wrote a script to go through my CrashPlan log and find out which directories were being backed up most frequently.

I have a local CrashPlan backup that goes to my Raspberry Pi. It could be a little faster, but it generally works pretty well.

A week or so ago, I finally completed a full sync after not having done so in a couple weeks. The next day, I noticed that I already had a few GB of changes queued up to sync, after relatively light use and no new large files I could think of. I was curious as to what was going on, so I went searching through my CrashPlan logs.

Unfortunately, just looking at the raw logs didn’t give me the best idea — there are just too many files to wrap my head around. So instead, I wrote up a quick script that sorts through the most recent log of backed up files and outputs a text file with the name of each directory and number of times it was referenced in the backup log, sorted by count. I found that there were several directories that had tons of frequently modified files that I didn’t really need to be backing up at all. I added these directories to CrashPlan’s Settings -> Backup -> Filename exclusions: and have been pleased with the results.