Finding CrashPlan's Most Frequently Used Directories
Tags:
Bottom Line: I wrote a script to go through my CrashPlan log and find out which directories were being backed up most frequently.
I have a local CrashPlan backup that goes to my Raspberry Pi. It could be a little faster, but it generally works pretty well.
A week or so ago, I finally completed a full sync after not having done so in a couple weeks. The next day, I noticed that I already had a few GB of changes queued up to sync, after relatively light use and no new large files I could think of. I was curious as to what was going on, so I went searching through my CrashPlan logs.
Unfortunately, just looking at the raw logs didn’t give me the best idea — there are just too many files to wrap my head around. So instead, I wrote up a quick script that sorts through the most recent log of backed up files and outputs a text file with the name of each directory and number of times it was referenced in the backup log, sorted by count. I found that there were several directories that had tons of frequently modified files that I didn’t really need to be backing up at all. I added these directories to CrashPlan’s Settings
-> Backup
-> Filename exclusions:
and have been pleased with the results.
#! /usr/bin/env python3 | |
'''crashplan_dirs.py | |
Takes the crashplan log and sorts it by the most commonly used directories. | |
As of 20140907 only configured for Mac OSX. | |
''' | |
import re | |
import collections | |
import sys | |
import logging | |
OUTPUT_FILE = 'crashplan_dirs.txt' | |
logging.basicConfig( | |
level=logging.WARNING, | |
format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s', | |
datefmt='%Y-%m-%d %H:%M:%S', | |
# filename='crashplan_dirs.log', | |
# filemode='a' | |
) | |
logger_name = str(__file__) + " :: " + str(__name__) | |
logger = logging.getLogger(logger_name) | |
if sys.platform == 'darwin': | |
logfile = '/Library/Logs/CrashPlan/backup_files.log.0' | |
try: | |
with open(logfile, 'r') as f: | |
lines = f.readlines() | |
except FileNotFoundError as e: | |
logger.exception("Unable to find the CrashPlan log file. Make sure you " | |
"have the right file set for your system.") | |
regex = re.compile(r'^I \d{2}/\d{2}/\d{2} \d{2}:\d{2}[AP]M \d+ \w+ \d (/.*?)$') | |
paths = [path.group(1) for path in [re.match(regex, line) for line in lines] | |
if path] | |
dirs = [re.match(r'^.*/', path).group(0) for path in paths] | |
test = collections.Counter(dirs) | |
output = sorted(test.items(), key=lambda x: x[1], reverse=True) | |
with open(OUTPUT_FILE, 'w') as f: | |
f.write('\n'.join(['{}: {}'.format(v, k) for (k, v) in output])) |