Digitizing Treasures
So, everything is supposed to be in “the cloud” already, right? I mean, how many times have I already backed up my DVD collection, and tossed away the physical discs? I mean, you need those physical discs in case you’re ever raided and need to prove you’ve actually purchased those hundreds of movies, and not just downloaded them from the internet right?
Over the years, I’ve managed to backup a fair number of DVDs, Audio CDs, pictures from phones, pictures from photo albums, and the like. Mostly the end up on the good old reliable Synology NAS sitting under my desk. All code is up on GitHub, other than the code I don’t want to every share publicly.
As we are moving to India, I’m making a final push to digitize everything.
Photos – We have accumulated about 80 years of family photos, because we’ve been the last stop for such things as various family members have passed away. There have been attempts over the years to digitize the same. We have packets of CD-ROMs from the likes of WalMart and Costco, where they had developed the film, and give you the CD containing the same. For the really old stuff, there have been waves of efforts to digitize albums through various services, leaving us with more disks, or perhaps nothing, and they’re lost to the ether.
A couple years back, I purchased a photo digitizer
Epson Fast Photo FF640 As it says on the box, it is fast. If you give it a pile of non-sticky photos, it will digitize a typical 5×7 in about a second. That’s great when your photos are all nice and clean and slippery. When they’re not in this condition, you might need to feed them by hand, one by one. In this case, it’s more like 10 seconds of more manual labor per photo.
We went through the albums, and tried to pull out pictures without ripping them. We were not always successful, and we’d just leave those alone, and move on to the ones we could separate. In all, we had about 25 albums total, amounting to thousands of photos.
All photos are now on a nice SanDisk USB SSD drive. This is a super snappy 4TB drive. More than enough to hold 80 years of family photos. I think they are all in .jpeg files. Certainly we can convert to other formats, but that should last another decade at least.
Onto the same SSD went all the photos from the Synology. That includes previous versions of digitizing images, as well as backups from phones that have long since added their structure to landfills or drawers. Phones are interesting, because sometimes their images are in standard formats such as .jpg, or possibly .png, but more often than not, they’re in a format that’s either raw, or some vendor proprietary thing. The key to making these useful, is to convert the to something universal and modern while copying, or soon thereafter.
Apple has had a variety of formats, and has added things like spacial to the mix more recently. To do these conversions, I’m sure Apple has software, but I tend to use IrfanView or something similar, because they’re treasure troves of esoteric file format support.
Then there’s Audio CDs. On a Windows machine, I can just pop it into the player, and open it up with Windows Media Player. There is a convenient button that says “Rip CD”. Just press that, and you get a nice rip, in .m4a format. This is supposed to be universal enough, that I just keep it in that format, and move along. If there’s audio that I want to preserve, with as much fidelity as possible, and eliminate any errors coming from the disk drive, I’ll use Exact Audio Copy. This is an older, going back to the days when ‘ripping’ was a bleeding edge thing, and encryption was a challenge. It’s great, and generates lossless .wav files. They’re big, but if you’re trying to do archival work (which I’m not), this is the way to go.
Lastly, there’s DVDs and BlueRay. There are legal reasons not to archive these, so I’ll pretend I’m only dealing with discs that have no encryption on them. I’ve collected DVDs from various places in the world while traveling. DVDs can be region locked, so a DVD from India, won’t necessarily play on a disc drive in the US, and visa versa. Besides this, the DVD from India is meant to be played on a PAL system, rather than NTSC. For years, I’ve had multiple physical DVD players connected to my TVs, putting into service the right one for the media and region. Well, digitizing this lot removes this burden once and for all.
There’s a two step process here. First is to get an external DVD drive that can be unlocked so it does not care about regions. I started with a OWC Mercury Pro 16X Blue-Ray. It’s a really good drive, and most importantly, there’s a SDF Tool Flasher, which unlocks the full potential of the drive. This firmware upgrade allows the drive to be in LibreDrive mode. This mode allows it to run faster while ripping, removes the region lock, and generally allows software to access the raw disk data without restriction. This is all very nice, and to do the actual ripping, I used MakeMKV, and convert the raw DVD information into the .mkv container format. In the past, I would just store the DVD as a .iso file for archival storage. Later, I would use Handbrake to pull out a single .mpg file for the main feature. These days, .mkv is a better archival format, as some players, such as VLC Player, and play from this directly, and more devices can read that format, even if they can’t read a .iso directly.
Having this unlocked drive makes it easier to get DVDs and BlueRays from other places in the world. This time, when I’m in India and want to buy the latest on a DVD, I don’t have to worry about getting an India region specific DVD player to connect to the TV. I can just connect the drive to the laptop, and play away.
Ignoring streaming for the moment, I find that getting off these old disks is a MUST. I mean, they have a physical lifespan. They’re prone to finger prints, scratches, water damage, delimitation, and just general bit rot. Now is the time to preserve the content (some as old as 20 years), and finally get rid of the physical disks. The fact that a palm sized SSD can hold 4 terabytes of data, makes it much easier than in the past. Previously, that much storage required some sort of NAS, with redundancy, and permanence, and cloud connection. Nowadays, if I want to backup, and take it on the road, I can just get another of these SanDisk drives (or smaller/cheaper), make a copy, and move on with life. No giant server, constantly consuming electricity, redundancy, and all that.
And what to do with all this content?
More movies that I’d ever have time to prioritize watching, more photos than I’d ever care to look at. I did it so that I can have my own data sources for AI research! I mean, 80 years of family photos, from different regions, different photo quality, different ages of the same people. Surely that’s an interesting set to have. Yes, other family members can look at the stuff, but I think it will make good pickings for playing with programatically. I can even see a data service where I say “just send us your photos if you’re doing nothing with them, and we’ll make you your own private AI of the same”.
De-treasuring is hard. The age of digitization is upon us. It’s easier than ever to preserve history, and at the same time, as we move away from physical forms and into digital, it’s easier to lose track of history altogether.
Here we are, and here we go!