Posts Tagged With: geekery

I wanted to make my Plex library more efficient and accidentally built a media processing platform

One of the more dangerous phrases in technology is:

“It’ll be a fairly simple project.”

The latest example began with a perfectly reasonable observation. I have a lot of media. This isn’t exactly breaking news to anyone who knows me, but it became particularly obvious once I found myself storing films measured in tens of gigabytes each whilst simultaneously insisting that storage space wasn’t really a concern.

To be fair, storage isn’t a concern. The NAS has plenty of room left and I have absolutely no hesitation filling it with unnecessarily large files if they happen to be shiny enough. Even so, I started wondering how much of what I’d accumulated over the years was actually efficient.

Some files were old encodes. Some contained audio tracks that would never be used. Some carried around subtitle streams, metadata and assorted baggage that served little practical purpose for me. Others were encoded using formats that made perfect sense when they were created but are no longer particularly efficient.

The original plan was straightforward enough: analyse the library, identify opportunities for improvement and make the collection a bit leaner without upsetting Plex, or perhaps even making Plex happier with less transcoding work to do.

That’s it.

A modest objective.

An afternoon’s work, perhaps.

Naturally it evolved into something considerably larger.

Part of the problem was that I wasn’t really looking for a transcoding tool. Those already exist, and some of them are excellent. Tdarr, FileFlows and Unmanic all tackle various aspects of media optimisation extremely well. If your goal is to point a tool at a library and have it start transcoding files, there are plenty of mature solutions available.

What I found myself wanting was something slightly different.

I didn’t want a system that immediately started changing media. I wanted a system that could analyse a library, explain what it thought should happen, categorise recommendations by risk and allow me to decide how aggressively I wanted to proceed.

After all, if I trusted things too much I could end up with non-English language films like Pan’s Labyrinth or Apocalypto calibrated beautifully with no audio track at all if I blindly pointed a tool at my library.

More importantly, I wanted to understand the potential benefit before committing to days or weeks of processing time.

That distinction turned out to be important.

The other thing I quickly discovered was that once you’re dealing with thousands of files, memory isn’t enough. I wasn’t interested in a tool that spat out recommendations and expected me to remember them. I wanted something that could track decisions, record outcomes, understand what had already been reviewed and provide a clear path from “this file could be improved” to “this file has been improved”.

Which is roughly the point at which a media optimiser starts suspiciously resembling a workflow platform.

The first surprise was discovering that media optimisation isn’t really one problem. It’s several completely different problems wearing a trench coat and pretending to be one problem.

Some recommendations are almost entirely safe. If a file contains redundant subtitle tracks, duplicate audio streams or other bits of media archaeology that nobody is ever likely to use, cleaning them up is relatively straightforward. Other recommendations require human judgement. Perhaps a file contains multiple audio tracks and the “correct” answer depends entirely on how you consume your media. Then there are conversions, where the potential savings are significantly larger but so are the risks.

Before long I realised I wasn’t building a tool that made recommendations.

I was building a workflow.

What eventually emerged was a three-stage process.

The first stage deals with safe recommendations. Things that can be cleaned up with minimal risk and relatively little processing overhead. The second stage covers recommendations that require a human being to look at them and decide whether they’re sensible. The third stage is conversion, where files are re-encoded into more efficient formats and where the genuinely substantial savings begin to appear.

This may sound suspiciously like enterprise workflow software.

I would like to stress that the original objective was merely to tidy up a Plex library.

The conversion side of things is where the project really started to become interesting. Modern video codecs are astonishingly effective when given the opportunity. One of the first large validation runs took a 17.5GB UHD source file and reduced it to around 8.5GB whilst preserving the streams and characteristics I actually cared about. Nearly 9GB saved from a single file is the sort of result that immediately gets your attention when you’re looking at a library measured in terabytes rather than gigabytes.

That file hasn’t even been through the Safe and Review paths yet, so there are likely to be further savings to be squeezed out of it.

The dangerous thing about seeing a saving like that is that your brain immediately starts extrapolating.

Save 9GB here.

Save 12GB there.

Repeat often enough and suddenly you’re talking about hundreds of gigabytes, perhaps even terabytes, of reclaimed storage. That in turn makes life easier for Plex by reducing transcoding overhead and network traffic.

That’s roughly the point where a fun experiment starts looking suspiciously like a project.

The obvious response to discovering that you can almost halve the size of some files is to enthusiastically start converting everything.

The less obvious response, which is usually the correct one, is to spend several days proving that doing so won’t create an absolute disaster.

This has been one of the more challenging aspects of the project because I am not, by nature, a particularly patient person. Most software development rewards impatience. You write some code, refresh a browser and immediately discover whether you’ve made things better or significantly worse.

Media conversion does not work like that.

Some of the larger UHD files take twenty hours or more to process. One of the validation runs spent the better part of a day chewing through a single film before finally revealing whether my assumptions had been correct. When you’re accustomed to software projects where the feedback loop is measured in seconds, that’s a surprisingly uncomfortable amount of waiting.

If the answer turns out to be “no”, congratulations: you’ve just spent the best part of a day discovering a new and exciting way to waste the best part of a day.

Mercifully, the pre-work I put in meant all of the major conversion tests yielded a resounding yes.

The idea itself has existed for a few weeks. The last few days, however, have largely involved staring at progress bars and developing a newfound appreciation for delayed gratification.

The irony is that the conversion queue isn’t even the beginning of the actual optimisation work.

The eventual rollout plan deliberately starts elsewhere. Safe recommendations are processed first because they’re quick, low risk and immediately reduce clutter. After that come the review items, where human decisions determine what should happen next. Only then does the conversion queue get unleashed on the library.

In other words, the slowest and most resource-intensive part of the entire project happens last.

This feels entirely consistent with how the rest of the project has gone.

As a result, Media Auditor has become increasingly conservative. Source media remains read-only. Generated files are written elsewhere. Conversions are verified. Outputs are audited. Human approval remains mandatory. A surprising amount of engineering effort is spent proving that a change is safe before the system is allowed to do anything particularly exciting.

This is partly because media libraries are valuable, but mostly because I have no desire to discover what happens when an overenthusiastic optimisation engine decides to improve several terabytes of content in an unexpected manner.

One of the more amusing discoveries has been that the most dangerous part of the entire workflow isn’t actually the conversion process itself.

It’s putting the converted file back.

Creating a replacement file is relatively straightforward. Replacing an existing file without upsetting Plex, preserving rollback options and ensuring that everything can be verified afterwards turns out to be considerably more complicated. As a result, a disproportionate amount of current effort is focused on replacement workflows, validation and approval processes rather than the conversion technology itself.

This probably says something profound about software engineering.

Or possibly just about me.

The project has also changed direction over time. Initially the objective was simply to optimise an existing library. Increasingly, though, that feels like the first phase rather than the end goal.

Once the existing library has been processed, the more interesting opportunity is preventing inefficient media from entering it in the first place.

The long-term vision is to move the same workflow further upstream. New downloads can be analysed before they ever reach the library. Safe improvements can be applied automatically. More complex recommendations can be surfaced for approval. By the time something appears in Plex, it has already passed through a process designed to ensure it’s as efficient and well-structured as possible.

In other words, stop cleaning up after myself and start avoiding the mess entirely.

Which feels suspiciously like a life lesson.

What I find most amusing is how quickly all of this happened. A few weeks ago I was looking for ways to make a media library slightly more efficient. Today I’m validating conversion batches that run for the better part of a day, designing approval workflows, thinking about resource management profiles and planning an optimisation campaign that may ultimately take weeks to complete.

The really annoying thing is that it’s worked.

The first validated conversions have demonstrated exactly the sort of savings I’d hoped for, the workflow is gradually proving itself and there’s a very real possibility that this ends up reclaiming a substantial amount of storage whilst simultaneously making the library easier for Plex to manage.

Which means I can no longer dismiss the whole thing as a ridiculous over-engineered distraction.

Somewhere along the way, “let’s make Plex a bit more efficient” quietly evolved into a media processing platform.

I should probably stop being surprised when this happens.

At this point it’s becoming a pattern.

Categories: Blog | Tags: , , , , , , , , , , | Leave a comment

I bought a NAS and accidentally built a tiny data centre

A couple weeks ago I bought a NAS because I wanted somewhere sensible to store my Plex library (which was sat on a flaky USB hard drive connected to my always-on Mac mini) and provide Time Machine backup for my two Macs.

That was the plan. Simple.

A nice, boring, responsible grown-up storage solution.

Fast forward a couple of weeks and I’ve accidentally-on-purpose built what can only be described as a budget enterprise media bunker with VPN mesh networking, internal DNS routing, automated torrent workflows, dashboard telemetry, HomeKit camera integrations and enough storage to archive a modest Principality.

As these things tend to go.

The heart of it all is a QNAP TS-464 stuffed with four 22TB Toshiba Enterprise drives in Raid 5. Which means:

  • It stores an absurd amount of data, and
  • One drive can die without me immediately entering a state of spiritual collapse.

Originally, the goal was just:

  • Move Plex media off the randomly disconnecting USB drive
  • Centralise Time Machine backups
  • Stop relying on ‘vibes’ as a data resilience strategy

But once the NAS existed, it immediately became obvious it could far more than hold files, as my first job tinkering with the servers at an internet hosting company came screaming from the void of forgotten things in my brain to the forefront. Who knew you could have muscle memory for vi?

Plex moved off the Mac and became a proper always-on media server without having a computer running all the time. Then I added an HDHomeRun Flex Quatro, which basically turned the whole setup into a DIY Sky+/TiVo replacement. Live TV streams around the house now, Plex records broadcasts directly onto the NAS, and somewhere along the line I found myself learning far more about multicast networking than any sane person should. Particularly since I never really watch live TV, haha!

Of course, the second you start self-hosting things, IP addresses begin breeding in dark corners. Suddenly you’re trying to remember whether qBitttorrent lives on :8080 or :8090 and whether Homarr was .71 or .73 and honestly, life is too short for that nonsense.

So naturally I ended up deploying AdGuard Home and Nginx Proxy Manager to create proper internal DNS routing.

Now everything has delightfully nerdy addresses like:

  • adguard.home.arpa
  • nas.home.arpa
  • router.home.arpa
  • etc..

Which makes the whole thing feel dramatically more professional than it probably is (and certainly more so than it needs to be!).

Then came the dashboard phase.

Homarr dashboard on mobile

I discovered Homarr and immediately lost a few hours redesigning widgets that nobody except me will probably ever properly appreciate.

But now I’ve got a mobile-friendly dashboard that surfaces quick links to the assorted things installed on there, NAS stats, services health and other telemetry so I can feel like I’m managing a tiny data centre from the sofa (which I suppose I am!).

It works beautifully as a web app on my iPhone.

And because I’m apparently incapable of leaving things alone, I also wanted all of this available remotely.

Securely, naturally.

Without opening horrifying holes in the router.

Enter Tailscale – which honestly feels like cheating. Suddenly my phone and laptop behave as though they’re still inside the home network even when I’m elsewhere. My entire Homarr dashboard, internal services and admin tools now work remotely as though the house itself has been quietly stuffed in my pocket.

The ‘tiny but brilliant’ things are probably my favourite parts though.

For example: I now have a non-HomeKit compatible doorbell camera appearing inside Apple Home because Scrypted is essentially digital witchcraft.

My torrent setup can accept magnet links emailed to a dummy address, automatically feed them into qBittorrent, then email me when the download is complete like some sort of shady digital butler.

The Nvidia Shield TV Pro has also evolved into an absurdly polished media appliance. Projectivy Launcher cleaned the interface of all the Google cruft, Surfshark selectively routes only certain apps through VPN, and assorted apps cover the combination of things I stream externally and internally on Plex. The whole thing feels smoother and cleaner than most commercial streaming boxes I’ve encountered.

Somewhere along the way I also accidentally reawakened the person who gets excited by:

  • Reverse proxies
  • SSL certificates
  • DNS propagation
  • Multicast traffic
  • Graceful UPS shutdown behaviour
    Whether dashboards have the correct border radius

I regret nothing.

What I love most is that it not longer feels like a pile of separate gadgets.

Everything talks to everything else.

The NAS handles storage and services. Plex handles media. HDHomeRun handles TV. AdGuard handles DNS. Tailscale stitches the entire thing remotely. Homarr surfaces everything cleanly. The Shield makes it pleasant to actually use day to day.

And underneath all the nerdy nonsense, the original goal still quietly works perfectly: the Macs backup automatically, the Plex library is centralised, and everything feels vastly more robust than it did before.

It’s just that the “simple NAS storage project” accidentally evolved into a full-blown self-hosted ecosystem somewhere along the way – thanks to a geeky tendency, awareness of the kind of things that could be achieved – that can, thanks to the power of AI assistants, be converted into clear step by step instructions to achieve what you need!

I’m sure there’ll be more tinkering to come…

Categories: Blog | Tags: , , , , , , , , , | Leave a comment

Create a free website or blog at WordPress.com.