Posts Tagged With: optimising

I wanted to make my Plex library more efficient and accidentally built a media processing platform

One of the more dangerous phrases in technology is:

“It’ll be a fairly simple project.”

The latest example began with a perfectly reasonable observation. I have a lot of media. This isn’t exactly breaking news to anyone who knows me, but it became particularly obvious once I found myself storing films measured in tens of gigabytes each whilst simultaneously insisting that storage space wasn’t really a concern.

To be fair, storage isn’t a concern. The NAS has plenty of room left and I have absolutely no hesitation filling it with unnecessarily large files if they happen to be shiny enough. Even so, I started wondering how much of what I’d accumulated over the years was actually efficient.

Some files were old encodes. Some contained audio tracks that would never be used. Some carried around subtitle streams, metadata and assorted baggage that served little practical purpose for me. Others were encoded using formats that made perfect sense when they were created but are no longer particularly efficient.

The original plan was straightforward enough: analyse the library, identify opportunities for improvement and make the collection a bit leaner without upsetting Plex, or perhaps even making Plex happier with less transcoding work to do.

That’s it.

A modest objective.

An afternoon’s work, perhaps.

Naturally it evolved into something considerably larger.

Part of the problem was that I wasn’t really looking for a transcoding tool. Those already exist, and some of them are excellent. Tdarr, FileFlows and Unmanic all tackle various aspects of media optimisation extremely well. If your goal is to point a tool at a library and have it start transcoding files, there are plenty of mature solutions available.

What I found myself wanting was something slightly different.

I didn’t want a system that immediately started changing media. I wanted a system that could analyse a library, explain what it thought should happen, categorise recommendations by risk and allow me to decide how aggressively I wanted to proceed.

After all, if I trusted things too much I could end up with non-English language films like Pan’s Labyrinth or Apocalypto calibrated beautifully with no audio track at all if I blindly pointed a tool at my library.

More importantly, I wanted to understand the potential benefit before committing to days or weeks of processing time.

That distinction turned out to be important.

The other thing I quickly discovered was that once you’re dealing with thousands of files, memory isn’t enough. I wasn’t interested in a tool that spat out recommendations and expected me to remember them. I wanted something that could track decisions, record outcomes, understand what had already been reviewed and provide a clear path from “this file could be improved” to “this file has been improved”.

Which is roughly the point at which a media optimiser starts suspiciously resembling a workflow platform.

The first surprise was discovering that media optimisation isn’t really one problem. It’s several completely different problems wearing a trench coat and pretending to be one problem.

Some recommendations are almost entirely safe. If a file contains redundant subtitle tracks, duplicate audio streams or other bits of media archaeology that nobody is ever likely to use, cleaning them up is relatively straightforward. Other recommendations require human judgement. Perhaps a file contains multiple audio tracks and the “correct” answer depends entirely on how you consume your media. Then there are conversions, where the potential savings are significantly larger but so are the risks.

Before long I realised I wasn’t building a tool that made recommendations.

I was building a workflow.

What eventually emerged was a three-stage process.

The first stage deals with safe recommendations. Things that can be cleaned up with minimal risk and relatively little processing overhead. The second stage covers recommendations that require a human being to look at them and decide whether they’re sensible. The third stage is conversion, where files are re-encoded into more efficient formats and where the genuinely substantial savings begin to appear.

This may sound suspiciously like enterprise workflow software.

I would like to stress that the original objective was merely to tidy up a Plex library.

The conversion side of things is where the project really started to become interesting. Modern video codecs are astonishingly effective when given the opportunity. One of the first large validation runs took a 17.5GB UHD source file and reduced it to around 8.5GB whilst preserving the streams and characteristics I actually cared about. Nearly 9GB saved from a single file is the sort of result that immediately gets your attention when you’re looking at a library measured in terabytes rather than gigabytes.

That file hasn’t even been through the Safe and Review paths yet, so there are likely to be further savings to be squeezed out of it.

The dangerous thing about seeing a saving like that is that your brain immediately starts extrapolating.

Save 9GB here.

Save 12GB there.

Repeat often enough and suddenly you’re talking about hundreds of gigabytes, perhaps even terabytes, of reclaimed storage. That in turn makes life easier for Plex by reducing transcoding overhead and network traffic.

That’s roughly the point where a fun experiment starts looking suspiciously like a project.

The obvious response to discovering that you can almost halve the size of some files is to enthusiastically start converting everything.

The less obvious response, which is usually the correct one, is to spend several days proving that doing so won’t create an absolute disaster.

This has been one of the more challenging aspects of the project because I am not, by nature, a particularly patient person. Most software development rewards impatience. You write some code, refresh a browser and immediately discover whether you’ve made things better or significantly worse.

Media conversion does not work like that.

Some of the larger UHD files take twenty hours or more to process. One of the validation runs spent the better part of a day chewing through a single film before finally revealing whether my assumptions had been correct. When you’re accustomed to software projects where the feedback loop is measured in seconds, that’s a surprisingly uncomfortable amount of waiting.

If the answer turns out to be “no”, congratulations: you’ve just spent the best part of a day discovering a new and exciting way to waste the best part of a day.

Mercifully, the pre-work I put in meant all of the major conversion tests yielded a resounding yes.

The idea itself has existed for a few weeks. The last few days, however, have largely involved staring at progress bars and developing a newfound appreciation for delayed gratification.

The irony is that the conversion queue isn’t even the beginning of the actual optimisation work.

The eventual rollout plan deliberately starts elsewhere. Safe recommendations are processed first because they’re quick, low risk and immediately reduce clutter. After that come the review items, where human decisions determine what should happen next. Only then does the conversion queue get unleashed on the library.

In other words, the slowest and most resource-intensive part of the entire project happens last.

This feels entirely consistent with how the rest of the project has gone.

As a result, Media Auditor has become increasingly conservative. Source media remains read-only. Generated files are written elsewhere. Conversions are verified. Outputs are audited. Human approval remains mandatory. A surprising amount of engineering effort is spent proving that a change is safe before the system is allowed to do anything particularly exciting.

This is partly because media libraries are valuable, but mostly because I have no desire to discover what happens when an overenthusiastic optimisation engine decides to improve several terabytes of content in an unexpected manner.

One of the more amusing discoveries has been that the most dangerous part of the entire workflow isn’t actually the conversion process itself.

It’s putting the converted file back.

Creating a replacement file is relatively straightforward. Replacing an existing file without upsetting Plex, preserving rollback options and ensuring that everything can be verified afterwards turns out to be considerably more complicated. As a result, a disproportionate amount of current effort is focused on replacement workflows, validation and approval processes rather than the conversion technology itself.

This probably says something profound about software engineering.

Or possibly just about me.

The project has also changed direction over time. Initially the objective was simply to optimise an existing library. Increasingly, though, that feels like the first phase rather than the end goal.

Once the existing library has been processed, the more interesting opportunity is preventing inefficient media from entering it in the first place.

The long-term vision is to move the same workflow further upstream. New downloads can be analysed before they ever reach the library. Safe improvements can be applied automatically. More complex recommendations can be surfaced for approval. By the time something appears in Plex, it has already passed through a process designed to ensure it’s as efficient and well-structured as possible.

In other words, stop cleaning up after myself and start avoiding the mess entirely.

Which feels suspiciously like a life lesson.

What I find most amusing is how quickly all of this happened. A few weeks ago I was looking for ways to make a media library slightly more efficient. Today I’m validating conversion batches that run for the better part of a day, designing approval workflows, thinking about resource management profiles and planning an optimisation campaign that may ultimately take weeks to complete.

The really annoying thing is that it’s worked.

The first validated conversions have demonstrated exactly the sort of savings I’d hoped for, the workflow is gradually proving itself and there’s a very real possibility that this ends up reclaiming a substantial amount of storage whilst simultaneously making the library easier for Plex to manage.

Which means I can no longer dismiss the whole thing as a ridiculous over-engineered distraction.

Somewhere along the way, “let’s make Plex a bit more efficient” quietly evolved into a media processing platform.

I should probably stop being surprised when this happens.

At this point it’s becoming a pattern.

Categories: Blog | Tags: , , , , , , , , , , | Leave a comment

Create a free website or blog at WordPress.com.