Script to remove unwanted data streams from movies/tv shows

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Some of you have considerable collections of movies and/or TV shows. Often you accidentally include audio and/or subtitles that you might not want. Well, these tracks take up space and we do like to save space whenever possible, right? So I've created a script to solve this problem:

Enter mkvstrip.

This allows you to provide information on the languages you speak and read and lets you have the remainder automatically removed.

There are a few bugs to work out and a few features to add, so check the README file.

If you are an expert python coder and would like to help with bugs, features, or just cleaning up the code in general feel free.

I've released this to the world before hitting 1.0 because I've hit the limits of my knowledge with python (which is seriously lacking) and I figured the community could probably help faster than me trying to cobble together what is left to work on.

I've done some fairly extensive testing and I'm pretty sure that the bugs list is complete. But as always, ymmv.

The intent of this script is to let you setup a FreeNAS jail and install mkvtoolnix and python 2.7 and then point this script to your movies and/or TV shows. It will systematically go through and remove all the extra tracks that are taking up space on your server.

Ultimately I want to be able to run this script as a nightly cronjob and it will automatically keep your video collections lean and mean. Currently there's a bug or two that make this impossible but with the help of the community we can fix this.

All you need to do to use this in a FreeNAS jail is the following:

# pkg install python2_7 mkvtoolnix nano

Then download the mkvstrip.py to your jail using git or manually downloading the file. Make the changes to the variables as necessary and let this script run.

NOTE: If you have many TB of videos this could take hours or days to run and could make your file shares almost unresponsive. This script will max out your pool's throughput until completed so use with care!
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Ah yes, I hate how I'm always accidentaly including extra languages when I rip my media. I'm so silly.

I do have a few MKVs that I can test this with though and I'll try to get to that later today.

For now I've had a look through the code and can offer a few tips.

The README mentions wanting to process media in alphabetical order and that now it looks to be ordered by when the media was created. That may be accurate, but it may also just be a coincidence. Basically, os.walk is going to return files and folders in the order that the system returns them and that has no guarantee of any ordering (file name, creation date, or anything). It probably seems to be ordered by creation date because that's closest to how the records are stored on disk, but it's not a guarentee. Anyway, if you want them ordered you'll have to walk the whole tree and build a list of files to process, then sort the list, then perform the processing. It'll require more memory, but that's probably negligible.

You have a flow control pattern inside a loop that looks like:
Code:
for ...:
    match = ...
    if match is not None:
        do something
    else:
        match = ....
        if match is not None:
            do something

This would be a bit cleaner written as:
Code:
for ...:
    match = ...
    if match is not None:
        do something
        continue
    match = ....
    if match is not None:
        do something
        continue

Basically, the nesting isn't as deep, but you get the same flow through the loop.

The code looks like it skips processing a file if it doesn't contain the desired audio. Instead, I'd suggest just processing the file, just not removing any of the audio tracks. It looks like this is how subtitles are handled.

Instead of the rename movie and tv options (these seem like they'd be very much tied to the organization that you want your media to have) I'd suggest a flag that replicates the directory structure of processed files in another location. In other words, if you process: "unprocessed-media/movie/P/Primer.mkv", the output would go to: "media/movie/P/Primer.mkv".
I have a script that I that uses Handbrake to re-encode a directory structure to an iOS friendly format. In my case I want the output to mirror the input so if a file is already in output it isn't encoded and if a file is in output, but not in input, the output is deleted. Part of the process requires building up the directory hierarchy on the output. I don't have this script online anywhere, but it's written in Python and there are some pieces that could probably be helpful with what you're doing. I'll post it tonight.

One final note is that the regexs you're using could be too restrictive in the search. From the MKVs that I've worked with (specifically, the kind where a user is more likely to have "accidentally left in extra tracks"), the fields that are included in the identity aren't always consistent. Then again, I haven't had a need to work with this much, so I'll assume that the format that is returned here is more consistent than I've had experience with.

Anyway, thanks for posting the project. I'll be sure to look at it more and contribute what I can to the code.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I did test this with 1TB of random movies I had and they all appeared to be "okay". So aside from the cosmetic stuff I'm fairly sure it's safe to use without blowing up your movie collection.
 

willforde

Cadet
Joined
Aug 11, 2014
Messages
2
Great idea cyberjock.
I had a look at the code and realy wanted to run it on my own collection. So I fixed the bug with the RENAME_TV/RENAME_MOVIE by using mkvinfo to check the current title and mkvpropedit to edit the metadata without remuxing.
I also added the feature to sort the list of mkv file by alphabetical order and added the counter.
Also you can pass in a file instead of a directory and it will only process that file. I probably should have added a argument for the file but for now you can use -d for a file aswell.

I have uploaded the code to pastebin.
http://pastebin.com/hSHvtBjg
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Thanks willforde! I'll get that merged in this weekend. :D
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Merged in the changes. 0.9.1 is out!
 

bobbob1016

Explorer
Joined
Mar 26, 2014
Messages
51
I ran:
Code:
python2.7 mkvstrip.py -a eng -y -d /path 

and I got:
Code:
Traceback (most recent call last):
  File "/media/mkvstrip-master/mkvstrip.py", line 278, in <module>
    result = subprocess.check_output(cmd)
  File "/usr/local/lib/python2.7/subprocess.py", line 566, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/local/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/local/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
[ 2014-09-19 08:15:27 AM ]  --
[ 2014-09-19 08:15:27 AM ]  Finished processing.


Did I install the wrong version, or is this a bug?

Edit: Forgot the stuff above the error added below
Code:
[ 2014-09-19 08:15:27 AM ]  Log file opened at /media/mkvstrip-master/log_20140919-081527.log
[ 2014-09-19 08:15:27 AM ]  --
[ 2014-09-19 08:15:27 AM ]  Running mkvstrip.py with configuration:
[ 2014-09-19 08:15:27 AM ]  MKVMERGE_BIN = /usr/local/bin/mkvmerge
[ 2014-09-19 08:15:27 AM ]  MKVINFO_BIN = /usr/local/bin/mkvinfo
[ 2014-09-19 08:15:27 AM ]  MKVPROPEDIT_BIN = /usr/local/bin/mkvpropedit
[ 2014-09-19 08:15:27 AM ]  DIR = /media/TV
[ 2014-09-19 08:15:27 AM ]  DRY_RUN = True
[ 2014-09-19 08:15:27 AM ]  PRESERVE_TIMESTAMP = True
[ 2014-09-19 08:15:27 AM ]  AUDIO_LANG = ['eng']
[ 2014-09-19 08:15:27 AM ]  SUBTITLE_LANG = ['eng', 'und']
[ 2014-09-19 08:15:27 AM ]  LOG_MISSING_SUBTITLE = True
[ 2014-09-19 08:15:27 AM ]  RENAME_TV = False
[ 2014-09-19 08:15:27 AM ]  RENAME_MOVIE = False
[ 2014-09-19 08:15:27 AM ]  Starting processing of 1897 Videos
[ 2014-09-19 08:15:27 AM ]  ==========================================

The above error happens with those videos, as well as another directory of 400 or so.
 
Last edited:

willforde

Cadet
Joined
Aug 11, 2014
Messages
2
It looks like to me that it can't find mkvmerge. Maybe check for the location of where mktoolnix is installed to on your system.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It looks like to me that it can't find mkvmerge. Maybe check for the location of where mktoolnix is installed to on your system.

and maybe open the .py script and read and set the variables as necessary.
 

bobbob1016

Explorer
Joined
Mar 26, 2014
Messages
51
Yeah, I realized that it didn't install. I ran the install line given, and found it said it couldn't find python, so I had to search for it. I figured it installed mkvmerge as well as failing on python. Did the install again, and all is working. One quick question though, am I right to assume that I can do "-a eng" and it'll keep everything "eng" and "english"?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
One quick question though, am I right to assume that I can do "-a eng" and it'll keep everything "eng" and "english"?

Yep. That's what it's there for. ;)

Just keep in mind that if the audio track isn't labeled as english it will remove it. For *many* people this may cause audio-less video because some tracks are labeled as "undetermined" (und). So I strongly recommend und be included. The same is true for subtitles.

If you read the beginning of my document I explain the need for undetermined. The script should skip a video file if it will be audio-less or subtitle-less based on what you provide. But, if you have a video with 2 audio tracks, one is english (commentary) and one is undetermined (actual movie audio) you will end up with an unwatchable video.

So caution and good pre-planning is required when using this script. And as always, have backups!
 

bobbob1016

Explorer
Joined
Mar 26, 2014
Messages
51
Yep. That's what it's there for. ;)
Thanks, I thought it would show eng and english as different tracks. But I've checked a few with mkvtool, and they seem to list as eng as well. I was mainly concerned because I bought some Rifftrax, and their respective movies, and wasn't sure if it would remove the Rifftrax from the file as well.
 

Oldtimer

Dabbler
Joined
Feb 19, 2015
Messages
10
Hello I would to try this mkvstrip but I do not how to mkvstrip act upon my DIR settings.
I have tried various combinations.

From my media jail I am trying and failing.

Here are my steps thus far.
Step 1
root@sabnzbd_1:/ # pkg install python2_7 mkvtoolnix nano
Updating FreeBSD repository catalogue...
FreeBSD repository is up-to-date.
All repositories are up-to-date.
pkg: No packages available to install matching 'python2_7' have been found in the repositories

Step 2
root@sabnzbd_1:/ # git clone git://github.com/cyberjock/mkvstrip.git /usr/local/share/mkvstrip/
fatal: destination path '/usr/local/share/mkvstrip' already exists and is not an empty directory.

Step 3
Command
python2.7 mkvstrip.py -a eng -y -d /usr/local/share/mkvstrip

Result
root@sabnzbd_1:/usr/local/share/mkvstrip # python2.7 mkvstrip.py -a eng -y -d /usr/local/share/mkvstrip
[ 2016-05-20 03:18:52 AM ] Log file opened at /usr/local/share/mkvstrip/log_20160520-031852.log
[ 2016-05-20 03:18:52 AM ] --
[ 2016-05-20 03:18:52 AM ] Running mkvstrip.py with configuration:
[ 2016-05-20 03:18:52 AM ] MKVMERGE_BIN = /usr/local/bin/mkvmerge
[ 2016-05-20 03:18:52 AM ] MKVINFO_BIN = /usr/local/bin/mkvinfo
[ 2016-05-20 03:18:52 AM ] MKVPROPEDIT_BIN = /usr/local/bin/mkvpropedit
[ 2016-05-20 03:18:52 AM ] DIR = /usr/local/share/mkvstrip
[ 2016-05-20 03:18:52 AM ] DRY_RUN = True
[ 2016-05-20 03:18:52 AM ] PRESERVE_TIMESTAMP = True
[ 2016-05-20 03:18:52 AM ] AUDIO_LANG = ['eng']
[ 2016-05-20 03:18:52 AM ] SUBTITLE_LANG = ['eng', 'und']
[ 2016-05-20 03:18:52 AM ] LOG_MISSING_SUBTITLE = True
[ 2016-05-20 03:18:52 AM ] RENAME_TV = False
[ 2016-05-20 03:18:52 AM ] RENAME_MOVIE = False
[ 2016-05-20 03:18:52 AM ] Starting processing of 0 Videos
[ 2016-05-20 03:18:52 AM ] --
[ 2016-05-20 03:18:52 AM ] Finished processing.



Data from mkvstrip.py
# Directory to process.
# Note that the location always uses the / versus the \ for location despite what the OS uses (*cough* Windows).
# Windows is usually something like C:/Movies
# FreeNAS jails (and FreeBSD) should be something like /mnt/tank/Movies or similar.
DIR = '/mnt/media/downloads/complete/movie' <----- My Entry
# DIR = '/mnt/tank/Entertainment/Movies'
#!/usr/bin/env python

Step 4
from the last log file called log_20160520-023631.log

[ 2016-05-20 02:36:31 AM ] Log file opened at /usr/local/share/mkvstrip/log_20160520-023631.log
[ 2016-05-20 02:36:31 AM ] --
[ 2016-05-20 02:36:31 AM ] Running mkvstrip.py with configuration:
[ 2016-05-20 02:36:31 AM ] MKVMERGE_BIN = /usr/local/bin/mkvmerge
[ 2016-05-20 02:36:31 AM ] MKVINFO_BIN = /usr/local/bin/mkvinfo
[ 2016-05-20 02:36:31 AM ] MKVPROPEDIT_BIN = /usr/local/bin/mkvpropedit
[ 2016-05-20 02:36:31 AM ] DIR = /usr/local/share/mkvstrip <------ not equal to '/mnt/media/downloads/complete/movie'
[ 2016-05-20 02:36:31 AM ] DRY_RUN = True
[ 2016-05-20 02:36:31 AM ] PRESERVE_TIMESTAMP = True
[ 2016-05-20 02:36:31 AM ] AUDIO_LANG = ['eng']
[ 2016-05-20 02:36:31 AM ] SUBTITLE_LANG = ['eng', 'und']
[ 2016-05-20 02:36:31 AM ] LOG_MISSING_SUBTITLE = True
[ 2016-05-20 02:36:31 AM ] RENAME_TV = False
[ 2016-05-20 02:36:31 AM ] RENAME_MOVIE = False
[ 2016-05-20 02:36:31 AM ] Starting processing of 0 Videos
[ 2016-05-20 02:36:31 AM ] --
[ 2016-05-20 02:36:31 AM ] Finished processing.

There are 50 folders in that directory.
Can anyone help me?

FN9.10 | Asus AMD F1-a55 LE | AMD A4-3300 APU with Radeon(tm) HD Graphics | 16gb ECC memory | 6 x 3TB raid 6
 
Last edited:
Status
Not open for further replies.
Top