OS X Terminal for Data Movers pt. 1
/OS X Terminal for Data Movers pt.1
September 2, 2016
by Bennett Cain and Edward Richardson
Like it or not, moving around piles of data is just part of the job for those working with modern digital camera systems and the files they generate. In the digital film & television production trenches terabytes of the stuff can easily be amassed in a single day. In order to stay employed, Digital Imaging Technicians, Assistant Editors, Facility Ingest Operators and others on the front lines must be able to quickly and efficiently keep up with the production's precious mountains of zero's and one's. There are many tools in this arsenal but an under-utilized one is the readily available command line in everyone's Mac Terminal app.
If you don’t have much programming or scripting experience, writing Bash Commands in the terminal can be a little intimidating. And for good reason as a bad entry can be devastating — data deleted here doesn’t go into the Trash but is gone forever like it never existed. Nothing to fret over but definitely something to be aware of!
The good news is there are many excellent tutorials out there on this topic. A cursory Google search will often yield a knowledge base for whatever you're trying to accomplish in the terminal. Once you’ve learned the basics, generic commands found online can easily be tweaked for one's own needs.
For simplicity's sake, this article does not concern Bash on the Unix side at all. While there is a lot of cross-over, for our purposes, it's a bit of a different beast and not much point exploring at this time.
For starters, if you’re not sure where the terminal app lives, it's found in your Mac's Applications > Utilities folder. Open it up, find a few a files you can safely experiment with, pour yourself a strong cup of coffee and let's get started. We recommend you start here and work to the end in a linear fashion as each section of the tutorial very much builds on the previous.
1. Anatomy of a Bash Command
The first thing to be aware of with any programming language is syntax, which like any other linguistic grammar, has its own rules. The Terminal likes what it likes and is totally unforgiving. Any errors in syntax will not return successful results and it's important to note that all entries are Case Sensitive!
Commands are made up of three distinct parts, let’s take a look at a very basic command that will list out the contents of my Documents folder:
ls -l ~/Documents
ls is the Utility, sometimes referred to simply as the command. There are many different Utilities and they cast the broadest net in terms of what you’re asking the system to do. You will refine the Utility by using Flags and Arguments.
-l is the Flag, which modifies the Utility by making it more specific. There are many different Flags and they always begin with one or two dashes, followed by a lowercase letter and they come immediately after the Utility. When multiple flags are used their order isn't important and only one "-" is necessary.
~/Documents is the Argument, which tells the Utility and Flag exactly where in the file system to execute the command. The Argument usually comes after the Utility and Flag, can be simple or complex, may work on its own or with many other Arguments. Because of this, errors in syntax within the Argument may be more difficult to detect than in the more straightforward Utility and Flag. The tilda, ~, followed by / always indicates the Relative File Path. More on this in a moment.
ls -l ~/Documents asks the system to follow the relative file path to my Documents folder (the argument) and then list out (the command) the contents in long form, vertical column showing permissions, ownership, etc (the flag). This command returns something that looks like this:
If you see "#" in a command, the hashtag signifies a "comment," text that will not be read as part of the command but may be a useful note for someone reading it and trying to make sense of it.
2. Getting Around in the Terminal Window
The most immediately useful (and least destructive) command set concerns basic file system navigation and networking.
The first thing to be aware of regarding navigation is you can drag folders and file icons into the terminal window to create the file path instead of typing it out manually, saving yourself a lot of time and energy.
In the graphic above you'll notice the line Bennetts-MacBook-Air:~ bencain$ several times. This is your Home Directory and is the default location of new terminal windows. It shows the name of the machine followed by the username and "$" which means the terminal is ready to accept a command from this normal, non root level user. The home directory and where are you in the file system are important concepts as they will affect how your arguments need to be written out.
Closely related to this is the concept of the Working Directory. Wherever you are in the file system is your working directory. This location determines how file paths will need to be written out which in turn affects how your command's argument is written.
For example, when I open a new terminal window on my machine I get this:
Bennetts-MacBook-Air:~ bencain$
This tells me that I'm at the home directory of the user bencain's files. By default, it's also my current working directory.
I can type:
pwd
And this will return:
Bennetts-MacBook-Air:~ bencain$
The command pwd for Print Working Directory tells me exactly where I am in the file system at any time.
I can also type:
whoami
The terminal will return the name of the current user, in this case bencain.
Bennetts-MacBook-Air:~ bencain$ whoami
bencain
Bennetts-MacBook-Air:~ bencain$
Directly affecting the home directory and working directory is the command, cd, for Change Directory.
cd
In the terminal, type cd and then drag the destination folder you'd like to move to. For example:
Bennetts-MacBook-Air:~ bencain$ cd /Users/bencain/Pictures
Hit Enter and this is what returns:
Bennetts-MacBook-Air:~ Pictures bencain$
When you're not in the home directory, the name of the working directory will appear before the username, in this case "Pictures." Using the pwd command once more, the following would return:
/Users/bencain/Pictures
Bennetts-MacBook-Air:~ Pictures bencain$
This shows me that my current working directory is bencain/Pictures. The cd command also functions as a shortcut and typing it wherever you are in the file system conveniently takes you back to your home directory.
At this point it's worth nothing the difference between Absolute and Relative File Paths.
/Users/bencain/Pictures/samplePhoto.jpg
This is an absolute file path as it's the definitive path beginning at the top level, through the computer's file hierarchy to samplePhoto.jpg.
Users/bencain/Pictures/samplePhoto.jpg
This is a relative file path because it tells you samplePhoto.jpg is three levels down but not precisely from where. If it had a starting point "/" it would be an absolute file path.
The tilda "~" signifies the home directory so wherever you are in the file system you can cd (change directories) to a folder in your home directory. For example:
Bennetts-MacBook-Air:~ HDR_images bencain$ cd ~/Documents
hit Enter
Bennetts-MacBook-Air:~ Documents bencain$
Before, I was three or four levels deep in my Photos directory but because Documents is in my home directory, by typing:
cd ~/Documents
I can get into Documents without having to type cd /Users/bencain/Documents
Closely related to "~" is the use "." and ".." for navigating around the file system.
"." always indicates the current working directory whereas ".." always means one level up from from the current working directory. For example:
cd ..
This will take you one level up from where you are.
cd ../..
Two levels up.
cd ../../..
Three levels up, etc.
What we're doing here is navigating the file system without the use of a GUI which is why ls or listing is so helpful.
The Finder can take some time to display the contents of folders filled with thousands of large files. By typing:
ls -l
or simply
ls
Drag the desired folder into the terminal, it will quickly return a nice orderly list of that folder's contents. ls -l will return a list in one vertical column with ownership, permissions, and some other info whereas ls will list out file names only such as this:
There are many other flags you can use with ls to suit your needs. Conveniently, you can call up a User Manual of sorts right in the terminal that will show you relevant command options.
cat
The cat command, short for concatenate, followed by a path to a file will read out its contents in text form. This is the command to call up the text for the Bash Manual:
man cat
Very helpful when you're trying to figure out options for flags or combinations of them. When you're done, you can type 'q" to return to the main prompt.
By listing directories out with ls and using cd and keyboard shortcuts, you can jump around in the filesystem almost as fast as in the Finder.
A few other tips to increase your speed:
Hit tab to autocomplete something you're typing based on what files exist in the system. For example, if you're typing the name of one of your files, "sampleFile.txt," you could type "samp" and hit tab to autocomplete the entry.
Hitting up and down arrows at the prompt will cycle through your previous entries. If you have redundant commands to enter, there's no need to retype, simply press up until it returns to the prompt and then hit enter. If you’ve changed directories since the last time you issued the command there could be problems if a relative path was used or there was an assumed path from the working directory. You might perform a command on an unintended folder.
To edit a command, you can navigate the cursor with the left and right arrows. An unmodified arrow will move the cursor one character at a time, while adding the option key will move the cursor to the previous or next word. Control-A will move the cursor to the beginning of the line and Control-E will move it to the end. And by Option-clicking you can move the cursor to the the location of your mouse pointer. Notice that anything you type will be inserted into the existing command, i.e., it won’t type over what’s already there.
Moving the cursor - a few helpful options
Control A goes to the beginning of line
Control E goes to the end of line
Option Forward Arrow goes to the next word
Option Back Arrow goes to the previous word
Option Click moves cursor to the mouse arrow
Pinging:
Something peripherally related to file system navigation is the simple “ping” command. This is extremely useful for testing connections between devices on your network. In the terminal type:
ping 127.0.0.1
Hit Enter
This is the Loopback address of your machine, also called localhost. It should always be available so you should get something like this:
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.121 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.184 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.110 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.161 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.161 ms
Now try:
ping 8.8.8.8
This is the address of one of Google’s DNS servers. There’s a good chance this server is running, so it’s a quick way to see if you have a valid connection to the internet. The output might look like this:
64 bytes from 8.8.8.8: icmp_seq=0 ttl=55 time=19.402 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=55 time=19.951 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=55 time=16.319 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=55 time=27.158 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=55 time=11.032 ms
When we see output like this, we know we have a valid network connection. You might notice the times, for example, 27.158 ms, are much longer than from localhost the ping is being answered from the outside internet.
If what returns is this:
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Then there’s no network connection between your machine and whatever IP address you pinged.
With ping and all other terminal commands, hit Control C at any time to pull the plug on what's running. Pinging is by far the easiest and handiest command for network troubleshooting and should be one of the first steps.
One last tip for this section, when the window gets too cluttered with text, you can always type clear at the prompt to push it away and give you a fresh place to start over.
3. Moving Data
And now to the meat of the article.
This will actually move a lot faster now that the basics have been established. First, why would you move files around with the terminal when it can be accomplished so easily and graphically in the Finder? For most tasks, the Finder suffices but once you get into moving around thousands of files, or hundreds of thousands of files, its limitations become readily apparent.
The problem is when the Finder copies files, it first catalogs them and prepares their graphical representation. This can slow down the CPU causing the OS to limp along on basic tasks such as populating a folder with its content. Finder also treats a copy as one operation even if it's actually thousands of files. If the Finder barfs on it, not only will the operation freeze and nothing more will be copied but every file before the error could be compromised.
For my own purposes, I find the terminal most useful for filling in the gaps of the Finder's limitations — tasks like getting folder and file sizes for massive items, copying/moving/deleting file sets of more than 5000, listing out the contents of directories that Finder takes too long to populate among many other things.
The first handy command is just to determine how much data there is on a volume or in a directory.
du
or
du -sh followed by the path to the volume (remember you can just drag the folder icon into the terminal.)
du is "disk usage" and will show the size of a file or folder in blocksize. It can be combined with a great number of different flags such as -a, -k but when used with -sh, the command returns the size of the directory in one nice tidy line in a "human readable" format, in this case is Kilobytes, Megabytes, Gigabytes, or Terabytes. Run du on Source and Destination folders after copying for a quick, down and dirty File Size Comparison Checksum.
The command for copy is very simple:
cp
When copying files using the Terminal, this is the order of the Source and Destination within the argument.
command (flag) /file/path/to/Source /file/path/to/Destination
For the first and most basic example, let's say I want to copy a file called Good_Photo.jpg from Pictures to a folder on my Desktop called Best_Work but first I need to make that folder with the mkdir command.
I'm going to change directory to my Desktop and then make the new directory. Looks like this:
Bennetts-MacBook-Air:~ bencain$ cd ~/Desktop
I'll now make a new directory with mkdir
Bennetts-MacBook-Air:Desktop bencain$ mkdir Best_Work
Now I'm going to run my copy cp command:
Bennetts-MacBook-Air:~ bencain$ cp /Users/bencain/Pictures/Good_Photo.jpg /Users/bencain/Desktop/Best_Work
Or if I had done navigated into Pictures with cd it would look like this because I don't need to type the source file path to my current working directory:
Bennetts-MacBook-Air:~ Photographs bencain$ cp Good_Photo.jpg /Users/bencain/Desktop/Best_Work
After running this command, the file Good_Photo.jpg now exists in two directories — Pictures and Best_Work.
In another scenario, I have a file called Bad_Photo.jpg that I want to move to a folder on my Desktop called Delete_Later.
The command for move, works just like dragging a file from one Finder window to another. It is:
mv
To move my file, this is how I would write out the command:
Bennetts-MacBook-Air:~ bencain$ mv /Users/bencain/Pictures/Bad_Photo.jpg /Users/bencain/Desktop/Delete_Later
After running it, the file Bad_Photo.jpg now only exists in the directory Delete_Later.
Here are a few flags to make the mv command much safer:
-i prompt for user interaction if a file is going to be overwritten
-n don’t overwrite files
-v verbose, show each file as it’s copied or moved
While you can easily just drag all this stuff into the trash, there is a command to delete it.
rm
Short for remove, this command deletes files and should be used with the utmost care especially with flags such as recursive, signified with:
-r
This flag means apply the command to everything in the directory including subdirectories and their contents. There's a little gotcha here to be aware of.
sudo rm -r or rm -rf
sudo means super user do which signifies commands in the root level of the file system. We'll get more into this in the next tutorial but sudo commands can be incredibly powerful and inadvertently destructive. For example, sudo rm -r used in a relative file path can wipe your entire hard. Just to reiterate, data deleted in the terminal doesn't go to the trash, it is gone forever. Make sure you're on an absolute file path before deleting any files using a terminal command! The force flag (-f) can also be used with rm -rf (with or without sudo) to force the file system to delete whatever files are in the argument.
The key to copying, moving or deleting lots of data is this -r flag, which is once again short for recursive, meaning everything in the directory. By using this along with bracket sets {..} and chaining commands with &&, mountains of data can be efficiently dealt with in the terminal.
To assist in accomplishing these tasks, there's a phenomenal freeware called Sublime Text. "Find All" and "Replace All" can ease the pain of a complex command line based workflow by isolating and modifying the same object across multiple commands. I highly recommend writing your commands in something like Sublime also because of the Quotation Mark Problem. In terminal we want to use straight quotes ( " " ) not curly quotes ( “ ” ). By default, TextEdit enables smart quotes that will automatically turn straight quotes into curly ones. It’s common to see curly quotes in online examples for terminal. Be on the lookout! They won’t work when you try and run the command. Sublime only uses straight quotes. Sublime is your friend.
In the example above I'm copying 5000 files at a time from my working directory to a folder on a network drive. Terminal has one limitation in that it can only work with 5000 files at a time but this can be easily circumvented by chaining multiple commands using &&, which specifies that if the first command returns successful, the next will proceed.
Here's another similar case. I want to copy frames 20,000-40,000 OpenEXR files (sample_project_20000.exr to sample_project_40000.exr) from my system drive to a directory called Transfer on an external drive called Backup. At 33MB per file. the Finder would struggle with this copy so setting it up in the terminal is a better option.
First cd to the directory where the files live.
Step 1, change to the directory you'd like to copy files from:
Mac-Workstation:~ bencain$ cd ~/Media/OpenEXR/Sample_Project
Step 2, set up the first command which utilizes cp -r to copy all the files within the range specified by the brackets (note there are two dots between the numbers, this is the required syntax):
Mac-Workstation:~ Sample_Project bencain$ cp -r sample_project_{20000..25000}.exr /Volumes/Backup/Transfer/ &&
Step 2, && allows you to chain the rest of commands together:
cp -r sample_project_{25000..29999}.exr /Volumes/Backup/Transfer/ &&
cp -r sample_project_{30000..34999}.exr /Volumes/Backup/Transfer/ &&
cp -r sample_project_{35000..39999}.exr /Volumes/Backup/Transfer/
That's all there is to it. While the transfer is in progress, typing in the terminal window will be unavailable. You'll know the operation is complete when the prompt is solid and you can begin typing again. One thing to be mindful about bracket sets is the syntax is very specific and easy to screw up. For example, there can only be two dots between numbers in the range, no more no less. {25000..29999} will work but {25000...29999} will not.
When you're done you can run a du on both Source and Destination directories and compare the bytes as a quick check sum.
RSync:
One step beyond all this is the command rsync, short for remote sync, which is so useful but with all of its options, that it's worthy of its own post.
rsync -r source directory/ destination directory
For example:
Bennetts-MacBook-Air:~ bencain$ rsync -r /Users/bencain/Desktop/ProjectFiles/ /Users/bencain/Desktop/BackUp
This command will copy the contents of the source directory, ProjectFiles, into the destination directory, BackUp. While similar to cp, the main difference is rsync's vesatility with more options for flag combinations. There's even Time Machine like functionality for rsync with -a, for archive, that can be written into a re-useable script. This sort of functionality can be very useful to the Data Mover looking to synchronize the content of multiple drives that will be constantly amended. More on all this next time!
4. Summary
From the material covered in this first article, these are the key items:
pwd Returns Current Working Directory
whoami Returns Current User
cd Change Directory
~/ Relative File Path
ls List
ls -l List in Long Format (Vertical Column)
mkdir Make Directory
du -sh Disk Usage Human Readable
tab Auto Complete
ping Test Network Connection
clear Clear the Terminal Widow
cp Copy
cp -r Copy Recursively
{..} Bracket to Apply a Command to a Range of Files
mv Move
sudo rm -r Safely Delete all the Contents of the Directory (Password Prompt)
rm -rf a Delete Command that does not use sudo. Recursively and Forciby Removes all Contents of the Directory.
-v Verbose, view the progress of the command
cat Concatenate (Read Out the Contents of a File)
man cat Open the Manual to get Relevant Command options ("q" to leave it)
Control C Cancel Process
&& Add another Command to the Queue
Up and Down arrow keys Scroll through Previous Entries
Left and Right arrow keys Scroll Left and Right through the Command
drag icon into terminal Way Faster than Typing
copy and paste into terminal Way Faster than Typing
# Comments
Get Sublime!
This is just a drop in the bucket. Like any other language, the more you practice, the greater your degree of fluency. Commands can be written to accomplish virtually any task within the filesystem.
5. Conclusion
That's it for this one. Next time we'll dig deeper into sudo commands, filtering with grep, using Pipes "|", more on rsync, wildcards "*" and some other advanced commands. Filtering and automation is where the real power of these commands lie, allowing you to get very specific and potentially save a lot of time. For now the best way to learn is just play around with, trying different flags and arguments — just not on any critical data! I recommend setting up a few test directories on the Desktop for learning.
It's tempting to try and reinvent the wheel in terminal just because you can. Basic utilities like Automator can do a lot of the stuff you may want to accomplish, such as Batch Renaming but in a far easier and more intuitive way. Bash and Automator used together can be a particularly powerful combination. Additionally while you could write a custom command for running md5 hastags and other checksum operations, there are many inexpensive softwares available that do it faster and easier than whatever you may come up with. It's a fun challenge to try and figure some of this stuff out but at the end of the day, a much easier way might already exist.
Many thanks to Edward Richardson for helping make this article happen.
Please touch base with any feedback on this series.