Extracting filenames and their extensions is a common task when working with files in a Unix-like environment. This can be particularly useful for scripting tasks that need to manipulate file names or process different types of files based on their extensions. While some languages provide built-in functions for this purpose, like Python’s os.path.splitext
, Bash has its own robust set of tools and techniques that can achieve the same results efficiently.
Understanding Shell Parameter Expansion
Bash offers powerful parameter expansion capabilities which allow you to manipulate strings directly within shell scripts without needing external programs. This is particularly useful when working with file paths as it minimizes overhead and keeps scripts efficient. The key operators used in Bash for string manipulation are:
${variable%%pattern}
: Removes the longest match ofpattern
from the end.${variable##pattern}
: Removes the longest match ofpattern
from the beginning.${variable%pattern}
: Removes the shortest match ofpattern
from the end.${variable#pattern}
: Removes the shortest match ofpattern
from the beginning.
These operators are used to parse filenames and extensions by manipulating strings based on patterns like dots (.
) which typically separate file names from their extensions.
Extracting Base Filename
To extract just the base filename (without directory path), we use the basename
command or a pattern expansion:
fullpath="some/directory/somefile.tar.gz"
filename="${fullpath##*/}"
This code snippet uses ${variable##*/}
to strip everything before and including the last slash (/
) from fullpath
, giving us just the filename.
Extracting Filename without Extension
To separate the filename from its extension, use:
base_filename="${filename%.*}"
Here, %.*
is a pattern that matches the shortest sequence of characters ending with a dot and any character after it. This extracts everything before the last dot in filename
.
Extracting File Extension
For extracting the extension, you can do:
extension="${filename##*.}"
The operator ##*.
removes everything from the start of filename
up to and including the final dot and any characters that follow it.
Comprehensive Example Script
Here’s a script that handles various edge cases for files with or without extensions, hidden files, and more:
#!/bin/bash
for fullpath in "$@"
do
filename="${fullpath##*/}" # Extracts base file name from path
dir="${fullpath:0:${#fullpath} - ${#filename}}" # Extracts directory part
# Extract the base and extension considering edge cases
base="${filename%.*}"
ext="${filename#${base}.}"
if [[ -z "$base" && -n "$ext" ]]; then
base=".$ext"
ext=""
fi
echo -e "$fullpath:\n\tdir = \"$dir\"\n\tbase = \"$base\"\n\text = \"$ext\""
done
Testing the Script
To verify our script works correctly, consider running it with a set of diverse test cases:
$ ./filename_extractor.sh /home/user/ /home/user/file /home/user/archive.tar.gz /home/user/.hiddenfile /home/user/../
/
dir = "/"
base = ""
ext = ""
/home/user/
dir = "/home/user/"
base = ""
ext = ""
/home/user/file
dir = "/home/user/"
base = "file"
ext = ""
/home/user/archive.tar.gz
dir = "/home/user/"
base = "archive.tar"
ext = "gz"
/home/user/.hiddenfile
dir = "/home/user/"
base = ".hiddenfile"
ext = ""
/home/user/..
dir = "/home/user/"
base = ".."
ext = ""
Conclusion
Using Bash’s parameter expansion and pattern matching, you can efficiently parse filenames and extensions directly in your shell scripts. This method is both powerful and flexible, capable of handling a wide range of file naming conventions without the need for external tools or languages. Understanding these techniques allows you to write more efficient and portable shell scripts tailored to handle files effectively.