· unix bash

Unix: Get file name without extension from file path

I recently found myself needing to extract the file name but not file extension from a bunch of file paths and wanted to share a neat technique that I learnt to do it.

I started with a bunch of Jupyter notebook files, which I listed usign the following command;

$ find notebooks/ -maxdepth 1 -iname *ipynb

notebooks/09_Predictions_sagemaker.ipynb
notebooks/00_Environment.ipynb
notebooks/05_Train_Evaluate_Model.ipynb
notebooks/01_DataLoading.ipynb
notebooks/05_SageMaker.ipynb
notebooks/09_Predictions_sagemaker-Copy2.ipynb
notebooks/09_Predictions_sagemaker-Copy1.ipynb
notebooks/02_Co-Author_Graph.ipynb
notebooks/04_Model_Feature_Engineering.ipynb
notebooks/09_Predictions_scikit.ipynb
notebooks/03_Train_Test_Split.ipynb

If we pick one of those files:

file="notebooks/05_Train_Evaluate_Model.ipynb"

I want to extract the file name from this file path, which would give us 05_Train_Evaluate_Model. We can extract the file name using the basename function:

$ basename ${file}

05_Train_Evaluate_Model.ipynb

StackOverflow has many suggestions for stripping out the file extension, but my favourite is one that uses parameter expansion.

${parameter#word}

${parameter##word}

The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches the beginning of the expanded value of parameter, then the result of the expansion is the expanded value of parameter with the shortest matching pattern (the "#" case) or the longest matching pattern (the "##" case) deleted. If parameter is '@' or '', the pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with ‘@’ or ‘’, the pattern removal operation is applied to each member of the array in turn, and the expansion is the resultant list.

We can use it like this:

$ basename ${file%.*}

05_Train_Evaluate_Model

Because we’ve used the % variant, this will delete the shortest matching pattern. i.e. only one file extension

If we had a file that ends with multiple file extensions, we’d need to use the %% variant instead:

$ filename="notebooks/05_Train_Evaluate_Model.ipynb.bak"
$ echo ${filename%%.*}

notebooks/05_Train_Evaluate_Model

Going back to our original problem, we can extract the file names for all of our Jupyter notebooks by running the following:

for file in `find notebooks -maxdepth 1 -iname *.ipynb`; do
  echo $(basename ${file%.*})
done
09_Predictions_sagemaker
00_Environment
05_Train_Evaluate_Model
01_DataLoading
05_SageMaker
09_Predictions_sagemaker-Copy2
09_Predictions_sagemaker-Copy1
02_Co-Author_Graph
04_Model_Feature_Engineering
09_Predictions_scikit
03_Train_Test_Split
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket