The goal of this post is to create a grep-like command-line interface (CLI) in python so that we can crawl a directory structure, parse the files, and show which files contain a desired string that we pass into the CLI as an argument. This is useful if we, for example, want to know where we put a TODO item in our code or in maintaining a large code base to track down exception messages, but there are many use cases for wanting to find a specific piece of text.
Start
Let's start with our imports and setting up our argument parser.
In search.py
import os
import argparse
parser = argparse.ArgumentParser(
description="""Search files for a specified string. Default is to search all
files in the current working directory."""
)
parser.add_argument('string', metavar="s", type=str,
required=True, help='a string to search for')
parser.add_argument('-ext', '--extensions', default='',
help='filter files for extensions')
parser.add_argument('-dir', '--root_directory', default=os.getcwd(),
help='specify the root directory to search')
args = parser.parse_args()
# ...
I'll be using the argparse package for this. Let's start by setting up the ArgumentParser object and adding our arguments. For this exercise, I want to be able to search for an arbitrary string across all files (or, if the ext argument is provided, in files with a particular extension) in a given directory (or the current one if not specified by the user). The add_argument method is used to set up these arguments in the parser.
Search Function
Next we move on to our search function and invocation. This will be a pretty basic function, but you can add regex support and other bells and whistles if desired. I just want to be able to search for a string within a file, so the in operator will do just fine. I'll be making use of the os library to crawl the directory structure.
In search.py
# ... (continued)
def search(search_for, file_ext, path):
# the parser places -ext args in a list so cast -ext args as tuple
ext = tuple(file_ext) if file_ext else file_ext
files_with_search_term = []
for (root, dirs, files) in os.walk(path, topdown=True):
for filename in files:
if filename.endswith(ext): # string method - requires tuple or string arg
try:
with open(os.path.join(root, filename), 'r') as rqstd:
print("Searching for '{}' in {}...".format(
search_for, filename))
if search_for in rqstd.read():
files_with_search_term.append(
os.path.join(root, filename)
)
except UnicodeDecodeError: # exception thrown for binary files e.g. executibles
continue
print("\nFound the following files with '{}' in the text:".format(search_for))
for file in files_with_search_term:
print("**** " + file)
print() # adds newline to stdout
def main():
search(args.string, args.extensions, args.directory)
if __name__ == "__main__":
main()
Running From the Terminal
Let's test out our sweet new CLI utility. Open up a terminal window and navigate to your project's root directory; afterwards, type in the following at the prompt (substituting the parts in the angle brackets):
$ python search.py "<some string here>" -ext <list of space-separated extensions e.g.: .py .js .html> -root "</some/root/directory/path>"
When I did this, I used a code repository as the directory and searched for the text "parser" in all JavaScript and C# files:
$ python search.py "parser" -ext .js .cs -root "/Volumes/Source Code"#stdout:
Searching for 'parser' in Program.cs...
Searching for 'parser' in AssemblyInfo.cs...
Searching for 'parser' in BatchMergeContacts.cs...
Searching for 'parser' in BusinessGroupSelect.cs...
Searching for 'parser' in BusinessGroupSelectAll.cs...
.
.
.
Found the following files with 'parser' in the text:
**** /Volumes/Source Code/Reports/scripts/d3.js
**** /Volumes/Source Code/CRM/scripts/lib.js
Installing the CLI
Great, we've got a working script. Say I want to now add this as a command in the terminal so instead of running
$ python search.py ... I can do something like
$ text_search ... right from the terminal.
We can do that easily enough by creating a python package, which is far easier than it sounds. All it takes is
changing up our directory structure and incorporating a few additional files.
First, we'll want to set up our directory as a module. The current folder structure is pretty meager, and looks like this:
search_cli/ ├── search.py... and that's it.
Let's change this so that search_cli is a module. We'll encompass search_cli in a folder of the same name. In search_cli/search_cli, create an __init__.py file. This just lets python know that the subdirectory search_cli/search_cli is a module. Our folder structure now looks like this:
search_cli/
├── search_cli
├── search.py
├── __init__.py
Finally, create a file in the project's root directory called setup.py.
In setup.py
from setuptools import setup
setup(
name='text_search',
version='1.0',
packages=['cli_search'],
entry_points={
'console_scripts': [
'text_search = cli_search.search:main'
]
},
description="A grep-like utility to parse files for a user-provided string.",
author="Ado Sibalo",
author_email="adosib1@gmail.com",
license="MIT"
)
This just lets pip know how to install the package. I got this idea from a post on Medium, so check that out to learn more; also be sure to look at the official setuptools documentation for more details on what all this is doing.
In the terminal, navigate to the project's root directory and then at the prompt enter
$ pip install -e .to install the package in development mode. The folder structure at this point looks like:
search_cli/
├── search_cli
├── search.py
├── __init__.py
├── text_search.egg-info
└── setup.py
All that's left to do is run the same command we did earlier, except instead of
$ python search.py ... we're going to use
$ text_search ... :
$ text_search "parser" -ext .js .cs -root "/Volumes/Source Code"The result will be the same, but this is a lot nicer than having to navigate to the python script and running
$ python .../search_cli/search.py ... every time.
Wrapping Up
Hopefully this post inspires you to check out the argparse package and create some sweet CLIs with python.
If you want to get up-and-running with the code, you can clone this project at the github repository where it's hosted.
That's it for this blog post - thanks for reading. Be sure to check out some of the other posts and leave a comment if you have something to say!