Creating a Basic Search CLI with Python

May 16, 2020

The goal of this post is to create a grep-like command-line interface (CLI) in python so that we can crawl a directory structure, parse the files, and show which files contain a desired string that we pass into the CLI as an argument. This is useful if we, for example, want to know where we put a TODO item in our code or in maintaining a large code base to track down exception messages, but there are many use cases for wanting to find a specific piece of text.

Start

Let's start with our imports and setting up our argument parser.

In search.py

import os
import argparse

parser = argparse.ArgumentParser(
    description="""Search files for a specified string. Default is to search all 
    files in the current working directory."""
)
parser.add_argument('string', metavar="s", type=str,
                    required=True, help='a string to search for')
parser.add_argument('-ext', '--extensions', default='',
                    help='filter files for extensions')
parser.add_argument('-dir', '--root_directory', default=os.getcwd(),
                    help='specify the root directory to search')
args = parser.parse_args()
# ...

I'll be using the argparse package for this. Let's start by setting up the ArgumentParser object and adding our arguments. For this exercise, I want to be able to search for an arbitrary string across all files (or, if the ext argument is provided, in files with a particular extension) in a given directory (or the current one if not specified by the user). The add_argument method is used to set up these arguments in the parser.

Search Function

Next we move on to our search function and invocation. This will be a pretty basic function, but you can add regex support and other bells and whistles if desired. I just want to be able to search for a string within a file, so the in operator will do just fine. I'll be making use of the os library to crawl the directory structure.

In search.py

# ... (continued)
def search(search_for, file_ext, path):
    # the parser places -ext args in a list so cast -ext args as tuple
    ext = tuple(file_ext) if file_ext else file_ext

    files_with_search_term = []
    for (root, dirs, files) in os.walk(path, topdown=True):
        for filename in files:
            if filename.endswith(ext):  # string method - requires tuple or string arg
                try:
                    with open(os.path.join(root, filename), 'r') as rqstd:
                        print("Searching for '{}' in {}...".format(
                            search_for, filename))
                        if search_for in rqstd.read():
                            files_with_search_term.append(
                                os.path.join(root, filename)
                            )
                except UnicodeDecodeError:  # exception thrown for binary files e.g. executibles
                    continue

    print("\nFound the following files with '{}' in the text:".format(search_for))
    for file in files_with_search_term:
        print("**** " + file)
    print()  # adds newline to stdout


def main():
    search(args.string, args.extensions, args.directory)


if __name__ == "__main__":
    main()

Running From the Terminal

Let's test out our sweet new CLI utility. Open up a terminal window and navigate to your project's root directory; afterwards, type in the following at the prompt (substituting the parts in the angle brackets):

$ python search.py "<some string here>" -ext <list of space-separated extensions e.g.: .py .js .html> -root "</some/root/directory/path>"

When I did this, I used a code repository as the directory and searched for the text "parser" in all JavaScript and C# files:

$ python search.py "parser" -ext .js .cs -root "/Volumes/Source Code"
#stdout:

Searching for 'parser' in Program.cs...
Searching for 'parser' in AssemblyInfo.cs...
Searching for 'parser' in BatchMergeContacts.cs...
Searching for 'parser' in BusinessGroupSelect.cs...
Searching for 'parser' in BusinessGroupSelectAll.cs...
.
.
.
Found the following files with 'parser' in the text:
**** /Volumes/Source Code/Reports/scripts/d3.js
**** /Volumes/Source Code/CRM/scripts/lib.js

Installing the CLI

Great, we've got a working script. Say I want to now add this as a command in the terminal so instead of running $ python search.py ... I can do something like $ text_search ... right from the terminal. We can do that easily enough by creating a python package, which is far easier than it sounds. All it takes is changing up our directory structure and incorporating a few additional files.

First, we'll want to set up our directory as a module. The current folder structure is pretty meager, and looks like this:

search_cli/
├── search.py
... and that's it.

Let's change this so that search_cli is a module. We'll encompass search_cli in a folder of the same name. In search_cli/search_cli, create an __init__.py file. This just lets python know that the subdirectory search_cli/search_cli is a module. Our folder structure now looks like this:

search_cli/
├── search_cli
    ├── search.py 
    ├── __init__.py

Finally, create a file in the project's root directory called setup.py.

In setup.py

from setuptools import setup

setup(
    name='text_search',
    version='1.0',
    packages=['cli_search'],
    entry_points={
        'console_scripts': [
            'text_search = cli_search.search:main'
        ]
    },
    description="A grep-like utility to parse files for a user-provided string.",
    author="Ado Sibalo",
    author_email="adosib1@gmail.com",
    license="MIT"
)

This just lets pip know how to install the package. I got this idea from a post on Medium, so check that out to learn more; also be sure to look at the official setuptools documentation for more details on what all this is doing.

In the terminal, navigate to the project's root directory and then at the prompt enter

$ pip install -e .
to install the package in development mode. The folder structure at this point looks like:
search_cli/
├── search_cli
    ├── search.py 
    ├── __init__.py
├── text_search.egg-info 
└── setup.py

All that's left to do is run the same command we did earlier, except instead of $ python search.py ... we're going to use $ text_search ... :

$ text_search "parser" -ext .js .cs -root "/Volumes/Source Code"
The result will be the same, but this is a lot nicer than having to navigate to the python script and running $ python .../search_cli/search.py ... every time.

Wrapping Up

Hopefully this post inspires you to check out the argparse package and create some sweet CLIs with python.

If you want to get up-and-running with the code, you can clone this project at the github repository where it's hosted.

That's it for this blog post - thanks for reading. Be sure to check out some of the other posts and leave a comment if you have something to say!