charlesreid1.com blog

Python: From Args to Kwargs

Posted in Python

permalink

Overview

In this short blog post, we talk about how and when you can take a method signature that defines input positional arguments by name, like this:

def foo(arg1, arg2, arg3):
    pass

and write code that will return a dictionary containing a keyword arguments-like structure:

>>> foo('red', 'blue', 'green')
{
    'arg1': 'red',
    'arg2': 'blue',
    'arg3': 'green'
}

We will cover an example of writing a decorator that utilizes input arguments from both the decorator and from the function it wraps, and how to keep all of that information straight.

The Easy Way: locals()

We'll start with the easiest possible wyay to turn args into kwargs: locals(). The locals() function is one of the built-in methods provided by Python:

>>> print(help(locals))

Help on built-in function locals in module builtins:

locals()
    Return a dictionary containing the current scope's local variables.

    NOTE: Whether or not updates to this dictionary will affect name lookups in
    the local scope and vice-versa is *implementation dependent* and not
    covered by any backwards compatibility guarantees.

This is a straightforward way to get a dictionary of input argument names mapping to the values provided by the user:

>>> def foo(arg1, arg2, arg3):
...     print(locals())
...
>>> foo('asdf', 'qwerty', 'oioioioi')
{'arg3': 'oioioioi', 'arg2': 'qwerty', 'arg1': 'asdf'}

When locals() Won't Work: Getting a Method Signature Programmatically

Sometimes, locals() won't get you what you need - like when you're decorating a function, and you don't have the original method signature.

In that case, you can still use a function handle and get the original positional argument names from the function signature.

In Python, the signature of a method can be obtained using the inspect module's signature() method, which can be passed a function:

>>> import inspect
>>> def foo(arg1, arg2, arg3):
...      pass
...
>>> print(inspect.signature(foo))
(arg1, arg2, arg3)

The _parameters attribute of the signature will yield an ordered list of parameters in the method signature, which is equivalent to the variable names that are used in the method definition (arg1, arg2, and arg3 in the example foo() function above):

>>> print(list(inspect.signature(foo)._parameters))
['arg1', 'arg2', 'arg3']

Args to Kwargs: Parameter Extraction from Decorator

We can use this to get the original variable names from a function handle, even if we don't have its original method signature (i.e., if we're a decorator and are just passed the function).

Here is an example of a decorator that extracts positional arguments from the function it decorates (and prints them out!):

import inspect

def real_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):

        # This is where the interesting stuff starts!
        # We have a handle to a function that we're
        # decorating, but we don't have its original
        # method signature.
        # No sweat. Turn positional args into augmented
        # kwargs!
        func_kwargs = {}
        sig = inspect.signature(wrapper)
        for i, p in enumerate(list(sig._parameters)):
            try:
                func_kwargs[p] = args[i]
            except IndexError:
                # Unspecified positional argument
                # (using default value)
                pass

        print(f"wrapper extracted the following params: {func_kwargs}")
        func(*args, **kwargs)

    return wrapper

# don't forget to top it off
# by decorating a simple function
# and calling it if script is run
@real_decorator
def foo(arg1, arg2, arg3):
    print("hello world!")

if  __name__=="__main__":
    foo('asdf', 'qwerty', 'wioioioio')

And when run, the result is:

$ py five.py
wrapper extracted the following params: {'arg1': 'asdf', 'arg2': 'qwerty', 'arg3': 'wioioioio'}
hello world!

Tags:    python    programming    arguments    functions    methods    parameters   

Confuse-A-Constructor: When Class A's Constructor Returns Objects of Type B

Posted in Python

permalink

Confuse-A-Constructor

Today, we are going to confuse a constructor.

What is the constructor?

One of the first concepts encountered in object-oriented programming is that of the constructor, the method that is run immediately after an object is instantiated that configures and initializes the object.

In Python, a constructor is defined by the __init__ function. The constructor is not permitted to return a value, because constructing a new instance of class A should result in an object of type A. Returning something would just be confusing things.

But does it ever make sense for a constructor of class A to return an object of type B? And if it does make sense, how do we go about doing it?

Rewiring the constructor to do... weird stuff

The answer lies in Python's __new__ method, which is a method called when a class is defined (not instantiated). The __new__ method is different from the __init__ method, and does not do the same thing.

The __new__ method for class A should only return the type of class A. If __new__ returns anything else, Python will not run the __init__ method for class A.

For example, suppose we want a wrapper class that transparently constructs different kinds of objects conditionally - based on a configuration file, or the state of a file, or some other condition. We want to construct an object of type A and get back an object of type B, C, or D. How to do that?

First, let's look at how the __new__ method works.

A simple example class

Start with a simple example class:

class A(object):

    def __init__(self, *args, **kwargs):
        print("Instance of class A created")

    def hello(self):
        print("Hello world")

Executing this gives:

In [3]: my_object = A()
Instance of class A created

In [4]: my_object.hello()
Hello world

Adding a __new__ method

Now let's look at a class A where we define the __new__ method. This method controls how the instantiation of objects of type A work, so we can do something like limiting the creation of objects of type A to when a certain condition is met:

import random

def tossCoin():
    if random.random() < 0.5:
        return True
    else:
        return False

class A5050(object):
    def __new__(cls, *args, **kwargs):
        if not tossCoin():
            raise RuntimeError("Count not create instance")
        instance = super(A5050, cls).__new__(cls, *args, **kwargs)
        return instance

    def __init__(self, *args, **kwargs):
        print("Instance of class A5050 created")

    def hello(self):
        print("Hello world")

Now we can run this block of code:

def make_a5050():
    try:
        my_object = A5050()
        my_object.hello()
    except RuntimeError:
        print("Better luck next time!")

It takes a few tries:

In [9]: make_a5050()
Better luck next time!

In [10]: make_a5050()
Instance of class A5050 created
Hello world

The __new__ method for the A5050 class raises a runtime error with a 50% probability. Otherwise, it calls the __new__ method of the parent class (object, which returns a class of type object). We pass the same arguments and keyword arguments (args/kwargs) on to the super class __new__, but we could optionally modify them here (say, add a keyword, or check the state of a file, or etc.).

This is just an example of how the instantiation behavior of a class can be modified before its constructor is even called by using the __new__ method.

When __new__ returns objects, not classes

In the above example, our __new__ method returned the result of a call to __new__ of a parent class. What happens if __new__ returns something else?

First, repeating an important point made above: if __new__ for a class returns anything other than that class type, then __init__ will not be called for that class.

That means that the __new__ method should either return a class (if returning the type of its parent class, like a normal __new__ method does), or it should return an instantiated object.

Let's imagine that we want to create different instances of different classes based on a command line flag passed to the script:

class BaseClass(object):
    def hello(self):
        print("Hello world from class %s"%(self.__class__.__name__))

class B(BaseClass):
    pass

class C(BaseClass):
    pass

class D(BaseClass):
    pass

class A(object):
    def __new__(cls, args):
        if args.B:
            return B()
        elif args.C:
            return C()
        elif args.D:
            return D()
        else:
            raise RuntimeError("Could not create instance")

if __name__=="__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument('-B', action='store_true',
                        help='Return object of type B')
    parser.add_argument('-C', action='store_true',
                        help='Return object of type C')
    parser.add_argument('-D', action='store_true',
                        help='Return object of type D')

    args = parser.parse_args()

    a = A(args)
    print(type(a))

Now if we run this script and pass it different flags, we get a variable a with different types:

$ py wat2.py -h
usage: wat2.py [-h] [-B] [-C] [-D]

optional arguments:
  -h, --help  show this help message and exit
  -B          Return object of type B
  -C          Return object of type C
  -D          Return object of type D

Now try the three flags:

$ py wat2.py -B
<class '__main__.B'>

$ py wat2.py -C
<class '__main__.C'>

$ py wat2.py -D
<class '__main__.D'>

Moving beyond argparse

The example above shows how the constructor can use argparse options to determine what kind of object to return with __new__, but you can use other types of conditions as well:

  • using command line options (see argparse example above)
  • using configuration file options
  • using environment variable values
  • checking status of a file or port
  • checking whether internet connection is available

Using __new__ in your patterns

We have already covered the Registry pattern in a prior blog post, but the __new__ method lends itself well to all kinds of other patterns, including the Singleton pattern and the Factory pattern.

There are some very useful patterns covered in this Github repository: https://github.com/faif/python-patterns

Python Patterns: The Registry

Posted in Python

permalink

Overview

This post is a summary of a useful Python programming pattern called the Registry pattern.

The Registry pattern is a way of keeping track of all subclasses of a given class.

More details about this pattern are available at https://github.com/faif/python-patterns.

What is the Registry Pattern?

Let's start with a common scenario: you have some kind of manager class that is managing multiple related subclasses, and you are looking for a simpler way to manage the subclasses.

The registry pattern creates a map of labels to class types, and allows you to access a single registry common to all of those classes. The registry is updated every time a new class is added.

This is useful for several situations, including these two examples:

  • You want the manager class to iterate over every available subclass and call a particular method on each subclass
  • You want to streamline a factory class, which takes an input label (like the name of a class) and creates/returns an object of that type

The Registry Base Type

We start by defining a registry base type (or class). This class defines one behavior, which is adding itself to the class registry. It creates a shared instance variable called REGISTRY which is shared amongst all of the classes, and is also accessible via a class method.

This is the base class; any class we want registered should inherit from this class. It should also extend the type class, since it is itself a class type:

class RegistryBase(type):

    REGISTRY = {}

    def __new__(cls, name, bases, attrs):
        # instantiate a new type corresponding to the type of class being defined
        # this is currently RegisterBase but in child classes will be the child class
        new_cls = type.__new__(cls, name, bases, attrs)
        cls.REGISTRY[new_cls.__name__] = new_cls
        return new_cls

    @classmethod
    def get_registry(cls):
        return dict(cls.REGISTRY)

The design here is subtle, but the details are important to understand how it works.

A few important things to note, progressing from top to bottom:

  • The REGISTRY variable is defined outside the scope of any class methods, meaning it is a shared instance variable (a variable that is shared across all instances of RegistryBase); this is an example of the Borg pattern.

  • The RegistryBase class defines __new__, not __init__; the __new__ method is run when the class is defined, while __init__ is run when the classs is instantiated. This ensures that any subclasses that inherit from RegistryBase add themselves to the registry when they are defined, not when they are instantiated.

  • In the constructor, we store a reference to the current class using this line:

new_cls = type.__new__(cls, name, bases, attrs)

This creates a new class, but of type type (confusing, but basically it means we store a reference to this type of object, not just a reference to a particular object).

It is important to note that once we use the registry, the value that is returned is callable, and will create an object corresponding to that type.

  • We define a get_registry method to return a copy of the registry; this must be decorated with a @classmethod decorator (not a @staticmethod decorator) so that it has access to the registry, which is a class instance variable.

BONUS NOTE: From prior experience, we have found it easiest to ignore the case of the label (uppercase, lowercase, CamelCase, etc) and convert all class names to lowercase in the registry:

        cls.REGISTRY[new_cls.__name__.lower()] = new_cls

The Base Registered Class

Now we can define a base class that will extend the RegistryBase class. However, because RegistryBase is a type class, we shouldn't extend it directly - we should use it as a metaclass.

class BaseRegisteredClass(metaclass=RegistryBase):
    pass

Any class that inherits from this BaseRegisteredClass will now be included in the registry when it is defined.

Extending the Base Registered Class

The next step is to use the base registered class to start creating interesting classes:

class ExtendedRegisteredClass(BaseRegisteredClass):
    def __init__(self, *args, **kwargs):
        pass

We skip defining constructor behavior, as the call order is what's important.

Seeing it in action

Now we can see the process of adding subclasses to the registry in action.

Start by checking the registry before we have created any subclasses:

>>> print(RegistryHolder.REGISTRY)
['BaseRegisteredClass']

Next, define the extended registered class:

>>> class ExtendedRegisteredClass(BaseRegisteredClass):
...     def __init__(self, *args, **kwargs):
...         pass

Remember, we add the new class to the registry in the __new__ method, not the __init__ method, so we don't even need to instantiate an ExtendedRegisteredClass object for it to be added to the registry. Check the registry again:

>>> print(RegistryHolder.REGISTRY)
['BaseRegisteredClass', 'ExtendedRegisteredClass']

Examples

Let's consider a specific example to help illustrate the usefulness of the Registry pattern: adding the ability to index different kinds of documents to a search engine.

Example: Search Engine

Suppose we are writing a search engine, and we are working on the search engine backend. Specifically, consider the backend portion that iterates over a group of documents, checks if the document is already in the search index, and either adds a new item to the search index, updates an existing item in the search index, or deletes an item from the search index.

Registry Subclasses

A search engine may index multiple kinds of documents, each living in different locations. For example, a search index may index .docx files in a Google Drive folder, Github issues, and/or a local folder full of Markdown files. For each of these document types, we must define:

  • How to add or update all documents in the document storage system (Google Drive, Github API, local filesystem, etc.); this method should be able to get a list of documents of this document type that are already in the search index, either by running a query itself, or by accepting a list of documents already in the index as an input argument

  • What schema to use (mapping various field names to data types) for documents of this type

  • How to display search results when they are documents of this type (i.e., how to display what fields when showing the user search results)

This can be done by defining classes corresponding to different document types, where each class defines how to do the above actions, and wraps them in high-level API methods that the manager classes can call.

The manager classes want a way to get a list of available document types, and to call each document type's high-level API methods. This can be done with the registry.

Doctype Registry Base Type

Start by defining a base type that will register new subclasses in a doctype registry. Note that as per the __new__ documentation, the __new__ method takes the class (in our case, a class of type type) as the first argument cls, and it should return a new object instance (which, again, is an instance of type type).

class DoctypeRegistryBase(type):

    DOCTYPE_REGISTRY = {}

    def __new__(cls, name, bases, attrs):
        new_cls = type.__new__(cls, name, bases, attrs)
        cls.DOCTYPE_REGISTRY[new_cls.__name__.lower()] = new_cls
        return new_cls

    @classmethod
    def get_doctype_registry(cls):
        return dict(cls.DOCTYPE_REGISTRY)

Base Registered Doctype Class

Next, we define a base class for any document type that we want registered in the doctype registry, using DoctypeRegistryBase as our metaclass (because it is a class of type type, as opposed to a normal class). All document types are required to define three bits of behavior (interacting with the document storage system to iterate over all documents; extracting information from individual documents; and displaying a given document type when it is a search result. We define these virtual methods on the base registered doctype class.

class BaseRegisteredDoctypeClass(metaclass=DoctypeRegistryBase):

    def add_update_delete(self, *args, **kwargs):
        """
        Iterate over all documents in the document storage system
        and add/update/delete documents as needed.
        """
        raise NotImplementedError()

    def get_schema(self, *args, **kwargs):
        """
        Assemble and return the schema (map of field names to data types)
        """
        raise NotImplementedError()

    def render_search_result(self, *args, **kwargs):
        """
        Use a Jinja template to render a single search result of type doctype
        to display to the user via the web interface
        """
        raise NotImplementedError()

Derived Registered Doctype Class

Now that we have a base class, we can define child classes that have the specific API functionality needed.

Here is a high-level sketch of what the Github issue document type class might look like, starting with the add/update/delete method, which boils down to some set operations to determine what documents to add, update, or delete from the search index.

(Also note, these methods are defined as class methods because they are called once for each doctype class, but this does not necessarily need to be the case.)

class GithubIssueDoctype(BaseRegisteredDoctypeClass):

    @classmethod
    def add_update_delete(cls, *args, **kwargs):
        # Get set of indexed document IDs
        indexed_docs = set()
        results = run_search_index_query(doctype = cls.__name__.lower())
        for result in results:
            indexed_docs.add(result.id)

        # Get set of remote document IDs
        remote_docs = set()
        for org_name in get_org_names():
            for repo_name in get_repo_names(org_name):
                for file in get_repo_files(repo_name, org_name):
                    remote_docs.add(file.id)

        # Do some set math to figure out what to add/update/delete
        add_ids = remote_docs.difference(indexed_docs)
        update_ids = indexed_docs.union(remote_docs)
        delete_ids = indexed_docs.difference(remote_docs)

        # Do the operations
        for add_id in add_ids:
            add_document_to_index(add_id)
        for update_id in update_ids:
            update_document_in_index(update_id)
        for delete_id in delete_ids:
            delete_document_in_index(delete_id)

    @classmethod
    def get_schema(cls, *args, **kwargs):
        ...

    @classmethod
    def render_search_result(cls, *args, **kwargs):
        ...

Now we do the same for documents on Google Drive, defining a different add_update_delete() method specific to Google Drive documents (note that while the sketch of this method looks similar to the Github doctype above, the implementation will look more and more different as we include more and more detail in our method and API calls).

As before, we make these methods class methods.

class GoogleDocsDoctype(BaseRegisteredDoctypeClass):

    @classmethod
    def add_update_delete(cls, *args, **kwargs):
        # Get set of indexed document IDs
        indexed_dcos = set()
        results = run_search_index_query(doctype = cls.__name__.lower())
        for result in results:
            indexed_docs.add(result.id)

        # Get set of remote document IDs
        remote_docs = set()
        for files in get_gdrive_file_list():
            remote_docs.add(file.id)

        # Do some math to figure out what to add/update/delete
        add_ids = remote_docs.difference(indexed_docs)
        update_ids = indexed_docs.union(remote_docs)
        delete_ids = indexed_docs.difference(remote_docs)

        # Do the operations
        for add_id in add_ids:
            add_document_to_index(add_id)
        for update_id in update_ids:
            update_document_in_index(update_id)
        for delete_id in delete_ids:
            delete_document_in_index(delete_id)

    @classmethod
    def get_schema(cls, *args, **kwargs):
        ...

    @classmethod
    def render_search_result(cls, *args, **kwargs):
        ...

Using the Registry

The last step is to actually use the registry from the class that manages all of our subclasses. In this case, we have a Search class that defines high-level operations (such as, "add/update/delete all documents indexed by this search engine") and in turn calls the corresponding method for each doctype that has been registered.

class Search(object):

    def add_update_delete_all(self, *args, **kwargs):

        # Iterate over every doc type
        for doctype_name in DoctypeRegistryBase.DOCTYPE_REGISTRY:

            # Get a handle to the doctype class
            doctype_class = DoctypeRegistryBase.DOCTYPE_REGISTRY[doctype_name]

            # Call the class method add_update_delete on the doctype class
            doctype_class.add_update_delete(*args, **kwargs)

            # Note: if we need to create an instance first
            # doctype_instance = doctype_class()
            # doctype_instance.add_update_delete(*args, **kwargs)

Further Modifications

In the example above, we don't have any need to create specific instances of the document type classes, since they do not need to preserve their state between calls, and we only need one instance of the doctype class per search engine.

We could re-implement this pattern, modifying the search engine doctype classes to be normal, non-static classes. This would allow us to have, say, two instances of the Github issues doctype class (corresponding to two different sets of Github API credentials, or two different Github accounts), or two instances of the Google Drive doctype class (corresponding to two different Google Drive folders). In this case, we would want to restructure the Search class to instantiate and save doctype class instances in the constructor, and use them later.

Here's an example Search class that would instantiate one of each subclass in the constructor, to be used in later methods:

class Search(object):

    def __init__(self, *args, **kwargs):

        # Store instances of each doctype
        self.all_doctypes = []

        # Iterate over every doc type
        for doctype_name in DoctypeRegistryBase.DOCTYPE_REGISTRY:

        # Get a handle to the doctype class
        doctype_class = DoctypeRegistryBase.DOCTYPE_REGISTRY[doctype_name]

        # Create an instance of type doctype_class
        doctype_instance = doctype_class()

        # Save for later
        all_doctypes.append(doctype_instance)

    def add_update_delete(self, *args, **kwargs):

        # Call the add_update_delete method on each doctype instance
        for doctype_name, doctype_instance in self.all_doctypes.items():
            doctype_instance.add_update_delete(*args, **kwargs)

(Note that this would work best if we removed the @classmethod decorators from the doctype classes defined above.)

Summary

In this post, we covered a useful programming pattern, the Registry, and showed it in action for one example - allowing a search engine index to register various document type subclasses, then use those registered subclasses to propagate calls to all document types later.

Tags:    python    programming    patterns    design patterns    registry    computer science   

March 2022

How to Read Ulysses

July 2020

Applied Gitflow

September 2019

Mocking AWS in Unit Tests

May 2018

Current Projects