Typed NameTuples

Filed under: programming

Last week I tried a Python 3.6+ featured called NamedTuples using the typing module. This blog post I write my thoughts about using the NamedTuple from the typing module however there’s also a NamedTuple object in the collections module.

Type Checking with Python

In the ideal Python set up you’d have the mypy package installed when using typing to achieve the full benefit of type checking. However, you can still import typing and from typing import * to take advantage of the NamedTuple data structures.

My task required organizing data during transformation in a pipeline from one format in one arrangement to another format in another arrangement. The bits, (or fields for you database folks), are the same in both arrangements. There was a challenge in keeping track of the 30 fields in my mental space, yet more importantly keeping track of them in the code, (1) using the core Python interpreter without use of extra libraries, and (2) making sure the output of the fields is consistent in order in the resulting CSV format.

In my first version of the data transformations code, I used a list of dictionaries to track the passing of data from one function to another. It was tedious to write and also verbose; it was easy to reason with as a prototype. Here’s an example using wine to demonstrate that it’s easy to end up with a lot of lines of code.

wine_rack = []
bottle = {}
bottle["product_name"] = "Remy Martin 1738"
bottle["size"] = "1L"
bottle["price"] = "65.00"
bottle["currency"] = "USD$"
bottle["type"] = "Cognac"
bottle["year"] = "2010"
bottle["flavor"] = "Butterscotch"
bottle["country_of_origin"] = "France"

It’s readable, but I thought I could do better than that in terms of thinking about maintainability of code. More lines of code is not necessarily more productive code. More lines of code is actually more lines of potential typos, so I personally prefer tuples if it’s a list of the same data structure, and with the typing.NamedTuple I could even add in the type information for future reference. Here’s the same example again using the preferred data structure, NamedTuple

Example use of NamedTuple

First you create the class object for the NamedTuple, the order defined here is the order in which you’ll input the information later. You define this once in the part where you define the specs and other business logics, and you can easily update it depending on the business stakeholders’ needs. I really like this format as it allows for easier time working with tentative structure. Say I want to save the information on the store that is selling the bottle and whether I’ve had it before, I can default the values.

from typing import NamedTuple

wine_rack = []
class Bottle(NamedTuple):
  product_name: str
  size: str
  price: float
  currency: str
  type: str
  year: int
  flavor: str
  country_of_origin: str
  store: str = "Local Wine Store"
  tried_before: bool = False

Then later in your code when you’re working to save and transform the data structure, it becomes so much easier to work with, such as below. I can add two bottles to the wine rack and only think about the order and the type of input. With Tuples, the order does matter… so do keep that in mind. It’s not for everyone, but since I usually an pulling data and labeling them with variables, I like tuples for maintaining code.

wine_rack.append(Bottle("Remy Martin 1738", "1L", "65.00", "USD$", "Cognac", "2010", "Butterscotch", "France"))

wine_rack.append(Bottle("Dom Perignon", "750ml", "150", "USD$", "Champagne", "2010", "Citrus", "France", "Wine.com", True))

I had one function focused on the management of that list of tuples. In another function, the wine_rack list object was modified with list comprehension make it list of lists menu = [[bottle[attr] for attr in bottle._fields] for bottle in wine_rack] to writerows(menu) to a CSV file. It was super easy to pass that original list along from stage to stage of the data transformation and feel the guarantee that the order was right, the types matched, and that there were far fewer lines of code to maintain down the road.

That’s how easy it was! The choice between tuples and dictionaries as the base data structure isn’t always clear. For this project I enjoyed using tuples.