My style

This could be way longer but I thought I would show my version of the text frequency problem in from Exercises in Programming Style.

The goal of the programs is "Given a text file, we want to display the N (e.g. 25) most frequent words and corresponding freqeuncies ordered by decreasing value of frequency." As well as through out stop words (the, a, an, etc) and stadardize capitalization.

The Gettysburg address looks something like this as output:

here: 8
nation: 5
can: 5
dedicated: 4
shall: 3
...

Object Oriented Style (From Exercises in Programming Style):

This is in a style called object oriented programming, where part of the goal is to encsapulate complexity into objects.

import sys,re,operator,string

from abc import ABCMeta

class TFExercise():
    __metaclass__ = ABCMeta

    def info(self):
        return self.__class__.__name__

class DataStorageManager(TFExercise):

    def __init__(self, path_to_file):
        with open(path_to_file) as f:
            self._data = f.read()
        pattern = re.compile(r"[\W_]+")
        self._data = pattern.sub(' ',self._data).lower()

    def words(self):
        return self._data.split()

class StopWordManager(TFExercise):

    def __init__(self):
        with open('./stop_words.txt') as f:
            self._stop_words = f.read().split(',')
        self._stop_words.extend(list(string.ascii_lowercase))

    def is_stop_word(self,word):
        return word in self._stop_words


class WordFrequencyManager(TFExercise):

    def __init__(self):
        self._word_freqs = {}

    def increment_count(self, word):
        if word in self._word_freqs:
            self._word_freqs[word] += 1
        else:
            self._word_freqs[word] = 1

    def sorted(self):
        return sorted(self._word_freqs.items(), key=operator.itemgetter(1), reverse=True)

class WordFrequencyController(TFExercise):

    def __init__(self,path_to_file):
        self._storage_manager = DataStorageManager(path_to_file)
        self._stop_word_manager = StopWordManager()
        self._word_freq_manager = WordFrequencyManager()

    def run(self):
        for w in self._storage_manager.words():
            if not self._stop_word_manager.is_stop_word(w):
                self._word_freq_manager.increment_count(w)

        word_freqs = self._word_freq_manager.sorted()
        for (w,c) in word_freqs[0:25]:
            print(w, ' - ',c)


WordFrequencyController(sys.argv[1]).run()

Modern, hide everything away style.

My version

import string
import sys

def read_file(file_name):
    with open(file_name,"r") as f:
        return f.read()

def top_25(file_name):
    freq_dict = {}
    
    file_data = read_file(file_name).lower()

    stop_words = read_file('./stop_words.txt').split(',')
    
    punc_free = ''.join([c if c not in string.punctuation + '\n' else ' '  for c in file_data])

    words = [word for word in punc_free.split(' ') if len(word)]

    filtered_words = [word for word in words if word not in stop_words]

    freq_dict = {word: filtered_words.count(word) for word in set(filtered_words)}

    return sorted(list(freq_dict.items()), key=lambda tup: tup[1],reverse=True)[:25]

print(top_25(sys.argv[1]))


The exposed brick style

My style

Everthing here is done for a reason. Younger me would have probably compressed it even further to show off, but literate code is a better design goal, for yourself or others that have to read it. Also to note I am still not fully happy with this, there are a few things I go back and forth on, like the .lower after the read_file.

My current style is the fusion of functional programming and data programming.

The main part of the file is trying to be step wise literate, meaning anyone coming into this program can read it quickly from top to bottom to understand the entire flow. This is a small example by I still do this with larger programs. I hate code that I can't quickly get the jist of what it is trying to do.

This is a combination of some functional programming style, plus a data programming style. Everything is a transformation and there are steps. I like having basic steps that can be broke apart or changed, and most importantly debugged. When the code tries to do all the things at once, like if this was a big for loop reading and doing the steps one at a time, it sucks to debug or log or change.

With code like this, it is easy to log each step and see if that step is correct. When code is more nested any change could break the whole thing. But this builds on the previous step, if that is correct, then you can forget about it and move on.

There are also no ifs or explicit loops. Under the hood the list comprehensions are loops, but there is no typed out loops. The reason being is a large number of bugs are introduced when writing conditionals or end conditions of loops. Boundary conditions are a pain and easy to screw up, so best to just try and not use them. Let the code process a whole arrary or data stucutre. Then filter out what you don't need.

There are also no new data structuctures, so when possible always stick to native data structures so that other code can just use them. Cognitive load as a domain component.

At a high level here are some of the domain I am trying to capture in my style:

  • Lower cognitive load, readability, basic data structures, small functional functions
  • Lower state or make it explicit
  • Limit variables as naming is hard, conditionals and basic loops as this is a common area for bugs
  • Referential transparency - all functions should easily be readable and function is stateless
  • Composability - Use native or basic data structures and stateless functions to aid in composability.
  • Solve problems as a chain of seperatable operations, this makes debuging and changes easy.