A Pythonic n-wise Iterator for Any Iterable

01 June 2010

Over the weekend, I was working on upgrading python-ngrams because I had discovered a bug where the tokenization was incorrect. I was reading a research paper that was describing q-grams and while following their examples, I realized I was getting incorrect results for a fundamental n-gram result.

The tokenization that's required here is quite simple, given some iterable consisting of values [x0, x1, x2, x3...,xi], produce an exhaustive iterator of n-tuples such that it satisfies (x0,...xn), (x1,...xn+1), (xm,...xi).

To make this easier to understand, if given a list of [0, 1, 2, 3, 4, 5, 6], I want to be able to return an iterator such that:

for first, second, third in n_wise([0, 1, 2, 3, 4, 5, 6], 3):
    print first, second, third

Will have a result of:

0, 1, 2
1, 2, 3
2, 3, 4
3, 4, 5
4, 5, 6

Fortunately, this wasn't too difficult as the trivial implementation of this is already done in itertools, under pairwise. Below I've implemented a n-wise iterator implementation that can take any iterator and return n-iterators where each iterator is advanced by a step ahead of the other.

from itertools import tee, izip


def n_wise(iterable, n):
    """
    Returns n iterators for an iterable that are sequentially

    n-wise

    """
    n_iterators = tee(iterable, n)
    zippables = [n_iterators[0]]

    for advance, iteratee in enumerate(n_iterators[1:]):
        advance += 1  # since enumerate is 0 indexed.

        while advance > 0:
           # we advance the iterator ``advance+1`` steps
           next(iteratee, None)
           advance -= 1
           # append everything to the zippables
           zippables.append(iteratee)
           # return the izip expansion of each iterator
    return izip(*zippables)

I find that I sometimes need to open an iterator for a file and I need to read n-wise lines each step, this iterator will do just that. For sake of completeness, look at how concise, powerful, and easy to use this combination is:

# assuming that n_wise is imported into this namespace

from functools import partial

triple_wise_line_reader = partial(n_wise, n=3)

for line1, line2, line3 in triple_wise_line_reader(open("some_file.log", "r")):
    # do some computation with line1, line2 and line3
    do_something(line1, line2, line3)

    # next step:
    # line1 <- line2
    # line2 <- line3
    # line3 <- line4

Imagine doing this in C ;) It will be just ugly!

Please let me know if you find this useful or you have a better implementation to the problem!

This entry was tagged as programming python

blog comments powered by Disqus