2010-04-23

Python lexical closures

I just learnt some nuances about Python’s lexical closures. Having not had experience with closures in other languages, particularly Perl (which has different behavior), I was a little bit ahead in not having to unlearn something. Anyway, the key point is that Python binds to the -- watch, I’m going to screw up the terminology -- variable rather than the value.


In particular, I wanted to loop over a bunch of strings. These strings were to be simple matching regexp patterns. Using these, I wanted to create a substitution function for each. Basically, I wanted to anonymize an IRC log file, and I predefine a dictionary matching real handles with anonymized handles.


My problem was that the naive way of doing things gave me only the last substitution in my dictionary.


 
    lines = ['Adam: hello Charlie', 'Barbara: hi Adam', 'Charlie: howdy Barbara']
    subs = {'Adam': 'Nobody', 'Barbara': 'Somebody', 'Charlie': 'Dr. Who'}

    relist = []
    for k,v in subs.iteritems():
        relist.append(lambda x: re.compile(k).sub(v,x))

    newlines = []
    for l in lines:
        nl = l
        for s in relist:
            nl = s(nl)
        newlines.append(nl)

    for l in newlines: print l


Which produces this output:

 
    Nobody: hello Charlie
    Barbara: hi Nobody
    Charlie: howdy Barbara


Obviously wrong. Only one substitution worked. The closure captured (k,v), and when the loop over subs exited, the last value of (k,v) was used in the lambda functions.


Here are two correct ways to do this. You have to force the value of (k,v) to be used in each substitution function.


 
    def dosub(relist, lines):
        newlines = []
        for l in lines:
            nl = l
            for s in relist:
                nl = s(nl)
            newlines.append(nl)
        return newlines

    def right(lines):
        subs = {'Adam': 'Nobody', 'Barbara': 'Somebody', 'Charlie': 'Dr. Who'}

        # the long and clear way
        relist = []
        for k,v in subs.iteritems():
            def subber(x,k=k,v=v):
                return re.compile(k).sub(v, x)
        
            relist.append(subber)

        newlines = dosub(relist, lines)

        for l in newlines: print l
        print '==================='

        # the short and opaque way
        relist = [(lambda kv: lambda x: re.compile(kv[0]).sub(kv[1], x))((k,v)) for k,v in subs.iteritems()]

        newlines = dosub(relist, lines)
        for l in newlines: print l

    if __name__ == '__main__':
        lines = ['Adam: hello Charlie', 'Barbara: hi Adam', 'Charlie: howdy Barbara']
        for l in lines: print l
        print '==================='
        right(lines)


Which gives the desired output:


Adam: hello Charlie
    Barbara: hi Adam
    Charlie: howdy Barbara
    ===================
    Nobody: hello Dr. Who
    Somebody: hi Nobody
    Dr. Who: howdy Somebody
    ===================
    Nobody: hello Dr. Who
    Somebody: hi Nobody
    Dr. Who: howdy Somebody


Credit is due to the contributor piro at Stack Overflow who responded to a question.


Elsewhere, someone has done a nice study on various methods of concatenating a list of strings in Python. It turns out, the one-liner using the list comprehension is most efficient.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.