Fork me on GitHub

Overview

Duck Typing

prefixtree provides two container classes, PrefixDict and PrefixSet. They are implementations of the MutableMapping and MutableSet Abstract Base Classes. Any modules that adhere to the principle of duck-typing should be able to accept a PrefixtDict or PrefixSet in place of a dict or set.

Compatability

prefixtree is implemented to be compatible with Python 2.x and Python 3.x. It has been tested against the following Python implementations:

  • CPython 2.6
  • CPython 2.7
  • CPython 3.2
  • PyPy 1.9.0

Continuous integration testing is provided by Travis CI.

Benchmarks

Benchmarks show that prefixtree is 200 times slower than the builtin dict and requires 10 times the memory.

Collection Memory Run Time
dict 40MB 0.34s
PrefixDict 453MB 67s

The following script was used to benchmark the memory usage and cpu utilisation of PrefixDict.

"""Test memory consumption and processing time with PrefixDict.

Use words from '/usr/share/dict/words' as keys for PrefixDict and measure
memory consumption and load time.
"""
import resource
import sys

from prefixtree import PrefixDict


if __name__ == '__main__':
    start = resource.getrusage(resource.RUSAGE_SELF)
    glossary = PrefixDict()
    with open('/usr/share/dict/words') as words:
        for word in words:
            glossary[word.strip()] = None
    stop = resource.getrusage(resource.RUSAGE_SELF)
    rss_mb = (stop.ru_maxrss- start.ru_maxrss) / 1024.0 / 1024.0
    tused = (stop.ru_utime + stop.ru_stime)
    sys.stdout.write('{0} MB\n'.format(rss_mb))
    sys.stdout.write('{0} seconds\n'.format(tused))

They are compared to the results for the following script testing the builtin dict:

"""Test memory consumption and processing time with builtin dict.

Use words from '/usr/share/dict/words' as keys for dict and measure memory
consumption and load time.
"""
import resource
import sys


if __name__ == '__main__':
    start = resource.getrusage(resource.RUSAGE_SELF)
    glossary = {}
    with open('/usr/share/dict/words') as words:
        for word in words:
            glossary[word.strip()] = None
    stop = resource.getrusage(resource.RUSAGE_SELF)
    rss_mb = (stop.ru_maxrss- start.ru_maxrss) / 1024.0 / 1024.0
    tused = (stop.ru_utime + stop.ru_stime)
    sys.stdout.write('{0} MB\n'.format(rss_mb))
    sys.stdout.write('{0} seconds\n'.format(tused))

The benchmarks where run using:

  • CPython 3.2, 64-bit
  • Max OSX 10.7.4
  • 2010 Macbook Pro

The benchamrks values were averaged from three runs of each benchmark script.