Fork me on GitHub

Guide

Introduction

prefixtree provides both PrefixDict, a dictionary like object, and PrefixSet, a set like object. Both are implemented using prefix trees, or tries.

Tries

Tries, also known as prefix trees, are an ordered tree data structure. Trie minimise the ammount of memory required to store keys if the keys frequently share the same prefix.

In addition to minimising memory, the keys in tries are ordered. This allows prefix tree based dicts and sets to support slicing operations.

Note

Memory minimisation is an academic property of the data structure. Comparing a pure Python trie to an optimised C hash table may not demonstrate any memory savings.

Keys

The keys used in prefixtree collections must be a str or bytes object. Unicode strings will be encoded to bytes before storage and after retrieval. Because of this '\u2641' and b'\xe2\x99\x81' are equivalent keys.

Installation

Use pip to install prefixtree from PyPI.

$ pip install --use-mirrors filemagic

The --use-mirrors argument is optional. However, it is a good idea to use this option as it both reduces the load on PyPI as well as continues to work if PyPI is unavailable.

The prefixtree module should now be availabe from the Python shell.

>>> import prefixtree

The sections bellow will describe how to use PrefixDict and PrefixSet.

PrefixDict

This dictionary like object, implemented using a trie, is an implementation of the MutableMapping Abstract Base Class. PrefixDict supports the same construction methods as the builtin dict object.

>>> from prefixtree import PrefixDict
>>> pd = PrefixDict()
>>> pd['a'] = Ellipsis
>>> 'a' in pd
True

The most significant difference between PrefixDict and builtin dict object is that PrefixDict supports using a slice when getting, setting and deleting keys. When a slice is used to get values an iterator is returned.

>>> pd.update([('a', 0), ('b', 1), ('c', 2)])
>>> list(pd['a':'b'])
[0, 1]

Unlike slices for sequence objects, such as list and tuple, slices on PrefixDict are inclusive of both the start and the stop values. Step values of 1 and -1 are supported. Indicating forward and reverse iteration.

>>> list(pd['a':'c':-1])
[2, 1, 0]

When setting a range of values using a slice from a PrefixDict, the new values are iterated over in order, replacing the current values from the slice.

>>> pd[:'b'] = [3, 4]
>>> pd['a']
3
>>> pd['b']
4

If there are fewer new values than there are values in the slice an ValueError exception is raised. The exception i raised after updating all possible values from the PrefixDict.

>>> pd['b':] = [5]
Traceback (most recent call last):
    ...
ValueError: Fewer new elements to than slice length
>>> pd['b']
5

Deleting slices works similar to getting slices. They are also inclusive of both the start and the stop value.

>>> del pd['b':'b']
>>> 'b' in pd
False

In addition to the standard dict interface, a PrefixDict has the following additional methods.

commonprefix`() returns the longest common prefix between the supplied key and the keys already in the PrefixDict.

>>> pd.commonprefix('aa')
'a'

startswith() iterates over all keys that begin with the supplied prefix.

>>> pd = PrefixDict(aa=0, ab=1, ac=2)
>>> list(pd.startswith('a'))
['aa', 'ab', 'ac']

Matching keys are returned in order. The order can be reversed by passing True for the reverse parameter.

PrefixSet

This set like object, implemented using a trie, is an implementation of the collections.MutableSet. Abstract Base Class.

>>> from prefixtree import PrefixSet
>>> ps = PrefixSet()
>>> ps.add('abc')
>>> 'abc' in ps
True

PrefixSet supports the same construction methods as the builtin set object.