Now that we have covered many algorithms and where the worst-case algorithmic complexities of them are known, it will be useful to gain some insights about how suited the algorithms are to implementation in a broad range of devices and how they perform in real-life conditions, including both larger enterprise and core systems, and smaller scale home or office systems.
Home network routers utilize a very small forwarding table because they simply pass all outbound traffic to the Internet Service Provider (ISP) gateway which takes care of all other routing steps. This is accomplished by using a default route. A home router forwarding tables typically contain ten or fewer entries. Home routers do not employ any of these complex algorithms. They employ the simplest algorithm (naive approach) for finding the best matching prefix through linear search of prefixes. The few prefixes entries are stored in an array. The search iterates through each prefix and compare it with the destination address. If a match occurs, it is remembered as the best match so far and the search continues. The best match is updated every time a better match is found. When the search terminates the last prefix remembered is returned as the best matching prefix. The time complexity for such a search is O(N). If these home routers have cache, then the idea of route caching combined with linear search can be implemented to speed up the lookup time. For lookup operations, the cache stores the recently seen 32-bit destination addresses and the associated next-hop information. When a packet arrives at the router, the destination address is extracted and the route cache is first consulted. If an entry exists in the cache, then the lookup operation is completed and the next hop information is used. Otherwise, the linear search is invoked and the result is cached. Using a caching scheme is effective only when there is locality in a stream of packets, i.e., a packet arrival implies another packet with the same destination address will arrive with high probability in the near future.
For office systems, where the number of packets to be routed and the number of prefixes are very small the use of simple unibit tries and path compressed tries such as Patricia might be better that using the linear search approach discussed above. For instance, BSD and Linux implementation of IP forwarding use these algorithms for routing packets that originate from the application. These implementations are typically used in servers and desktop machines where the number of packets to be routed and the number of prefixes are very small. Moreover, what makes these algorithms satisfactory in office environment is the data rate the can be sustained. For instance, if the memory access time is 10 ns, the lookup operation will consume 320 ns in the worst case (32 memory accesses). This implies a maximum forwarding speed of 3.125 million packets per second (mpps) (1/320ns). If we assume minimum sized 40-byte IP packets which is a TCP-acknowledgment packets, this can support links speed of at most 1 Gbps.
“An LC-trie is a compacted version of a path compressed trie in which complete subtries are level compressed from the top down. It uses an array-based storage and hence incremental updates are expensive. Since the internal nodes do not contain any next-hop information, a failed search at the leaf requires examination of separate tables, which is inefficient” .
“The Lulea scheme use multibit tries, but the wasted space in the trie nodes is compressed using bitmaps. Because of compression, the forwarding data structure is very compact, which fits the entire data structure in a cache, and hence lookup times are fast. But the updates are very slow due to compression and leaf pushing. However, the tree bitmap algorithm, by avoiding leaf pushing, provides guaranteed fast update times. Furthermore, the lookup speed and the size of the compressed forwarding data structure are still comparable to Lulea. Also the tree bitmap scheme provides sufficient flexibility for adapting the implementation to various memory architectures” . Because of such attractive features, it is implemented in larger enterprise and core systems such as Cisco CRS-1.
The binary search on prefix lengths scales very well to IPv6 addresses that are 128 bits long. However, the main disadvantage is the use of hashing, which introduces collisions, and hence there is no guarantee on worst case lookup times. Further implementing hash tables in hardware is tricky. Hence, it is better suited to implementation in software, and a few vendors have already done so.
The binary search on prefix ranges provides reasonably fast lookup performance and consumes only a reasonable amount of storage. Further improvement in lookup times can be made through the use of multiway branches, wider memory, and pipelined implementation. However, unlike the other trie schemes, all of which are subject to patents, binary search is free of such restrictions and this is the most important advantage.
Finally table 1 make some comparison between our three implemented algorithms. This comparison is based only on number of nodes visited (number of memory accesses) when a lookup operation is performed. For this test, the same 40000 prefixes were inserted into the unibit trie, patricia trie, and fixed stride multibit.
|W||Unibit||Patricia||1-bit fixed stride||2-bit fixed stride||3-bit fixed stride||4-bit fixed stride||8-bit fixed stride|
Table 1: “Unibit, Patricia, Mulitbit Comparison Using the 40000 Prefixes Input File”
First. It is obvious that in Unibit, the worst case is 32 which is expected. It is also worth mentioning that 1-bit Fixed stride behaves exactly similar to unibit. On the other hand, Patricia worse case is 23 huge improvement but the average case is very close to unibit. Probably the tree is not sparse enough.
Once we look at fixed stride multibit. Clearly the number of memory accesses is reduced significantly. However, the space required is increased dramatically. Apparently, the average case for 3-bit fixed stride and 4-bit fixed stride is very close. However, Although the least Average case is for 8-bit fixed stride which is 3.73605 but it is very close to the worst case of 4. Giving the trends, we believe choosing a fixed stride of 3 or 4 is the optimal choice.
It worth mentioning that we we tried to increase the stride size above 8. However the java.lang.OutOfMemoryError was encountered. We think that we can increase the Java heap size to test bigger values but we guess it is not a good idea as a very huge amount of memory in need store these 40000 prefixes and clearly we won’t improve the average case.