πΎ Caching Strategies
You can also use caching strategies (i.e. ways of storing data in a temporary area) to serve future requests for the same data faster. Caching is useful in situations where retrieving data, or computing it, is resource-intensive. You donβt want to perform these operations more than once! Python provides various tools for caching, as described below.
LRU (Least Recently Used) Cache
The LRU caching strategy removes the βleast recently usedβ items first. Itβs useful in applications where you want to cache a limited number of items and when the likelihood of accessing recently used items is high.
Python's functools
module provides an @lru_cache
decorator to implement LRU caching. It can be applied to any function whose output you want to cache:
from functools import lru_cache
@lru_cache(maxsize=100)
def expensive_function(param):
# Simulate an expensive operation
return some_expensive_computation(param)
Memoization
Memoization is a specific form of caching that involves storing the results of expensive function calls and returning the cached result when the same inputs occur again. Memoization is a form of LRU caching but is specifically applied to function calls. The @lru_cache
can also be used for memoization:
from functools import lru_cache
@lru_cache(maxsize=None) # No limit on cache size
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
TTL (Time To Live) Cache
TTL caching invalidates cache entries after a set period. This strategy is suitable when the data changes over time or if you want to ensure that data doesn't become stale. The cachetools
library offers TTL caching capabilities:
from cachetools import TTLCache
from cachetools.decorators import cached
ttl_cache = TTLCache(maxsize=100, ttl=300) # Cache up to 100 items, each for 300 seconds
@cached(ttl_cache)
def get_data(key):
# Fetch data that changes over time
return fetch_some_data(key)
Disk-based Caching
When you are dealing with large amounts of data or you need to persist a cache across program restarts, disk-based caching help. Libraries such as joblib
provide mechanisms for storing cache data on disk:
from joblib import Memory
memory = Memory("./cachedir", verbose=0)
@memory.cache
def expensive_function(param):
# Operations that take a lot of resources
return some_expensive_computation(param)
Choosing the Right Strategy
Implementing caching can significantly improve the performance of Python applications by reducing the need to recompute results or re-fetch data. However, it's important to choose the right caching strategy based on the application's requirements and the nature of the data being cached. To summarise:
- LRU caching is useful for fixed-size caches, and when recent data is more likely to be accessed
- Memoization is best for optimising expensive functions with a limited set of inputs
- TTL caching is suitable for data that changes over time, which needs a degree of βfreshnessβ
- Disk-based caching is ideal for large amounts of data, or when you need cache persistence.