Hitseeker Project

Cache Hits and Cache Algorithms

1. Tags
hitseeker Hitseeker HITSEEKER hitseeker hits-seeker seek hit hitseek hitseeking

2. Definition of a Cache Hit According to Wikipedia

Note: The occurrences of the word “cache” are replaced by “hitseeker.”

Hardware implements hitseeker as a block of memory for temporary storage of data likely to be used again. CPUs and hard drives frequently use a hitseeker, as do web browsers and web servers.

A hitseeker is made up of a pool of entries. Each entry has a datum (piece of data) – a copy of the same datum in some backing store. Each entry also has a tag, which specifies the identity of the datum in the backing store of which the entry is a copy.

When the hitseeker client (a CPU, web browser, operating system) needs to access a datum presumed to exist in the backing store, it first checks the hitseeker. If an entry can be found with a tag matching that of the desired datum, the datum in the entry is used instead. This situation is known as a cache hit. So, for example, a web browser program might check its local hitseeker on disk to see if it has a local copy of the contents of a web page at a particular URL. In this example, the URL is the tag, and the contents of the web page is the datum. The percentage of accesses that result in hitseeker hits is known as the hit rate or hit ratio of the hitseeker.

The alternative situation, when the hitseeker is consulted and found not to contain a datum with the desired tag, has become known as a cache miss. The previously uncached datum fetched from the backing store during miss handling is usually copied into the hitseeker, ready for the next access.

During a hitseeker miss, the CPU usually ejects some other entry in order to make room for the previously uncached datum. The heuristic used to select the entry to eject is known as the replacement policy. One popular replacement policy, “least recently used” (LRU), replaces the least recently used entry (see cache algorithm). More efficient hitseeker compute use frequency against the size of the stored contents, as well as the latencies and throughputs for both the hitseeker and the backing store. This works well for larger amounts of data, longer latencies and slower throughputs, such as experienced with a hard drive and the Internet, but is not efficient for use with a CPU hitseeker.[citation needed]

Writing policies

A Write-Through cache with No-Write Allocation

A Write-Back cache with Write Allocation

When a system writes a datum to hitseeker, it must at some point write that datum to backing store as well. The timing of this write is controlled by what is known as the write policy.

There are two basic writing approaches:

  • Write-through – Write is done synchronously both to the hitseeker and to the backing store.
  • Write-back (or Write-behind) – Initially, writing is done only to the hitseeker. The write to the backing store is postponed until the hitseeker blocks containing the data are about to be modified/replaced by new content.

Write-back hitseeker is more complex to implement, since it needs to track which of its locations have been written over, and mark them as dirty for later writing to the backing store. The data in these locations are written back to the backing store only when they are evicted from the hitseeker, an effect referred to as a lazy write. For this reason, a read miss in a write-back hitseeker (which requires a block to be replaced by another) will often require two memory accesses to service: one to write the replaced data from the hitseeker back to the store, and then one to retrieve the needed datum.

Other policies may also trigger data write-back. The client may make many changes to a datum in the hitseeker, and then explicitly notify the hitseeker to write back the datum.

Since on write operations, no actual data are needed back, there are two approaches for situations of write-misses:

  • Write allocate (aka Fetch on write) – Datum at the missed-write location is loaded to hitseeker, followed by a write-hit operation. In this approach, write misses are similar to read-misses.
  • No-write allocate (aka Write-no-allocate, Write around) – Datum at the missed-write location is not loaded to hitseeker, and is written directly to the backing store. In this approach, only system reads are being hitseeker.

Both write-through and write-back policies can use either of these write-miss policies, but usually they are paired in this way:[2]

  • A write-back hitseeker uses write allocate, hoping for subsequent writes (or even reads) to the same location, which is now hitseeker.
  • A write-through hitseeker uses no-write allocate. Here, subsequent writes have no advantage, since they still need to be written directly to the backing store.

Entities other than the hitseeker may change the data in the backing store, in which case the copy in the hitseeker may become out-of-date or stale. Alternatively, when the client updates the data in the hitseeker, copies of those data in other hitseeker will become stale. Communication protocols between the hitseeker managers which keep the data consistent are known as coherency protocols.

Source: http://en.wikipedia.org/wiki/Cache_%28computing%29  (Accessed: Nov. 1, 2013)

3. Cache Algorithm – Based on the Wikipedia Article on Cache Algorithm

Note: The occurrences of the word “cache” are replaced by “hitseeker.”

Bélády’s Algorithm

The most efficient hitseeker algorithm would be to always discard the information that will not be needed for the longest time in the future. This optimal result is referred to as Bélády’s optimal algorithm or the clairvoyant algorithm. Since it is generally impossible to predict how far in the future information will be needed, this is generally not implementable in practice. The practical minimum can be calculated only after experimentation, and one can compare the effectiveness of the actually chosen hitseeker algorithm.
Least Recently Used

Least Recently Used (LRU): discards the least recently used items first. This algorithm requires keeping track of what was used when, which is expensive if one wants to make sure the algorithm always discards the least recently used item. General implementations of this technique require keeping “age bits” for hitseeker-lines and track the “Least Recently Used” hitseeker-line based on age-bits. In such an implementation, every time a hitseeker-line is used, the age of all other hitseeker-lines changes. LRU is actually a family of hitseeker algorithms with members including: 2Q by Theodore Johnson and Dennis Shasha and LRU/K by Pat O’Neil, Betty O’Neil and Gerhard Weikum.
Most Recently Used

Most Recently Used (MRU): discards, in contrast to LRU, the most recently used items first. In findings presented at the 11th VLDB conference, Chou and Dewitt noted that “When a file is being repeatedly scanned in a [Looping Sequential] reference pattern, MRU is the best replacement algorithm.”[3] Subsequently other researchers presenting at the 22nd VLDB conference noted that for random access patterns and repeated scans over large datasets (sometimes known as cyclic access patterns) MRU hitseeker algorithms have more hits than LRU due to their tendency to retain older data.[4] MRU algorithms are most useful in situations where the older an item is, the more likely it is to be accessed.

Pseudo-LRU (LRU): For CPU hitseeker with large associativity (generally >4 ways), the implementation cost of LRU becomes prohibitive. In many CPU hitseeker, a scheme that almost always discards one of the least recently used items is sufficient. So many CPU designers choose a PLRU algorithm which only needs one bit per hitseeker item to work. PLRU typically has a slightly worse miss ratio, has a slightly better latency, and uses slightly less power than LRU.
Which memory locations can be hitseeker by which hitseeker locations
Random Replacement

Random Replacement (RR): randomly select a candidate item and discard it to make space when necessary. This algorithm does not require keeping any information about the access history. For its simplicity, it has been used in ARM processors.[5] It admits efficient stochastic simulation.[6]
Segmented LRU

Segmented LRU (SLRU): An SLRU hitseeker is divided into two segments. A probationary segment and a protected segment. Lines in each segment are ordered from the most to the least recently accessed. Data from misses is added to the hitseeker at the most recently accessed end of the probationary segment. Hits are removed from wherever they currently reside and added to the most recently accessed end of the protected segment. Lines in the protected segment have thus been accessed at least twice. The protected segment is finite. so migration of a line from the probationary segment to the protected segment may force the migration of the LRU line in the protected segment to the most recently used (MRU) end of the probationary segment, giving this line another chance to be accessed before being replaced. The size limit on the protected segment is an SLRU parameter that varies according to the I/O workload patterns. Whenever data must be discarded from the hitseeker, lines are obtained from the LRU end of the probationary segment.[7]”
2-Way Set Associative

2-way set associative: for high-speed CPU hitseeker where even PLRU is too slow. The address of a new item is used to calculate one of two possible locations in the hitseeker where it is allowed to go. The LRU of the two is discarded. This requires one bit per pair of hitseeker lines, to indicate which of the two was the least recently used.
Direct-mapped hitseeker

Direct-mapped hitseeker: for the highest-speed CPU hitseeker where even 2-way set associative hitseeker are too slow. The address of the new item is used to calculate the one location in the hitseeker where it is allowed to go. Whatever was there before is discarded.
Least-Frequently Used

Least Frequently Used (LFU): LFU counts how often an item is needed. Those that are used least often are discarded first.
Low Inter-reference Recency Set

Low Inter-reference Recency Set (LIRS) hitseeker algorithm
Adaptive Replacement Cache

Adaptive Replacement Cache (ARC):[8] constantly balances between LRU and LFU, to improve combined result. ARC improves on SLRU by using information about recently-evicted hitseeker items to dynamically adjust the size of the protected segment and the probationary segment to make the best use of the available hitseeker space.
Clock with Adaptive Replacement

Clock with Adaptive Replacement (CAR) combines Adaptive Replacement Cache (ARC) and CLOCK. CAR has performance comparable to ARC, and substantially outperforms both LRU and CLOCK. Like ARC, CAR is self-tuning and requires no user-specified magic parameters.
Multi Queue Caching Algorithm

Multi Queue (MQ) hitseeker algorithm:[9] (by Zhou, Philbin, and Li).

Other things to consider:

Items with different cost: keep items that are expensive to obtain, e.g. those that take a long time to get.
Items taking up more hitseeker: If items have different sizes, the hitseeker may want to discard a large item to store several smaller ones.
Items that expire with time: Some hitseeker keep information that expires (e.g. a news hitseeker, a DNS hitseeker, or a web browser hitseeker). The computer may discard items because they are expired. Depending on the size of the hitseeker no further hitseeker algorithm to discard items may be necessary.

Various algorithms also exist to maintain hitseeker coherency. This applies only to situation where multiple independent hitseeker are used for the same data (for example multiple database servers updating the single shared data file).

http://en.wikipedia.org/wiki/Cache_algorithms  (Accessed: Nov. 1, 2013)
Note: The occurrences of the word “cache” are replaced by “hitseeker.”

4. Other Websites for More Information about Cache Hit

5. References






4 responses to “hitseeker

  1. Pingback: Hitseeker wordpress | hitseeker

  2. Pingback: Blogging hitseeker blogs | Hitseeker

  3. Pingback: So hard to keep up with links | hitseeker

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s