Reading notes by Yu Guan: Temporal Locality in Today’s Content Caching: Why it Matters and How to Model it

2017-10-10

Posted by 关宇

Temporal Locality in Today’s Content Caching: Why it Matters and How to Model it

 

This is a work about test data generation for content caching.

 

Main idea

When we design a cache, we usually need some request traces to test the performance of cache. The most popular trace is IRM (Independent Reference Model). (generating traces according to Zipf distribution) However, IRM loses the temporal locality of real traces so it can not approximately measure the performance of cache. So authors analyze real traces in VoD system and present a new model——SNM (Shot Noise Model), to replace IRM, which can be used for testing performance of cache.

 

Contributions

  1. Analyze the performance of IRM trace generation method and explain why it can not accurately evaluate the performance of cache.
  2. Present a new model SNM trace generation method and experimental results shows that it can accurately evaluate the performance of cache.

 

In this reading notes, I divide this paper into 3 parts: (1) Why we need a request trace generation method for cache evaluation? (2) Weakness of IRM and why? (3) SNM and its strength.

 

 

 

 

Why we need a request trace generation method for cache evaluation?

Content cacheing plays a critical role in many network architectures, such as ICN, CDN. But shen evaluating the benefits of these proposals, researchers are faced with the question of what to evaluate their methodology.

  1. Trace-driven analysis is effective only when large data sets are available. Many researchers even don’t have real data traces.
  2. This kind of analysis does not allow users to test potential changes in the traffic profile.

 

Weakness of IRM and why?

Weakness: Specification of the content popularity distribution, subject to a given content catalogue of size N.

 

Experiment 1: Collect a real trace. Then split the trace into K slices. Authors compare the distribution of different K and different slices in a same K. Results show that these distributions are very different.

图片 1

Experiment 2: We also split the trace into K slices. Then randomly reorder requests in each slice to break the temporal locality in varying degrees. Then use these traces to test the performance of LRU cache. Results show that cache performs better in traces with greater K.

图片 2

Experiment 3: During different time of one day, the frequency of requests differs greatly. But it does not influence the performance of cache.

图片 3

Experiment 4: Draw Cdf gram for each content. The shape of the line for different content different greatly.

图片 4

SNM and its strength

The goal of designing SNM: maintain the generality and flexibility of the IRM approach, but improve on its accuracy without significantly increasing the model’s complexity.

 

Each content has 3 physical parameters:

 

V: the average number of requests generated by the content

λ, l: λ is popularity profile, describing how the request rate for a given object evolves over time. L is the lifetime of the content.

τ: the time instant at which the content enters the system

Then we divide content into 6 groups:

Group 0: V < 10

Group 1: V >= 10 && l <= 2

Group 2: V >= 10 && 2 < l <= 5

Group 3: V >= 10 && 5 < l <= 8

Group 4: V >= 10 && 8 < l <= 13

Group 5: V >= 10 && l > 13

For group 0 and 5, simply use IRM. Because group 0 has few requests, it is difficult to build a distribution. Group 5 has too long lifetime, so it may be a long-term content which exist not only in the test time interval. So simply apply IRM is a good choice. For group 1 to 4, SNM is applied.

 

SNM generation method:

  1. Set τ for each content. (homogeneous poison)
  2. Set l for each content to the average l in its class.
  3. Set V for each content. (V and τ form a distribution)

 

Model validation

Authors tries different type of λ. Results show that different λ cause little change of performance. In conclusion, testing performance using SNM model is more accurate than testing performance using IRM model.

 

 

Some thinking:

This paper mentions pattern of request traces for a content. I believe it can be used not only in building test data, it can be also used for popularity prediction. Given a part of the pattern of a trace, we can match this pattern with existed pattern and then predict what the rest of the part should be looked like.