X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Fdev%2Fcache-pool.rst;fp=src%2Fceph%2Fdoc%2Fdev%2Fcache-pool.rst;h=0000000000000000000000000000000000000000;hb=7da45d65be36d36b880cc55c5036e96c24b53f00;hp=7dc71c828e9ff7d3951b056957fed7fe7c2abba5;hpb=691462d09d0987b47e112d6ee8740375df3c51b2;p=stor4nfv.git diff --git a/src/ceph/doc/dev/cache-pool.rst b/src/ceph/doc/dev/cache-pool.rst deleted file mode 100644 index 7dc71c8..0000000 --- a/src/ceph/doc/dev/cache-pool.rst +++ /dev/null @@ -1,200 +0,0 @@ -Cache pool -========== - -Purpose -------- - -Use a pool of fast storage devices (probably SSDs) and use it as a -cache for an existing slower and larger pool. - -Use a replicated pool as a front-end to service most I/O, and destage -cold data to a separate erasure coded pool that does not currently (and -cannot efficiently) handle the workload. - -We should be able to create and add a cache pool to an existing pool -of data, and later remove it, without disrupting service or migrating -data around. - -Use cases ---------- - -Read-write pool, writeback -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -We have an existing data pool and put a fast cache pool "in front" of -it. Writes will go to the cache pool and immediately ack. We flush -them back to the data pool based on the defined policy. - -Read-only pool, weak consistency -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -We have an existing data pool and add one or more read-only cache -pools. We copy data to the cache pool(s) on read. Writes are -forwarded to the original data pool. Stale data is expired from the -cache pools based on the defined policy. - -This is likely only useful for specific applications with specific -data access patterns. It may be a match for rgw, for example. - - -Interface ---------- - -Set up a read/write cache pool foo-hot for pool foo:: - - ceph osd tier add foo foo-hot - ceph osd tier cache-mode foo-hot writeback - -Direct all traffic for foo to foo-hot:: - - ceph osd tier set-overlay foo foo-hot - -Set the target size and enable the tiering agent for foo-hot:: - - ceph osd pool set foo-hot hit_set_type bloom - ceph osd pool set foo-hot hit_set_count 1 - ceph osd pool set foo-hot hit_set_period 3600 # 1 hour - ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB - ceph osd pool set foo-hot min_read_recency_for_promote 1 - ceph osd pool set foo-hot min_write_recency_for_promote 1 - -Drain the cache in preparation for turning it off:: - - ceph osd tier cache-mode foo-hot forward - rados -p foo-hot cache-flush-evict-all - -When cache pool is finally empty, disable it:: - - ceph osd tier remove-overlay foo - ceph osd tier remove foo foo-hot - -Read-only pools with lazy consistency:: - - ceph osd tier add foo foo-east - ceph osd tier cache-mode foo-east readonly - ceph osd tier add foo foo-west - ceph osd tier cache-mode foo-west readonly - - - -Tiering agent -------------- - -The tiering policy is defined as properties on the cache pool itself. - -HitSet metadata -~~~~~~~~~~~~~~~ - -First, the agent requires HitSet information to be tracked on the -cache pool in order to determine which objects in the pool are being -accessed. This is enabled with:: - - ceph osd pool set foo-hot hit_set_type bloom - ceph osd pool set foo-hot hit_set_count 1 - ceph osd pool set foo-hot hit_set_period 3600 # 1 hour - -The supported HitSet types include 'bloom' (a bloom filter, the -default), 'explicit_hash', and 'explicit_object'. The latter two -explicitly enumerate accessed objects and are less memory efficient. -They are there primarily for debugging and to demonstrate pluggability -for the infrastructure. For the bloom filter type, you can additionally -define the false positive probability for the bloom filter (default is 0.05):: - - ceph osd pool set foo-hot hit_set_fpp 0.15 - -The hit_set_count and hit_set_period define how much time each HitSet -should cover, and how many such HitSets to store. Binning accesses -over time allows Ceph to independently determine whether an object was -accessed at least once and whether it was accessed more than once over -some time period ("age" vs "temperature"). - -The ``min_read_recency_for_promote`` defines how many HitSets to check for the -existence of an object when handling a read operation. The checking result is -used to decide whether to promote the object asynchronously. Its value should be -between 0 and ``hit_set_count``. If it's set to 0, the object is always promoted. -If it's set to 1, the current HitSet is checked. And if this object is in the -current HitSet, it's promoted. Otherwise not. For the other values, the exact -number of archive HitSets are checked. The object is promoted if the object is -found in any of the most recent ``min_read_recency_for_promote`` HitSets. - -A similar parameter can be set for the write operation, which is -``min_write_recency_for_promote``. :: - - ceph osd pool set {cachepool} min_read_recency_for_promote 1 - ceph osd pool set {cachepool} min_write_recency_for_promote 1 - -Note that the longer the ``hit_set_period`` and the higher the -``min_read_recency_for_promote``/``min_write_recency_for_promote`` the more RAM -will be consumed by the ceph-osd process. In particular, when the agent is active -to flush or evict cache objects, all hit_set_count HitSets are loaded into RAM. - -Cache mode -~~~~~~~~~~ - -The most important policy is the cache mode: - - ceph osd pool set foo-hot cache-mode writeback - -The supported modes are 'none', 'writeback', 'forward', and -'readonly'. Most installations want 'writeback', which will write -into the cache tier and only later flush updates back to the base -tier. Similarly, any object that is read will be promoted into the -cache tier. - -The 'forward' mode is intended for when the cache is being disabled -and needs to be drained. No new objects will be promoted or written -to the cache pool unless they are already present. A background -operation can then do something like:: - - rados -p foo-hot cache-try-flush-evict-all - rados -p foo-hot cache-flush-evict-all - -to force all data to be flushed back to the base tier. - -The 'readonly' mode is intended for read-only workloads that do not -require consistency to be enforced by the storage system. Writes will -be forwarded to the base tier, but objects that are read will get -promoted to the cache. No attempt is made by Ceph to ensure that the -contents of the cache tier(s) are consistent in the presence of object -updates. - -Cache sizing -~~~~~~~~~~~~ - -The agent performs two basic functions: flushing (writing 'dirty' -cache objects back to the base tier) and evicting (removing cold and -clean objects from the cache). - -The thresholds at which Ceph will flush or evict objects is specified -relative to a 'target size' of the pool. For example:: - - ceph osd pool set foo-hot cache_target_dirty_ratio .4 - ceph osd pool set foo-hot cache_target_dirty_high_ratio .6 - ceph osd pool set foo-hot cache_target_full_ratio .8 - -will begin flushing dirty objects when 40% of the pool is dirty and begin -evicting clean objects when we reach 80% of the target size. - -The target size can be specified either in terms of objects or bytes:: - - ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB - ceph osd pool set foo-hot target_max_objects 1000000 # 1 million objects - -Note that if both limits are specified, Ceph will begin flushing or -evicting when either threshold is triggered. - -Other tunables -~~~~~~~~~~~~~~ - -You can specify a minimum object age before a recently updated object is -flushed to the base tier:: - - ceph osd pool set foo-hot cache_min_flush_age 600 # 10 minutes - -You can specify the minimum age of an object before it will be evicted from -the cache tier:: - - ceph osd pool set foo-hot cache_min_evict_age 1800 # 30 minutes - - -