X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Fcephfs%2Fdirfrags.rst;fp=src%2Fceph%2Fdoc%2Fcephfs%2Fdirfrags.rst;h=0000000000000000000000000000000000000000;hb=7da45d65be36d36b880cc55c5036e96c24b53f00;hp=717553fea9afc2e3128b68e7cf0fb26eeacc46a4;hpb=691462d09d0987b47e112d6ee8740375df3c51b2;p=stor4nfv.git diff --git a/src/ceph/doc/cephfs/dirfrags.rst b/src/ceph/doc/cephfs/dirfrags.rst deleted file mode 100644 index 717553f..0000000 --- a/src/ceph/doc/cephfs/dirfrags.rst +++ /dev/null @@ -1,100 +0,0 @@ - -=================================== -Configuring Directory fragmentation -=================================== - -In CephFS, directories are *fragmented* when they become very large -or very busy. This splits up the metadata so that it can be shared -between multiple MDS daemons, and between multiple objects in the -metadata pool. - -In normal operation, directory fragmentation is invisbible to -users and administrators, and all the configuration settings mentioned -here should be left at their default values. - -While directory fragmentation enables CephFS to handle very large -numbers of entries in a single directory, application programmers should -remain conservative about creating very large directories, as they still -have a resource cost in situations such as a CephFS client listing -the directory, where all the fragments must be loaded at once. - -All directories are initially created as a single fragment. This fragment -may be *split* to divide up the directory into more fragments, and these -fragments may be *merged* to reduce the number of fragments in the directory. - -Splitting and merging -===================== - -An MDS will only consider doing splits and merges if the ``mds_bal_frag`` -setting is true in the MDS's configuration file, and the allow_dirfrags -setting is true in the filesystem map (set on the mons). These settings -are both true by default since the *Luminous* (12.2.x) release of Ceph. - -When an MDS identifies a directory fragment to be split, it does not -do the split immediately. Because splitting interrupts metadata IO, -a short delay is used to allow short bursts of client IO to complete -before the split begins. This delay is configured with -``mds_bal_fragment_interval``, which defaults to 5 seconds. - -When the split is done, the directory fragment is broken up into -a power of two number of new fragments. The number of new -fragments is given by two to the power ``mds_bal_split_bits``, i.e. -if ``mds_bal_split_bits`` is 2, then four new fragments will be -created. The default setting is 3, i.e. splits create 8 new fragments. - -The criteria for initiating a split or a merge are described in the -following sections. - -Size thresholds -=============== - -A directory fragment is elegible for splitting when its size exceeds -``mds_bal_split_size`` (default 10000). Ordinarily this split is -delayed by ``mds_bal_fragment_interval``, but if the fragment size -exceeds a factor of ``mds_bal_fragment_fast_factor`` the split size, -the split will happen immediately (holding up any client metadata -IO on the directory). - -``mds_bal_fragment_size_max`` is the hard limit on the size of -directory fragments. If it is reached, clients will receive -ENOSPC errors if they try to create files in the fragment. On -a properly configured system, this limit should never be reached on -ordinary directories, as they will have split long before. By default, -this is set to 10 times the split size, giving a dirfrag size limit of -100000. Increasing this limit may lead to oversized directory fragment -objects in the metadata pool, which the OSDs may not be able to handle. - -A directory fragment is elegible for merging when its size is less -than ``mds_bal_merge_size``. There is no merge equivalent of the -"fast splitting" explained above: fast splitting exists to avoid -creating oversized directory fragments, there is no equivalent issue -to avoid when merging. The default merge size is 50. - -Activity thresholds -=================== - -In addition to splitting fragments based -on their size, the MDS may split directory fragments if their -activity exceeds a threshold. - -The MDS maintains separate time-decaying load counters for read and write -operations on directory fragments. The decaying load counters have an -exponential decay based on the ``mds_decay_halflife`` setting. - -On writes, the write counter is -incremented, and compared with ``mds_bal_split_wr``, triggering a -split if the threshold is exceeded. Write operations include metadata IO -such as renames, unlinks and creations. - -The ``mds_bal_split_rd`` threshold is applied based on the read operation -load counter, which tracks readdir operations. - -By the default, the read threshold is 25000 and the write threshold is -10000, i.e. 2.5x as many reads as writes would be required to trigger -a split. - -After fragments are split due to the activity thresholds, they are only -merged based on the size threshold (``mds_bal_merge_size``), so -a spike in activity may cause a directory to stay fragmented -forever unless some entries are unlinked. -