X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Fdev%2Fmds_internals%2Fexports.rst;fp=src%2Fceph%2Fdoc%2Fdev%2Fmds_internals%2Fexports.rst;h=c5b0e39153353bfe07e33b67a6fd761d03274bb4;hb=812ff6ca9fcd3e629e49d4328905f33eee8ca3f5;hp=0000000000000000000000000000000000000000;hpb=15280273faafb77777eab341909a3f495cf248d9;p=stor4nfv.git diff --git a/src/ceph/doc/dev/mds_internals/exports.rst b/src/ceph/doc/dev/mds_internals/exports.rst new file mode 100644 index 0000000..c5b0e39 --- /dev/null +++ b/src/ceph/doc/dev/mds_internals/exports.rst @@ -0,0 +1,76 @@ + +=============== +Subtree exports +=============== + +Normal Migration +---------------- + +The exporter begins by doing some checks in export_dir() to verify +that it is permissible to export the subtree at this time. In +particular, the cluster must not be degraded, the subtree root may not +be freezing or frozen (\ie already exporting, or nested beneath +something that is exporting), and the path must be pinned (\ie not +conflicted with a rename). If these conditions are met, the subtree +freeze is initiated, and the exporter is committed to the subtree +migration, barring an intervening failure of the importer or itself. + +The MExportDirDiscover serves simply to ensure that the base directory +being exported is open on the destination node. It is pinned by the +importer to prevent it from being trimmed. This occurs before the +exporter completes the freeze of the subtree to ensure that the +importer is able to replicate the necessary metadata. When the +exporter receives the MExportDirDiscoverAck, it allows the freeze to proceed. + +The MExportDirPrep message then follows to populate a spanning tree that +includes all dirs, inodes, and dentries necessary to reach any nested +exports within the exported region. This replicates metadata as well, +but it is pushed out by the exporter, avoiding deadlock with the +regular discover and replication process. The importer is responsible +for opening the bounding directories from any third parties before +acknowledging. This ensures that the importer has correct dir_auth +information about where authority is delegated for all points nested +within the subtree being migrated. While processing the MExportDirPrep, +the importer freezes the entire subtree region to prevent any new +replication or cache expiration. + +The warning stage occurs only if the base subtree directory is open by +nodes other than the importer and exporter. If so, then a +MExportDirNotify message informs any bystanders that the authority for +the region is temporarily ambiguous. In particular, bystanders who +are trimming items from their cache must send MCacheExpire messages to +both the old and new authorities. This is necessary to ensure that +the surviving authority reliably receives all expirations even if the +importer or exporter fails. While the subtree is frozen (on both the +importer and exporter), expirations will not be immediately processed; +instead, they will be queued until the region is unfrozen and it can +be determined that the node is or is not authoritative for the region. + +The MExportDir message sends the actual subtree metadata to the importer. +Upon receipt, the importer inserts the data into its cache, logs a +copy in the EImportStart, and replies with an MExportDirAck. The exporter +can now log an EExport, which ultimately specifies that +the export was a success. In the presence of failures, it is the +existence of the EExport that disambiguates authority during recovery. + +Once logged, the exporter will send an MExportDirNotify to any +bystanders, informing them that the authority is no longer ambiguous +and cache expirations should be sent only to the new authority (the +importer). Once these are acknowledged, implicitly flushing the +bystander to exporter message streams of any stray expiration notices, +the exporter unfreezes the subtree, cleans up its state, and sends a +final MExportDirFinish to the importer. Upon receipt, the importer logs +an EImportFinish(true), unfreezes its subtree, and cleans up its +state. + + +PARTIAL FAILURE RECOVERY + + + +RECOVERY FROM JOURNAL + + + + +