cachefilesd缓存项目介绍

原文引自这里

FS Cache

FS-Cache是一种内核功能,网络文件系统或其他文件系统可以通过它来缓存数据到本地磁盘空间,减少网络传输的数据,从而提升性能。这在网络速度比较慢时会得到比较好的效果。

FS-Cache 可以被任何希望添加本地缓存的文件系统使用,例如:AFS、NFS、CIFS和Isofs。

FS-Cache 对于客户端文件系统是透明存在的,当这项功能开启的时候,透过缓存请求文件对于客户端是无感知的。可以参考下图,FS-Cache可以认为是网络文件系统和缓存后端的中间介质:
avatar

看一个更详细的图,FS-Cache为网络文件系统提供了一个缓存工具,从而让缓存对用户无感知
avatar

FS-Cache并不遵循在允许访问之前将所有完全打开的每个netfs文件完全加载到高速缓存中,主要有以下几个原因:

  1. 没有Cache 也应该能够正常操作
  2. 被访问的文件的大小不应该受限于Cache的空间大小
  3. 所有已经打开的文件大小不应该受限于Cache的空间大小
  4. 不应该强制用户为了一个文件操作(访问文件的一小部分)将整个文件全部进行下载缓存

FS-Cache提供的能力如下:

1
2
3
4
5
6
7
8
9
10
1. More than one cache can be used at once. Caches can be selected explicitly by use of tags. 一次可以使用多个Cache,不同的Cache使用不同的tag区分
2. Caches can be added / removed at any time. Cache可以在任何时间被移除或者添加
3. The netfs is provided with an interface that allows either party to withdraw caching facilities from a file (required for (2)). 网络文件系统提供接口,能够允许其他方删除文件cache相关的能力
4. The interface to the netfs returns as few errors as possible, preferring rather to let the netfs remain oblivious. 网络文件系统尽量不要返回错误
5. Cookies are used to represent indices, files and other objects to the netfs. The simplest cookie is just a NULL pointer - indicating nothing cached there. 使用cookie表示文件系统的目录、文件、其他对象
6. The netfs is allowed to propose - dynamically - any index hierarchy it desires, though it must be aware that the index search function is recursive, stack space is limited, and indices can only be children of indices.
7. Data I/O is done direct to and from the netfs’s pages. The netfs indicates that page A is at index B of the data-file represented by cookie C, and that it should be read or written. The cache backend may or may not start I/O on that page, but if it does, a netfs callback will be invoked to indicate completion. The I/O may be either synchronous or asynchronous.
8. Cookies can be “retired” upon release. At this point FS-Cache will mark them as obsolete and the index hierarchy rooted at that point will get recycled.
9. The netfs provides a “match” function for index searches. In addition to saying whether a match was made or not, this can also specify that an entry should be updated or deleted.
10. As much as possible is done asynchronously.

FS-Cache维护了一个网络文件系统数据的全部索引,该信息可以位于一个或者多个cache中,如下图所示:

avatar

1
2
3
4
In the example above, you can see two netfs’s being backed: NFS and AFS. These have different index hierarchies:

* The NFS primary index contains per-server indices. Each server index is indexed by NFS file handles to get data file objects. Each data file objects can have an array of pages, but may also have further child objects, such as extended attributes and directory entries. Extended attribute objects themselves have page-array contents.
* The AFS primary index contains per-cell indices. Each cell index contains per-logical-volume indices. Each of volume index contains up to three indices for the read-write, read-only and backup mirrors of those volumes. Each of these contains vnode data file objects, each of which contains an array of pages.

Kernel 内部的Cache 管理

FS-Cache维护类内核形态的网络文件感兴趣的对象,这些对象使用fscache_cookie 结构体来表示,以cookie的方式被引用。

FS-Cache 也单独维护了内核形态的 缓存后端正在使用的对象cache

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
FS-Cache maintains an in-kernel representation of each object that a netfs is currently interested in. Such objects are represented by the fscache_cookie struct and are referred to as cookies.

FS-Cache also maintains a separate in-kernel representation of the objects that a cache backend is currently actively caching. Such objects are represented by the fscache_object struct. The cache backends allocate these upon request, and are expected to embed them in their own representations. These are referred to as objects.


There is a 1:N relationship between cookies and objects. A cookie may be represented by multiple objects - an index may exist in more than one cache - or even by no objects (it may not be cached).

Furthermore, both cookies and objects are hierarchical. The two hierarchies correspond, but the cookies tree is a superset of the union of the object trees of multiple caches:

NETFS INDEX TREE : CACHE 1 : CACHE 2
: :
: +-----------+ :
+----------->| IObject | :
+-----------+ | : +-----------+ :
| ICookie |-------+ : | :
+-----------+ | : | : +-----------+
| +------------------------------>| IObject |
| : | : +-----------+
| : V : |
| : +-----------+ : |
V +----------->| IObject | : |
+-----------+ | : +-----------+ : |
| ICookie |-------+ : | : V
+-----------+ | : | : +-----------+
| +------------------------------>| IObject |
+-----+-----+ : | : +-----------+
| | : | : |
V | : V : |
+-----------+ | : +-----------+ : |
| ICookie |------------------------->| IObject | : |
+-----------+ | : +-----------+ : |
| V : | : V
| +-----------+ : | : +-----------+
| | ICookie |-------------------------------->| IObject |
| +-----------+ : | : +-----------+
V | : V : |
+-----------+ | : +-----------+ : |
| DCookie |------------------------->| DObject | : |
+-----------+ | : +-----------+ : |
| : : |
+-------+-------+ : : |
| | : : |
V V : : V
+-----------+ +-----------+ : : +-----------+
| DCookie | | DCookie |------------------------>| DObject |
+-----------+ +-----------+ : : +-----------+
: :
In the above illustration, ICookie and IObject represent indices and DCookie and DObject represent data storage objects. Indices may have representation in multiple caches, but currently, non-index objects may not. Objects of any type may also be entirely unrepresented.

As far as the netfs API goes, the netfs is only actually permitted to see pointers to the cookies. The cookies themselves and any objects attached to those cookies are hidden from it.

对象管理状态机

1
2
3
4
5
6
7
8
9
10
Within FS-Cache, each active object is managed by its own individual state machine. The state for an object is kept in the fscache_object struct, in object->state. A cookie may point to a set of objects that are in different states.

Each state has an action associated with it that is invoked when the machine wakes up in that state. There are four logical sets of states:

* Preparation: states that wait for the parent objects to become ready. The representations are hierarchical, and it is expected that an object must be created or accessed with respect to its parent object.
* Initialisation: states that perform lookups in the cache and validate what’s found and that create on disk any missing metadata.
* Normal running: states that allow netfs operations on objects to proceed and that update the state of objects.
* Termination: states that detach objects from their netfs cookies, that delete objects from disk, that handle disk and system errors and that free up in-memory resources.

In most cases, transitioning between states is in response to signalled events. When a state has finished processing, it will usually set the mask of events in which it is interested (object->event_mask) and relinquish the worker thread. Then when an event is raised (by calling fscache_raise_event()), if the event is not masked, the object will be queued for processing (by calling fscache_enqueue_object()).

Provision of CPU Time

The work to be done by the various states was given CPU time by the threads of the slow work facility. This was used in preference to the workqueue facility because:

Threads may be completely occupied for very long periods of time by a particular work item. These state actions may be doing sequences of synchronous, journalled disk accesses (lookup, mkdir, create, setxattr, getxattr, truncate, unlink, rmdir, rename).

Threads may do little actual work, but may rather spend a lot of time sleeping on I/O. This means that single-threaded and 1-per-CPU-threaded workqueues don’t necessarily have the right numbers of threads.

Locking Simplification

Because only one worker thread may be operating on any particular object’s state machine at once, this simplifies the locking, particularly with respect to disconnecting the netfs’s representation of a cache object (fscache_cookie) from the cache backend’s representation (fscache_object) - which may be requested from either end.

状态集合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
The Set of States
The object state machine has a set of states that it can be in. There are preparation states in which the object sets itself up and waits for its parent object to transit to a state that allows access to its children:

1. State FSCACHE_OBJECT_INIT.

Initialise the object and wait for the parent object to become active. In the cache, it is expected that it will not be possible to look an object up from the parent object, until that parent object itself has been looked up.

There are initialisation states in which the object sets itself up and accesses disk for the object metadata:

2. State FSCACHE_OBJECT_LOOKING_UP.

Look up the object on disk, using the parent as a starting point. FS-Cache expects the cache backend to probe the cache to see whether this object is represented there, and if it is, to see if it’s valid (coherency management).

The cache should call fscache_object_lookup_negative() to indicate lookup failure for whatever reason, and should call fscache_obtained_object() to indicate success.

At the completion of lookup, FS-Cache will let the netfs go ahead with read operations, no matter whether the file is yet cached. If not yet cached, read operations will be immediately rejected with ENODATA until the first known page is uncached - as to that point there can be no data to be read out of the cache for that file that isn’t currently also held in the pagecache.

3. State FSCACHE_OBJECT_CREATING.

Create an object on disk, using the parent as a starting point. This happens if the lookup failed to find the object, or if the object’s coherency data indicated what’s on disk is out of date. In this state, FS-Cache expects the cache to create

The cache should call fscache_obtained_object() if creation completes successfully, fscache_object_lookup_negative() otherwise.

At the completion of creation, FS-Cache will start processing write operations the netfs has queued for an object. If creation failed, the write ops will be transparently discarded, and nothing recorded in the cache.

There are some normal running states in which the object spends its time servicing netfs requests:

4. State FSCACHE_OBJECT_AVAILABLE.

A transient state in which pending operations are started, child objects are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary lookup data is freed.

5. State FSCACHE_OBJECT_ACTIVE.

The normal running state. In this state, requests the netfs makes will be passed on to the cache.

6. State FSCACHE_OBJECT_INVALIDATING.

The object is undergoing invalidation. When the state comes here, it discards all pending read, write and attribute change operations as it is going to clear out the cache entirely and reinitialise it. It will then continue to the FSCACHE_OBJECT_UPDATING state.

7. State FSCACHE_OBJECT_UPDATING.

The state machine comes here to update the object in the cache from the netfs’s records. This involves updating the auxiliary data that is used to maintain coherency.

And there are terminal states in which an object cleans itself up, deallocates memory and potentially deletes stuff from disk:

8. State FSCACHE_OBJECT_LC_DYING.

The object comes here if it is dying because of a lookup or creation error. This would be due to a disk error or system error of some sort. Temporary data is cleaned up, and the parent is released.

9. State FSCACHE_OBJECT_DYING.

The object comes here if it is dying due to an error, because its parent cookie has been relinquished by the netfs or because the cache is being withdrawn.

Any child objects waiting on this one are given CPU time so that they too can destroy themselves. This object waits for all its children to go away before advancing to the next state.

10. State FSCACHE_OBJECT_ABORT_INIT.

The object comes to this state if it was waiting on its parent in FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself so that the parent may proceed from the FSCACHE_OBJECT_DYING state.

11. State FSCACHE_OBJECT_RELEASING.

12. State FSCACHE_OBJECT_RECYCLING.

The object comes to one of these two states when dying once it is rid of all its children, if it is dying because the netfs relinquished its cookie. In the first state, the cached data is expected to persist, and in the second it will be deleted.

13. State FSCACHE_OBJECT_WITHDRAWING.

The object transits to this state if the cache decides it wants to withdraw the object from service, perhaps to make space, but also due to error or just because the whole cache is being withdrawn.

14. State FSCACHE_OBJECT_DEAD.

The object transits to this state when the in-memory object record is ready to be deleted. The object processor shouldn’t ever see an object in this state.

CacheFiles介绍

CacheFiles,是属于Linux Kernel的一个模块,主要用于缓存已经挂载的文件系统,CacheFiles 是一个缓存后端,当一个文件系统挂载到本地时,可以基于CacheFiles做一个缓存目录,CacheFi 使用一个用户空间的守护进程进行cache管理,例如收割陈旧的节点和剔除,这个守护进程被称为cachefilesd

缓存的文件系统和数据完整性与后端服务的文件系统一样好,由于不同文件系统的日志记录接口都是特殊定义的,因此CacheFiles不会尝试记录任文件系统日志

CacheFiles 会创建一个混杂的字符设备”/dev/cachefiles”,用于与守护进程进行通信,这个设备一次打开只能做一次事情,当它打开时,至少存在部分缓存,守护进程打开 并发送指令用于控制缓存,CacheFiles目前只能用于一个单独的缓存

CacheFiles会尝试维护文件系统一定比例的空闲空间,可能会通过剔除部分cache用户缩小cache的大小,用于释放空间,这就意味着可以在同一介质上存放灵活的实时数据,可能会扩展来使用空闲的空间,也可能收缩

Requiremenets

使用CacheFiles 需要以下依赖

  • dnotify: 对文件信号进行监听
  • extended attribute(xattrs)
  • openat() and friends
  • bmap() support on files in the filesystem (FIBMAP ioctl)
  • the use of bmap() to detect a partial page at the end of the file

配置

配置文件 /etc/cachefilesd.conf,配置文件的主要内容为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>%
Configure the culling limits. Optional. See the section on culling The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.

The commands beginning with a ‘b’ are file space (block) limits, those beginning with an ‘f’ are file count limits(也可以限制文件数量).

dir <path> 存放缓存的根目录
Specify the directory containing the root of the cache. Mandatory.
tag <name>
Specify a tag to FS-Cache to use in distinguishing multiple caches. Optional. The default is “CacheFiles”.
debug <mask> 用于开启日志
Specify a numeric bitmask to control debugging in the kernel module. Optional. The default is zero (all off). The following values can be OR’d into the mask to collect various information:

1 Turn on trace of function entry (_enter() macros)
2 Turn on trace of function exit (_leave() macros)
4 Turn on trace of internal debug points (_debug())
This mask can also be set through sysfs, eg:

echo 5 >/sys/modules/cachefiles/parameters/debug

启动服务

启动守护进程,该守护进程打开 cache 设备(/dev/cachefiles),配置cache,并开始进行cache,此时cache绑定fscache,cache开始运行。具体的启动命令和参数如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
The daemon is run as follows:

/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
The flags are:

-d
Increase the debugging level. This can be specified multiple times and is cumulative with itself.
-s
Send messages to stderr instead of syslog.
-n
Don’t daemonise and go into background.
-f <configfile>
Use an alternative configuration file rather than the default one.

缓存剔除

缓存偶尔需要进行清理,用于释放空间,这里主要将近期未被使用的cache进行清理,基于文件的访问时间进行判断,如果空目录没有在使用也会被清理。Cache的清理是基于配置的当前文件系统的block比例和文件比例,主要有6个限制,如下所示

1
2
3
4
5
6
brun, frun
If the amount of free space and the number of available files in the cache rises above both these limits, then culling is turned off.当缓存中空闲的block和file 都高于该值时,不进行缓存剔除
bcull, fcull
If the amount of available space or the number of available files in the cache falls below either of these limits, then culling is started.当缓存中可以使用的block或者files 有一个低于该值时,进行缓存剔除
bstop, fstop
If the amount of available space or the number of available files in the cache falls below either of these limits, then no further allocation of disk space or files is permitted until culling has raised things above these limits again.当缓存中可以使用的block或者files有一个低于该值时,除非缓存剔除机制进行了缓存剔除,否则不会再分配磁盘空间,

通常配置是这样

1
2
0 <= bstop < bcull < brun < 100
0 <= fstop < fcull < frun < 100

需要注意,这些值是表示的可以使用的空间和文件,并不是100 减去使用df 查看的信息,用户空间的守护进程扫描cache,来建立一个需要提出的对象表,基于最少使用原则进行剔除。“ A new scan of the cache is started as soon as space is made in the table”,如果对象的atimes(最后访问时间)发生了变化,不会进行剔除,或者内核模块通知说该文件仍然在使用,也不会删除该cache

缓存结构

会存在两个目录

  • cache/
  • graveyard/

活动的cache 对象会存放在 cache/ 目录。CacheFile的内核模块会将不再使用或者剔除的对象移动到graveyard 目录,守护进程会在graveyard进行删除,守护进程使用dnotify来监控graveyard目录,然后会将graveyard存在的对象删除。

  • CacheFiles 模块将索引对象使用目录的方式进行表示,目录名称可能是”I….”或者”J….”

  • 没有子对象的数据对象会以文件的方式进行表示,有子对象的数据对象会以目录的形式进行表示,文件名称可能是”D…”或者”E….”。如果表示目录,那么会有一个叫”data”的文件在该目录,用户真实的保存数据

  • 特殊的对象,通数据对象类似,不过文件名是以”S….”或者”T…”的形式

如果一个对象有子对象,那么他会以目录的形式表示,在该目录下会有一系列子目录,子目录的名称以@+子对象的哈希值命名,如下所示

1
2
3
4
5
6
 /INDEX    /INDEX     /INDEX                            /DATA FILES
/=========/==========/=================================/================
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry

如果文件名称太长,超过了NAME_MAX的大小,那么会被分成多分,第一份被用于创建嵌套目录,最后一份会位于最后一个目录,每个中间目录的名称会以”+”作为前缀,例如:

1
J1223/@23/+xy...z/+kl...m/Epqr
1
2
3
4
5
6
7
8
9
10
11
12
13
Note that keys are raw data, and not only may they exceed NAME_MAX in size, they may also contain things like ‘/’ and NUL characters, and so they may not be suitable for turning directly into a filename.

To handle this, CacheFiles will use a suitably printable filename directly and “base-64” encode ones that aren’t directly suitable. The two versions of object filenames indicate the encoding:

OBJECT TYPE PRINTABLE ENCODED
Index “I…” “J…”
Data “D…” “E…”
Special “S…” “T…”
Intermediate directories are always “@” or “+” as appropriate.

Each object in the cache has an extended attribute label that holds the object type ID (required to distinguish special objects) and the auxiliary data from the netfs. The latter is used to detect stale objects in the cache and update or retire them.

Note that CacheFiles will erase from the cache any file it doesn’t recognise or any file of an incorrect type (such as a FIFO file or a device file).

转载的阅读摘要

1
2
3
4
5
6
7
8
9
10
11
12
13
14
yum install cachefilesd; 

挂载命令:直接mount服务端共享的目录到本地的/mnt目录,必须使用-o fsc参数选项;

All access to files under /mount/point will go through the cache, unless the file is opened for direct I/O or writing;

Opening a file from a shared file system for direct I/O automatically bypasses the cache. This is because this type of access must be direct to the server.

To avoid coherency management problems between superblocks, all NFS superblocks that wish to cache data have unique Level 2 keys. Normally, two NFS mounts with same source volume and options share a superblock, and thus share the caching, even if they mount different directories within that volume.


Opening a file from a shared file system for writing will not work on NFS version 2 and 3. 因为没有足够的维持并发写的一致性信息;

Furthermore, this release of FS-Cache only caches regular NFS files. FS-Cache will not cache directories, symlinks, device files, FIFOs and sockets. 其只对文件数据进行cache的操作。