论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2012-02-16 19:55 |只看该作者 |倒序浏览

Redis 或弃用当前 VM 机制，采用新的 diskstore 模型

Redis 的作者 Salvatore Sanfilippo（@antirez）今天在 Redis 的 Google Group 上发表了一篇文章，表明他对当前Redis 的VM机制并不满意，并称正在着手修改成一种新的实现逻辑。下面是主要内容的翻译。原文请看：http://goo.gl/uMKQN

要将数据持久化存储，大概考虑有这样三种方式：

1.使用虚拟内存，即将冷数据放磁盘热并保存一份映射。（目前Redis使用的方式）
2.将数据以内存映射的方式存磁盘，操作数据时直接操作磁盘，然后使用操作系统的Cache作为操作缓冲层。（作者称其为MongoDB的方式）
3.将数据按自定义的格式存磁盘，但操作数据时并不直接操作磁盘，而是操作内存并在某种条件下将内存数据写到磁盘上。（作者打算使用的新方式）
当前VM的坏处：

1.slow restart 重启太慢
2.slow saving 保存数据太慢
3.slow replication 上面两条导致 replication 太慢
4.complex code 代码过于复杂
最后决定使用第三种方式，我们称这种实现为diskstore，下面是对diskstore的一些实现描述：

- In diskstore key-value paris are stored on disk.
在diskstore实现中，我们把键值对存在磁盘上

- Memory works as a cache for live objects. Operations are only performed on in memory keys, so data on disk does not need to be stored in complex forms.
内存中保存着热数据，数据的写操作全部记录在内存中，并不直接操作磁盘上的数据，所以磁盘数据可以按我们的方便进行排列。

- The cache-max-memory limit is strict. Redis will never use more RAM,even if we have 2 MB of max memory and 1 billion of keys. This works since now we don’t need to take keys in memory.
cache-max-memory 设置将会严格支持，就算我们设置这个值为2M而我们有10亿数据，内存使用也不会超过，之所以能做到这样，是因为我们连key值都不存在内存中。（是这意思么，为什么能做到这样？)

- Data is flushed on disk asynchronously. If a key is marked as dirty, and IO operation is scheduled for this key.
如果有数据写操作存在的话，内存中的数据会被异步地flush到磁盘。

- You can control the delay between modifications of keys and disk writes, so that if a key is modified a lot of time in small time, it will written only one time on disk.
你可以控制从在内存中修改值到这个修改flush到磁盘的时间延迟，这样可能一个值在这段延迟的时间内被修改了多次但是只会写一次磁盘。

- Setting the delay to 0 means, sync it as fast as possible.
如果设置这个延迟为0的话，则会尽可能快地将写操作执行到磁盘上。

- All I/O is performed by a single dedicated thread, that is long-running and not spawned on demand. The thread is awaked with a conditional variable.
所有的磁盘IO操作都由一个专门的线程来执行，这个线程会一直执行不退出，这个线程由一个条件变量控制执行。

- The system is much simpler and sane than VM implementation, as there is no need to “undo” operations on race conditions.
datastore实现中，我们不用再考虑在存在有竞争条件下的undo操作，因此在实现上更简洁。

- Zero start-up time… as objects are loaded on demand.
启动时间接近到0，几乎不需要任何加载操作。

- There is negative caching. If a key is not on disk we remember it (if there is memory to do so). So we avoid accessing the disk again and again for keys that are not there.
在这个实现中，我们会实现一个不存在标记缓存，如果一个值在磁盘上不存在，那么在取这个值的时候我们会在内存中标记，那在再次取这个值的时候，我们就不用再去磁盘查看这个值是否在了。

- The system is very fast if we access mostly our working set, and this working set happens to fit memory. Otherwise the system is much slower (I/O bound).
如果我们的热数据和我们的内存大小差不多或者小于内存大小的话，那系统会非常快，但是如果热数据量比内存限制还大，那可能就会遭遇IO瓶颈了。这是无法避免的。

- The system does not support BGSAVE currently, but will support this, and what is cool, with minimal overhead and used memory in the saving child, as data on disk is already written using the same serialization format as .rdb files. So our child will just copy files to obtain the .rdb. In the mean time the objects in cache are not flushed, so the system may use more memory, but it’s not about copy-on-write, so it will use very very little additional memory.
目前系统并不支持BGSAVE操作，但是即将会支持，我们可以想像，在diskstore的系统上实现BGSAVE将会中有很小的开销，因为我们的磁盘文件本来就是按照和.rdb文件一样的格式保存的，所以BGSAVE操作只需要复制数据文件就可以得到这个.rdb文件了。在复制文件的时候，内存中的写操作暂时会停止写到磁盘，所以系统在这个时候会把内存超出一点，但由于他不是写时复制，不会超出太多。（最后这个但是不太明白）

- Persistence is *PER KEY* this means, there is no point in time persistence.
数据写入磁盘的持久化是操作是针对每个键来做的，所以并不存在某个时间点所有的数据都是持久化到磁盘中的。

相关链接：新浪微博timyang同学的相关文章：Redis新的存储模式diskstore

文库|博客

如果有一天21

家境小康

论坛徽章:: 0

2楼 [报告]

发表于 2012-02-17 22:20 |只看该作者

谢谢分享

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

返回列表

Chinaunix › 论坛 › 数据库技术 › NoSQL技术 › Redis 或弃用当前 VM 机制，采用新的 diskstore 模型

[Redis] Redis 或弃用当前 VM 机制，采用新的 diskstore 模型 [复制链接]

浏览过的版块