- 分片 >
- 片键
片键¶
On this page
片键决定了集群中一个集合的 documents 在不同 shards 中的分布.片键字段必须被索引,且在集合中的每条记录都不能为空,可以是单个字段或复合字段.
MongoDB使用片键的范围把数据分布在分片中,每个范围,又称为数据块,定义了一个不重叠的片键范围,MongoDB把数据块与他们存储的文档分布到集群中的不同分片中.
当一个数据块的大小超过 数据块最大大小 时,MongoDB会依据片键的范围将数据块 分裂为 更小的数据块.
重要
Once you shard a collection, the shard key and the shard key values are immutable; i.e.
片键在写入后不能被改变,参见 集合的限制 以获取更多信息.
- You cannot update the values of the shard key fields.
Shard Key Specification¶
To shard a collection, you must specify the target collection and the shard key to the sh.shardCollection() method:
sh.shardCollection( namespace, key )
如果在一个空的集合创建哈希片键,MongoDB会自动创建并迁移数据块,以保证每个分片上都有两个数据块,你可以在执行 shardCollection 指定 numInitialChunks 参数以控制初始化时MongoDB创建的数据块数目,或者手动调用 split 命令在分片上分裂数据块.
要在集合上使用哈希片键,参见 /tutorial/shard-collection-with-a-hashed-shard-key .
Shard Key Indexes¶
对使用了哈希片键分片的集合进行请求时,MongoDB会自动计算哈希值,应用方 不需要 解析哈希值.
- If the collection is empty, sh.shardCollection() creates the index on the shard key if such an index does not already exists.
片键可以影响数据在分片间的分布,也影响 mongos 对集群直接操作的效率,因此可以影响集群的读写性能, 可以考虑以下的操作受片键的影响.
If you drop the last valid index for the shard key, recover by recreating an index on just the shard key.
Unique Indexes¶
For a sharded collection, only the _id field index and the index on the shard key or a compound index where the shard key is a prefix can be unique:
- You cannot shard a collection that has unique indexes on other fields.
- You cannot create unique indexes on other fields for a sharded collection.
一些片键会使应用程序能够达到集群能够提供的最大的写性能,有一些则不能,比如使用默认的 _id 做片键的情况.
在插入文档时,MongoDB会生成一个全局唯一的 ObjectId 标识符_id,不过,需要注意的一点是, 这个标识符的前几位代表时间戳,这意味着_id是以常规的并且可预测的方式增长,即使_id有 大的基数 ,在使用 _id或者任意其他单调递增的数据 作为片键时,所有的写入操作都会集中到一个分片中
片键可以影响数据在分片间的分布,也影响 mongos 对集群直接操作的效率,因此可以影响集群的读写性能, 可以考虑以下的操作受片键的影响.
不过,如果你的写入频率很低或者大多都是 update() 操作,单调递增的片键不会对性能有很大影响,一般来说,选择的片键要 同时 具有较大的基数与将请求分布在整个集群中两个特性.
You cannot specify a unique constraint on a hashed index.
通常,一个经过计算的片键会有一定的”随机性”,比如一个包含了其他字段加密哈希(例如 MD5或者SHA1)的片键,会使集群具有较好的写扩展性能.不过,随机的片键通常不会提供 查询隔离 的特性,而查询隔离同样是片键一个很重要的特性.¶
MongoDB可以使用哈希片键为数据库分片,哈希片键提供了较好的写扩展性能,参见 /tutorial/shard-collection-with-a-hashed-shard-key 获得更多细节.
The shard key affects the performance and efficiency of the sharding strategy used by the sharded cluster.
The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster.
At minimum, consider the consequences of the cardinality, frequency, and rate of change of a potential shard key.
参见文档 mongos 与文档 配置服务器 部分以获得关于集群环境概览与查询的详细信息.¶
For restrictions on shard key, see Shard Key Limitations.
Collection Size¶
When sharding a collection that is not empty, the shard key can constrain the maximum supported collection size for the initial sharding operation only. See Sharding Existing Collection Data Size.
重要
A sharded collection can grow to any size after successful sharding.
Shard Key Cardinality¶
The cardinality of a shard key determines the maximum number of chunks the balancer can create. This can reduce or remove the effectiveness of horizontal scaling in the cluster.
A unique shard key value can exist on no more than a single chunk at any given time. If a shard key has a cardinality of 4, then there can be no more than 4 chunks within the sharded cluster, each storing one unique shard key value. This constrains the number of effective shards in the cluster to 4 as well - adding additional shards would not provide any benefit.
The following image illustrates a sharded cluster using the field X as the shard key. If X has low cardinality, the distribution of inserts may look similar to the following:
The cluster in this example would not scale horizontally, as incoming writes would only route to a subset of shards.
如果这个字段基数比较低(即没有足够的选择性),你需要添加第二个字段,构成复合字段片键,在使用复合片键时,数据可以被更好地分离.
If your data model requires sharding on a key that has low cardinality, consider using a compound index using a field that has higher relative cardinality.
Shard Key Frequency¶
Consider a set representing the range of shard key values - the frequency of the shard key represents how often a given value occurs in the data. If the majority of documents contain only a subset of those values, then the chunks storing those documents become a bottleneck within the cluster. Furthermore, as those chunks grow, they may become indivisible chunks as they cannot be split any further. This reduces or removes the effectiveness of horizontal scaling within the cluster.
在集群中, mongos 对从所有分片返回的数据进行合并排序,参见 mongos 和 使用索引来排序查询结果 以获得更多信息.
A shard key with low frequency does not guarantee even distribution of data across the sharded cluster. The cardinality and rate of change of the shard key also contributes to data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that has high frequency values, consider using a compound index using a unique or low frequency value.
Monotonically Changing Shard Keys¶
A shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single shard within the cluster.
This occurs because every cluster has a chunk that captures a range with an upper bound of maxKey. maxKey always compares as higher than all other values. Similarly, there is a chunk that captures a range with a lower bound of minKey. minKey always compares as lower than all other values.
If the shard key value is always increasing, all new inserts are routed to the chunk with maxKey as the upper bound. If the shard key value is always decreasing, all new inserts are routed to the chunk with minKey as the lower bound. The shard containing that chunk becomes the bottleneck for write operations.
The following image illustrates a sharded cluster using the field X as the shard key. If the values for X are monotonically increasing, the distribution of inserts may look similar to the following:
If the shard key value was monotonically decreasing, then all inserts would route to Chunk A instead.
A shard key that does not change monotonically does not guarantee even distribution of data across the sharded cluster. The cardinality and frequency of the shard key also contributes to data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that changes monotonically, consider using Hashed Sharding.