当前位置：

首页
/
袖珍分布式系统（一）

袖珍分布式系统（一）

看了好多分布式系统，如 Amazon's Dynamo, Google's BigTable and MapReduce, Apache's Hadoop 等，那这些系统背后有没有什么共性的东西呢？本文试图去带领大家阐述这些系统背后的思想。

在作者看来，distributed programming 主要是在处理分布式带来的两个主要的影响：

information travels at the speed of light
independent things fail independently

具体怎么理解呢？

信息的传输速度是光速，虽然已经很快了，但是还是需要时间的，另一个系统要处理的错误不仅一类，而是会很多，而且这些错误之间没有联系，彼此独立。

作者希望通过本文，让读者能认识对

distance, time and consistency models

怎么交互有个更好的理解。

Distributed systems at a high level

Distributed programming is the art of solving the same problem that you can solve on a single computer using multiple computers.

分布式系统遇到的问题和单机系统是一样的，只是解决同一个问题的时候，其系统的 constraints 不同了。

计算机系统主要解决两类问题

计算
存储

市面上各种框架其可以划分为两个领域的战争，一个是偏向底层存储的战争，一个是偏向计算的战争

偏向存储的战争有关系型数据库和非关系型数据库 (Relational vs NoSQL) 的战争，它们两者都有各自的应用特点。

关系型数据库最大的特点是事务的一致性，读写操作都是事务的，具有 ACID 的特点，它在银行这样对一致性有要求的系统中应用广泛。
而非关系型数据库一般对一致性要求不高，但支持高性能并发读写，海量数据访问，在微博、Facebook 这类 SNS 应用中广泛使用。

另外，非关系型数据库内部也有战争，比如说 HBase 和 Cassandra，前者注重一致性 (Consistency) 和可用性 (Availability)，后者提供可用性(Availability) 和分区容错性(Partition tolerance)

Redis 和 Memcached，它们都是内存内的 Key/Value 存储，但 Redis 还支持哈希表，有序集和链表等多种数据结构。

MongoDB，CouchDB 和 Couchbase 这三个文档型数据库，MongoDB 更适用于需要动态查询的场景，CouchDB 偏向于预定义查询，Couchbase 比 CouchDB 有更强的一致性，而且还可以作为 Key/Value 存储。

搜索引擎 Solr 和 Elasticsearch 的，它们都是基于 Lucene，性能上相近，但是前者在 Java/C# 开发者中大受欢迎，而后者深受 Python/PHP 开发者喜爱。

偏向计算的战争有 MapReduce 和 Spark 之间的战争，此外还有 Spark Streaming 和 Storm 之间的战争等等。

以上关于计算和存储的讨论来自

回到本文的内容上：我们为什么需要分布式？理论上如果单机系统有无限的内存，极致的计算速度，那我们是没必要将系统设计为分布式的，因为分布式引入了一些列问题，但是呢，我们没有 money 啊，我们只能妥协了，用平价机器来实现高性能。

下面回答第二个问题：我们通过分布式希望得到什么？

在作者看来所有的问题起源都是：

everything starts with the need to deal with size

everything starts with size - scalability

那什么是呢？

Scalability is the ability of a system, network, or process, to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.

：伸缩性，系统伸缩性好意味着当系统处理 growing amount of work 的时候，即使处理难度不是线性增长的，也应该是可预测的，而不是指数级增长的。

那什么是 growing amount of work 呢？作者又举了 3 个具体的例子：

Size scalability: adding more nodes should make the system linearly faster; growing the dataset should not increase latency
Geographic scalability: it should be possible to use multiple data centers to reduce the time it takes to respond to user queries, while dealing with cross-data center latency in some sensible manner.
Administrative scalability: adding more nodes should not increase the administrative costs of the system (e.g. the administrators-to-machines ratio).

上面的讨论我们可以看到每个例子都试着从存储和计算两方面进行解释，然后每个处理都是在 time，distance 和 consistency 的抉择。

随着系统 size 的增加，我们的设计需要不断满足实际的需求，而这些需求主要来自于

performance
availability

Performance (and latency)

而这两方面在实际系统中的测量方式也有很多，先来看 performance

Performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.

性能主要是通过计算机系统有效工作占计算时间和资源的比例来衡量的，更具体来说：

Short response time/low latency for a given piece of work
High throughput (rate of processing work)
Low utilization of computing resource(s)

低延迟，高吞吐，低利用率

先看延迟

Latency：The state of being latent ; delay, a period between the initiation of something and the occurrence.

Latent：From Latin latens, latentis, present participle of lateo ("lie hidden"). Existing or present but concealed or inactive.

延迟：一件事情发生到真正产生影响

举个例子：假设一个数据中心，有一个功能是通过计算所有数据，然后返回一个结果

result = query(all data in the system)

对于上面这个请求，影响延迟的因素是什么呢？

the amount of old data
the speed at which new data "takes eﬀect" in the system

是 1 还是 2 呢？

我们可以知道如果系统处理 query 的速度和新数据在系统中生效的速度一样，那么我们会发现 query 将永远不会返回，因此真正决定延迟的是 2.

另外一点我们需要注意的是：延迟是因为有新数据的产生，那如果系统没有产生数据，那就没有延迟一说。

最后一点我们无法忽略的延迟是：

the speed of light limits how fast information can travel, and
hardware components have a minimum latency cost incurred per operation (think RAM and hard drives but also CPUs).

信息传输产生的延迟以及硬件本身产生的延迟。

因此决定延迟的因素有：

操作本身的耗时
信息传输延迟

看完性能后，我们看下一个分布式系统的主需求：可用性

Availability (and fault tolerance)

Performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.

单机系统相比较于分布式系统永远无法达到的一点是：高可用或者说容错

分布式系统可以通过一群不可靠的组件，最后达到一个高可用的系统，这就是分布式的魅力。

那我们来看下什么是容错（fault tolerance）

Fault tolerance：ability of a system to behave in a well-deﬁned manner once faults occur

从定义我们可以看到，首先我们要规定问题边界：要处理的错误，基于定义好的错误我们去开发算法去解决它。如果系统发生了我们从未考虑的错误，那何来谈容错一说。

What prevents us from achieving good things?

分布式系统受限于两个物理约束：

the number of nodes (which increases with the required storage and computation capacity)
the distance between nodes (information travels, at best, at the speed of light)

简单来说就是数量和距离，上面的物理限制在增加节点的时候会带来：

an increase in the number of independent nodes increases the probability of failure in a system (reducing availability and increasing administrative costs)【降低可用性和管理成本】
an increase in the number of independent nodes may increase the need for communication between nodes (reducing performance as scale increases)【通信成本增加降低系统性能】
an increase in geographic distance increases the minimum latency for communication between distant nodes (reducing performance for certain operations)【多数据中心增加通信成本】

上面说了这么多限制，那怎么定义系统好不好呢？总的来说是从 performance 和 availability 来看，具体衡量则是我们经常说的 SLA(service level agreement)。

除了 performance 和 availability 外，还有一个很重要的点是： intelligibility，即系统好不好理解，如果系统设计的非常好，但是只有设计的人懂，那这种系统价值也是不大的。

Abstractions and models

抽象帮我们做减法，只剩下本质的东西，而模型则将抽象具现化，让我们能更精确的定义抽象的东西。

我们来举一些具体的模型的例子，这些也是之后章节会讲的东西：

System model (asynchronous / synchronous)
Failure model (crash-fail, partitions, Byzantine)
Consistency model (strong, eventual)

我们设计分布式系统的目标是： work like a single system ，但是现实中我们有很多的节点需要处理，我们很难让使用者在无感知的情况下使用分布式系统，模型越是简单，意味着提供给用户的功能越是抽象，如果用户想要获得更好的性能，更高级的功能，只能穿过给用户提供的抽象，去看系统的实现细节。

Design techniques: partition and replicate

为了存储更多的数据，我们将数据保存到不同的节点上，为了让运算更快，我们将同一份数据 copy 多份，让多个实例进行计算，一个很好的总结是：

Divide and conquer - I mean, partition and replicate.

Partitioning

Partitioning improves performance by limiting the amount of data to be examined and by locating related data in the same partition【数据分片后减少数据大小从而提高性能】
Partitioning improves availability by allowing partitions to fail independently, increasing the number of nodes that need to fail before availability is sacriﬁced【每个数据片失败都是独立的，提供了可用性】

Replication

To replication! The cause of, and solution to all of life's problems.

复制是应对延迟的有效手段

Replication improves performance by making additional computing power and bandwidth applicable to a new copy of the data
Replication improves availability by creating additional copies of the data, increasing the number of nodes that need to fail before availability is sacriﬁced

复制来到好处的同时，也带来了很多问题，最大的就是数据一致性问题，只有当模型是 strong consistency 的时候，我们才会得到一个简单的编程模型（和单机系统一致），其他模型我们都好去理解系统内部是怎么做的，这样子才能很好的满足我们的需求。

来源:

与本文相关文章

暂无,快来抢沙发吧！