Kubernetes Cookbook 编程指南 中文版教程
创建时间:2018-12-07  访问量:3667  7  0

Kubernetes Cookbook 编程指南 中文版教程

etcd集群

etcd存储了Kubernetes中的网络信息与状态。任何数据丢失都是至关重要的。强烈建议在etcd中使用群集。etcd天然支持集群;一个有N个成员的集群可以忍受大约(N-1)/2个节点故障。有三种机制创建etcd集群。如下所示:

  • 静态(Static)

  • etcd发现(etcd discovery)

  • DNS发现(DNS discovery)

在本节,我们将讨论如何通过静态和etcd发现机制启动一个etcd集群。

开始

在开始构建一个etcd集群之前,你需要决定需要集群中需要多少成员。你想要创建多大的etcd集群依赖于环境。在生产环境中,推荐至少有3个成员。然后,集群可以忍受至少一个永久失效。在本节,我们将使用三个成员作为开发环境的示例:

Name/Hostname IP address
ip-172-31-0-1 172.31.0.1
ip-172-31-0-2 172.31.0.2
ip-172-31-0-3 172.31.0.3

如何去做...

static机制是设置一个集群最简单的方式。然而,每个成员的IP地址应该提前知道。它意味着如果你要在一个云提供商环境中启动一个etcd集群,那么static机制将可能不是很适用。因此,etcd也提供了下发现机制来从已存在的集群中启动自己。

Static

使用static机制,你需要知道每个成员的IP地址信息:

Parameters Meaning
-name 这个成员的名称
-initial-advertise peer-urls 如其它成员配合使用,应该与-initial-cluster列出的相同
-listen-peer-urls 接收同等流量的URL
-listen-client-urls 接收客户端流量的URL
-advertise-client-urls etcd成员用于通知其它成员
-initial-cluster-token 区别不同集群的唯一的token
-initial-cluster 通知所有成员的对等URL
-initial-cluster-state 指定初始集群的状态

使用etcd命令行工具和每个成员上附加的参数启动一个集群:

// on the host ip-172-31-0-1, running etcd command to make it peer with
ip-172-31-0-2 and ip-172-31-0-3, advertise and listen other members via
port 2379, and accept peer traffic via port 2380
# etcd -name ip-172-31-0-1 \
-initial-advertise-peer-urls http://172.31.0.1:2380 \
-listen-peer-urls http://172.31.0.1:2380 \
-listen-client-urls http://0.0.0.0:2379 \
-advertise-client-urls http://172.31.0.1:2379 \
-initial-cluster-token mytoken \
-initial-cluster ip-172-31-0-1=http://172.31.0.1:2380,ip-172-31-0-
2=http://172.31.0.2:2380,ip-172-31-0-3=http://172.31.0.3:2380 \
-initial-cluster-state new
...
2016-05-01 18:57:26.539787 I | etcdserver: starting member
e980eb6ff82d4d42 in cluster 8e620b738845cd7
2016-05-01 18:57:26.551610 I | etcdserver: starting server... [version:
2.2.5, cluster version: to_be_decided]
2016-05-01 18:57:26.553100 N | etcdserver: added member 705d980456f91652
[http://172.31.0.3:2380] to cluster 8e620b738845cd7
2016-05-01 18:57:26.553192 N | etcdserver: added member 74627c91d7ab4b54
[http://172.31.0.2:2380] to cluster 8e620b738845cd7
2016-05-01 18:57:26.553271 N | etcdserver: added local member
e980eb6ff82d4d42 [http://172.31.0.1:2380] to cluster 8e620b738845cd7
2016-05-01 18:57:26.553349 E | rafthttp: failed to dial 705d980456f91652
on stream MsgApp v2 (dial tcp 172.31.0.3:2380: getsockopt: connection
refused)
2016-05-01 18:57:26.553392 E | rafthttp: failed to dial 705d980456f91652
on stream Message (dial tcp 172.31.0.3:2380: getsockopt: connection
refused)
2016-05-01 18:57:26.553424 E | rafthttp: failed to dial 74627c91d7ab4b54
on stream Message (dial tcp 172.31.0.2:2380: getsockopt: connection
refused)
2016-05-01 18:57:26.553450 E | rafthttp: failed to dial 74627c91d7ab4b54
on stream MsgApp v2 (dial tcp 172.31.0.2:2380: getsockopt: connection
refused)

这个在ip-172-31-0-1上的etcd守护进程将启检查所有成员是否在线。日志显示了ip-172.31-0-2和ip-172-31-0-3不在存,所示拒绝连接。让我们进入了下一个成员并运行etcd命令:

// on the host ip-172-31-0-2, running etcd command to make it peer with
ip-172-31-0-1 and ip-172-31-0-3, advertise and listen other members via
port 2379, and accept peer traffic via port 2380
# etcd -name ip-172-31-0-2 \
-initial-advertise-peer-urls http://172.31.0.2:2380 \
-listen-peer-urls http://172.31.0.2:2380 \
-listen-client-urls http://0.0.0.0:2379 \
-advertise-client-urls http://172.31.0.2:2379 \
-initial-cluster-token mytoken \
-initial-cluster ip-172-31-0-1=http://172.31.0.1:2380,ip-172-31-0-
2=http://172.31.0.2:2380, ip-172-31-0-3=http://172.31.0.3:2380 -initialcluster-state new
...
2016-05-01 22:59:55.696357 I | etcdserver: starting member
74627c91d7ab4b54 in cluster 8e620b738845cd7
2016-05-01 22:59:55.696397 I | raft: 74627c91d7ab4b54 became follower at
term 0
2016-05-01 22:59:55.696407 I | raft: newRaft 74627c91d7ab4b54 [peers: [],
term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016-05-01 22:59:55.696411 I | raft: 74627c91d7ab4b54 became follower at
term 1
2016-05-01 22:59:55.706552 I | etcdserver: starting server... [version:
2.2.5, cluster version: to_be_decided]
2016-05-01 22:59:55.707627 E | rafthttp: failed to dial 705d980456f91652
on stream MsgApp v2 (dial tcp 172.31.0.3:2380: getsockopt: connection
refused)
2016-05-01 22:59:55.707690 N | etcdserver: added member 705d980456f91652
[http://172.31.0.3:2380] to cluster 8e620b738845cd7
2016-05-01 22:59:55.707754 N | etcdserver: added local member
74627c91d7ab4b54 [http://172.31.0.2:2380] to cluster 8e620b738845cd7
2016-05-01 22:59:55.707820 N | etcdserver: added member e980eb6ff82d4d42
[http://172.31.0.1:2380] to cluster 8e620b738845cd7
2016-05-01 22:59:55.707873 E | rafthttp: failed to dial 705d980456f91652
on stream Message (dial tcp 172.31.0.3:2380: getsockopt: connection
refused)
2016-05-01 22:59:55.708433 I | rafthttp: the connection with
e980eb6ff82d4d42 became active
2016-05-01 22:59:56.196750 I | raft: 74627c91d7ab4b54 is starting a new
election at term 1
2016-05-01 22:59:56.196903 I | raft: 74627c91d7ab4b54 became candidate at
term 2
2016-05-01 22:59:56.196946 I | raft: 74627c91d7ab4b54 received vote from
74627c91d7ab4b54 at term 2
2016-05-01 22:59:56.949201 I | raft: raft.node: 74627c91d7ab4b54 elected
leader e980eb6ff82d4d42 at term 112
2016-05-01 22:59:56.961883 I | etcdserver: published {Name:ip-172-31-0-2
ClientURLs:[http://10.0.0.2:2379]} to cluster 8e620b738845cd7
2016-05-01 22:59:56.966981 N | etcdserver: set the initial cluster
version to 2.1

在启动了成员2之后,我们可以看到当前集群的版本为2.1。下面的错误信息显示了与705d980456f91652对等连接是不健康的。通过观察日志,我们可以找到成员705d980456f91652指向了http://172.31.0.3:2380 。让我们再启动上一个成员ip-172-31-0-3:

# etcd -name ip-172-31-0-3 \
-initial-advertise-peer-urls http://172.31.0.3:2380 \
-listen-peer-urls http://172.31.0.3:2380 \
-listen-client-urls http://0.0.0.0:2379 \
-advertise-client-urls http://172.31.0.3:2379 \
-initial-cluster-token mytoken \
-initial-cluster ip-172-31-0-1=http://172.31.0.1:2380,ip-172-31-0-
2=http://172.31.0.2:2380, ip-172-31-0-3=http://172.31.0.3:2380 -initialcluster-state new
2016-05-01 19:02:19.106540 I | etcdserver: starting member
705d980456f91652 in cluster 8e620b738845cd7
2016-05-01 19:02:19.106590 I | raft: 705d980456f91652 became follower at
term 0
2016-05-01 19:02:19.106608 I | raft: newRaft 705d980456f91652 [peers: [],
term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016-05-01 19:02:19.106615 I | raft: 705d980456f91652 became follower at
term 1
2016-05-01 19:02:19.118330 I | etcdserver: starting server... [version:
2.2.5, cluster version: to_be_decided]
2016-05-01 19:02:19.120729 N | etcdserver: added local member
705d980456f91652 [http://10.0.0.75:2380] to cluster 8e620b738845cd7
2016-05-01 19:02:19.120816 N | etcdserver: added member 74627c91d7ab4b54
[http://10.0.0.204:2380] to cluster 8e620b738845cd7
2016-05-01 19:02:19.120887 N | etcdserver: added member e980eb6ff82d4d42
[http://10.0.0.205:2380] to cluster 8e620b738845cd7
2016-05-01 19:02:19.121566 I | rafthttp: the connection with
74627c91d7ab4b54 became active
2016-05-01 19:02:19.121690 I | rafthttp: the connection with
e980eb6ff82d4d42 became active
2016-05-01 19:02:19.143351 I | raft: 705d980456f91652 [term: 1] received
a MsgHeartbeat message with higher term from e980eb6ff82d4d42 [term: 112]
2016-05-01 19:02:19.143380 I | raft: 705d980456f91652 became follower at
term 112
2016-05-01 19:02:19.143403 I | raft: raft.node: 705d980456f91652 elected
leader e980eb6ff82d4d42 at term 112
2016-05-01 19:02:19.146582 N | etcdserver: set the initial cluster
version to 2.1
2016-05-01 19:02:19.151353 I | etcdserver: published {Name:ip-172-31-0-3
ClientURLs:[http://10.0.0.75:2379]} to cluster 8e620b738845cd7
2016-05-01 19:02:22.022578 N | etcdserver: updated the cluster version
from 2.1 to 2.2

我们可以看到,在成员3上,我们成功初始化了一个etcd集群而没有任何错误并且当前的集群版本是2.2。那么成员1现在怎么样了?

2016-05-01 19:02:19.118910 I | rafthttp: the connection with
705d980456f91652 became active
2016-05-01 19:02:22.014958 I | etcdserver: updating the cluster version
from 2.1 to 2.2
2016-05-01 19:02:22.018530 N | etcdserver: updated the cluster version
from 2.1 to 2.2

随着成员2与成员3的在线,成员1现在可以连接并且也在线了。当观察日志时,我们可以看到在etcd集群中发生了leader选举:

ip-172-31-0-1: raft: raft.node: e980eb6ff82d4d42 (ip-172-31-0-1) elected
leader e980eb6ff82d4d42 (ip-172-31-0-1) at term 112
ip-172-31-0-2: raft: raft.node: 74627c91d7ab4b54 (ip-172-31-0-2) elected
leader e980eb6ff82d4d42 (ip-172-31-0-1) at term 112
ip-172-31-0-3: 2016-05-01 19:02:19.143380 I | raft: 705d980456f91652
became follower at term 112

这个etcd集群将会向其集群中的成员发送心跳信息来检查集群的健康状态。注意,当你需要从集群中添加或删除任何成员时,上述的etcd命令需要重启在所有成员节点上运行,目的是为了通知有新的成员加入了集群。用这种方式,集群中的所有成员都知道所有在线的成员;如果一个Node节点离线了,其它成员将轮询失败的成员直接它被etcd命令刷新成员。如果我们从一个设置了一个成员的消息,我们也可以从其它成员获取相同的消息。如果其中一个成员变为不健了,etcd集群中的其它成员将仍然提供服务并选举出一个新的leader。

etcd discovery

在使用etcd发现之前,你应该有一个发现的URL用于启动一个集群。如果你想要添加或删除一个成员,你应该使用etcdctl命令作为运行时重新配置。命令行与static机制非常相似。我需要做的就是将--initial-cluster改变为-discovery,这个先项用来指定发现服务的URL。我们可以使用etcd发现服务(https://discovery.etcd.io)来请求一个发现的URL:

// get size=3 cluster url from etcd discovery service
# curl -w "\n" 'https://discovery.etcd.io/new?size=3'
https://discovery.etcd.io/be7c1938bbde83358d8ae978895908bd
// Init a cluster via requested URL
# etcd -name ip-172-31-0-1 -initial-advertise-peer-urls
http://172.31.43.209:2380 \
-listen-peer-urls http://172.31.0.1:2380 \
-listen-client-urls http://0.0.0.0:2379 \
-advertise-client-urls http://172.31.0.1:2379 \
-discovery https://discovery.etcd.io/be7c1938bbde83358d8ae978895908bd
...
2016-05-02 00:28:08.545651 I | etcdmain: listening for peers on
http://172.31.0.1:2380
2016-05-02 00:28:08.545756 I | etcdmain: listening for client requests on
http://127.0.0.1:2379
2016-05-02 00:28:08.545807 I | etcdmain: listening for client requests on
http://172.31.0.1:2379
2016-05-02 00:28:09.199987 N | discovery: found self e980eb6ff82d4d42 in
the cluster
2016-05-02 00:28:09.200010 N | discovery: found 1 peer(s), waiting for 2
more

第一个成员已经加入到集群了;等待其它两个同等成员。让我们在第二个Node上启动etcd:

# etcd -name ip-172-31-0-2 -initial-advertise-peer-urls
http://172.31.0.2:2380 \
-listen-peer-urls http://172.31.0.2:2380 \
-listen-client-urls http://0.0.0.0:2379 \
-advertise-client-urls http://172.31.0.2:2379 \
-discovery https://discovery.etcd.io/be7c1938bbde83358d8ae978895908bd
...
2016-05-02 00:30:12.919005 I | etcdmain: listening for peers on
http://172.31.0.2:2380
2016-05-02 00:30:12.919074 I | etcdmain: listening for client requests on
http://0.0.0.0:2379
2016-05-02 00:30:13.018160 N | discovery: found self 25fc8075ab1ed17e in
the cluster
2016-05-02 00:30:13.018235 N | discovery: found 1 peer(s), waiting for 2
more
2016-05-02 00:30:22.985300 N | discovery: found peer e980eb6ff82d4d42 in
the cluster
2016-05-02 00:30:22.985396 N | discovery: found 2 peer(s), waiting for 1
more

我们知道在etcd中已经有两个成员了,并且它正在等待最后一个成员的加入。下面的代码启动最后一个节点:

# etcd -name ip-172-31-0-3 -initial-advertise-peer-urls
http://172.31.0.3:2380 \
-listen-peer-urls http://172.31.0.3:2380 \
-listen-client-urls http://0.0.0.0:2379 \
-advertise-client-urls http://172.31.0.3:2379 \
-discovery https://discovery.etcd.io/be7c1938bbde83358d8ae978895908bd

在新的节点加入后,我们可以通过日志来验证发生了一个新的选举:

2016-05-02 00:31:01.152215 I | raft: e980eb6ff82d4d42 is starting a new
election at term 308
2016-05-02 00:31:01.152272 I | raft: e980eb6ff82d4d42 became candidate at
term 309
2016-05-02 00:31:01.152281 I | raft: e980eb6ff82d4d42 received vote from
e980eb6ff82d4d42 at term 309
2016-05-02 00:31:01.152292 I | raft: e980eb6ff82d4d42 [logterm: 304,
index: 9739] sent vote request to 705d980456f91652 at term 309
2016-05-02 00:31:01.152302 I | raft: e980eb6ff82d4d42 [logterm: 304,
index: 9739] sent vote request to 74627c91d7ab4b54 at term 309
2016-05-02 00:31:01.162742 I | rafthttp: the connection with
74627c91d7ab4b54 became active
2016-05-02 00:31:01.197820 I | raft: e980eb6ff82d4d42 received vote from
74627c91d7ab4b54 at term 309
2016-05-02 00:31:01.197852 I | raft: e980eb6ff82d4d42 [q:2] has received
2 votes and 0 vote rejections
2016-05-02 00:31:01.197882 I | raft: e980eb6ff82d4d42 became leader at
term 309

使用发现的方法,我们可以看到集群可以在不需事先知道其它IP的情况下运行。如果有新的节点加入或离开,etcd将会启动一个新的选举,并总是保持服务与多节点设置在线。

还可以参考

为了理解单节点etcd服务器的安装,参考第1章构建你自己的Kubernetes,构建数据存储库这一节。