欢迎来到讨论OpenStack和High Availability(高可用性)的文章,我要来解释我如何为我的公司建立高可用性的OpenStack云。
我们以两台机器开始我们的集群,两台都有一个公共和两个私有网卡。
·server1:5.x.x.x(公共ip),10.0.0.1 (eth1), 10.1.0.1 (eth2)
·server2:5.x.x.x(公共ip),10.0.0.2 (eth1), 10.1.0.2 (eth2)
两个节点的hosts文件(相关部分)
10.0.0.1 server1
10.0.0.2 server2
Pacemaker与Corosync的安装
首先我们需要安装Pacemaker和Corosync:
apt-get install pacemaker corosync
要对corosync进行设置,将如下内容复制粘贴到所有节点的/etc/corosync/corosync.conf内:
# Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 secauth: off threads: 0 interface { ringnumber: 0 bindnetaddr: 10.8.0.0 mcastaddr: 226.94.1.1 mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off }} service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 1 } amf { mode: disabled } quorum { provider: corosync_votequorum expected_votes: 2 } |
原因不明,但你必须手动创建/var/log/cluster目录,避免产生parse error in config: parse error in config: .的错误:
mkdir /var/log/cluster
我们还需要在启动时候开启两个服务,因此使用:
update-rc.d pacemaker start 50 1 2 3 4 5 . stop 01 0 6 .
要添加Pacemaker并编辑/etc/default/corosync设置:
START=yes
要安装Corosync
注意:上面的设置在两台主机都要进行
检查Corosync配置
开启corosync服务:
service corosync start
检查是否一切正常:
#corosync-cfgtool -s
Printing ring status.
Local node ID 33556490
RING ID 0
id = 10.8.0.2
status = ring 0 active with no faults
还要检查集群节点和人数:
corosync-quorumtool -l
Nodeid Votes Name
16779274 1 server1
33556490 1 server2
检查Pacemaker配置
在确认Corosync工作正常后,让我们来配置Pacemaker,首先,开启服务:
service pacemaker start
现在检查它是否识别出我们的集群:
crm_mon -1 ============ Last updated: Mon Jul 16 15:01:57 2012 Last change: Mon Jul 16 14:52:34 2012 via cibadmin on server1 Stack: openais Current DC: server1 – partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ server1 server2 ] |
因为这是两个主机的安装,你还需要禁用人数,由于split brain,因此要禁用人数检查:
crm configure property no-quorum-policy=ignore
在这之后还可以禁用STONITH,如果不需要的话:
crm configure property stonith-enabled=false
现在我们就安装完corosync和pacemaker了,下一步就是安装MySQL并让Pacemaker使它高可用度。
Openstack的核心就是MySQL数据库,几乎每个组件都是用MySQL获取/设置信息,让我们来看看如何建立一个完全高度可用的MySQL终端。
在hosts文件/etc/hosts中,加入这行:
10.0.1.1 mysqlmaster
首先我们需要为mysql下载资源代理:
cd /usr/lib/ocf/resource.d/ mkdir percona cd percona wget -q https://github.com/y-trudeau/resource-agents-prm/raw/master/heartbeat/mysql chmod u+x mysql |
这样一来,当我们从从属机到主机升级MySQL服务器是,我们还要将“mysqlmaster”的IP绑定到那个节点上,当失效服务器出现时,它将以slave模式启动MySQL。因为,让我们来建立我们的虚拟IP:
crm configure primitive mysqlmasterIP ocf:heartbeat:IPaddr2 params ip=10.0.1.1 cidr_netmask=16 nic=eth1 op monitor interval=10s
我们可以通过再次运行集群监视器来检查我们的新IP:
============ Last updated: Mon Jul 16 16:10:34 2012 Last change: Mon Jul 16 16:10:33 2012 via cibadmin on server1 Stack: openais Current DC: server1 – partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 3 expected votes 2 Resources configured. ============ Online: [ server1 server2 ] mysqlmasterIP (ocf::heartbeat:IPaddr2): Started server1 |
现在我们已经配置好了虚拟IP,接下来设置MySQL复制,在两个节点上安装MySQL服务器:
apt-get install mysql-server
我们来安装基本的复制,在server1上,编辑/etc/mysql/my.cnf,在[mysqld]这部分(85行附近),取消这部分的注释:
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
在第二个服务器上,同样文件内,取消注释并编辑:
server-id = 2
log_bin = /var/log/mysql/mysql-bin.log
并让MySQL监听所有地址:
bind-address = 0.0.0.0
之后建立一个复制和一个测试用户,这样在所有服务器的mysql客户端上都会出现:
grant replication client, replication slave on *.* to repl_user@’10.0.%.%’ identified by ‘password’; grant replication client, replication slave, SUPER, PROCESS, RELOAD on *.* to repl_user@’localhost’ identified by ‘password’; grant select ON mysql.user to test_user@’localhost’ identified by ‘password’; FLUSH PRIVILEGES; |
现在禁用启动时开启MySQL,因为初始化脚本已经被转换成upstart,在所有的节点上打开/etc/init/mysql.conf并注释掉以下这行:
start on runlevel [2345]
现在来创建MySQL资源:
crm configure primitive clustermysql ocf:percona:mysql params binary=”/usr/bin/mysqld_safe” log=”/var/log/mysql.log” socket=”/var/run/mysqld/mysqld.sock” evict_outdated_slaves=”false” config=”/etc/mysql/my.cnf” pid=”/var/run/mysqld/mysqld.pid” socket=”/var/run/mysqld/mysqld.sock” replication_user=”repl_user” replication_passwd=”password” test_user=”test_user” test_passwd=”password” op monitor interval=”5s” role=”Master” OCF_CHECK_LEVEL=”1″ op monitor interval=”2s” role=”Slave” timeout=”30″ OCF_CHECK_LEVEL=”1″ op start interval=”0″ timeout=”120″ op stop interval=”0″ timeout=”120″ |
你会发现MySQL正在一个节点上运行:
crm_mon -1 ============ Last updated: Mon Jul 16 17:36:22 2012 Last change: Mon Jul 16 17:14:55 2012 via cibadmin on server1 Stack: openais Current DC: server2 – partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 3 expected votes 3 Resources configured. ============ Online: [ server1 server2] mysqlmasterIP (ocf::heartbeat:IPaddr2): Started server2 clustermysql (ocf::heartbeat:mysql): Started server2 |
现在要安装master/slave控制器,首先我们需要设置hosts的IP,这样它才能迁移MySQL主机,用crm configure edit改动这几行:
node server1 attributes clustermysql_mysql_master_IP=”10.0.0.1″ node server2 attributes clustermysql_mysql_master_IP=”10.0.0.2″ |
然后创建真正的master/slave资源,要实现该步,只需通过crm建立:
crm configure ms ms_MySQL clustermysql meta master-max=”1″ master-node-max=”1″ clone-max=”2″ clone-node-max=”1″ notify=”true” globally-unique=”false” target-role=”Master” is-managed=”true” |
现在MySQL应以master/slave模式启动,crm_mon -1会产生以下结果:
============
Last updated: Tue Jul 17 11:26:04 2012 Last change: Tue Jul 17 11:00:34 2012 via cibadmin on server1 Stack: openais Current DC: server1- partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 3 expected votes 4 Resources configured. ============ Online: [ server1 server2 ] Master/Slave Set: ms_MySQL [clustermysql] Masters: [ server1 ] Slaves: [ server2 ] mysqlmasterIP (ocf::heartbeat:IPaddr2): Started server1 |
最后要做的就是当mysql以master或slave模式启动时,变动master/slave IP,以下操作可轻易实现:
crm configure colocation masterIP_on_mysqlMaster inf: mysqlmasterIP ms_MySQL:Master crm configure order mysqlPromote_before_IP inf: ms_MySQL:promote mysqlmasterIP:start |
好了,现在当你停止pacemaker服务时,MySQL在另一节点上会以master模式启动,而且ip也会相应变动。
在MySQL之后,我们需要让RabbitMQ高度可用。
注意:实际上RabbitMQ仅支持带有Drbd的主动/被动,主动/主动也支持,不过你需要在Openstack代码中更改队列声明。Eugene Kirpichov制作的补丁仍在开发中,在这里可以找到:https://review.openstack.org/#/c/10305/
因此按惯例我们只使用官方提供的资源安装RabbitMQ,想要获得更新的版本,请等待补丁:
echo “deb http://www.rabbitmq.com/debian/ testing main” > /etc/apt/sources.list.d/rabbitmq.list wget -q http://www.rabbitmq.com/rabbitmq-signing-key-public.asc -O- | sudo apt-key add – apt-get update apt-get install rabbitmq-server |
现在准备工作已经完成了,该来安装Keystone,并使它高度可用,在本教程里我不会涉及安装的部分,因为手册里已经全都包括了。
只有两处不同:
·你必须在两个主机上都安装keystone,不是一个
·要将MySQL主机设置为“clustermysql”,这样它在MySQL主机之上。
·当你定义为每个服务创建虚拟IP的服务时(这里是指keystoneip, glanceip, novacomputeip等,并在建立终端时指向它们)
现在你已经安装好Keystone并创建了用户,角色,服务和终端,我们来让它“高度可用”,我们需要在启动时禁用自动载入,在两台主机上这样做:
echo “manual” > /etc/init/keystone.override
现在下载资源代理:
mkdir /usr/lib/ocf/resource.d/openstack cd /usr/lib/ocf/resource.d/openstack/ wget https://raw.github.com/madkiss/keystone/master/tools/ocf/keystone chmod u+x * |
然后为Keystone创建基元:
crm configure primitive keystoneService ocf:openstack:keystone params config=”/etc/keystone/keystone.conf” os_auth_url=”http://clusterkeystone:5000/v2.0/” os_password=”admin” os_tenant_name=”admin” os_username=”admin” user=”keystone” client_binary=”/usr/bin/keystone” op monitor interval=”15s” timeout=”30s” |
“clusterkeystone”处是分配给Keystone的虚拟IP,os_*是你在安装Keystone设置的管理员用户的认证信息。
对虚拟IP和服务分组是很有用的,这样它们就能在同一主机开启:
crm configure group Keystone keystoneIP keystoneService
在MySQL主机开启后运行Keystone,可以这样做:
crm configure order keystone_after_mysqlmasterIP inf: mysqlmasterIP:start Keystone
这样你就有了一个能用的keystone故障恢复,以防主机故障。
一个例子,两个主机都在运行:
============ Last updated: Mon Jul 30 15:03:40 2012 Last change: Mon Jul 30 15:03:38 2012 via cibadmin on server2 Stack: openais Current DC: server1 – partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 5 Resources configured. ============ Online: [ server1 server2 ] mysqlmasterIP (ocf::heartbeat:IPaddr2): Started server1 Master/Slave Set: ms_MySQL [clustermysql] Masters: [ server1 ] Slaves: [ server2 ] Resource Group: Keystone keystoneIP (ocf::heartbeat:IPaddr2): Started server2 keystoneService (ocf::openstack:keystone): Started server2 |
现在停止server1,几秒后你会得到:
============ Last updated: Mon Jul 30 15:08:34 2012 Last change: Mon Jul 30 15:08:26 2012 via crm_attribute on server2 Stack: openais Current DC: server2 – partition WITHOUT quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 5 Resources configured. ============ Online: [ server2 ] OFFLINE: [ server1 ] mysqlmasterIP (ocf::heartbeat:IPaddr2): Started server2 Master/Slave Set: ms_MySQL [clustermysql] Masters: [ server2 ] Stopped: [ clustermysql:0 ] Resource Group: Keystone keystoneIP (ocf::heartbeat:IPaddr2): Started server2 keystoneService (ocf::openstack:keystone): Started server2 |
在Keystone教程里,我假定你已经安装了Glance并使用了。
与通常的设置不同的地方是:
·当设置/etc/glance/glance-api-paste.ini和/etc/glance/glance-registry-paste.ini时,记得也要编辑auth host:
service_protocol=http service_host = clusterkeystone service_port = 5000 auth_host = clusterkeystone auth_port = 35357 auth_protocol = http auth_uri = http://clusterkeystone:5000/ admin_tenant_name = service admin_user = glance admin_password = glance ” |
·当设置/etc/glance/glance-registry.conf时,使用“clustermysql”作为MySQL主机
·在设置的最后,运行glance-manage version_control 0 和 glance-manage db_sync
·一定要安装curl,glance-registry资源代理没说但你需要它
现在我们要让Pacemaker在需要时运行glance,首先我们要停止glance服务并在启动时禁用,然后下载资源代理,在所有主机上做以下操作:
echo “manual” > /etc/init/glance-api.override echo “manual” > /etc/init/glance-registry.override service glance-api stop service glance-registry stop cd /usr/lib/ocf/resource.d/openstack/ wget https://raw.github.com/madkiss/glance/ha/tools/ocf/glance-api wget https://raw.github.com/madkiss/glance/ha/tools/ocf/glance-registry chmod u+x * |
然后添加资源:
crm configure primitive glanceApiService ocf:openstack:glance-api params config=”/etc/glance/glance-api.conf” os_auth_url=”http://clusterkeystone:5000/v2.0/” os_password=”admin” os_tenant_name=”admin” os_username=”admin” user=”glance” client_binary=”/usr/bin/glance” op monitor interval=”15s” timeout=”30s” crm configure primitive glanceRegistryService ocf:openstack:glance-registry params config=”/etc/glance/glance-registry.conf” os_auth_url=”http://clusterkeystone:5000/v2.0/” os_password=”admin” os_tenant_name=”admin” os_username=”admin” user=”glance” op monitor interval=”15s” timeout=”30s” |
现在pacemaker可以在我们的集群上运行Glance API and Registry了。
按惯例,分组并给Glance添加正确的命令:
group Glance glanceIP glanceApiService glanceRegistryService
crm configure order glance_after_keystone inf: Keystone Glance
我设置让Glance在Keystone之后运行,因为它依赖于那两者还有MySQL,因为Keystone在MySQL之后运行,你只能让它在Keystone之后运行。
这是由此产生的配置:
============ Last updated: Mon Jul 30 16:14:09 2012 Last change: Mon Jul 30 16:11:54 2012 via crm_attribute on server1 Stack: openais Current DC: server1 – partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 8 Resources configured. ============ Online: [ server1 server2 ] mysqlmasterIP (ocf::heartbeat:IPaddr2): Started server1 Master/Slave Set: ms_MySQL [clustermysql] Masters: [ server1 ] Slaves: [ server2 ] Resource Group: Keystone keystoneIP (ocf::heartbeat:IPaddr2): Started server2 keystoneService (ocf::openstack:keystone): Started server2 Resource Group: Glance glanceIP (ocf::heartbeat:IPaddr2): Started server1 glanceApiService (ocf::openstack:glance-api): Started server1 glanceRegistryService (ocf::openstack:glance-registry): Started server1 |
在MySQL, RabbitMQ, Keystone和Glance之后,我们要来安装要用Pacemaker管理的Nova服务,并让它们高度可用。
和其他教程一样,当编辑/etc/nova/api-paste.ini,还要改动服务主机:
service_protocol=http service_host = clusterkeystone service_port = 5000 auth_host = clusterkeystone auth_port = 35357 auth_protocol = http auth_uri = http://clusterkeystone:5000/ admin_tenant_name = service admin_user = nova admin_password = nova |
还有,在做nova-manage db sync之前,一定要设置SQL主机为“clustermysql”,我这里这样设置/etc/nova/nova.conf:
[DEFAULT] dhcpbridge_flagfile=/etc/nova/nova.conf dhcpbridge=/usr/bin/nova-dhcpbridge logdir=/var/log/nova state_path=/var/lib/nova lock_path=/run/lock/nova allow_admin_api=true use_deprecated_auth=false auth_strategy=keystone scheduler_driver=nova.scheduler.simple.SimpleScheduler s3_host=clusterglance ec2_host=clusterec2 ec2_dmz_host=clusterec2 rabbit_host=clusterrabbit cc_host=clusterec2 nova_url=http://clusternova:8774/v1.1/ glance_api_servers=clusterglance:9292 image_service=nova.image.glance.GlanceImageService iscsi_ip_prefix=192.168.4 sql_connection=mysql://novadbadmin:password@clustermysql/nova ec2_url=http://clusterec2:8773/services/Cloud keystone_ec2_url=http://clusterkeystone:5000/v2.0/ec2tokens api_paste_config=/etc/nova/api-paste.ini libvirt_type=kvm libvirt_use_virtio_for_bridges=true start_guests_on_host_boot=true resume_guests_state_on_host_boot=true novnc_enabled=true novncproxy_base_url=http://5.9.x.x:6080/vnc_auto.html vncserver_proxyclient_address=10.8.0.1 vncserver_listen=0.0.0.0 network_manager=nova.network.manager.FlatDHCPManager public_interface=eth0 flat_interface=eth2 flat_network_bridge=br100 flat_injected=False force_dhcp_release=true iscsi_helper=tgtadm connection_type=libvirt root_helper=sudo nova-rootwrap verbose=True debug=True multi_host=true enabled_apis=ec2,osapi_compute,osapi_volume,metadata |
再次检查你的/etc/hosts,确保你已经把“clustermysql”,“clusterglance”这样的虚拟IP声明成你在Keystone安装(在终端配置里)和MySQL认证时设定的那样。
现在你可以官方教程里的db_sync部分了。
我们必须停止服务并让它们由Pacemaker管理:
service nova-api stop service nova-cert stop service nova-compute stop service nova-consoleauth stop service nova-network stop service nova-objectstore stop service nova-scheduler stop service nova-volume stop service novnc stop echo “manual” > /etc/init/nova-api.override echo “manual” > /etc/init/nova-cert.override echo “manual” > /etc/init/nova-compute.override echo “manual” > /etc/init/nova-consoleauth.override echo “manual” > /etc/init/nova-network.override echo “manual” > /etc/init/nova-objectstore.override echo “manual” > /etc/init/nova-scheduler.override echo “manual” > /etc/init/nova-volume.override echo “manual” > /etc/init/novnc.override |
为服务下载资源代理:
cd /usr/lib/ocf/resource.d/openstack/ wget https://raw.github.com/leseb/OpenStack-ra/master/nova-api-ra wget https://raw.github.com/leseb/OpenStack-ra/master/nova-cert-ra wget https://raw.github.com/leseb/OpenStack-ra/master/nova-consoleauth-ra wget https://raw.github.com/leseb/OpenStack-ra/master/nova-scheduler-ra wget https://raw.github.com/leseb/OpenStack-ra/master/nova-vnc-ra wget https://raw.github.com/alex88/nova-network-ra/master/nova-network-ra wget https://raw.github.com/alex88/nova-compute-ra/master/nova-compute-ra wget https://raw.github.com/alex88/nova-objectstore-ra/master/nova-objectstore-ra wget https://raw.github.com/alex88/nova-volume-ra/master/nova-volume-ra chmod +x * |
设置服务随Pacemaker启动:
crm configure primitive novaApiService ocf:openstack:nova-api-ra params config=”/etc/nova/nova.conf” op monitor interval=”5s” timeout=”5s” crm configure primitive novaCertService ocf:openstack:nova-cert-ra params config=”/etc/nova/nova.conf” op monitor interval=”30s” timeout=”30s” crm configure primitive novaConsoleauthService ocf:openstack:nova-consoleauth-ra params config=”/etc/nova/nova.conf” op monitor interval=”30s” timeout=”30s” crm configure primitive novaSchedulerService ocf:openstack:nova-scheduler-ra params config=”/etc/nova/nova.conf” op monitor interval=”30s” timeout=”30s” crm configure primitive novaVncService ocf:openstack:nova-vnc-ra params config=”/etc/nova/nova.conf” op monitor interval=”30s” timeout=”30s” crm configure primitive novaNetworkService ocf:openstack:nova-network-ra params config=”/etc/nova/nova.conf” op monitor interval=”30s” timeout=”30s” crm configure primitive novaComputeService ocf:openstack:nova-compute-ra params config=”/etc/nova/nova.conf” op monitor interval=”30s” timeout=”30s” crm configure primitive novaObjectstoreService ocf:openstack:nova-objectstore-ra params config=”/etc/nova/nova.conf” op monitor interval=”30s” timeout=”30s” crm configure primitive novaVolumeService ocf:openstack:nova-volume-ra params config=”/etc/nova/nova.conf” op monitor interval=”30s” timeout=”30s” crm configure clone novaVolume novaVolumeService meta clone-max=”2″ clone-node-max=”1″ crm configure clone novaNetwork novaNetworkService meta clone-max=”2″ clone-node-max=”1″ crm configure clone novaCompute novaComputeService meta clone-max=”2″ clone-node-max=”1″ crm configure clone novaApi novaApiService meta clone-max=”2″ clone-node-max=”1″ crm configure clone novaVnc novaVncService meta clone-max=”2″ clone-node-max=”1″ crm configure group novaServices novaConsoleauthService novaCertService novaSchedulerService crm configure order novaServices_after_keystone inf: Keystone novaServices |
注意:一定根据你的需要的用处来使用clone指令,其实我在Api and Network上使用clone,因为我运行的是multi_host openstack。
我的nova.conf里面说s3_host是glance ip,一定要编辑group Glance来包括nova-objectstore服务,所以要进行crm configure edit并确保有这一行:
group Glance glanceIP novaObjectstoreService glanceApiService glanceRegistryService
现在你就能查看OpenStack集群的状态了:
Binary Host Zone Status State Updated_At nova-compute server1 nova enabled 🙂 2012-07-31 10:00:27 nova-compute server2 nova enabled 🙂 2012-07-31 10:00:19 nova-network server2 nova enabled 🙂 2012-07-31 10:00:26 nova-network server1 nova enabled 🙂 2012-07-31 10:00:26 nova-scheduler server2 nova enabled 🙂 2012-07-31 10:00:26 nova-consoleauth server2 nova enabled 🙂 2012-07-31 10:00:26 nova-cert server2 nova enabled 🙂 2012-07-31 10:00:26 nova-volume server1 nova enabled 🙂 2012-07-31 10:00:26 nova-volume server2 nova enabled 🙂 2012-07-31 10:00:26 |
好了,现在你已经有了所有Corosync+Pacemaker管理的Openstack组件了。Openstack的虚拟机高可用性特性仍在开发,敬请期待这方面的更新。
我们一直都在努力坚持原创.......请不要一声不吭,就悄悄拿走。
我原创,你原创,我们的内容世界才会更加精彩!
【所有原创内容版权均属TechTarget,欢迎大家转发分享。但未经授权,严禁任何媒体(平面媒体、网络媒体、自媒体等)以及微信公众号复制、转载、摘编或以其他方式进行使用。】
微信公众号
TechTarget
官方微博
TechTarget中国
相关推荐
-
OpenStack走过沉淀期:中国市场迎来新格局
开源OpenStack技术发展至今,市场的讨论声音已经越来越少;在这种情况下,不少人开始提出质疑:“OpenStack是否已经不行了?”。然而,过去11月份的OpenStack悉尼峰会却用事实给出了否定的答案。
-
OpenStack不行了吗?悉尼峰会回答你
金融行业并不是OpenStack未来发展的全部,在我看来,这次悉尼峰会的主要任务,应该是要回答“OpenStack不行了吗?”。
-
OpenStack的Pike和Queens版本:你有什么期待?
虽然SDN具有为混合云优化企业网络的潜力,但是它的技术及其周围的生态系统仍处于不成熟的阶段。
-
OpenStack本地存储选项的现在与未来
SSD、HDD和NVMe都能为OpenStack部署提供了存储服务。但他们各自的优势和哪里,并且这些技术将如何演进?