论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2014-01-20 14:11 |只看该作者 |倒序浏览

我从去年10月份开始研究Openstack，虽然我们公司已经大量用了VMWARE和IBM的虚拟机。网上参考了很多文档，新老版本的都有，前期有同事装过Redhat的allinone的版本，装过就装过了，完全不知道怎么用，开个虚拟机过几天死了都不知道为什么。我开始也用Redhat的安装，想装出个一台控制节点+一台网络节点+两台计算节点的测试环境，同事们都说就有Redhat的，方便快捷，结果我折腾了一个星期后还是决定用原码安装。
ControllerNode安装最难，第一次嘛，换了两台电脑，装了三次才成功，主要是KEYSTONE和RABBITMQ的问题，装好后再装NetworkNode和ComputeNode就很方便了。中间有一点有些小问题，再ControllerNode时按文档调了半天Quantum Network，结果发现AllInOne才要的，多节点的话是在NetworkNode上，白浪费了几天。在NetworkNode上也白花了几天，网络设置一直有问题，后来发现要ComputeNode在才好设。
具体编译步骤就不写了，大家网上找，可多了，我列出一些我参考过的：
在CentOS 6.3上通过源码安装Openstack Folsom之XX: http://bbs.chinaunix.net/thread-4054916-1-1.html
Ubuntu12.04 OpenStack Folsom 安装（GRE模式）: http://www.chenshake.com/opensta ... 12-04/#Open-vSwitch
OpenStack完整安装手册(CentOS6.2)          : http://blog.lightcloud.cn/?p=91#sec-3.10
openstack quantum 网络架构                : http://hi.baidu.com/wylinux/item/e1cc323af851d0f7e6bb7a5a
建立openstack quantum开发环境             : http://blog.csdn.net/quqi99/article/details/7433285

我一开始就选用了GRE (Generic Routing Encapsulation)模式，网上找到有三个好处（老外认为的），没料到在本公司严密的CISCO环境中不好用（CISCO的设备多了去了，从低端到高端，网络人言必称思科，完全看不起开源的OpenVSwitch，对Openstack的网络节点嗤之以鼻。思科连刀片都想卖给我们，他们觉得我们买了那么多HP和IBM的刀片，没理由不买他们的。）
以下是基本的要求：
1.能联Internet，这是基本要求，编译安装需要许多大大小小奇奇怪怪的Python包
2.装上以下三个RPM包：
-rw-r--r--. 1 root root 14540 Oct 29 16:35 epel-release-6-8.noarch.rpm
-rw-r--r--. 1 root root  7573 Oct 29 16:35 rdo-release-grizzly-3.noarch.rpm
-rw-r--r--. 1 root root 12640 Oct 29 16:35 rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm
3.yum groupinstall all，我有个例子：
yum -y groupinstall "Additional Development" "Base" "Console internet tools" "Debugging Tools" "Desktop" "Desktop Platform" "Dial-up Networking Support" "Directory Client" "E-mail server" "Electronic Lab" "Fonts" "General Purpose Desktop" "Graphical Administration Tools" "Hardware monitoring utilities" "Internet Applications" "Internet Browser" "Java Platform" "Large Systems Performance" "Legacy UNIX compatibility" "Legacy X Window System compatibility" "Milkymist" "MySQL Database client" "MySQL Database server" "NFS file server" "Network Infrastructure Server" "Network Storage Server" "Network file system client" "Networking Tools" "

erformance Tools" "

erl Support" "

ostgreSQL Database client" "

ostgreSQL Database server" "Remote Desktop Clients" "Scalable Filesystems" "Scientific support" "Server Platform" "System administration tools" "Virtualization" "Virtualization Tools" "Web Server" "X Window System" "iSCSI Storage Client" "Installed Language Groups:" "Arabic Support [ar]" "Armenian Support [hy]" "Georgian Support [ka]" "Hebrew Support [he]" "Inuktitut Support [iu]" "Japanese Support [ja]" "Korean Support [ko]" "Lao Support [lo]" "Tajik Support [tg]" "Available Groups:" "Backup Client" "Backup Server" "CIFS file server" "Client management tools" "Compatibility libraries" "Desktop Debugging and Performance Tools" "Desktop Platform Development" "Development tools" "Directory Server" "Eclipse" "Educational Software" "Emacs" "FCoE Storage Client" "FTP server" "Fedora Packager" "Graphics Creation Tools" "Haskell" "High Availability" "High Availability Management" "Identity Management Server" "Infiniband Support" "Input Methods" "KDE Desktop" "Load Balancer" "Mainframe Access" "Messaging Client Support" "Office Suite and Productivity" "

HP Support" "

rint Server" "

rinting client" "Resilient Storage" "Ruby Support" "SNMP Support" "Security Tools" "Server Platform Development" "Smart card support" "Storage Availability Tools" "System Management" "Systems Management Messaging Server support" "TeX support" "Technical Writing" "TurboGears application framework" "Virtualization Client" "Virtualization Platform" "Web Servlet Engine" "Web-Based Enterprise Management" "Xfce"
  然后再：yum erase paste* 、yum erase gmp，编译安装GMP5.1，自带的太旧了又不好卸，好象装完了要ldconfig一下，我记不清了
4.设置NTP：ControllerNode是NTP服务器（当然它同步别的NTP时间），其它是NTP客户
5.Python就有CentOS6.4自带的2.6，要是装了2.7会有很多问题，3.0就别折腾了。我试过在ControllerNode上装了2.7，结果跟2.6冲突，最后不得不重新系统。

以下是ControllerNode，控制节点的安装要点
1.编译安装erlang和rabbitmq
Redhat用的是Qpid，很多人说有问题，而RabbitMQ没问题，我没比较过
RabbitMQ需要Erlang，而Erlang需要wxWidgets，我装wxWidgets-2.9.5.tar才成功的，装别的不行，Erlang编译时老说wxWidget没有
2.Keystone是基础，一定要先装好的，我最后的经验是建EndPoint时全用管理网段的IP，这样PublicIP变了不受影响。
_member_ 这个role不能删除，我删过，后果你可以试试，全部重来。
要用driver = keystone.catalog.backends.sql.Catalog  而不是 driver = keystone.catalog.backends.templated.TemplatedCatalog
我开始选对的，后来不知什么时候改成后一个了，结果Quantum建Network时老死，再发现Keystone不能重建EndPoint，折腾了好久才发现这个错误。
这次经历使我每天改conf文件时先备份一份，以免出了问题不知是不是改错了。
Start keysthoe fail, check find " pkg_resources.DistributionNotFound: python-keystoneclient>=0.2.1,<0.3",
I find the python-keystoneclient=4.1, finally install python-keystoneclient-0.2.5, then keystone can start and work.
3.Glance安装：
[]# glance-manage
      AttributeError: 'module' object has no attribute 'HAVE_DECL_MPZ_POWM_SEC'
      1.install gmp-devel
      2.uninstall pycrypto:pip-python uninstall pycrypto
      3.reinstall pycrypto:pip-python install pycrypto
      Warning: GMP >= 5
      1.install gmp-5.1.3:./configure|make|make check|make install
      2.echo /usr/local/lib > /etc/ld.so.conf.d/local_lib.conf
      3.ldconfig
      4.reinstall pycrypto

[root@controllernode src]# glance image-list
   Authorization Failed: An unexpected error prevented the server from fulfilling your request. cannot import name ALL_BYTE_VALUES (HTTP 500)
[root@controllernode glance]# su - glance
[glance@controllernode ~]$  /usr/bin/glance-api --config-file=/etc/glance/glance-api.conf
   Traceback (most recent call last):
      File "/usr/bin/glance-api", line 5, in <module>
      pkg_resources.run_script('glance==2013.1.4', 'glance-api')
      File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 492, in run_script
      if entry is None:
      File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1350, in run_script
      zip_stat = self.zipinfo[zip_path]
      File "/usr/lib/python2.6/site-packages/glance-2013.1.4-py2.6.egg/EGG-INFO/scripts/glance-api", line 44, in <module>
      from glance.common import config
      File "/usr/lib/python2.6/site-packages/glance-2013.1.4-py2.6.egg/glance/common/config.py", line 30, in <module>
      from paste import deploy
      File "/usr/lib/python2.6/site-packages/paste/deploy/__init__.py", line 3, in <module>
      from paste.deploy.loadwsgi import *
      File "/usr/lib/python2.6/site-packages/paste/deploy/loadwsgi.py", line 11, in <module>
      from paste.deploy.util import fix_call, lookup_object
   ImportError: cannot import name fix_call
[root@controllernode src]# yum erase paste
[root@controllernode src]# yum install python-paste and python-paste-deploy
Error: find glance index always show error:
      cannot import name ALL_BYTE_VALUES (HTTP 500)
      {"message": "An unexpected error prevented the server from fulfilling your request. cannot import name ALL_BYTE_VALUES", "code": 500, "title": "Internal Server Error"}}

         Search from google from glance error, finally in baidu someone say shoule say keystone.log, it is auth error in keystone!
         Finally find I use curl -d using 10.38.149.152 instead on 10.38.149.151!, check find 10.38.149.151 can't auth!
         Then find keystone didn't start this time! su - keystone, keystone-all report error:
         [keystone@controllernode ~]$ /usr/bin/keystone-all --config-file=/etc/keystone/keystone.conf
         Traceback (most recent call last):
         File "/usr/bin/keystone-all", line 4, in <module>
            import pkg_resources
         File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2797, in <module>

         File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 576, in resolve

         pkg_resources.DistributionNotFound: python-keystoneclient>=0.2.1,<0.3
         But find pip-python list show python-keystoneclient=4.1, reinstall still 4.1 and still meet same error,
         Finally in keystone-2013.1.4, run python setup.py install, it install python-keystoneclient 0.2.5,then keystone can work now!
Now, glance index can work and no error.

4.Quantum安装（以后的版本都改成Nurtron了）
只要装Quantum-server就好了，别多装，

5.NOVA安装
  在Grizzly及以后，没有NOVA-Volume（改成Cinder了），没有NOVA-NETWORK（改用Quantum了）
meet error : TypeError: dist must be a Distribution instance, rootcause: setuptools too new=1.1.7, replace as 0.9.8
error same, then manual install 依赖包
error: CRITICAL nova [-] Could not load paste app 'ec2' from /etc/nova/api-paste.ini
      The problem is the old python-paste version of centos. So if you get this error, just remove python-paste with:
      yum remove python-paste python-paste-deploy python-paste-script  and install the new one via pip:
      pip install --upgrade Paste PasteDeploy PasteScript
error: nova-network can't start by nova user, report  sudo nova-rootwrap /etc/nova/rootwrap.conf iptables-restore -c execute /usr/lib/python2.6/site-packages/nova-2013.1.4-py2.6.egg/nova/utils.py:212
      login as nova, run this scripts, always report need nova sudo password!
      Check find /etc/sudoers.d/nova files should set as:"nova ALL = (root) NOPASSWD: /usr/bin/nova-rootwrap /etc/nova/rootwrap.conf * "

  以下是所有的NOVA进程：
  service nova-api restart
  service nova-network restart
  service nova-cert restart
  service nova-consoleauth restart
  service nova-scheduler restart
  service nova-conductor restart
  service nova-novncproxy restart

  Check: nova-manage service list

6.Cinder安装
没什么要说的

7.Horizon安装
  简单，我是这样做的：
  tar xvf horizon-2013.1.4.tar, then copy horizon-2013.1.4 to /opt , ln -s horizon-2013.1.4 horizon
mkdir /opt/horizon/.blackhole
python /opt/horizon/manage.py collectstatic
python /opt/horizon/manage.py syncdb
chown -R apache:apache /opt/horizon
service httpd restart
error: /usr/bin/env: node: No such file or directory
      make softlink in /usr/bin/node
service httpd restart; service memcached restart
  中间有些错误，最后发现是NOVA、Cinder没开好

以下是NetworkNode，网络节点的安装要点
1.yum update, 更新kernel到2.6.32-358.123.2.openstack.el6.x86_64好点，要装iproute-2.6.32-130.el6ost.netns.2.x86_64，支持namespaces.
yum install http://repos.fedorapeople.org/re ... .netns.2.x86_64.rpm
2.OpenVSwitch安装，我装的是2.0
3.Quantum安装
这个很烦，我重做了很多次，每次重建网络一定要重新建Quantum的数据库
ImportError: cannot import name fix_call
  Resolve:
   1.uninstall all Paste*: yum erase paste*
   2.install nose: pip install nose(quantum ImportError: No module named nose)
   3.uninstall 1.4.1 setuptools, install 0.9.8 setuptools
   4.uninstall python_paste*: pip uninstall
   5.install Paste*:pip install Paste
ImportError: cannot import name deploy
   1.yum install python-paste and python-paste-deploy
quantum-openswitch-agent start error: AttributeError: 'Connection' object has no attribute 'connection_errors'
   From https://bugs.launchpad.net/anvil/+bug/1186453, find Downgrading to kombu==2.0 seemed to fix it
   so check pip list|grep kombu, find is 1.1, then pip uninstall kombu and install it, install as kombu3.0.5!
   Re-run quantum-openswitch-agent report no kombu package, check controllernode, find kombu=1.4.1,
   so pip install kombu==1.4.1, then start OK!
最后建网络，用网上下的quantum-network.sh，自己改改

计算节点安装
1.yum update
2.OpenVSwitch安装
3.Quantum安装，只要quantum-openvswitch-agent
4.NOVA安装，只要启动nova-compute,
  要注意的是VNC设置：in nova.conf, the IP in novncproxy_base_url should be Public IP:  it will be novncproxy_base_url=http://$PUBLIC_IP:6080/vnc_auto.html  !!

[root@controllernode]$ nova-manage service list
Binary          Host                               Zone          Status    State Updated_At
nova-conductor controllernode.wux.chin.seagate.com  internal       enabled

2013-11-27 00:54:20
nova-cert controllernode.wux.chin.seagate.com internal enabled

2013-11-27 00:54:21
nova-consoleauth controllernode.wux.chin.seagate.com internal enabled

2013-11-27 00:54:21
nova-scheduler controllernode.wux.chin.seagate.com internal enabled

2013-11-27 00:54:22
nova-compute ComputeNode2 nova enabled

2013-11-27 00:54:23
nova-compute ComputeNode1 nova enabled

2013-11-27 00:54:27

当quantum net-list -v 死掉时要好好找找原因，是改了什么还是什么死了，所有节点上的Quantum组件都要重启
Quantum L3 Agent问题最多，启动好的特点是有两个dnsmasq进程：
nobody 19834    1  0 08:01 ?       00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap5a2fe537-19 --except-interface=lo --pid-file=/var/lib/quantum/dhcp/d8b13702-67c8-4b34-b470-9ea435b44222/pid --dhcp-hostsfile=/var/lib/quantum/dhcp/d8b13702-67c8-4b34-b470-9ea435b44222/host --dhcp-optsfile=/var/lib/quantum/dhcp/d8b13702-67c8-4b34-b470-9ea435b44222/opts --dhcp-script=/usr/bin/quantum-dhcp-agent-dnsmasq-lease-update --leasefile-ro --dhcp-range=set:tag0,10.10.10.0,static,120s --conf-file= --domain=openstacklocal
root    19835 19834  0 08:01 ?       00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap5a2fe537-19 --except-interface=lo --pid-file=/var/lib/quantum/dhcp/d8b13702-67c8-4b34-b470-9ea435b44222/pid --dhcp-hostsfile=/var/lib/quantum/dhcp/d8b13702-67c8-4b34-b470-9ea435b44222/host --dhcp-optsfile=/var/lib/quantum/dhcp/d8b13702-67c8-4b34-b470-9ea435b44222/opts --dhcp-script=/usr/bin/quantum-dhcp-agent-dnsmasq-lease-update --leasefile-ro --dhcp-range=set:tag0,10.10.10.0,static,120s --conf-file= --domain=openstacklocal

最后boot一个VM，一般用CIRROS0.3.0，当然CIRROS0.3.1更好，这时问题最多，起先是拿不到IP，就是得不到metadata，折腾我好几个星期，最后发现第一次要手工激活一下VRouter,
在开起来的CIRROR虚拟机中手工设置IP和Router：
sudo ip addr add 10.5.5.3/24 10.5.5.255 dev eth0
sudo route add default gw 10.5.5.1
在NetworkNode上:
ip netns exec qrouter-xxxxx ping 10.5.5.3
并在NetworkNode上手工加两条role

我的PublicIP是从172.168.149.129到172.168.149.254，VRouter占用了172.168.149.129和172.168.149.130两个IP)
route add -net 172.168.149.128/25 gw 172.168.149.129 dev br-ex
route add -net 10.5.5.0/24 gw 172.168.149.130
这两条role应该是Quantum自动加入的，不知为什么我这儿的Quantum网络只自动加入过一次，以后都是我手工加入的。

我现在能boot CentOS6.4、win7、windows2012的映像，问题是GRE模式不适合公司复杂昂贵的网络，所以VM无法直接访问，只在openStack环境中可访问。
下面准备试试VLAN方式。

文库|博客

hosyp

丰衣足食

论坛徽章:: 0

2楼 [报告]

发表于 2014-01-20 14:26 |只看该作者

本帖最后由 hosyp 于 2014-01-20 14:26 编辑

Openstack中网络有好几个，很容易混淆：
1.管理网络：所有节点都通，我用192.168.X.X/24
2.业务网络：只网络节点和计算节点通，我用10.10.10.x/24
3.外部网络：就是PublicIP啦，从Openstack环境外访问，只有控制节点和网络节点有，我设为172.168.149.x/24
还有一个FIXIP/FLATIP，就是虚拟机的IP，我设为10.5.5.x/24，我开始把它同业务网络混为一谈了，后来发现后只好重建Quantum网络。
在GRE模式下一个VM起来得到一个FIXIP和一个PublicIP，但要再assign一个FloatingIP才能外部访问，而这个FloatingIP也是PublicIP，就是一个VM占用两个PublicIP！

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

hosyp

丰衣足食

论坛徽章:: 0

3楼 [报告]

发表于 2014-01-20 14:29 |只看该作者

我的ControllerNode两个网卡：em1=192.168.0.151, em2=172.168.149.51
NetworkNode三个网卡：em1=192.168.0.152, em2=10.10.10.152，br-ex(em3)=172.168.149.129
ComputeNode两个网卡：em1=192.168.0.153, em2=10.10.10.153，

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

sunshiene

小富即安

论坛徽章:: 0

4楼 [报告]

发表于 2014-01-21 22:33 |只看该作者

mark先希望以后有机会和lz交流

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

返回列表

Chinaunix › 论坛 › IT运维 › 虚拟化与云服务 › CentOS6.4下OpenStack Grizzly 原码试验（控制节点+网络 ...

[OpenStack] CentOS6.4下OpenStack Grizzly 原码试验（控制节点+网络节点+计算节点） [复制链接]

浏览过的版块