免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3715 | 回复: 0
打印 上一主题 下一主题

[新手入门] Performance Other Tools > ganglia [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-04-24 23:16 |只看该作者 |倒序浏览
在lu交流的时候,
beginner-bj
提到了ganglia data flow这个开源监控,看图片功能貌似不错,明天去单位研究一下。此款软件使用的rrdtool作图,c/s结构部署。n多的开源软件作图都是使用的rrdtool,rrdtool真是强大。

   



  
[/url]

ganglia

  • [url=http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/ganglia]View



  • Attachments (40)


  • Info

    Added by
    Nigel Griffiths
    , last edited by
    Nigel Griffiths
    on Dec 31, 2007  (
    view change
    )
    div.auto_complete {
        width: 350px;
        background: #fff;
    }
    div.auto_complete ul {
        border: 1px solid #888;
        margin: 0;
        padding: 0;
        width: 100%;
        list-style-type: none;
    }
    div.auto_complete ul li {
        margin: 0;
        padding: 3px;
    }
    div.auto_complete ul li.selected {
        background-color: #ffb;
    }
    div.auto_complete ul strong.highlight {
        color: #800;
        margin: 0;
        padding: 0;
    }
    Labels:
    (None)
    Add Labels


    Enter labels to add to this page:
      
    Tip: Looking for a label? Just start typing.
    Ganglia HowTo


    In Brief:

    • Many large AIX High performance Computing (HPC) clusters use this excellent tool to monitor performance across large clusters of machines.
    • The data is displayed graphical on a Website, includes configuration and performance statistics. This is also increasingly being used in commercial data centers too to monitor large groups of machines.
    • Ganglia can also be used to monitor a group of logical partitions (LPARs) on a single machine - these just look like a cluster to Ganglia.
    • Ganglia is not limited to just the AIX, which makes it even more useful in heterogeneous computer rooms.
    • For more information go to the Ganglia home website at
      http://ganglia.sourceforge.net/



    • For the Ganglia for AIX and Linux on POWER binaries goto
      http://www.perzl.org/ganglia/



    • Briefly, a daemon runs on each node, machine or LPAR and the data collected by a further daemon and placed in an rrdtool database. Ganglia then uses PHP scripts on a web server to generate the graphs as directed by the user. There is also an on-going project to add POWER5 micro-partitions statistics.

    Contents
  • Introduction to Ganglia
  • Performance Monitoring in General
  • Uses for Ganglia in Performance Monitoring
  • Have you seen Ganglia yet?
  • The components of Ganglia
  • Ganglia Setup
  • Before you start
  • Setting up the simplest possible Ganglia with the two following nodes
  • Larger Setup with groups of machines
  • POWER5 additions
  • Advanced topics


    When the contents is releated to the IBM POWER5 based machines running Advanced POWER Virtualisation and Logical Partitions you should find this little logo. Otherwise the content should apply to any machine running ganglia.
    Performance Monitoring in General
    Every systems administrator knows that they should be monitoring the performance of their machines to:

    • Check for general machine and OS health,
    • Spotting longer term trends,
    • Avoiding the "hitting the performance wall" issues,
    • Identifying bottlenecks etc.

    The problems are also many:

    • If you manage one or two machines this is easy but if you are in charge of a few hundred then you just don't have the time.
    • Many monitoring tools provide "gallons" of statistics, far too many to deal with regularly, so what you need are a lower number of stats and just the important ones.
    • You also rapidly get a data overload problem - with hundreds of servers capturing data files you need to manage all the files and data so that you can sort it and find it later.
    • Next you find the different operating systems have different tools, stat and data and displayed in different ways.

    From the data centres that visit I find various approaches to this problem:

    • They have purchased a large cross platform performance monitoring suite. The downside is that tools are usually expensive, have a performance hit and you are at the mercy of the vendor to update the tool.
    • They do nothing.
    • They capture raw data to the disk and only look at the data in post mortem style i.e. when it all goes horribly wrong.

    Now as the developer of the nmon tool for detailed monitoring on AIX and Linux systems I have long been a fan of the low level high volume performance data but I have come to understand that this is impossible for large numbers of machine and does not let you take the overall view of the computer room. To be honest, I was shocked when I first came across Ganglia for the first time. This was clearly the tools I had been looking for a long time and was even thinking of writing myself. When I first got it running I was shocked again - the flexibility and ability to add new stats was amazing.
    I have since shown my working version of Ganglia to many system administrators and the reaction is always exactly the same:

    • "Wow! That is really cool ... I think that is exactly what I want ... where do I get it?"

    There is one problem. Getting started with Ganglia is quite hard work. The problem is that the designers and developers brains are too large and us regular guys struggle to understand the basic setup. The developers tend to be High Performance comuting (HPC) people running 50 to 2000+ nodes at universities. The Ganglia documents refer to distributed, scalable, multiple-resolution, network broadcast, XML protocol models - all very well but how can I get it working quickly.
    The rest of this article is for regular guys who just what to get it working and get the benefits - the theory can come later.
    Uses for Ganglia in Performance Monitoring
    In Ganglia we have a number of terms:

    • Node - a machine - typically racked up 1,2 or 4 CPU small machines all essentially helping to do one job or task or calculation
    • Cluster - a group of nodes
    • GRID - a group of clusters

    I see multiple uses of Ganglia:

    • Large scale clusters (i.e. what it was designed for) - so this the focus area of the developers and it works very well. In this case, each parallel computing infrastructure is seen as a ganglia cluster and if you have more than one you can get Ganglia to view these as a GRID of clusters, so you can seamlessly on the Ganglia website see the performance stats of the various Grids.
    • Just an bunch of machines - if you have a group of machines in your data centre with different purposes, sizes, speeds and applications then you can call all of these machines your cluster. Fortunately, Ganglia also displays the important machines information like hostname, number of CPUs and memory. Using this you can see both the machine details and the performance. So you can find the busy machines and then see what is going on. Hopefully, your machine names will make sense!
    • Data Centre(s) sprawl - if you have even more machines you may view them in different terms. You may split them in to groups in terms of production, test and development. You may split them up in terms of geography like London, Glasgow, Swansea, Birmingham and Paris. Or you might have a functional split like external customers, admin, sales, human resources etc. What ever grouping you decide can be used with Ganglia. In Ganglia terms each of these groups is a cluster and the groups in total are your GRID.
    • Logical Partitions (LPAR) and Virtualization engines - this is a growing area in the computer industry and performance monitoring in this area is often forgotten. These operating system images are sharing a physical computer but you want to track which of them is taking what resources from the pool. In Ganglia terms each of these operating system images is a node and the machine as a whole is your Ganglia cluster.


    HPC
    Bunch of machines
    Data center
    Virtualization
    Node
    each machine
    each machine
    each machine
    LPAR
    Cluster
    all the nodes used for a single task
    all the nodes in the computer room
    what ever grouping you decide
    the LPARs of a single machine
    GRID
    groups of clusters
    other machine rooms or not used
    all the cluster groups as a whole
    multiple machines
    You will have to decide what makes sense for you. Below we will show, at "block box" level, what makes up Ganglia, then set up a tiny two machine cluster (bunch of machines style) that you can follow for practice and then a Virtualization example, which is easy once you have the basic understanding.
    Have you seen Ganglia yet?
    If not now would be a good time to have a look. Fortunately some of the largest clusters in the world use Ganglia and have made the user interface public so you can see them from the Internet. Here is good link:

    • University of California Berkeley Grid at
      http://monitor.millennium.berkeley.edu/




      • This is at the Grid level you can select a cluster to see all the nodes and the overall cluster stats for CPU and memory
      • This simply lets you drill down to the cluster you are interested in
      • An example of a cluster is the Nano Cluster within the Grid scroll down and find the Nano cluster and click on the name or the graph for Nano

    • You should end up at the Nano Cluster URL
      http://monitor.millennium.berkeley.edu/?c=Nano




      • At this level you can select the statistics for the whole cluster that you want to see for all nodes (at the bottom).
      • At the top you have:

        • the node and CPU counts (these nodes hav 2 CPUs each)
        • the summary graphs for the cluster
        • the pie chart showing how much of the cluster is busy
        • a set of small graphs, one per node

      • Click on the "Physical View" top right and you will find the nodes are

        • The Total CPUs, memory and Disk space
        • the fullest Disk
        • each node has 2 GHz and have 2 GB of memory each
        • This is a good view to spot odd nodes or configurations

      • Go back to the "Full View" by clicking on it or on the Nano Cluster name
      • You can look at lots of different stats and configuration details
      • For example click on the Metric (default is load_one = the CPU load in the last 1 minute) and select "cpu_user".
      • Now the graphs show you the nodes User Utilisation numbers.
      • Select "machine_type" and you see they are powerpc machines - actually the IBM JS20 Blades.
      • Select "os_name" and then "os_release" and you see they are all Linux using the 2.6 kernel.
      • now select the Last field and you will see you can view the graphs over the last hour, day, week, month and year
      • If you then select a node of the Nano cluster from the list of just click on the n1 graph you see the machine (node) details for node 1

    • You should end up at the n1 node URL at
      http://monitor.millennium.berkeley.edu/?c=Nano&h=n1




      • Now you see all the graphs for this one node.

    Have a browse around. Once you understand the levels (Grid, cluster, node) it is relatively easy to work it all out.
    Here is some screen dumps from Grid which is IBM pSeries machines using Ganglia to monitor Virtualization Engines (LPARs in IBM speak) on a few machines:
    Below is the Grid View and the graphs are the summary of each cluster:


    • Cluster demo_p505 is the machine with virtualization
    • The "other" cluster is just a collection of older machines that I also want to monitor

    Below is the Cluster View of cluster demo_p505 with the node graphs at the bottom

    Below is the Physical View of the demo_p505 cluster showing the CPU and memory of each node

    Below shows some of the different stats that can be shown and how you select the different time periods:




    Below is a quick check of the Operating System and version configuration:

    !


    Note: in the above you can see the Virtual I/O Server, two copies of AIX and many copies of Linux running (Red Hat EL4, SUSE SLES 9 and Fedora4)
    Below are the stats for network packets out, disk write and memory free (in that order);



    Below is a look at the physical CPU use - this is a new stats added for POWER5 and shows how much of the real CPU time each LPAR is taking up



    For POWER5 running AIX and Linux LPAR weight, SMT status, Entitlement, Capped status, kernel64bit status and others have also been added.



    Below are the details for one individual node




    Note: the above also shows some POWER5 only options but the bulk is standard Ganglia
    The components of Ganglia
    The components of Ganglia are as follows:

    • The data collector (G)

      • The daemon is a single file and called gmond (Ganglia MONitor Daemon!)
      • Its configuration file /etc/gmond.conf
      • This goes on each node

    • The data consolidator (G)

      • The daemon is a single file and called gmetad (Ganglia METAdata Daemon!)
      • Its configuration file /etc/gmetad.conf
      • You need one of these for each cluster. On massive clusters you can have more than one and a hierarchy.
      • This daemon collects the gmond data set via the network and saves it in a rrdtool database.

    • The database

      • Ganglia uses the well known and respected Open Source tool called - rrdtool

    • The Web GUI tools (G)

      • These are a collection of PHP scripts started by the Webserver to extract the ganglia data and generate the graphs for the website

    • The web server with PHP

      • This could be any web server that supports PHP, SSL and XML
      • Every one uses Apache2 - you are on your own if you use anything else!

    • Addition advanced tools (G)

      • gmetric to add extra stats - in fact anything you like numbers or strings, with units etc.
      • gstat to get at the Ganglia data to do anything else you like

    The parts that are marked up with (G) are part of Ganglia.
    The other parts you have to get and install as pre-requisites namely Apache2, PHP and rrdtool - these may also have pre-requisites.
    The below diagram shows the connections:

    In this diagram you should note the following features:

    • The left hand side shows the gmond daemons process running one on each node of the cluster. This is configured by a single /etc/gmond.conf files on each node. So the installation on the nodes is very simple = just two files that are identical on each node (assuming it is the same hardware and OS). You also need to make sure the gmond process is started every time the machine is rebooted. The gmond.conf only needs three of four lines changed for each cluster like the cluster name and where to forward the stats.
    • The top right hand side shows the more complicated central machine (which normally is one of the nodes in the cluster but does not have to be). On this machine the metad daemon process collects the performance stats and saves them to rrdtool databases. Again this is controlled but the single configuration file - in this case /etc/gmetad.conf. This only needs a couple of lines changed for each cluster too. and if it is to report a grid it needs to have one line of configuration to be able to find the other gmetad daemons and get the stats they hold.
    • The lower right hand side shows the website details. The user browses to the website and invokes the PHP scripts that fetches the data from the co-located rrdtool databases and generates the graphs you have seen above dynamically.
    • The setup of the right versions of Apache2 and with the right built-in features is the hardest part and depends on your operating system.

    A word on suitable Web Servers:

    • On Linux this is really easy as most recent version as the Apache2 and PHP4 or PHP5 usually comes with your Linux distribution i.e. the standard CDROM or should be on your network install server.
    • On the other UNIX platforms, you have to either check that the Apache2 and PHP are available or compile them from the Open Source code directly. Fortunately, this is relatively simple - even for AIX!
    • For AIX, see the "AIX and Open Source" Wiki page at this
      Direct Link to AIX and Open Source Wiki page
      for details on how to compile your own version of Apache2 and PHP with the required features for Ganglia. At the time of writing, I could find no downloadable version that would work with Ganglia for AIX.

    Ganglia Setup
    Before you start
    It can be tricky if you change some things after you have Ganglia running. So before you start:

    • Make sure you are not going to change your hostnames. This is a given in production ,so think about this mainly for proto-type and test systems.
    • Make sure you are not going to change IP addresses.
    • Make sure the timezone, time and date is consistent on all machines in a cluster and the use of NTP is recommended.

    Setting up the simplest possible Ganglia with the two following nodes

    • One Ganglia Client node with just the gmond data collector
    • One ganglia Server node with gmond, gmetad, rrdtool, Apache2 and PHP5

    We will tackle this in three steps:

    • Install and setup of gmond on the Client node
    • Install and setup of gmetad on the Server node
    • Install and setup of Ganglia web "front end" PHP scripts on the Server node

    Before you start I hope you have determined the type of configuration you want in terms of Grid and Cluster names. In this simple case we are going to ignore the Grid level so you need a Cluster name. This can, in fact be the hardest part

    For this worked example we are going to call the Cluster "serenity".
    Note: there is documentation for Ganglia that can be found at
    http://ganglia.sourceforge.net/docs/ganglia.html


    Install and setup of gmond on the Client node - simplest example
    You now need the gmond package for you operating system and platform.
    At the time of writing ganglia 3.0.3 is the latest available but you might be on a later release.
    You are looking for something with a name like:

    • ganglia-gmond-3.0.3-.rpm
      For example:
      Operating System
      RPM Filename
      AIX 5.3
      ganglia-gmond-3.0.3-1.aix5.3.ppc.rpm
      SUSE on POWER machines
      ganglia-gmond-3.0.3-1.suse.ppc64.rpm
      Red Hat on x86
      ganglia-gmetad-3.0.3-1.rhel4.x86_64.rpm

    The Ganglia Download website is at
    http://ganglia.sourceforge.net/downloads.php


    or the AIX or Linux on POWER binaries at
    http://www.perzl.org/ganglia/


    If you can't find the right download then you are going to have to recompile the Open Source code yourself but first check the end of this wiki page.
    To install the gmond daemon/command use:
    rpm -Uvh filename.rpmThis will

    • Install the gmond binary - usually in /usr/sbin or /opt/freeware/sbin
    • Create a /etc/gmond.conf default config file
    • Set your system to automatically restart gmond on reboot (on most systems) and
    • Start the gmond process.

    But you need to edit the /etc/gmond.conf file, kill gmond and restart gmond.
    The gmond command can be used to create the default gmond.conf file like this: gmond -t >/etc/gmond.conf
    The file needs to be changed as follows. Change:
    cluster {
      name = "unspecified"
      owner = "unspecified"
      latlong = "unspecified"
      url = "unspecified"
    }to:
    cluster {
      name = "serenity"
      owner = "unspecified"
      latlong = "unspecified"
      url = "unspecified"
    }Note for POWER5 additions - with Linux on POWER the gmond process needs access to /proc/ppc64/lparcfg but this is only allowed by the root super user. You can chmod this or chown this pseudo file on reboot or change the /etc/gmond.conf file as follows, change in the "globals" sections at the top:
    setuid = yesto
    setuid = noThis will be using the defaults for the rest of the setup including the broadcast address which we will change in the more complex example.
    You now need to kill and restart gmond for this new configuration file.
    On most system you will find the automatic control script either:

    • /etc/init.d/gmond
    • /etc/rc.d/init.d/gmond

    You can then restart gmond. for example: /etc/init.d/gmond restart
    The options to this script are: start|stop|restart|status
    That is all - sounds complex and takes time to explain but much simpler to do. In practice:

    • FTP the rpm file to the machine or better yet have some shared disk over NFS
    • Run the rpm command
    • Edit the gmond.conf file
    • Restart gmond

    That comes to about 10 seconds per node and you could automate it as it is only a couple of files and they are identical on each node.
    To check it is still running use either:

    • /etc/init.d/gmond restart
    • ps -ef | grep gmond

    Help about gmond options can be found using: gmond --help
    Problem determination
    If you find gmond fails to start or is not running after starting it and checking with ps (as above) then it is simple to start gmond in debug mode for more information. You need to run the actual gmond command (not the start up script in /etc/ ... ) and start it in debug mode and in the foreground - i.e. the output is to the screen and a control-C will halt gmond.
    # gmond --debug=9
    udp_recv_channel mcast_join=239.2.11.71 mcast_if=NULL port=8649 bind=239.2.11.71
    tcp_accept_channel bind=NULL port=8649
    udp_send_channel mcast_join=239.2.11.71 mcast_if=NULL host=NULL port=8649
            metric 'cpu_user' being collected now
            metric 'cpu_user' has value_threshold 1.000000
            metric 'cpu_system' being collected now
            metric 'cpu_system' has value_threshold 1.000000
            metric 'cpu_idle' being collected now
    ...
    ...
    In the case above it started normally. If there is a problem starting gmond it should be detail in the output and it will stop. If you watch for a minute or so the gmond debug output, you will notice the gmond processes on all the nodes "chatting" to each other. This allows the auto-discovery of new nodes to a cluster to work but does suggest that on large clusters the default time between sending the performance information could be tuned to be less often.
    One problem that has happened to me and difficult to determine the cause is error messages about "failing to create a multicast server". This is caused by not having a network gateway (default route) setup on your system. Check it you have a route with:

    • on Linux the "route" command
    • on AIX the "netstat -C" command
      and look in the output for a "default destination" line with Flags set to "UG". In a production environment this is unlikely but in a quick test setup for Ganglia it is easy to forget to set gateway.
      Don't use the "route -f" on AIX - you might think it would list full output but it actually flushes (drops) ALL default routes i.e. gateways - the exact opposite of what you want and may cause lots of network problems until you add back the default route i.e. gateway (been there, done that).

    Install and setup of gmetad on the Server node - simplest example
    If the Server side is on a nod of the cluster then you should, of course install the gmond data collector on this node too. It is done exactly the same way as described above and in this simple example it is assumed that you will.
    Now we need to set up the data management side followed by the Web Server. Just like the installation of the gmond daemon we need to locate the RPM file for the gmetad daemon. And you can probably guess what that file is going to be called too. Make sure it is the same version number as the gmond you are using. To install the gmetad daemon/command use: rpm -Uvh filename.rpm
    This will

    • Install the gmetad binary - usually in /usr/sbin or /opt/freeware/sbin
    • Create a /etc/gmetad.conf default config file
    • Set your system to automatically restart gmond on reboot (on most systems)
    • Start the gmetad process.

    You may find that this rpm command will fail due to pre-requisites. This depends on if you have previously install other libraries and tools.
    For my system I needed: rrdtool and libart_lgpl

    There may be other pre-reqs for rrdtool but these are already installed due to Apache and PHP pre-reqs including:



      • libpng-1.2.1-6.aix5.1.ppc.rpm
      • freetype2-2.1.7-2.aix5.1.ppc.rpm
      • zlib-1.2.2-4.aix5.1.ppc.rpm
      • perl 5.8.2

    But you need to edit the /etc/gmetad.conf file, kill gmetad and restart gmetad.
    The file needs to be changed as follows. Find the section with comments about the data_source syntax and add the line:
    data_source "serenity" localhostThe name "serenity" identifies the cluster whose data is to be saved on this machine and "localhost" means that this machine will hold a copy rather than getting the information that is stored on another gmetad database on a different machine. this will be covered more in the more complex example.
    This will be using the defaults for the rest of the setup including the broadcast address which we will change in the more complex example, later on.
    Now restart gmetad using the command: /etc/rc.d/init.d/gmetad restart (or /etc/rc.d/gmetad restart depending your system).
    If you want to confirm gmetad
    To check it is still running useeither:

    • /etc/init.d/gmetad restart
    • ps -ef | grep gmetad

    Also take a look at where the daemon is saving the data in rrdtool databases. The directory is actually in the /etc/gmetad.conf file but the default is in /var/lib/ganglia/rrds. there sould be a series of directories and files in here for each cluster, node and statistic with some summaries too.
    Install and setup of Ganglia web "front end" PHP scripts on the Server node - simplest example
    Warning: This section assumes you have a Web Server with PHP, SSL and XML support built-in
    You now need the front-end PHP scripts package which is independant operating system and platform.
    At the time of writing ganglia 3.0.3 is the latest available but you might be on a later release.
    You are looking for the file: ganglia-web-3.0.3-1.noarch.rpm
    It can be found at the Ganglia Download website is at
    http://ganglia.sourceforge.net/downloads.php


    or the AIX or Linux on POWER binaries at
    http://www.perzl.org/ganglia/


    Install the RPM with: rpm -Uvh ganglia-web-3.0.3-1.noarch.rpm
    Now the bad news - this is installed at /var/www/html/ganglia
    You must move this directory to the directories servers by your web server.
    This directory could be anywhere but popular examples are:

    • /usr/local/apache2/htdocs
    • /srv/www/htdocs
    • /webpages
      For apache this directory is in the httpd.conf configuration file and in the line (for example):
      DocumentRoot "/usr/local/apache2/htdocs"The UNIX owner of the files in this directories files in the lines:
      User apache
      Group apache

    You can rename the "ganglia" directory but we will retain this for this example and will assume your top level web server directory is /usr/local/apache2/htdocs
    Copy the files and set the right owner with:
    cp -R /var/www/html/ganglia /usr/local/apache2/htdocs
    chown -R apache:apache /usr/local/apache2/htdocs/gangliaNow point your browser at the ganglia scripts with the following URL:

    • http:///ganglia

    Problem determination
    Problem 1) If the above URL, does not work your web server does not naturally find index.php files (like it normally find index.html files if you don't explicitly have this at the end of the URL), so try: http:///ganglia/index.php
    Problem 2) If naming the index.php file does not work, try creating a file in the ganglia directory a file called test.php with contents:
    PHP Test
    Make this file readable with:
    chmod 755 /usr/local/apache2/htdocs/ganglia/test.phpThen try the following URL: http:///ganglia/test.php
    This should show you lots of PHP details.
    Problem 3) If it does not work and you only get the words "PHP Test" or just the raw text content of the file or it refuses with an error then you probably do not have PHP support on your web server. Sorry but adding PHP support to what-ever software you run for your web server is beyond the scope of this article. For recent (last 2 years) Linux systems we can recommend Apache that comes with your distribution as these all seem to have built in PHP support - or at least the ones I have tried which is primarily SUSE SLES9 and Red Hat EL4. For the AIX platform, all we can offer is the instructions for using the latest Apache and PHP - this is best done by recompiling the source code and is not as hard as it sounds. Find the details at
    Direct Link to AIX and Open Source Wiki page
    . For other platforms, you need to ask your vendor or start searching the Internet for a suitable download. It can be very hard to determine if a web server and PHP download has all the optional components required to support Ganglia without actually trying it. If you have success perhaps you can add to the list below:

    • AIX - best to recompile Apache and PHP details at
      Direct Link to AIX and Open Source Wiki page

    • SUSE SLES9 - Apache and PHP from the distribution works fine on x86 and POWER hardware.
    • Red Hat EL4 - Apache and PHP from the distribution works fine on x86 and POWER hardware.

    Larger Setup with groups of machines
    In this section we assume you have tried Ganglia or have worked through the above simple example. In the above, we accepted lots of the default setting for the Ganglia gmond and gmetad daemons to make the setup simple. In this section, we are only going to set a few extra options to allow multiple clusters. These clusters (as described above), could be grouping machines together for a number of purposes like:

    • They are all nodes of a HPC super computer working as a whole
    • They are different machine rooms or geographical groupings
    • They are functionally grouped together like web, database, admin, app servers
    • They are the logical partitions of a virtualized machine and share hardware.

    Whatever the reason, the mechanics of setting these clusters up are the same. Actually deciding your clustering groups and their names is far harder than setting them up! Also note that Ganglia is a powerful tool and there are hundreds of options and possible ways of setting it up. We are going to cover here only what is necessary to have a single Ganglia Grid with multiple clusters containing multiple nodes.
    WARNING:
    There are network issues here that need to be understood. Ganglia by default uses network broadcast packets from the gmond daemon which are picked up by any listening gmetad daemon. This is for maximum flexibility and minimum setup. These packets are issues only every few seconds - like between 10 to 15 seconds. As Ganglia is designed to scale to thousands of machines this is unlikely to cause network bottlenecks but you need to be aware this is happening and if you are not he network administrator then you need to discuss this with them. The default is to broadcast with UDP an IP address of 239.2.11.71 and uses port 8649. These can be changed but beyond changing the IP address, it is not covered here in further detail.
    Below is a diagram showing three clusters (green, blue and yellow) being available from one web server co-hosted with the yellow cluster:

    Here we have four examples of clusters
  • Yellow Cluster - with local nodes and supporting the front end user interface

    • This cluster is supporting the web server from which users can view the Ganglia data
    • It will show data for the locally supported node (yellow nodes) and the remotely supported blue and green clusters
    • The stats for the yellow nodes

  • Yellow Cluster - without local nodes and supporting the front end user interface

    • As above but just running the front end web server, as it is not mandatory that there are locally supported nodes

  • Blue Cluster - this is a group of nodes with no local data repository

    • The blue nodes will shared their performance data and it is collected on one node
    • The gmetad daemon of the Yellow cluster collects this information and stores it
    • If yellow cluster was not saving the stats the information would be lost

  • Green Cluster - this is a group of nodes with local data repository

    • The green nodes will shared their performance data and it is collected on one node
    • The gmetad daemon of this cluster collects this information and stores it
    • This means its data is independent of yellow cluster save the data (unlike Blue cluster).

    This shows some of the options for Ganglia setup there are many more. In practice you would simplify things and not have one of each type. I suggest two typical setups:

    • Lots of green clusters and one yellow cluster with local nodes
    • Lots of blue clusters and one yellow cluster without local nodes

    Note: make all the gmond.conf files the same in each cluster to make life simple.
    So how to setup the Green Cluster?
    This is the same as the simple example above but we don't need to setup the web server, PHP or the Ganglia PHP scripts. The gmond daemon on each node of the cluster broadcasts the performance and configuration information and the gmetad daemon for the cluster saves the data in the rrdtool database. This data is later sent on to the higher level gmetad with the web server. A simple way to get this to work effectively is to install the gmond and gmetad processes (just as before) and then make the following changes to the gmond.conf and gmetad.conf files. We want all the nodes to only send their data to the gmetad daemon for this cluster. There is no point in others knowing about or seeing these packets. This is controlled by the multicast parameters. In this example:

    • cluster name green
    • The gmetad is running on a machine with a host name of "green23" and also running a gmond daemon.

    Change the cluster name = ""unspecified" to the cluster name green and each refences to 239.2.11.71 changes to green23 (or its IP address if you prefer). The top of the /etc/gmond.conf file will look like this:
    /* This configuration is as close to 2.5.x default behavior as possible
       The values closely match ./gmond/metric.h definitions in 2.5.x */
    globals {
      daemonize = yes
      setuid = yes
      user = nobody
      debug_level = 0
      max_udp_msg_len = 1472
      mute = no
      deaf = no
      host_dmax = 0 /*secs */
      cleanup_threshold = 300 /*secs */
      gexec = no
    }
    /* If a cluster attribute is specified, then all gmond hosts are wrapped inside
    * of a  tag.  If you do not specify a cluster tag, then all  will
    * NOT be wrapped inside of a  tag. */
    cluster {
      name = "green"
      owner = "unspecified"
      latlong = "unspecified"
      url = "unspecified"
    }
    /* The host section describes attributes of the host, like the location */
    host {
      location = "unspecified"
    }
    /* Feel free to specify as many udp_send_channels as you like.  Gmond
       used to only support having a single channel */
    udp_send_channel {
      mcast_join = green23
      port = 8649
    }
    /* You can specify as many udp_recv_channels as you like as well. */
    udp_recv_channel {
      port = 8649
      family = inet4
    }
    ...Notes:

    • Remember for POWER5 and Linux LPARS, you should have "setuid = no".
    • It is tempting to set the owner, latlong, url and location fields but these will not normally appear on the resulting Ganglia website. There may be advanced settings to get this information but the writer has not found this yet! If you know how to display these then please add it here. The latlong field has been used to draw world maps and display the sites of ganglia clusters on it. the other fields could be useful information too. For example, knowing who to notify of a problem or knowing how to find a machine that has failed in a large computer room
    • Don't forget to update all the /etc/gmond.conf files on all the nodes.
    • Don't forget to restart all the gmond daemons on all the nodes.

    For the machine or LPAR running gmetad you need to add to the /etc/getad.conf a single data source line, so that is gather the data and saves it in local rrdtool files as below:
    data_source "green" localhost
    So how to setup the Blue Cluster?
    This is very much like the Green cluster except there is no node running gmetad. Just select one node say, for example, "bigblue" and replace the 239.2.11.71 with bigblue (or its IP address if you prefer) and change the cluster name to "blue". Change all the /etc/gmond.conf files and restart all the daemons. This node bigblue will forward on the stats to the Yellow cluster when asked. The selection of which node is not important it just needs to be available.
    So how to setup the Yellow Cluster?
    This is very much like the Green cluster except the setup for the gmetad deamon machine is a bit special. Again change the /etc/gmond.conf file and set the name of the cluster to "yellow" and replace the 239.2.11.71 changes to the hostname of the node running gmetad (or its IP address if you prefer)Change all the /etc/gmond.conf files and restart all the daemons.
    The details for the /etc/gmetad.conf is a little more complex. Assuming we already have Blue and Green clusters running. We have local nodes in the Yellow cluster and we want to also display the other clusters. At the top of the /etc/gmetad.conf file we add the following lines.
    data_source "yellow" localhost
    data_source "blue" bigblue
    data_source "green" green23This directs gmetad to:

    • contact the local gmond daemon to get the stats for the yellow nodes
    • contact the bigblue node to get information about the blue nodes - as this is a gmond daemon it will save the data locally in rrdtool
    • contact the green23 node to get information about the green cluster - as this has gmetad running and saving local data, yellow will collect summary stats from green23 and will ask green23 for more data if required for the front end website graphs.

    In addition we want them to appear in one Grid called "Rainbow". Further down the /etc/gmetad.conf file we set the following line:
    gridname "Rainbow"Now the Ganglia website should display a grid called Rainbow and have three clusters of yellow, green and blue. If you drill down into one of the clusters you should see only the nodes of that cluster and the summaries of the cluster should reflect just the correct nodes.
    POWER5 additions


    The POWER5 and POWER5+ machines from IBM can run AIX and Linux on POWER in logical partitions (LPARs) that are less than one CPU or parts of the CPU. The ranges from 0.1 CPUs up to the maximum of 64 CPUs in increments of 0.01 of a CPU. These are called Micro-partitions or Shared Processor partitions. Additions have been made to Ganglia to support the statistics from these types of partition so that the LPARs form a Ganglia cluster. This makes Ganglia an ideal extra performance tools for monitoring such partitioned machines.
    What are the new POWER5 stats that are available?
    The following additional metrics are defined for AIX and Linux on POWER are:
    Ganglia stat name
    Value
    capped
    boolean 0=false, 1 = true
    cpu_entitlement
    a number
    cpu_in_lpar
    a number
    cpu_in_machine
    a number
    cpu_in_pool
    a number
    cpu_pool_idle
    a number
    cpu_used
    a number
    disk_read
    a number
    disk_write
    a number
    kernel64bit
    boolean 0=false, 1 = true
    lpar
    boolean 0=false, 1 = true
    lpar_name
    a string
    lpar_num
    a number
    oslevel
    a string
    serial_num
    a string
    smt
    boolean 0=false, 1 = true
    splpar
    boolean 0=false, 1 = true
    weight
    boolean 0=false, 1 = true
    The meaning of these new performance stats should be fairly obvious to experienced systems administrators familiar for POWER5 and Micro-partitions except cpu_in_lpar which is the Virtual Processor number in POWER5 and just the number of CPUs in POWER4 machines. The stats should also work on non-POWER5 machines some details will clearly not possible but they should be reported in a suitable way. If you are new to POWER5 and the Advanced POWER Virtualisation (APV) features then there are two excellent Redbooks to read up on the subject:
    Redbook
    URL for Downloading the .pdf
    Advanced POWER Virtualization on IBM System p5
    http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247940.html



    Advanced POWER Virtualization on IBM eServer p5 Servers: Architecture and Performance Considerations
    http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg245768.html



    Where do I get Ganglia?
    The Ganglia download website has the pre-built binaries for Fedora and RedHat EL4 for two platforms which are x86 (AMD and Intel) and ia64 (Itanium). The download website is
    http://ganglia.sourceforge.net/downloads.php


    take the "ganglia monitor core" link or the AIX or Linux on POWER binaries at
    http://www.perzl.org/ganglia/


    . If you have such Linux system and want to run a quick test to learn then these are recommended and you have Apache and PHP support with the distribution.
    Where do I get the Source code for Ganglia, Apache and PHP?
    You can recompile the Ganglia daemons from the open source code found on the Ganglia prime website but it can be a little tricky as it needs quite a few supporting libraries and tools. Don't let me put your off. If you are a developer this can be done by downloading the code and then finding the latest versions of the required packages. It is simply a case of trying the ./configure and make commands and waiting for errors to tell you what is missing.
    There are Ganglia intructions for compiling the deamon at
    http://ganglia.sourceforge.net/docs/ganglia.html




    Many of the websites and hints for recompiling Apache 2 and PHP 5 are useful so check out the AIX and Open Source wiki page
    aixopen
    . You can get the POWER5 source code updates from
    http://www.perzl.org/ganglia/


    Note: the Ganglia front end PHP scripts are platform independant.
    You could run just the gmond daemons on the POWER based machines (AIX and/or Linux) and the gmetad plus Apache + PHP on a x86 Linux based machine.
    Where do I download the binaries for standard Ganglia or Ganglia for AIX and Linux on POWER with the POWER5 additions?

    POWER AIX
    POWER Linux
    x86 Linux
    Other platform or Operating System
    Web Server Apache2+PHP5
    See details at
    aixopen

    with Linux Distro
    with Linux Distro
    Ask vendor
    Ganglia FE PHP scripts
    Platform independent set



    Platform independent set



    Platform independent set



    Platform independent set



    gmetad
    POWER5 RPMs



    POWER5 RPMs



    Ganglia download site


    (2)
    Need to compile (2)
    gmond
    POWER5 RPMs



    POWER5 RPMs



    Ganglia download site



    Need to compile (2)

    • (2) If you want the web front end on non POWER machines (like a Linux/PC), just make sure you run the gmetad on POWER5 (to enable the POWER5 additions) and get the web front end gmetad to talk to the POWER5 gmetad for the data.

    This is the problem at the moment. The POWER5 additions are being offered to the Ganglia developers as updates and will hopefully, be in the standard code when it is next released - possibly 3.1. The binary RPMs with POWER5 additions for gmond and gmetad are for

    • AIX 5L v5.1
    • AIX 5L v5.2
    • AIX 5L v5.3
    • Linux SUSE SLES9 and SLES10 for POWER
    • Linux Red Hat EL AS 4 for POWER

    Part of the design of Ganglia means you need a gmetad that requests these extra POWER5 stats from gmond, so you will need to run gmetad on POWER based AIX and Linux to be able to see the new POWER5 stats. This means the Ganglia web server normally has to also be on AIX or Linux on POWER. So in this case, you could run the gmond daemons, gmetad daemon and and Apache + PHP on the POWER based machines (AIX and/or Linux).
    Advanced topics
    Using gmetric to add more stats
    You have your Ganglia cluster working nice but you start thinking "I wish I could also monitor XYZ". Well, what ever XYZ is. if you can get a number or a string at the command line then you can add it to the Ganglia monitored data.
    Examples for AIX might be:

    • Transaction rate of your database - this will depend on the database
    • Number of database users connected - this will depend on the database
    • Machine model - on AIX use: lsattr -El sys0 -a modelname -F value
    • Machine firmware level - on AIX use: lsattr -El sys0 -a fwversion-F value
    • Number of disks - on AIX use: lspv | wc -l

    You can read the gmetric documentation at
    http://ganglia.sourceforge.net/docs/ganglia.html


    - it is near the bottom.
    To add the firmware string:
    gmetric --name firmware --value `lsattr -El sys0 -a modelname -F value` --type "string"To add the the number of disks string:
    gmetric --name number_of_disks --value `lspv | wc -l` --type int32To add the number of transaction and assuming you have script that will work this out called "transactions" that returns a number with a decimal point - you will have to write this yourself!
    gmetric --name tpm --value `/usr/local/bin/transactions` --type doubleThe above will only save the statistics once. The firmware level is unlikely to change, the number of diks could change and the number of transactions per minute will definitely change. To get these always u to date, it is recommended to get the commands regularly (run once every 60 seconds) via cron.
    Seemingly, by magic these new stats or strings will appear on the Ganglia website. Find the machine involved and all the data about the node. then click on the "Gmetrics" link - it was not obvious to me the first time that this is where the new data would appear! You may have to give it a minute or two for the values to appear.
    Using gstat to extract data
    The gstat tool can let you know information about your cluster it can be useful to determine an number of things
    For example, to check the hosts that are up or dead just run: gstat
    $ gstat
    CLUSTER INFORMATION
           Name: demo_p505
          Hosts: 9
    Gexec Hosts: 0
    Dead Hosts: 0
      Localtime: Wed Jun 21 17:51:05 2006
    There are no hosts running gexec at this timeYou can also get more information about the status with the --all options
    $gstat --all --single_line
    CLUSTER INFORMATION
           Name: demo_p505
          Hosts: 9
    Gexec Hosts: 0
    Dead Hosts: 0
      Localtime: Wed Jun 21 17:56:29 2006
    CLUSTER HOSTS
    Hostname                     LOAD                       CPU              Gexec
    CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System, Idle, Wio]
    daic4.aixncc.uk.ibm.com     4 (    0/   66) [  0.00,  0.00,  0.00] [   0.0,   0.0,   0.0,  99.9,   0.0] OFF
    daic3.aixncc.uk.ibm.com     4 (    0/   82) [  0.00,  0.00,  0.00] [   0.0,   0.0,   0.1,  99.8,   0.1] OFF
    daic2.aixncc.uk.ibm.com     4 (    0/   57) [  0.00,  0.00,  0.00] [   0.0,   0.0,   0.1,  99.9,   0.0] OFF
    daivios1.aixncc.uk.ibm.com  4 (    0/   77) [  0.00,  0.00,  0.00] [   0.0,   0.0,   0.0,  99.9,   0.0] OFF
    dainim.aixncc.uk.ibm.com    4 (    0/   77) [  0.00,  0.00,  0.00] [   0.0,   0.0,   0.0, 100.0,   0.0] OFF
    dai6.aixncc.uk.ibm.com      4 (    0/   86) [  0.08,  0.03,  0.01] [   0.1,   0.0,   0.1,  99.5,   0.3] OFF
    daic1.aixncc.uk.ibm.com     4 (    1/   60) [  1.00,  1.02,  1.09] [   0.0,   0.0,   0.0, 100.0,   0.0] OFF
    daic5.aixncc.uk.ibm.com     2 (    0/   53) [  0.00,  0.00,  0.00] [   0.0,   0.0,   0.0,  99.9,   0.0] OFF
    daivios.aixncc.uk.ibm.com   4 (    1/   74) [  2.01,  2.09,  1.94] [   0.1,   0.0,   0.6,  99.2,   0.0] OFF
    The Cluster Summary Pie Chart


    In the summary of the Cluster you are shown a pie chart for the load_one (i.e. the CPU load over the last minute for the various nodes in different colours. However, you might like to display some other number in the pie chart. To do this, go to the Ganglia web server directory. There you should find a file called conf.php - this file has all sorts of interesting options. The default_metric decides the stats used for the pie chart. For the POWER5 addition I wanted to have the new metric of cpu_used (this is the physical CPU used by the partition), so I changed
    #
    # Default metric
    #
    $default_metric = "load_one";to
    $default_metric = "cpu_used";I have not tried this with a number that is not a percentage as it may fail or need other things changed.
    Default sort order of the machines
    You can change default sort to sorting by hostname in order to have the nodes always in same order. This stops the order from changing depending on the statistics that you are looking at - which I find confusing. To achieve this edit get_context.php and change:
    if (!$sort)
          $sort = "descending";to
    if (!$sort)
          $sort = "by hostname";
    POWER5 Cross Partition/Whole Machine/CEC/Global LPAR View graphs - via Automated Add-on


    This Add-on extracts the CPU use from the rrdtool databases in the /var/lib/ganglia/rrds directory.
    With this POWER5 Add-On you have a new graph at the Ganglia Cluster (in our case the LPARs of one machine) of the added up CPU use and the size of the Shared CPU Pool.
    An example is below:

    You can see the dark blue line around this graph (this is the Zoomable Add-on) which means you can click on it to get a much larger and more detailed graph as below.

    In the above graph you can see all the Logical Partitions (LPARs) on this two CPU pSeries p505 machine - this includes the following operating systems:

    • AIX 5.3 and AIX 6.1
    • Linux on POWER SUSE SLES 9, Fedora 5 and RedHat 4
    • Virtual I/O Server (called the p505ivm partition)

    This is a crash and burn machine used for demonstrations the "fake workload" is generated via nstress ncpu programs on the red partition p505lpar9.  The other workloads were started by hand to create more interesting graph. You can see that the p505ivm LPAR (Virtual I/O Server) is in dark blue. Also note how little CPU time the mostly idle LPARs are taking - it is around 0.02% to 0.04% of a CPU. This is likely to just be the regular (100 times a second) timer interrupts, device drivers and daemons ticking over.  At the top of the chart above is that the number of CPUs in the Shared Pool is a black line at the 2 CPU level.  When using the Integrated Virtualization Manager (IVM) all CPUs are normally in the shared pool. We can also see when the three LPARs are busy we take practically all the CPU time.
    Where to get these Add-Ons?

    Further cool additions - finer control of the graph times
    By default Ganglia records enough data in the rrdtool database to draw last hour, day, week, month and year graphs but if you make a one line change to the /etc/gmetad.conf file to increase the data held and a little bit more disk space then you can use the Calendar add-on where you can ask for graphs
    between any two date and times. For example, if you find a peak from three weeks ago you can ask Ganglia to graph just that half day or even hour.
    See the below for
    selecting the start or end time date and time:

    Further cool additions - custom graphs
    Yes it gets even better. Now you can specify what you want to graph like what stats, labels, which dates and colours and more. This is a further add-on for custom graphs. See below on the options for specifying the graphs

    We have the CPU entitlement, and actually used plus the numbers of CPUs in the machine and pool (if different) for a particular peak a few days ago.
    and the generated the below graph - the graph details can be saved and used again later on.

    POWER5 Cross Partition/Whole Machine/CEC/Global LPAR View graphs - via Manually written PHP scripts


    As all the LPAR data is held in rrdtool databases in the /var/lib/ganglia/rrds directory of the gmetad and webserver machine, it is possible to extract the Physical CPU used in each Logical Partition (LPAR) of the machine to see the use of CPU power as a whole. The same goes for memory, disk and network stats.
    This is still a work in progress but it is a start. Below are a few of the graphs generated, so far, and then how this was done is explained. Click on the any thumbnail graph for a bigger version:
    CPU Last Hour


    CPU Last Day


    CPU Last Week


    CPU Last Month


    CPU Last 3 Months


    CPU Last year


    Memory Free


    Memory Total


    Network In


    Network Out


    Disk Read


    Disk Write


    Run Queue


    Each of these graphs is generated by a PHP script. Below is a sample one:
    "Content-type: image/gif");
    passthru("/usr/bin/rrdtool graph - \
    --title 'Global LPAR View for machine demo_p505 - Physical-CPU-Use for Last-Hour' \
    --vertical-label 'Physical-CPUs' \
    --start end-1h \
    --width 800 \
    --height 600 \
    --lower-limit 0 \
    DEF:LABEL1=/var/lib/ganglia/rrds/demo_p505/daivios.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    AREA:LABEL1#0000FF:daivios \
    DEF:LABEL2=/var/lib/ganglia/rrds/demo_p505/daic1.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL2#00FF00:daic1 \
    DEF:LABEL3=/var/lib/ganglia/rrds/demo_p505/daic11.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL3#FF0000:daic11 \
    DEF:LABEL4=/var/lib/ganglia/rrds/demo_p505/lpar9.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL4#00FFFF:lpar9 \
    DEF:LABEL5=/var/lib/ganglia/rrds/demo_p505/daic3.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL5#FFFF00:daic3 \
    DEF:LABEL6=/var/lib/ganglia/rrds/demo_p505/daic4.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL6#FF00FF:daic4 \
    DEF:LABEL7=/var/lib/ganglia/rrds/demo_p505/daic5.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL7#000088:daic5 \
    DEF:LABEL8=/var/lib/ganglia/rrds/demo_p505/dainim.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL8#008800:dainim \
    DEF:LABEL9=/var/lib/ganglia/rrds/demo_p505/dai6.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL9#880000:dai6 \
    DEF:LABEL10=/var/lib/ganglia/rrds/demo_p505/daivios1.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL10#008888:daivios1 \
    DEF:LABEL11=/var/lib/ganglia/rrds/demo_p505/lpar10.aixncc.uk.ibm.com/cpu_used.rrd:sum:AVERAGE \
    STACK:LABEL11#888800:lpar10 \
    DEF:LABEL12=/var/lib/ganglia/rrds/demo_p505/daivios.aixncc.uk.ibm.com/cpu_in_pool.rrd:sum:AVERAGE \
    LINE3:LABEL12#FF0000:CPU_in_pool \
    2>/tmp/err");
    ?>These PHP scripts are generated via a small configuration file and a shell script.
    Configuration files looks like this (called data505):
    demo_p505
    daivios.aixncc.uk.ibm.com daivios 0000FF
    daic1.aixncc.uk.ibm.com daic1 00FF00
    daic11.aixncc.uk.ibm.com daic11 FF0000
    lpar9.aixncc.uk.ibm.com lpar9 00FFFF
    daic3.aixncc.uk.ibm.com daic3 FFFF00
    daic4.aixncc.uk.ibm.com daic4 FF00FF
    daic5.aixncc.uk.ibm.com daic5 000088
    dainim.aixncc.uk.ibm.com dainim 008800
    dai6.aixncc.uk.ibm.com dai6 880000
    daivios1.aixncc.uk.ibm.com daivios1 008888
    lpar10.aixncc.uk.ibm.com lpar10 888800Notes:
  • First line is the cluster name as found in the directory in /var/lib/ganglia/rrds
  • The rest of the lines are one per LPAR with:
  • Hostname for the node as found in /var/lib/ganglia/rrds// directory
  • Short hand name you want on the graph
  • Six digit Hexadecimal number for the colour
    The shell script is here (called create_global):
    write_php()
    {
    title=$1
    time=$2
    period=$3
    variable=$4
    poolline=$5
    units=$6
    i=1
    read machine
    printf "
    printf "header(\"Content-type: image/gif\");\n"
    printf "passthru(\"/usr/bin/rrdtool graph - %c\n" '\'
    printf -- "--title \'Global LPAR View for machine %s - %s for %s\' %c\n" $machine $title $time '\'
    printf -- "--vertical-label \'%s\' %c\n" $units '\'
    printf -- "--start %s %c\n" $period '\'
    printf -- "--width 800 %c\n" '\'
    printf -- "--height 600 %c\n" '\'
    printf -- "--lower-limit 0 %c\n" '\'
    #do the first line as it need the AREA tag - other lines need STACK
    read node1 name1 colour1
    printf "DEF:LABEL%d=/var/lib/ganglia/rrds/%s/%s/%s.rrd:sum:AVERAGE %c\n" $i $machine $node1 $variable '\'
    printf "AREA:LABEL%d#%s:%s %c\n" $i $colour1 $name1 '\'
    while read node name colour
    do
    let i=i+1
    printf "DEF:LABEL%d=/var/lib/ganglia/rrds/%s/%s/%s.rrd:sum:AVERAGE %c\n" $i $machine $node $variable '\'
    printf "STACK:LABEL%d#%s:%s %c\n" $i $colour $name '\'
    done
    if [[ "$poolline" == "yes" ]]
    then
    let i=i+1
    printf "DEF:LABEL%d=/var/lib/ganglia/rrds/%s/%s/cpu_in_pool.rrd:sum:AVERAGE %c\n" $i $machine $node1 '\'
    printf "LINE3:LABEL%d#FF0000:CPU_in_pool %c\n" $i '\'
    fi
    printf -- "2>/tmp/err\");\n"
    printf -- "?>\n"
    }
    # Main script here
    # Main script here
    input_file=$1
    read machine  "Physical-CPU-Use" "Last-Hour"    end-1h cpu_used yes "Physical-CPUs" ${machine}_hour.php
    write_php "Physical-CPU-Use" "Last-Day"     end-1d cpu_used yes "Physical-CPUs" ${machine}_day.php
    write_php "Physical-CPU-Use" "Last-Week"    end-1w cpu_used yes "Physical-CPUs" ${machine}_week.php
    write_php "Physical-CPU-Use" "Last-Month"   end-1m cpu_used yes "Physical-CPUs" ${machine}_month.php
    write_php "Physical-CPU-Use" "Last-Quarter" end-3m cpu_used yes "Physical-CPUs" ${machine}_quarter.php
    write_php "Physical-CPU-Use" "Last-Year"    end-1y cpu_used yes "Physical-CPUs" ${machine}_year.php
    write_php "CPU-Entitlement"  "Last-Day"    end-1d cpu_entitlement yes "Physical-CPUs" ${machine}_entitle.php
    write_php "Memory-Free"      "Last-Day"    end-1d mem_free no "Bytes" ${machine}_mem_free.php
    write_php "Memory-Total"     "Last-Day"    end-1d mem_total no "Bytes" ${machine}_mem_total.php
    write_php "Network-In"       "Last-Day"    end-1d bytes_in no "Bytes" ${machine}_network_in.php
    write_php "Network-Out"      "Last-Day"    end-1d bytes_out no "Bytes" ${machine}_network_out.php
    write_php "Disk-Read"        "Last-Day"    end-1d disk_read no  "Bytes" ${machine}_disk_read.php
    write_php "Disk-Write"       "Last-Day"    end-1d disk_write no "Bytes" ${machine}_disk_write.php
    write_php "Run-Queue"        "Last-Day"    end-1d proc_run no "Processes" ${machine}_proc_run.phpThe script is is called as follows and within a directory of the webserver
    create_global data505
    This generates the 14 PHP scripts. When the PHP scripts are accessed via the web broswer, it generates the graphs on the fly. You might want to make this simple
    via a webpage containing something like this:
    Welcome to this Ganglia Cross Partition or Global LPAR View
    CPU Graphs Over Time
    Last Hour
    Last Day
    Last Week
    Last Month
    Last Quarter
    Last Year
    For the Last Hour only
    Entitlement
    Run Queue
    Memory Total
    Memory Free
    Network In
    Network Out
    Disk Read
    Disk Write
    Using unicast for multiple cluster configuration
    Ganglia webnode is a LPAR on p550. We have to machines p505 and p550, LPARs from each one should appear in a different cluster.
    On ganglia web-node I used following configuration for gmetad:
    data_source "p550" localhost
    data_source "p505" 172.28.255.203And this gmond.conf:
    cluster {
      name = "p550"
      owner = "Tomas Baublys"
      latlong = "unspecified"
      url = "unspecified"
    }
    #...
    udp_send_channel {
    # The headnode of p550 cluster ist webnode itself
    host = 172.28.255.100
    port = 8666
    }
    /* You can specify as many udp_recv_channels as you like as well. */
    udp_recv_channel {
      port = 8666
    }On all p550 lpars I used the same gmond.conf above.
    On the p505 cluster I determined one lpar (172.28.255.203) to be the head node (using gmond only) and all other sending information to it. I used this gmond.conf for all p505 LPARs:
    cluster {
      name = "p505"
      owner = "Tomas Baublys"
      latlong = "unspecified"
      url = "unspecified"
    }
    #...
    udp_send_channel {
    host = 172.28.255.203
    port = 8666
    }
    udp_recv_channel {
    port = 8666
    Larger detailed graphs via an enhanced Ganglia Web-Frontend script
    People have noted that some Ganglia websites on the Internet, allow you to click on the small Ganglia generated graphs to get a much large and more detailed graph and wondered how to get this on their own Ganglia system. The change is very simple to make and makes the graphs much more valuable.
    Take this
    Link


    to Michael's Webpage with the details and download of the enhanced scripts.
    Scenario: setting up Unicast configuration through Firewalls
    Have a look at this Wiki page if you need to go for Unicast through a Firewall for more secure networks

    Add your tips here ... please!

    The postings on this site solely reflect the personal views of the authors and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.
    -->
    Comments  (
    Hide
    )
    The rpm-installation of the mentioned ganglia-web-3.0.3-1.noarch.rpm from
    http://ganglia.sourceforge.net/downloads.php


    , which is supposed to be plattform independent failed to install like:
    rpm -Uvh ganglia-web-3.0.3-1.noarch.rpm
    package ganglia-web-3.0.3-1 is for a different operating system
    Thus we used the php-files from the ganglia-source like below, which works fine
    wget
    http://belnet.dl.sourceforge.net/sourceforge/ganglia/ganglia-3.0.3.tar.gz


    gunzip ganglia-3.0.3.tar.gz; tar xvf ganglia-3.0.3.tar
    mv ganglia-3.0.3/web /usr/local/apache2/htdocs/ganglia

    Posted by
    georg kuehnberger
    at Dec 17, 2006 18:38 |
    Permalink

    -->
    For loading the ganglia-web-3.0.3-1.noarch.rpm install problem use
    rpm -Uvh --ignoreos ganglia-web-3.0.3-1.noarch.rpm

    Posted by
    Nigel Griffiths
    at Apr 20, 2007 17:14 |
    Permalink



    本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/6436/showart_576673.html
  • 您需要登录后才可以回帖 登录 | 注册

    本版积分规则 发表回复

      

    北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
    未成年举报专区
    中国互联网协会会员  联系我们:huangweiwei@itpub.net
    感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

    清除 Cookies - ChinaUnix - Archiver - WAP - TOP