Mittwoch, 26. Februar 2014

Setting Up Zenoss for Monitoring Grails Applications


This week I spent setting up a simple monitored set of virtualized Grails application servers. As my monitoring service I chose Zenoss.

Multi-Machine Setup with Vagrant

In order to simulate a production-like private network I created a multi-machine configuration for Vagrant comprising 3 machines:
  • 10.0.0.2 is the installation target for the Zenoss server
  • 10.0.0.3 and 10.0.0.4 are the two to-be-monitored application servers, each configured as a blue/green deployable Tomcat for hosting Grails applications

Vagrantfile
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.define "zenoss" do |zenoss_server|
    zenoss_server.vm.box = "CentOS-6.2-x86_64"
    zenoss_server.vm.box_url = "https://dl.dropboxusercontent.com/u/17905319/vagrant-boxes/CentOS-6.2-x86_64.box"

    zenoss_server.vm.network :private_network, ip: "10.0.0.2"
    # ...
  end

  (1..2).each do |idx|
    config.vm.define "grails#{idx}" do |grails_web|
      grails_web.vm.box = "squeezy"
      grails_web.vm.box_url = "https://dl.dropboxusercontent.com/u/17905319/vagrant-boxes/squeezy.box"

      grails_web.vm.network :private_network, ip: "10.0.0.#{2 + idx}"
      # ...
    end
  end
end

Installing the Zenoss server

All the machines are provisioned with Chef. For the server, there is a dedicated role in roles/zenoss_server.rb. Besides filling the run list with the zenoss::server recipe, it configures various attributes for Java and the Zenoss installation.
Vagrantfile
  config.vm.define "zenoss" do |zenoss_server|
    # ...
    zenoss_server.vm.provision :chef_solo do |chef|
      # ...
      chef.add_role "zenoss_server"
      # ...

      chef.json = {
        domain: "localhost"
      }
    end
  end
roles/zenoss_server.rb
name "zenoss_server"
description "Configures the Zenoss monitoring server"

default_attributes(
  "zenoss" => {
    "device" => {
      "properties" => {
        "zCommandUsername" => "zenoss",
        "zKeyPath" => "/home/zenoss/.ssh/id_dsa",
        "zMySqlPassword" => "zenoss",
        "zMySqlUsername" => "zenoss"
      }
    }
  }
)

override_attributes(
  "java" => {
    "install_flavor" => "oracle",
    "jdk_version" => "7",
    "oracle" => {
      "accept_oracle_download_terms" => true
    }
  },
  "zenoss" => {
    "server" => {
      "admin_password" => "zenoss"
    },
    "core4" => {
      "rpm_url" => "http://downloads.sourceforge.net/project/zenoss/zenoss-4.2/zenoss-4.2.4/4.2.4-1897/zenoss_core-4.2.4-1897.el6.x86_64.rpm?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fzenoss%2Ffiles%2Fzenoss-4.2%2Fzenoss-4.2.4%2F4.2.4-1897%2F&ts=1392587207&use_mirror=skylink"
    },
    "device" => {
      "device_class" => "/Server/SSH/Linux"
    }
  }
)

run_list(
  "recipe[zenoss::server]"
)

Installing the Application Servers

In order to prepare an application server for monitoring, you have to install the SNMP daemon. The Simple Network Management Protocol provides insights into various system parameters like CPU utilization, disk usage, RAM statistics. I bundled my common run list and attributes in roles/monitored.rb
Vagrantfile
  (1..2).each do |idx|
    config.vm.define "grails#{idx}" do |grails_web|
      # ...
      grails_web.vm.provision :chef_solo do |chef|
        # ...
        chef.add_role   "monitored"

        chef.json = {
          domain: "localhost",
        }
      end
    end
  end
roles/monitored.rb
name "monitored"
description "Bundles settings for nodes monitored by Zenoss"

default_attributes()

override_attributes(
  "snmp" => {
    "snmpd" => {
      "snmpd_opts" => '-Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid'
    },
    "full_systemview" => true,
    "include_all_disks" => true
  }
)

run_list(
  "recipe[snmp]"
)

Signing up the Application Servers for Monitoring

Now we must acquaint the Application Servers with Zenoss. As a first step, I did this manually via the Zenoss Web UI. The Web UI is only reachable through the server's loopback interface. To make it accessible from my browser, I tunneled HTTP traffic to the loopback device via SSH:
Terminal
ssh -p 2222 -o "UserKnownHostsFile /dev/null" -o "StrictHostKeyChecking no" -N -L 8080:127.0.0.1:8080 root@localhost
# Password is `vagrant'
Now I can access the UI from localhost:8080.

Logging in with the credentials from roles/zenoss_server.rb, we can access the dashboard:

Switching over to the Infrastructure tab, we can Add Multiple Devices:

We input the IP addresses of our two virtual app servers, 10.0.0.3 and 10.0.0.4, and keep the default value for the device type, Linux Server (SNMP).

Now, Zenoss adds these two nodes to its server pool in the background:

Having finished this, Zenoss starts recording events and measurements of the nodes. This is an example from a simple load scenario of a Grails application on node grails2, 10.0.0.4

Now you are prepared for further exploration of the server performance jungle. All my sources are available from GitHub.

Dienstag, 11. Februar 2014

Zero-Downtime Deployment for Grails Applications

Often, it's okay to have a (short) downtime when deploying a new version of your application. But my recent customer is into a time-critical round-the-clock business. Downtime is very critical, there is only a short window of time for deployment, once a day. In this context, continuous deployment is not an option, which limits the level of support and the possibilities for feedback.

The solution is Blue/Green Deployment. One deploys a new version to an offline service and moves the incoming traffic from the old version to the new one once it's deployed. I adapted a solution from Jakub Holy.
There are several options to deploy different version of an application in parallel to Tomcat. I want to discuss them shortly:

Different context roots

Deploying to different context roots within the same Tomcat container, e.g. localhost:8080/version1, localhost:8080/version2 etc.

Pros

  • No changes to the Tomcat installation or configuration

Cons

  • Requires URL rewriting by the reverse proxy which is harder to configure.
  • Very likely, due to memory leaks, the Tomcat instance will run out of memory (PermGen), and there is no possibility to restart the instance without downtime.

Different Tomcat listeners

One can start multiple listeners within the same container, providing the applications on different ports, e.g. localhost:8080/ and localhost:8081

Pros

  • No changes to the Tomcat installation (startup scripts, default environment variables and paths).

Cons

  • Some changes to the Tomcat config file, server.xml, necessary.
  • Very likely, due to memory leaks, the Tomcat instance will run out of memory (PermGen), and there is no possibility to restart the instance without downtime.

Different Tomcat instances

Last, but definitely not least, there is the "big" solution; start two completely separate Tomcat instances.

Pros

  • It is possible to restart the offline Tomcat instance without any downtime.
  • This enables repeated deployments without running out of memory at some time.

Cons

  • Requires very many changes to the system configuration, because every configuration artifact must be available twice. You need two startup scripts, two Catalina home directories, two server.xml, context.xml, two logging directories and so on.
Being the only option that allows real zero-downtime operations, I chose the latter option.

    Session-Handling

    The last problem to tackle is the session handling. By default, the session information like logins is limited to one application instance. If every deployment requires the users to login again, zero-downtime will result in zero-acceptance, too. The solution to this problem is clustering the two Tomcat instances.
    This requires a few changes to the application itself. The application must be marked as 'distributable'. The simplest way to achieve this is creating a deployment descriptor in src/templates/war/web.xml:
    <web-app ...>
      <display-name>/@grails.project.key@</display-name>
     
      <!-- Add this line -->
      <distributable />
      ...
    </web-app>
    Besides, clustering must be activated in Tomcat's server.xml:
    <Server port="8005" shutdown="SHUTDOWN">
      <!-- ... -->
      <Service name="Catalina">
        <!-- ... -->
        <Engine name="Catalina" defaultHost="localhost">

          <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
                   channelSendOptions="8">

            <Manager className="org.apache.catalina.ha.session.DeltaManager"
                     expireSessionsOnShutdown="false"
                     notifyListenersOnReplication="true"/>

            <Channel className="org.apache.catalina.tribes.group.GroupChannel">
              <Membership className="org.apache.catalina.tribes.membership.McastService"
                          address="228.0.0.4"
                          port="45564"
                          frequency="500"
                          dropTime="3000"/>
              <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                        address="auto"
                        port="4000"
                        autoBind="100"
                        selectorTimeout="5000"
                        maxThreads="6"/>

              <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
              </Sender>

              <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
              <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
              <Interceptor className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>
            </Channel>

            <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""/>
            <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>

            <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
            <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
          </Cluster>

        </Engine>
      </Service>
    </Server>
    This configuration enables session replication using TCP multicasting. There are alternatives where session information is persisted to disk which would enable failover and recovery from crashes. But for my scenario—just two Tomcat instances on the same machine—direct TCP synchronization seems sufficient.

    Moving from Blue to Green

    Finally, incoming requests have to be routed to the active Tomcat instance. In my setup, that's the duty of haproxy. As described in the documentation, one can configure haproxy to forward incoming requests to either of several backends.
    To simplify the process of deployment and reconfiguration of haproxy, I developed a little Bash script:
    #!/bin/bash
    if [ $# -ne 1 ]; then
      echo "Usage: $0 <war-file>"
      exit 1
    fi

    set -e
    retry=60
    war_file=$1

    current_link=`readlink /etc/haproxy/haproxy.cfg`
    if [ $current_link = "./haproxy.green.cfg" ]; then
      current_environment="GREEN"
      target_environment="BLUE"
      target_service="tomcat6-blue"
      target_port="8080"
      target_webapps="/var/lib/tomcat6-blue/webapps"
      target_config_file="./haproxy.blue.cfg"
    fi
    if [ $current_link = "./haproxy.blue.cfg" ]; then
      current_environment="BLUE"
      target_environment="GREEN"
      target_service="tomcat6-green"
      target_port="8081"
      target_webapps="/var/lib/tomcat6-green/webapps"
      target_config_file="./haproxy.green.cfg"
    fi
    echo "haproxy is connected to $current_environment backend"

    curl --user deployer:supersecret http://localhost:$target_port/manager/undeploy?path=/
    service $target_service stop

    cp --verbose $war_file $target_webapps/ROOT.war
    service $target_service start
    until curl --head --fail --max-time 10 http://localhost:$target_port/; do
        if [ $retry -le 0 ]; then
          echo "$war_file was not deployed successfully within retry limit"
          exit 1
        fi
        echo "Waiting 5 secs for successful deployment"
        sleep 5
        echo "$((--retry)) attempts remaining"
    done
    ln --symbolic --force --no-target-directory --verbose $target_config_file /etc/haproxy/haproxy.cfg
    service haproxy reload
     

    Putting everything together

    Finally, I collected all of the configuration, scripts, and so on into a Chef cookbook, forked from the original Tomcat cookbook. I provide a GitHub repository that helps you setup a virtual machine with Vagrant and the described Tomcat / haproxy configuration.
    git clone https://github.com/andreassimon/zero-downtime.git
    cd zero-downtime
    bundle install
    librarian-chef install
    vagrant up
    Copy your WAR file into the project directory, and deploy it to the virtual machine:
    cp /home/foo/your-war-file.war .
    vagrant ssh
    sudo -i
    deploy-war /vagrant/your-war-file.war
    Now, you can access the virtual machine in your host browser via http://localhost:8080/.