Clustering of the messaging subsystem HornetQ in JBoss AS7 and EAP 6

Nov 24, 2012

Reading Time: 8 Minuten

in akquinet, as7, cluster, eap, failover, high availability, hornetq, jboss, jboss as 7, jms, live backup, load balancing, messaging, redhat

In the recent posts of this series we talked about many different aspects of clustering for the JBoss AS 7 and its quality assured version EAP 6, such as:

Until now, there is one important thing we have not covered yet: clustering of the messaging subsystem. The EAP 6 as well as the AS 7 uses HornetQ as default messaging provider. In this post we want to give an overview about the clustering abilities of HornetQ and explain how to use the various clustering features in combination with the EAP 6 or respectively the JBoss AS 7. We implemented a simple JMS client application to demonstrate the HornetQ clustering abilities.

Overview

As already explained in a previous post about cluster topologies, clustering can mean many different things depending on what you want to achieve. One important thing to achieve is bare computing power i. e. here a higher message throughput. You want to be able to receive and deliver more messages during the same time frame. Another thing is high availability and failover.

Higher message throughput

A higher message throughput is meant with the wording clustering in HornetQ. In a HornetQ cluster all newly received messages can distributed across the whole cluster. A server that receives a message distributes this message by default round robin to itself or to those servers which are connected by a cluster connection. The illustration below shows a possible topology of a symmetric cluster with server-side message load balancing.

HornetQ – symmetric cluster topology

Load balancing can be configured to be a little bit more intelligent, so that messages will only be redirected to servers with corresponding consumers. But that does not completely solve the problem: What if the consumers disconnect while the message is on its way to the server? That is the case message redistribution was invented for. If enabled, the message will then, after a configurable timeout, be redirected to another server with consumers for that message.

Another approach which avoids network overhead is client-side load balancing during the creation of a new connection via the JMS connection factory.

High Availability

A second thing to achieve could be high availability. High availability does not mean that we do not loose any messages. HornetQ guarantees the delivery of durable JMS messages. The data, such as messages or destinations are stored in a file-based journal. What we want to achieve is that producers can still send messages and the consumers will still receive messages if one HornetQ server crashes.

High availability in HornetQ is realized through a live-backup structure, as shown in the illustration below. There is always one live server that processes the current message load. A live server can be backed by a second server. This backup server will not process any messages before the live server crashes and failover occured.

HornetQ – live-backup structure

A backup server can also be backed with as many servers as you want. Those servers are waiting to become a backup server. If the live server crashes, the backup will become live and one of the awaiting-to-become-backup servers will become the passive backup of the new live server. When the previous live server is restarted it will be awaiting to become backup or become backup if there is none. To negotiate the roles with the other servers, every node of the ha-cluster will need a cluster-connection to all the other nodes. So you will need to enable clustering even if you are not using the clustering capabilities in order to achieve a higher throughput.

Only the live server processes messages. When the live server dies, we would expect, that the backup does not only receive new messages, but also delivers the messages, which are already received by the previous live server. The backup knows about the messages by a shared journal. In the upcoming version 2.3 of HornetQ the journal can be shared via replication over the network. However, for now a shared file-system-directory is required for failover. For performance reasons it is recommended to use a block-based protocol like Fibre Channel or HyperSCSI to share the journal directory, instead of a file-based protocol like NFS or SMB/CIFS. According to several threads as this, NFS is not supported or tested and might lead to problems. There are plenty SAN solutions provided by different big companies that will do a good job here. For the demo in this post we will simply put everything on the same machine.

We get a highly available messaging cluster topology with the live-backup structure. But how is a failover of the producers or consumers handled between the live and backup? The participants of the cluster, including live and backup server, send information about their connection details by using ip multicasts. If ip multicasts can not be used it is also possible to use a static configuration of initial connections. After an initial connection, the client will also be informed about the topology. If the current connection is stale, the client establishes an new connection to another node.

Setting up our HornetQ cluster-environment

Now that we know the basics, we want to show how the two clustering mechanisms explained above can be configured. We use the EAP 6 version and setup a cluster in standalone mode. You could also use the AS 7 and built the JBoss AS 7.1.2 tag by yourself.

Setting up multiple HornetQ servers

Let us start with two fresh installations of the application server. As already mentioned, we will use the standalone mode with the standalone-full-ha.xml configuration. The configuration is already prepared for clustering, including the HornetQ configuration for server-side load-balancing between multiple HornetQ servers.

The most important configuration is that the messaging system is configured to be clustered, as shown in the following listing.

&lt;subsystem xmlns=&quot;urn:jboss:domain:messaging:1.3&quot;&gt;
   &lt;hornetq-server&gt;
      &lt;clustered&gt;true&lt;/clustered&gt;
      &lt;cluster-user&gt;clusteruser&lt;/cluster-user&gt; 
      &lt;cluster-password&gt;cluster-secret&lt;/cluster-password&gt;
      &lt;!-- ... --&gt;
   &lt;/hornetq-server&gt;
&lt;/subsystem&gt;

As you already guessed all cluster-nodes need to be configured with the same cluster-user and cluster-password. The cluster user is by default HORNETQ.CLUSTER.ADMIN.USER.

The next configuration is how the nodes of the cluster can connect to each other. In the deafult configuration, the servers use ip multicasts to discover other nodes. Each server propagates information about server connectors to other servers. The way, how a server broadcasts the connector is configured via the broadcast-group, as shown in the following listing.

&lt;subsystem xmlns=&quot;urn:jboss:domain:messaging:1.3&quot;&gt;
   &lt;hornetq-server&gt;
      &lt;broadcast-groups&gt;
         &lt;broadcast-group name=&quot;bg-group1&quot;&gt;
            &lt;socket-binding&gt;messaging-group&lt;/socket-binding&gt;
            &lt;broadcast-period&gt;5000&lt;/broadcast-period&gt;
            &lt;connector-ref&gt;netty&lt;/connector-ref&gt;
         &lt;/broadcast-group&gt;
      &lt;/broadcast-groups&gt;
      &lt;!-- ... --&gt;
   &lt;/hornetq-server&gt;
&lt;/subsystem&gt;

The server receives the connector information of other nodes over the discovery group. This discovery group will be referenced by the cluster-connection to establish connections to other servers of the group. The following listing contains the default configuration:

&lt;subsystem xmlns=&quot;urn:jboss:domain:messaging:1.3&quot;&gt;
   &lt;hornetq-server&gt;
      &lt;discovery-groups&gt;
         &lt;discovery-group name=&quot;dg-group1&quot;&gt;
            &lt;socket-binding&gt;messaging-group&lt;/socket-binding&gt;
            &lt;refresh-timeout&gt;10000&lt;/refresh-timeout&gt;
         &lt;/discovery-group&gt;
      &lt;/discovery-groups&gt;
      &lt;!-- ... --&gt;
   &lt;/hornetq-server&gt;
&lt;/subsystem&gt;

Now, with those connections, the load can be server-side distributed between the cluster nodes. There are many other configuration parameters for cluster-connections, such as reconnect-attempts or max-hops in a chain-topology. For our cluster, the defaults are sufficient and we can now start our server nodes.

Configuration for high availability

Let us setup a backup server for each cluster node, before we try our cluster out. We start again with two fresh installations of the application server. Both live and backup will need to have clustering enabled as explained in the previous subsection.

High availability in HornetQ requires a shared journal. To keep it easy we will not be using a SAN for the journal (this will not make a difference in the configuration). Instead the live server and its backup will be running on the same machine and share a directory.

The first configuration makes the journal persistent and tells the server that it will share the journal directory with other servers as shown in the listing below:

&lt;subsystem xmlns=&quot;urn:jboss:domain:messaging:1.3&quot;&gt;
   &lt;hornetq-server&gt;
      &lt;!-- ... --&gt;
      &lt;!-- first of all we want to use a journal on disk (this is important) --&gt;
      &lt;persistence-enabled&gt;true&lt;/persistence-enabled&gt;
  
      &lt;!-- the journal will be shared by multiple servers --&gt;
      &lt;shared-store&gt;true&lt;/shared-store&gt;
 
      &lt;!-- a directories which can be accessed by both, live and backup --&gt;
      &lt;journal-directory path=&quot;path/to/journal&quot; relative-to=&quot;user.home&quot;/&gt;
      &lt;bindings-directory path=&quot;path/to/bindings&quot; relative-to=&quot;user.home&quot;/&gt;
      &lt;large-messages-directory path=&quot;path/to/large-message&quot; relative-to=&quot;user.home&quot;/&gt;
      &lt;paging-directory path=&quot;path/to/paging&quot; relative-to=&quot;user.home&quot;/&gt;
  
      &lt;journal-file-size&gt;102400&lt;/journal-file-size&gt;&lt;!-- you may tune this --&gt;
      &lt;journal-min-files&gt;2&lt;/journal-min-files&gt;&lt;!-- you may tune this --&gt;
 
      &lt;!-- When we shut down the live, we want failover to kick in. --&gt;
      &lt;failover-on-shutdown&gt;true&lt;/failover-on-shutdown&gt;  
      &lt;!-- ... --&gt;
   &lt;/hornetq-server&gt;
&lt;/subsystem&gt;

Note that the configuration is the same for live and backup. The configuration contains the details about the journal, such as the location. The last element enables failover on server shutdown. Normally there would not be failover if you manually shut the live down.

As a next step, it is necessary to decide which server should be the backup server. The configuration for a backup is the same as for an “awaiting-to-become-backup”, as shown in the listing below. Note: A server that has been marked as the backup needs the live server for successful start-up.

&lt;subsystem xmlns=&quot;urn:jboss:domain:messaging:1.3&quot;&gt;
   &lt;hornetq-server&gt;
      &lt;backup&gt;true&lt;/backup&gt;&lt;!-- true for backup, false for live --&gt;
      &lt;!-- ... --&gt;
   &lt;/hornetq-server&gt;
&lt;/subsystem&gt;

In order that the client can failover to the backup, without another lookup of the ConnectionFactory, it is necessary that the initial ConnectionFactory knows the other cluster nodes. Therefore, the ConnectionFactory needs to reference the discovery group and it must be marked with ha, as shown in the listing below:

&lt;subsystem xmlns=&quot;urn:jboss:domain:messaging:1.3&quot;&gt;
   &lt;hornetq-server&gt;
      &lt;!-- ... --&gt;
      &lt;jms-connection-factories&gt;
         &lt;connection-factory name=&quot;RemoteConnectionFactory&quot;&gt;
            &lt;discovery-group-ref discovery-group-name=&quot;dg-group1&quot;/&gt;&lt;!-- This only works for clients that can be reached by multicasts from the servers --&gt;
            &lt;entries&gt;
               &lt;entry name=&quot;java:jboss/exported/jms/RemoteConnectionFactory&quot;/&gt;
            &lt;/entries&gt;
            &lt;ha&gt;true&lt;/ha&gt;&lt;!-- important for automatic client failover --&gt;
            &lt;reconnect-attempts&gt;-1&lt;/reconnect-attempts&gt;
         &lt;/connection-factory&gt;
      &lt;/jms-connection-factories&gt;  
      &lt;!-- ... --&gt;
   &lt;/hornetq-server&gt;
&lt;/subsystem&gt;

By default the connection does not try to reconnect automatically. This can be configured by the reconnect-attempts element. In the example above, the value -1 is used for infinite retries. Maybe not the best configuration for production.

Try out the cluster with the demo application

We provide a little demo application in the jms-cluster-example directory of our github project jbosscc-as7-examples. The server-side application features a message-driven-bean that subscribes a queue. The payload of received messages is written to the console of the application server that processes the message. The client application contains a message producer for the queue.

The repository also contains the configuration for a cluster with two lives and their two backups. The configuration has been tested with the EAP 6 but should also work well with the JBoss AS 7.1.2. Simply setup four fresh instances and copy the contents of the standalone/configuration/ folders into your JBoss installation directories. Start the instances on the same machine with the following commands:

live1  : ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jl1
backup1: ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jbu1
live2  : ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jl2
backup2: ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jbu2

Note: They will bind to 127.0.0.1 other IP adresses may not work with the provided configurations.

The demo application can be compiled via mvn package. After the servers have been started and the message driven bean has been deployed on the live servers, you can use the client application to produce messages. Simply start it with mvn exec:exec.

The client connects to the first live server live1 and starts to produce messages. You can observe load-balancing by console-output of the message driven bean. The messages will distributed between the live servers. When you kill the live servers or just one of them you can observe failover, which is accompanied with some harmless warnings. If you then start the lives or just one live again, you can see its message driven bean not only consuming new messages but also messages the backup received when the server was down. A remote consumer would receive messages even if the live is down. However, in our example the message driven bean is deployed on the live server and it will also die, if the jvm process of the live server is killed.

Summary

The clustering mechanisms described in this post are very powerful but can also become very complex. They are stable but you can easily make your system unreliable if you configure it wrong or do things like forgetting to close sessions. So it needs to be handled with care.

If it is finished early enough, HornetQ 2.3 will be included into EAP 6.1. That would mean you could share the journal without a SAN. Before that we have to use a SAN for high availability. But you could also use a “higher-message-throughput” cluster without any live-backup pairs but with an ha-enabled connection-factory. Clients will always be able to send and receive messages but some single messages might could get stuck in the journal of a crashed server. These messages would not be delivered right away.

The next and last post of this blogpost series will be about JGroups and Cloud issues. Thank you for your feedback on the last posts and if you have got any questions or comments to this post feel free to comment or write an e-mail to

heinz.wilming (at) akquinet.de
immanuel.sims (at) akquinet.de

akquinet as7 cluster eap failover high availability hornetq jboss jboss as 7 jms live backup load balancing messaging redhat

Comments

20 responses to “Clustering of the messaging subsystem HornetQ in JBoss AS7 and EAP 6”

Bhaskar

21. November 2016

Hi,
First of all thanks for the wonderful post.

I am working on developing switchyard applications in fuse service works and implementing the HA for these services. Most of the applications and the interactions is through JMS queue using HornetQ. For messaging HA we have setup the live1,backup1,backup2 cluster configurations, each of the switchyard application runs in active/passive mode which is working fine.

From the solution and the project complexity/constraints point of view we dont want to do node level HA but at service level.
1) There are many services( switchyard applications) installed in one fsw node (one jboss eap instance). What we need is selective failover of a particular service instead of the entire node. In this case, how do we identify the failure of a particular service( i know the question sounds trivial, anything that FSW inherently supports this) and fail over only those service to the secondary node.

2) May be converting all our switchyard applications to singleton-deployed EJB could be an idea, but is a constraint and loosing features of switchyard at this moment.

All of our services are kind of of event processing and forwarding to next service.
Please let me know your views on this and any suggestions you have.

Thanks in Advance
john

4. August 2016

So each live/backup share same queue or each has its own queue in your example? Is there any way to share ONE queue between multiple Jboss live/backup server pairs? thanks
1. Immanuel Sims
  
  4. August 2016
  
  I’m not sure what you mean with ‘sharing a queue’. Anyway: The backup will be able to deliver every message that was given to the live and has not been delivered yet and vice versa. That’s the HA scenario. Note that you must not send/consume messages to/from a backup as long as the live is up and running.
  
  Clustering without HA is a bit different. Every server in the cluster can forward messages to other servers in the cluster. By thus a client connected to server A can get a message that was originally given to server B. But once a server passes out all the messages that it has in its journal will be stuck until the server comes up again.
fabrizio

3. March 2016

Hi,
i have an ERROR when insering discovery-group-ref into RemoteConnectionFactory, it seems it is an illegal element!! and the server cannot start!!
Balasubramanian

18. February 2016

really much helpful information about hornetq cluster setup… thanks a lot…
FTP

28. October 2014

Thanks a Lot for a great tutorial.
Would like to know how to setup Live 1 and Live 2 when they are different host. We have JBOSS EAP 6 and the default standalone_full_ha.xml did not have the broadcast-groups etc xml but we have added.
Things work good but i am not seeing Messages are getting balanced.
OG

11. September 2014

How could we set the Hornetq path variables via cli ?
1. Immanuel Sims
  
  15. September 2014
  
  Like this:
  
  /subsystem=messaging/hornetq-server=default/path=journal-directory:add(path=some/directory/with/journal.file,relative-to=user.home)
rafaeltuelho

28. January 2014

Hi.

You used the standalone mode is your example. But what about Domain Mode? How can I chose a backup server and set the element? In domain the configuration os the subsystems is centralized in domain.xml configuration.

Thanks in advance.
1. Immanuel Sims
  
  5. February 2014
  
  Instead of multiple different standalone.xml files you would need to set up different profiles where the standalone.xml files differ.
Daniel Nuss

4. November 2013

Hi! Thank you for your great posts regarding JBoss clustering. I find it quite hard to find good documentation on this topic and your posts helped me a lot to setup a clustered JBoss in domain mode. Could you shortly explain where to configure the ha (backup server) of hornetq in domain mode? It’s quite clear to me when running in standalone-mode, since there is a different config for each of the servers then. But how to do the same in domain mode, when all the configuration is done in one domain.xml?
Thank you a lot for your help!
1. Immanuel Sims
  
  22. November 2013
  
  I think you would have to use different profiles for the servers. I. e. one profile for the live and one profile for the backup.
  1. Daniel Nuss
    
    22. November 2013
    
    Ah, this seems to be right. The answer is so simple and still i did not think of this solution. Thank you!
Fathil Ismail

17. September 2013

Hi. im trying to find a solution for the .jsp web session expired when the live server jump to another backup server.

If i apply this configuration, am i will be able to keep the web session alive when the jump to another backup server?

thanks in advance.
Eyal

25. February 2013

Hi there. Thanks for a very clear and detailed post on this rather cumbersome topic.
Our team here has tried to test the new Replication feature (available in the latest nightly of Jboss 7.2) and we kind of hit a wall. It doesn’t seem to work well at all, and messages are not being received after the live server goes down.
Did anyone have success in using Replication?
BTW – we’ve consulted with RedHat and they advised us the Replication will be available in Jboss 6.1 EAP only as a ‘technical preview’ – and not as a production feature.
Sam

17. January 2013

Hi guys,
Great post! Very clear and to the point.

I have a question running your maven example…

When I run “./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jl1”, am I running from $JBOSS_HOME/bin. When you say to copy the various configuration directories(live1, backup1, live2, backup2) to my servers, do I copy them directly under $JBOSS_HOME/standalone? I have a standalone-full-ha.xml directly under standalone so I’m trying to figure out how the “./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jl1” command knows to use the one under the “live1” directory.
1. Sam
  
  17. January 2013
  
  I actually was able to run all four instances now. I made 4 copies of the standalone folder and renamed them to live1, live2, backup1, backup2 and replaced the configuration directory from the respective configuration directory from your repo. I then ran the following command ./standalone.sh -Djboss.server.base.dir=../live1/Standalone –server-config=standalone-full-ha.xml -Djboss.node.name=jl1. I did the same for the other 3 instances and it started working.
  1. Immanuel Sims
    
    18. January 2013
    
    I’m sorry, that part is a little bit ambigious. It was meant, to have four different jboss installations. But your solution should work as well.
JGroups & Cloud issues when clustering the EAP 6 – AS 7 « akquinet-blog

29. November 2012

[…] Clustering of the messaging subsystem HornetQ in JBoss AS7 and EAP 6 […]
JavaPins

29. November 2012

Clustering of the messaging subsystem HornetQ in JBoss AS7 and EAP 6 « akquinet-blog…

Thank you for submitting this cool story – Trackback from JavaPins…