Using JBoss Messaging in a firewalled environment : the toll road

Are you using JBoss Messaging in a firewalled environment? Do your long running JMS consumers fail to receive JMS notifications after some time of inactivity? If you answer yes to any of these questions you might be interested in our field report where we describe how we diagnosed and fixed such issues with JBoss Messaging and long running JMS consumers within a Java Swing Client.

The Scenario

Using Our customers application is a EJB based server application which runs on a Java EE application server. The clients are Swing based and use RMI to interact with the server. Due to the nature of the application it is common that the data displayed in one client is concurrently modified by another client. Therefore the application uses JMS messages to notify all interested parties about data changes. The clients then are able to reload the changed data and update their view.

System Architecture
System Architecture

This approach worked well until we migrated the application from another Java EE application server to the JBoss Enterprise Application Platform. Users kept reporting that after some time of inactivity the update feature in the clients did not work anymore. So we began to investigate why this was happening.

First Suspects

We suspected JBoss Messaging to be guilty. We could confirm that by examining the message count of the specific queue.

Examining the Message Count of a JMS queue or topic

Point your Browser to the URL of the jmx-console of your jboss installation (e.g: http://localhost:8080/jmx-console). Then, find the MBean that represents your JMS queue/topic. In case you followed the JBoss provided examples, it is jboss.messaging.destination:name=MyName,service=Queue. Look at the MessageCount attribute of that MBean. The message count of a given JMS queue should always go down to zero, especially if the system is idle. A non-zero message count might indicate that the messages are not consumed by the JMS clients anymore.

As soon as the client was broken it seemed that it was unable to receive JMS notifications anymore. We also added some logging statements to the client side code that is executed after a JMS message has been received to confirm that. Besides that the client still worked fine and was able to access the server. A manual refresh of the view also displayed the correct data.

With this initial observation we tried to reproduce the issue in our test and development environment. Unfortunately we were not able to reproduce it in our development environment. Luckily the test environment exhibited exactly the same behavior. Looking at our network diagram we soon had an idea what could cause the difference: A firewall that is located “in front” of the test server is the major difference between our development and the test environment as you can see comparing image 1 and 2. So we kept investigating in that direction.

Development Environment Topology
Development Environment Topology

Quality Environment Topology
Quality Environment Topology

Going deep

To further isolate the problem we followed the divide and conquer approach. We put our JBoss stack aside for a while and wrote a simple Java based client/server application to see if we could find any network related issues. On startup the client connects to the server and sends a text to the server. The server simply returns the text to the client. After successfully receiving the reply the client waits some time and then sends the next message. The connection that was established initially is kept open the whole time. We increased the wait interval between two client/server interactions exponentially. This was done to find out if the firewall closes any of our connections after some time of inactivity.

Interactions of the test client / server
Interactions of the test client / server

As suspected we could see that after approximately one hour the echo client/server did not work anymore. Obviously the firewall dropped the connection. Interestingly both client and server were not aware that the connection was gone. Could that happen to JMS too?

A look at the usual resources (Google, JBoss wiki, JBoss forum) did not offer obvious solutions so we kept on digging. Next we used wireshark to have a look at what was going on at network level.

Wireshark is a free packet analyzer computer application. It is used for network troubleshooting, analysis, software and communications protocol development, and education. Originally named Ethereal, in May 2006 the project was renamed Wireshark due to trademark issues. (cited from Wikipedia).

Wireshark turned out to be a real helpful tool. Initially we needed some time to figure out how to use that tool properly. On the long term that time was well invested because Wireshark got us a lot of understanding what was going on at the network level.

We could see that all connections were opened by the client. This makes sense in a firewall scenario as firewalls usually forbid opening connections from the server to the outside world. However it was unclear to us what would happen if a client lost its connection to the server. If in that case the server wanted to push some JMS messages to the client it needed a way to open a new connection.

JBoss Remoting

The JBoss Remoting documentation says “JBoss Messaging uses JBoss Remoting for all client to server communication. For full details of what JBoss Remoting is capable of and how it is configured please consult the JBoss Remoting documentation.”. Therefore we had a look at the JBoss Remoting documentation.

In this documentation you can find detailed information about the transports that are used within JBoss Messaging.
JBoss Messaging uses the so called bisocket transport:

The bisocket transport, like the multiplex transport, is a bidirectional transport that can function in the presence of restrictions that would prevent a unidirectional transport like socket or http from creating a server to client push callback connection. … For example, security restrictions could prevent the application from opening a ServerSocket on the client, or firewall restrictions could prevent the server from contacting a ServerSocket even if it were possible to create one.

The bisocket transport uses a control connection that is established between client and server. As soon as the server needs to push some JMS messages to the client it sends the request to open a new data connection via the control connection. The client that receives this request creates a new connection to the server. The server side waits for this new incoming client connection and associates it to the JMS session. This ensures that the server can establish new data connections to the server without actually acting as tcp client (See here for a detailed description and visualization of the mechanism).

Now it was obvious to us that dropping this control connection could render the JMS system useless. Luckily the jboss remoting documentation described a feature that seemed quite useful to us. The JBoss remoting bisocket transport can be configured using the properties PING_FREQUENCY and PING_WINDOW_FACTOR.

Setting ping frequency to a milliseconds value causes the server side to send a ping packet once every interval. If the client side does not receive a ping packet within PING_FREQUENCY * PING_WINDOW_FACTOR it assumes the control connection to be broken and opens a new control connection. These properties looked really helpful so we decided to experiment with them. To change the values you need to edit the file JBOSS_HOME/server/default/deploy/jboss-messaging.sar/remoting-bisocket-service.xml and search for the MBean jboss.messaging:service=Connector,transport=bisocket.

We were puzzled to see the configuration jboss messaging uses:

<attribute 
   name="pingFrequency" isParam="true">214748364</attribute>
<attribute 
   name="pingWindowFactor" isParam="true">10</attribute>

These settings effectively render the ping feature useless. Ignoring the warning text that is in the file we set the values to some more appropiate values:

...
 <!-- There should be no reason to change these 
         parameters - warning!
          Changing them may stop JBoss Messaging 
         working correctly --> 
...
<attribute 
     name="pingFrequency" isParam="true">1200000</attribute>
<attribute 
    name="pingWindowFactor" isParam="true">2</attribute>

As our main focus was to create some traffic on the connection, we chose to use this rather large values (20 minutes). In case of the loss of the control connection it would take about 40 minutes for the client to notice and re-establish the control connection. As we suspected the firewall to have an inactivity timeout of about 60 minutes this seemed fine to us.

Restarting the server with the changed configuration exposed a new issue. The JMS subsystem didn’t start up properly. Instead we could see the following Message in the Logfile and on the Console:

ERROR [ConnectionFactory] Parameter pingFrequency has a different value ( 1200000) to the default shipped with this version of JBM
(214748364). There is rarely a valid reason to change this parameter value. If you are using ServiceBindingManager to supply the remoting
configuration you should check that the parameter value specified there exactly matches the value in the configuration supplied with JBM.
This connection factory will now not deploy. To override these checks set 'disableRemotingChecks' to true on the connection factory.
Only do this if you are absolutely sure you know the consequences.

So we also had to add the mentioned setting in the file JBOSS_HOME/server/default/conf/jboss-messaging.sar/connection-factories-service.xml.

With this last change we were able to restart the server again. Using wireshark we could see the ping packets going from server to client. We left our client open overnight and were very happy to see that the connection was still up the next day. More than that, JMS was still working the next day. Anyone can imagine our disappointment to find out that JMS was dysfunctional again after about one hour.

Some more investigation then revealed that we only had solved half of the puzzle yet. The control connection did not cause any trouble anymore. However the data connection suffered from the same timeout issue. Pushing some JMS messages from the server to the client caused the creation of a data connection. If after that the system was left idle for more than one hour the data connection also was dropped by the firewall. Obviously the ping we enabled was only effective for the control connection not the data connection. So we further dig in the jboss remoting source code and documentation. We found another property timeout which can be used to enable the SO timeout on the client side. This causes blocking methods like read() to timeout after the specified interval. The method throws an Exception in that case.
The socket remains valid in case of that particular exception. In consequence the jboss remoting code just ignores that exception and continues to read from that socket. Luckily for us you can influence that behavior by setting the property continueAfterTimeout=false. This causes the connection to be closed after such a timeout occurs.

As we confirmed before that the control connection is stable we were confident that this change might be the missing piece. So we added the following configuration to the bisocket configuration (JBOSS_HOME/server/default/deploy/jboss-messaging.sar/remoting-bisocket-service.xml):

<attribute 
    name="timeout" isParam="true">1800000</attribute>
<attribute 
    name="continueAfterTimeout" isParam="true" >false</attribute>

Again we used wireshark to verify that the data connection were closed after the timeout (30 minutes) and that JMS was functional beyond the timeout. In this case JMS was able to open another data connection via the control connection mechanism explained before.

As everything looked good we were set for another long running test. Finally we were able to use our client, leave it idle overnight and again use it the next day.

Conclusion

The most important point is that we fixed the issue. The fix is currently under testing and will soon go into production. We think that the remoting configuration that jboss messaging is using by default is quite firewall unfriendly. The fact that JBoss Messaging actively enforces this default settings (see disableRemotingChecks above) discourages any experiments with the settings. This is a pity as JBoss Remoting turns out to be a powerful library that contains all the options necessary to fix our problem.

Looking at the JBoss forum it seems that other people had similiar issues. However it seems that there is no official JBoss approved solution. We would like to see more appropiate configuration defaults and/or some official information how to tweak the settings for firewall environments. Until then we hope that this field report may help other users who face the same problem.

Wireshark was a really helpful tool during the process. Java EE usually relieves us from the burden to deal with all the low level details. Sometimes you need to get your feet wet. It is good to have a powerful tool like Wireshark that protects you from getting thoroughly soaked.

Summary

In this report we explained how we investigated a problem where long running JMS consumers with a Java Swing client lost their connection with the JMS server. The connection loss was caused by a firewall that closed connections silently after one hour of inactivity. We found some JBoss Messaging/JBoss Remoting configuration changes that allowed us avoid the problem. We did the following changes to our installation

File JBOSS_HOME/server/default/deploy/jboss-messaging.sar/remoting-bisocket-service.xml

Before change

<attribute 
    name="pingFrequency" isParam="true">214748364</attribute>
<attribute 
    name="pingWindowFactor" isParam="true">10</attribute>

After change

<attribute 
    name="timeout" isParam="true">1800000</attribute>
<attribute 
    name="continueAfterTimeout" isParam="true" >false</attribute>
<attribute 
    name="pingFrequency" isParam="true">1200000</attribute>
<attribute 
    name="pingWindowFactor" isParam="true">2</attribute>

File JBOSS_HOME/server/default/deploy/jboss-messaging.sar/connection-factories-service.xml

Before change

<mbean 
     code="org.jboss.jms.server.connectionfactory.ConnectionFactory"
      name="jboss.messaging.connectionfactory:service=ConnectionFactory"
      xmbean-dd="xmdesc/ConnectionFactory-xmbean.xml">
      <depends optional-attribute-name="ServerPeer">
        jboss.messaging:service=ServerPeer
      </depends>
      <depends optional-attribute-name="Connector">
       jboss.messaging:service=Connector,transport=bisocket
      </depends>
      ...
</mbean>

After change

<mbean 
  code="org.jboss.jms.server.connectionfactory.ConnectionFactory"
  name="jboss.messaging.connectionfactory:service=ConnectionFactory"
  xmbean-dd="xmdesc/ConnectionFactory-xmbean.xml">
  <depends optional-attribute-name="ServerPeer">
    jboss.messaging:service=ServerPeer
  </depends>
  <depends optional-attribute-name="Connector">
   jboss.messaging:service=Connector,transport=bisocket
  </depends>
  ...
  <attribute name="DisableRemotingChecks">
  true
  </attribute>
</mbean>

10 thoughts on “Using JBoss Messaging in a firewalled environment : the toll road

  1. Bravo! Great job of analysis.

    I was aware of your solution, and, to be honest, I didn’t see why it worked. Your blog has sent me back to think about it, and now I understand. When the timed out socket closes on the client side, one last byte is written to the server side. When the server goes to send some data to the client, it checks for that byte, and, if it’s present, it closes the old connection and creates a new one. The embarrassing fact is that I’m the one that installed that test for a final byte. πŸ™‚

    Through some principle of synchronicity, this issue has come up recently multiple times, after lying dormant, apparently, for years, and we proposed two solutions:

    1. Set the “socket.check_connection” parameter to true. With this setting, a connection is tested by trying to send a round trip byte. If the test fails, then the connection is discarded. Of course, this solution increases network traffic and latency.

    2. Set the “numberOfCallRetries” parameter to a value greater than 1. With this setting, a stale connection might get used, but when the write() fails, a new connection is created.

    The latter solution is the one we’ve recommended, as described in the JBossMessaging JIRA issue JBMESSAGING-1733 “Update settings in remoting-bisocket-service.xml” (https://jira.jboss.org/jira/browse/JBMESSAGING-1733). Note that this solution depends on some changes in the most recent release of JBossRemoting, 2.2.3.SP1, which will ship with the next cumulative patch release of EAP 4.3. Your solution has the advantage of not requiring an update. And it’s elegant.

    One point. Your statement “Luckily for us you can influence that behavior by setting the property continueAfterTimeout=true.” isn’t right. In fact, you set “continueAfterTimeout” to “false”. I’ll just mention that “false” is the default value, so no setting is required.

    In any case, thank you for the idea (wish I’d thought of it), and congratulations on slogging through the Mirkwood Forest that is the bisocket transport. πŸ™‚

    -Ron Sigal, JBossRemoting developer

    1. Hi Alexandre,

      please understand that we cannot name the firewall vendor (and model) to protect our customer for potential attacks ;).

  2. Hi,

    Great blog! Thanks for the detailed explanation. However with the clarity in PingFrequncy and window, can you pls explain what are validatorPingPeriod, validatorPingTimeout and callBackTimeout Parameters. With so many timeout paramaters, it gets messier. Also what is clientLeasePeriod? We are using JBoss-messaging 2.5.4 SP1.

    Regards,
    Vishal Pore

  3. Pingback: Server Hilfe
  4. Just wanted to say that your blog article single-handedly solved a production problem for us and we could probably not have fixed it without your help!

    Many thanks!

  5. all the time i used to read smaller articles that as well clear
    their motive, and that is also happening with this post which I am
    reading at this time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s