3

Closed

Schedule Adapter seems to cause host to hang in Stopping State

description

I am using the latest Version 3.0 Aug 10 2010, Beta schedule adapter on a BizTalk 2010 environment.
 
I've set up the adapter to run in its own 64 bit host called UtilityHost. I am having a problem where, once a schedule has run for the first time, the UtilityHost won’t stop cleanly and sits there in a Stopping state and never stops. I have to kill the process to make it stop.
 
This problem doesn't occur on a couple of development machines so could be environment related so I'm looking for pointers on where to look next. The problem happens on Windows 2008 R2 Enterprise but no Windows 2008 R2 Standard (so far).
 
I’ve tried a few things:
  1. Changed Utility Host to 32 Bit from 64 bit– no change – indicates it’s not dependent on 64 or32 bit hosts
  2. Set the Receive Pipeline to Pass Thru – no change – indicates it’s not related to the custom pipeline
  3. Stopped the Schedule receive location before stopping the Utility Host – no change – indicates that there could be state in the host that is causing the problem
  4. Reduce number of worker threads in the receive host– no change
     
    Any further suggestions?
Closed Mar 4 at 6:38 PM by sandro_asp

comments

Graphain wrote Jun 26, 2012 at 5:49 AM

I encountered this in version 3.0 on 31/8/11 as well and for whatever reason it appears I didn't post the fix. I'm not sure if this is still in version 4.0 (and/or still useful to anyone) but here it is:

In ScheduledEndpoint.cs go to the ControllerEndpointTask() method. In this method you will see:

this.EndpointTask();
GC.Collect();
this.controlledTermination.Leave()

Remove the last of these lines (the "this.controlledTermination.Leave()" line).

Explanation:
  • This call causes the controlledTermination counter to decrement too many times.
  • This in turn freezes the host instance during stop/restart until the service stop timeout (which was about 300 seconds on my PC).
  • The reason Leave() is not needed here is that the Batch type(s) used perform their own controlledTermination.Leave() on EndBatchComplete (decompile the BizTalk code if you want to verify).
  • The existing batches make this Leave() call without performing their own Enter() call (i.e. they assume the controlledTermination is open).

wrote Jun 26, 2012 at 5:50 AM

Graphain wrote Jun 26, 2012 at 5:58 AM

(Additional details)
I am using BizTalk 2010 and a 64-bit host.
It looks like this code remains unchanged in the v4.0 adapter so will probably continue to be an issue.

This has been working in production for me. However, I last looked at this about a year ago, and if I was to look at it for the first-time again I would do some checks to make sure that the batch completion code (and therefore the controlledTermination.Leave) is always going to be called (otherwise we would have the opposite problem of a controlledTermination that was over-incremented rather than over-decremented). For instance, check if the scheduler threw an exception before initialising the Batch / if the Batch can terminate without calling the controlledTermination.Leave (e.g. due to no subscribers). Like I said this is working for me, but fresh eyes see these as potential issues that I can't recall whether I verified over a year ago.

wrote Jun 26, 2012 at 7:32 AM

sandro_asp wrote Jun 26, 2012 at 2:50 PM

Thanks for your feedback and explanation.

I’m also not sure if this bug still exists in this version. However I’m working in a new version and I will take this feedback into consideration.

Graphain wrote Nov 16, 2012 at 12:28 AM

Can confirm that I had to reapply the same fix to the latest version 4.0 (commenting out the "this.controlledTermination.Leave()").

The fix seems to work just as well and I have not experienced any issues as a result.

Have also tested the "no subscribers" concern I had in a previous comment and ruled that out as an issue.

Would recommend you apply the fix to the core solution.

wrote Feb 14, 2013 at 7:13 PM

wrote Mar 4 at 6:38 PM

wrote Mar 4 at 6:38 PM