Alaska Software Inc. - Index file lock fails - application hangs
Username: Password:
AuthorTopic: Index file lock fails - application hangs
Thomas Braun
Index file lock fails - application hangs
on Tue, 01 Dec 2009 11:07:53 +0100
Hi,

I'm currently struggling with intermittent WAA application hangs where the
only option is to kill the entire WAA1SRV process.

First I suspected my use of Phil Ides regex package to be causing this, but
after replacing it with plain Xbase++ code, the hangs did not vanish.

Finally, by using Sysinternals procmon, I found that there seems to be an
issue with index file locking because as soon as the WAA hang starts,
procmon reports a failed filelock operation on a CDX file, it seems to me
as if there is some kind of deadlock situation.

It dows not alwasy fail on the same cdx file and I already deleted all cdx
files and re-created them.

Th server still runs on Windows 2000 (for about 4 years now) and Xbase++
1.82.294

Any ideas what I could do to get to find the actual cause for this?

regards
Thomas
Thomas Braun
Re: Index file lock fails - application hangs
on Wed, 02 Dec 2009 11:57:09 +0100
Thomas Braun wrote:

> Any ideas what I could do to get to find the actual cause for this?
> 

I have now set the lock retry to 1000 and so far there has been nomore
lockup.

But I'm still wondering why this shows up now after years of running the
application without any locking issues.

Thomas
Andreas HerdtRe: Index file lock fails - application hangs
on Thu, 03 Dec 2009 09:58:51 +0100
Hi Thomas,

If the lock retry for the CDXDBE makes the difference, can
you please post what your settings have been? The default
value for the CDX is 100000. On large settups people tend to
use 1000000 or much more.


Thomas Braun schrieb:
> Thomas Braun wrote:
> 
>> Any ideas what I could do to get to find the actual cause for this?
>>
> 
> I have now set the lock retry to 1000 and so far there has been nomore
> lockup.
> 
> But I'm still wondering why this shows up now after years of running the
> application without any locking issues.
> 
> Thomas


   Andreas Herdt
   Alaska Software
Thomas Braun
Re: Index file lock fails - application hangs
on Thu, 03 Dec 2009 18:10:48 +0100
Andreas  Herdt wrote:

> If the lock retry for the CDXDBE makes the difference, can
> you please post what your settings have been?

I did not make a difference (I set it to 1000) - it happened again today.

> The default value for the CDX is 100000. On large settups people tend to
> use 1000000 or much more.

Hmm - what exactly happens if the lock retry actually times out?

So you suggest to increase the value instead of decreasing it?

And how does this help in case of such a deadlock?

Thomas
Andreas HerdtRe: Index file lock fails - application hangs
on Fri, 04 Dec 2009 10:06:43 +0100
Hallo Thomas,

>> The default value for the CDX is 100000. On large settups people tend to
>> use 1000000 or much more.
> 
> Hmm - what exactly happens if the lock retry actually times out?
> 
> So you suggest to increase the value instead of decreasing it?
> 
> And how does this help in case of such a deadlock?

To be honest, my posting was slightly off topic. I have posted
the default values because people tend to choose too small
timeout values. When a timeout occurs, then a corresponding
runtime error is posted. Even a very large number does not cause
a deadlock. This just increases the time until a timeout occurs.

For the time being we know very little about this issue since
we do not even know what part of the WAA is encountering a
deadlock. This is even more confusing since you tell us that
the application was running since years without this kind of
trouble.

 From 1.82 to 1.90 and 1.90 SL1 we have resolved some issues that
can result in deadlock situations. However, this is of no help
because 1.82 went pretty well until this point of time.......
Very confusing.

I would guess that the reason is to be found with your environment.
Did anything change somewhere in your network? Anything that
changed could be an indication (automatic updates, virus scanners
somewhere, .... )

A desperate measurement could be to restart the operating system
if it was continuesly running during the last 4 years.

All this is not of very big help, I fear........

   Andreas Herdt
   Alaska Software

--------------------------------------------------------------------

Technical Support:      support@alaska-software.com

News Server:            news.alaska-software.com
Homepage:               http://www.alaska-software.com
WebKnowledgeBase:       http://www.alaska-software.com/kbase.shtm

Fax European Office:    +49 (0) 61 96 - 77 99 99 23
Fax US Office:          +1 (646) 218 1281
--------------------------------------------------------------------
Thomas Braun
Re: Index file lock fails - application hangs
on Fri, 04 Dec 2009 11:23:11 +0100
Andreas  Herdt wrote:

> To be honest, my posting was slightly off topic. I have posted
> the default values because people tend to choose too small
> timeout values. When a timeout occurs, then a corresponding
> runtime error is posted.

Ok - that is waht I thought.

> For the time being we know very little about this issue since
> we do not even know what part of the WAA is encountering a
> deadlock.

Well - as far as I can tell it must be a part of my package code since it
is one of the systems cdx files.

> Very confusing.

I agree with you 

> I would guess that the reason is to be found with your environment.
> Did anything change somewhere in your network? Anything that
> changed could be an indication (automatic updates, virus scanners
> somewhere, .... )

Well, the server is patched as soon as Microsoft releases patches for
Win2K. So the system changes quite often 

> A desperate measurement could be to restart the operating system
> if it was continuesly running during the last 4 years.

No - it is restarted on a regular basis anyway.

I will be setting up a new machine with Windows 2003 Server in January
anyway - so I hope the problem disappears that way 

regards
Thomas
Thomas Braun
Re: Index file lock fails - application hangs
on Thu, 14 Jan 2010 10:56:10 +0100
Andreas  Herdt wrote:

> For the time being we know very little about this issue since
> we do not even know what part of the WAA is encountering a
> deadlock. This is even more confusing since you tell us that
> the application was running since years without this kind of
> trouble.

Hi Andreas, 

for what its worth, I thought it might help to include the details how this
is reported by procmon:

Process Name   WAA1SRV.EXE
PID            2148
Operation      LockFile
Path           C:\Alaska\HP2\00127\CONFIRM.CDX
Result         NOT GRANTED
Detail         Exclusive: True, Offset: 2.147.483.646, Length: 1, Fail Immediately: True
TID            1672

This error message repeats around 60 times each second....

Thomas
Andreas HerdtRe: Index file lock fails - application hangs
on Mon, 18 Jan 2010 10:02:50 +0100
Hi Thomas,

What happening here is that the CDX engine tries to aquire a lock
when the file is already locked.

The CDX file needs to be locked implicit on read and write operations.
Internally - when a lock can not be granted - the engine will retry
until it can aquire the lock. Insofar the log entry you have sent here
is nothing unusual and does not give a hint on a deadlock at first
glance.

May be you can investigate what else is happening to confirm.cdx.
When was the offset successfully locked at that offset. Can the lock
be aquired after some time.

Thomas Braun schrieb:
> Andreas  Herdt wrote:
> 
>> For the time being we know very little about this issue since
>> we do not even know what part of the WAA is encountering a
>> deadlock. This is even more confusing since you tell us that
>> the application was running since years without this kind of
>> trouble.
> 
> Hi Andreas, 
> 
> for what its worth, I thought it might help to include the details how this
> is reported by procmon:
> 
> Process Name   WAA1SRV.EXE
> PID            2148
> Operation      LockFile
> Path           C:\Alaska\HP2\00127\CONFIRM.CDX
> Result         NOT GRANTED
> Detail         Exclusive: True, Offset: 2.147.483.646, Length: 1, Fail Immediately: True
> TID            1672
> 
> This error message repeats around 60 times each second....
> 
> Thomas


   Andreas Herdt
   Alaska Software

--------------------------------------------------------------------

Technical Support:      support@alaska-software.com

News Server:            news.alaska-software.com
Homepage:               http://www.alaska-software.com
WebKnowledgeBase:       http://www.alaska-software.com/kbase.shtm

Fax European Office:    +49 (0) 61 96 - 77 99 99 23
Fax US Office:          +1 (646) 218 1281
--------------------------------------------------------------------
Thomas Braun
Re: Index file lock fails - application hangs
on Mon, 18 Jan 2010 16:11:56 +0100
Andreas  Herdt wrote:

> The CDX file needs to be locked implicit on read and write operations.

I know  

> Internally - when a lock can not be granted - the engine will retry
> until it can aquire the lock. Insofar the log entry you have sent here
> is nothing unusual and does not give a hint on a deadlock at first
> glance.

Well - the application hangs completely... WAA does not shut down and it
does not work even after waiting some minutes.

Unfortunately this is our production system - so waiting indefinitely to
find out if the situation clears by itself is not an option.

What I can say is that the error seems to be load related. In december
the system was under heavy load, now there are only a few hits every hour
and the error only showed up once in january.

I will prepare a small test program this week to put the server under
artificial load to see if I can reproduce this.

> May be you can investigate what else is happening to confirm.cdx.
> When was the offset successfully locked at that offset. Can the lock
> be aquired after some time.

The problem is that it is not predictable when this problem occurs. So I
only can run procmon after this problem shows up, thus losing the actual
point in time when the lock was aquired the first time.

Maybe I could run procmon for a few days, but AFAIK procmon consumes a lot
of memory this way because it buffers all file system events in the
background even when the view is filtered - so I doubt this would run
long enough to find out what happens without making additional trouble 

regards
Thomas
Thomas Braun
Re: Index file lock fails - application hangs
on Wed, 20 Jan 2010 13:15:29 +0100
Thomas Braun wrote:

> I will prepare a small test program this week to put the server under
> artificial load to see if I can reproduce this.

By using WCat

   http://www.iis.net/downloads/default.aspx?tabid=34&i=1466&g=6

to set the server under load I can now reproduce the effect at will anytime
on my test system running Windows XP and it takes less than 90 seconds
until it happens.

What I can say from looking at the process monitor output is that there are
always locks with "not granted" status in between "success" entries before
the complete lock occurs.

There seems to be no difference between various settings of
CDXDBE_LOCKRETRY, regardless if it is set to 10 1000 or even 100.000.

I could isolate the exact point where "it" happens (whatever "it" exactly
might be  and attached the procmon log file in CSV format. 

What you can see are various LockFile/ReadFile/UnlockFileSingle sequences
until thread ID 3792 locks the cdx file and thread 3540 tries to lock it as
well before the first lock is released. After this point, no other file
actions than "lock file ->not granted" are recorded. WAA hangs, even all
waa gateway instances do not return anymore until waa1srv.exe is killed.


Thomas


Logfile-Lockup.csv.zip
Boris BorzicRe: Index file lock fails - application hangs
on Wed, 20 Jan 2010 14:40:05 +0100
Thomas Braun <spam@software-braun.de> wrote in
news:ywmab9gzxaaj$.1hwdur06yj11l$.dlg@40tude.net: 

> By using WCat
>    http://www.iis.net/downloads/default.aspx?tabid=34&i=1466&g=6
> 
> to set the server under load I can now reproduce the effect at will
> anytime on my test system running Windows XP and it takes less than 90
> seconds until it happens.

You should try this test using Advantage Local Server instead of CDXDBE. 
It's an easy change and it may fix your problem.

Best regards,
Boris Borzic

http://xb2.net
http://sqlexpress.net
industrial strength Xbase++ development tools
Thomas Braun
Re: Index file lock fails - application hangs
on Wed, 20 Jan 2010 17:00:56 +0100
Boris Borzic wrote:

> Thomas Braun <spam@software-braun.de> wrote in
> news:ywmab9gzxaaj$.1hwdur06yj11l$.dlg@40tude.net: 
> 
>> By using WCat
>>    http://www.iis.net/downloads/default.aspx?tabid=34&i=1466&g=6
>> 
>> to set the server under load I can now reproduce the effect at will
>> anytime on my test system running Windows XP and it takes less than 90
>> seconds until it happens.
> 
> You should try this test using Advantage Local Server instead of CDXDBE. 
> It's an easy change and it may fix your problem.

Honestly I would like to avoid using another 3rd party tool... apart from
that my Advantage knowledge has got a little bit "rusty" since I did not
use it form years so it might take some time until I get the right setup
for the local engine 

I'm thinking about creating a generic WAA package so someone else could
test this with a 1.9x version of WAA (as I currently only can use 1.8x)

Thomas
Thomas Braun
Re: Index file lock fails - application hangs
on Wed, 20 Jan 2010 17:15:52 +0100
Thomas Braun wrote:

> By using WCat
> 
>    http://www.iis.net/downloads/default.aspx?tabid=34&i=1466&g=6
> 

BTW, the program above is really nice if you want to know how many
requests/second it takes to get your WAA application down on its knees - or
even lower 

Very small footprint (3 MB) and scriptable tool to send HTTP(s) requests to
your webserver, either from one or remotley-controlled from several
physical machines at the same time.

regards
Thomas