Duke4.net Forums: eduke32 crash linux / work around. - Duke4.net Forums

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

eduke32 crash linux / work around.

User is offline   cybertiger 

#1

Under Linux eduke32 will crash randomly on computers with multiple processors and/or cores, and possibly single core processors with ht support.

A simple fix is to use taskset to force it to run on one processor.

taskset -c 1 ./eduke32

You made need to install the taskset utility first:

sudo apt-get install schedultils

The reason for the crash is that sound buffer is modified by two threads concurrently and there is no synchronization.

A simple example of this is that fillData in jaudiolib/src/driver_sdl.c can be called before MixBuffer is initialized, which reliably (about 50% of the time for me) causes a crash on startup before the logo is shown.

-CT.
0

User is offline   ZedDB 

#2

Thank you! It's been getting on my nerves with these crashes.
0

User is offline   ZedDB 

#3

Try the new svn rev. This is now probably fixed.
0

User is offline   cybertiger 

#4

View PostZedDB, on Jan 11 2010, 08:07 PM, said:

Try the new svn rev. This is now probably fixed.


That should fix the more common cases, excluding my crash on startup that I get about 50% of the time where MixBuffer and MixCallback are both 0 the first time fillData() is called by SDL_Mixer (because line 185 writes a bunch of zeros to the channel).

My C is rusty as hell or I'd offer a patch that was a bit more watertight.

This post has been edited by cybertiger: 11 January 2010 - 11:06 AM

0

User is offline   Plagman 

  • Former VP of Media Operations

#5

What about now?
0

User is offline   cybertiger 

#6

Thanks, that fixes the most annoying crash.

The synchronization in the sound system is still substantially broken, if I get time I'll try to write a patch.

Something you should watch out for is the following:

if (InterruptsDisabled++)
return;
SDL_MutexLock(flibble);


If two threads hit this code at the same time (on a system with multiple cores and/or processors), the results can vary.

If you're using SDL_LockMutex and SDL_UnlockMutex, then there is no need to do your own ref counting, SDL will do it for you. Some other locking functions might not perform the reference counting for you, the best thing is to avoid them and use SDL_Mutex instead.

With SDL_Mutex:

SDL_MutexLock(flibble);
... code ...
SDL_MutexUnlock(flibble);


This might explain some of the random hangs people have been having, combined with some missing and commented out locking in multivoc.c

-CT
0

User is offline   Plagman 

  • Former VP of Media Operations

#7

Keep in mind that everything runs in the same thread except fillData, which gets called from a thread spawned by SDL_mixer. No use relying on atomic recursive locking when we can do it safely and faster ourselves.

The commented out locking is on purpose, so I don't believe any gaps remain in the current logic. The freezes you're talking about have been fixed a while ago.
0

User is offline   cybertiger 

#8

View PostPlagman, on Jan 12 2010, 12:10 AM, said:

No use relying on atomic recursive locking when we can do it safely and faster ourselves.

I very much appreciate that the project is actively maintained and I don't want to get into some sort of fight over this however you are wrong.

Your code is not safe, and if you actually locked the things which you need to lock then it would hang, as most of that code is commented out it crashes instead. If you do not believe if it is caused by a lack of synchronization code then why does pinning the process to one CPU stop the random crashes?

I don't mind patching it myself, and when I'm happy with the code I'll post a diff, which you can choose to use or ignore, for now I'll run the program on one CPU (unless I'm testing) because it will continue to crash randomly otherwise.

I'm sorry if I upset the apple cart, I'll go away until I have a patch.
0

User is offline   Plagman 

  • Former VP of Media Operations

#9

It's by no means a fight, I was just replying to your observations with what I thought was right; I'm happy to be proven wrong.
If you're still experiencing crashes, attaching the backtraces here would be a good place to start. Can you elaborate on what exactly you think isn't safe in the current code? Are you implying that several distinct threads can access SDLDrv_PCM_Lock and Unlock? If so, that's definitely a bug and I need to fix that.
0

User is offline   cybertiger 

#10

View PostPlagman, on Jan 12 2010, 01:49 AM, said:

It's by no means a fight, I was just replying to your observations with what I thought was right; I'm happy to be proven wrong.

If you're still experiencing crashes, attaching the backtraces here would be a good place to start. Can you elaborate on what exactly you think isn't safe in the current code? Are you implying that several distinct threads can access SDLDrv_PCM_Lock and Unlock? If so, that's definitely a bug and I need to fix that.
There's only two threads in the code, as far as I know.

Regards maintaining ref counts for locking, and doing it in a safe and fast way reading the libSDL source is enlightening. (It's only about 5 lines of code each for lock/unlock).

Regards the thread safety and lack of synchronization anything called from fillData() is also going to be on the audio thread. I am thinking specifically of MV_MixCallBack() which in turn calls FX_CallBack(), which in turn calls S_TestSoundCallback(). These are all called with the sound lock held.

Now any state these functions access is shared between two threads, and when the state is state is accessed from the main thread the sound lock should be obtained first, and in the vast majority of cases it is not.

I'm not suggesting using locks on all access to the shared state, I don't think that would be a good (or fast ;) ) solution. There are probably other more sensible solutions.

Given there's only two threads and one lock, in theory getting deadlocks should be impossible (you need a minimum of 2 threads and 2 locks for this).

I'll turn core dumps on and run against latest svn to see if I can obtain backtraces for you, in theory they should happen at points where the shared state I listed is accessed, if they don't overwrite the stack with junk before crashing.
0

User is offline   Plagman 

  • Former VP of Media Operations

#11

View Postcybertiger, on Jan 11 2010, 10:11 PM, said:

There's only two threads in the code, as far as I know.


Yes, which is why bothering with atomic recursive locking in PCM_Lock/PCM_Unlock is pointless since the worker thread will never access these functions.

View Postcybertiger, on Jan 11 2010, 10:11 PM, said:

Regards the thread safety and lack of synchronization anything called from fillData() is also going to be on the audio thread. I am thinking specifically of MV_MixCallBack() which in turn calls FX_CallBack(), which in turn calls S_TestSoundCallback(). These are all called with the sound lock held.


Right, the code in S_TestSoundCallback() is written with that in mind.

View Postcybertiger, on Jan 11 2010, 10:11 PM, said:

Now any state these functions access is shared between two threads, and when the state is state is accessed from the main thread the sound lock should be obtained first, and in the vast majority of cases it is not.


Which exact state are you talking about? Can you describe a precise case where concurrent access to this shared state would result in a crash? Or even unwanted behavior?

View Postcybertiger, on Jan 11 2010, 10:11 PM, said:

Given there's only two threads and one lock, in theory getting deadlocks should be impossible (you need a minimum of 2 threads and 2 locks for this).


Yes.

View Postcybertiger, on Jan 11 2010, 10:11 PM, said:

I'll turn core dumps on and run against latest svn to see if I can obtain backtraces for you, in theory they should happen at points where the shared state I listed is accessed, if they don't overwrite the stack with junk before crashing.


Core dumps won't help, but be sure to `make veryclean && make RELEASE=0`. Also, changing sdlayer.c:585 to "#if 0" should help with dealing with gdb and EDuke32 at the same time. I still don't think you're going to run in any (related) crashes, though.
0

User is offline   cybertiger 

#12

View PostPlagman, on Jan 12 2010, 09:15 AM, said:

Core dumps won't help, but be sure to `make veryclean && make RELEASE=0`. Also, changing sdlayer.c:585 to "#if 0" should help with dealing with gdb and EDuke32 at the same time. I still don't think you're going to run in any (related) crashes, though.


Thanks.
0

User is offline   cybertiger 

#13

You win, I can't crash it any more ;)
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic


All copyrights and trademarks not owned by Voidpoint, LLC are the sole property of their respective owners. Play Ion Fury! ;) © Voidpoint, LLC

Enter your sign in name and password


Sign in options