• Networked message handling/thread going to 100% cpu & no imports

    From Khelair@VERT to All on Sat Dec 6 21:36:44 2014
    I know I've mentioned this before, but the bug in synchronet that a few people have talked about that pegs a thread @ 100% of cpu usage after one of the networks (I believe- though this is a glorified assumption at this point) tries to pull messages, has been bugging me a lot more often recently. Basically at least once a day now I'm finding that after a prolonged period of time no messages have been imported to any of the networked subs, and inevitably, after I check the cpu stats, the sbbs process is pegged at 100%. A kill -15 won't kill it, after awhile I kill -9 it, restart it, and things seem to be working again. This time around I haven't noticed any particular sub-boards being corrupted in the process, but I've trimmed down the number of sub-boards that I'm reading lately due to not enough time, and FIDONet not posting anything for me in an error that my RC doesn't seem to be able to help me fix.
    So I'm not sure exactly which networked function it may be, but when it happens it shuts down importing of all networked messages across 5 networks. It's really a hinderance, and I'd rather not have to fall back on setting up a shell script to run every hour to check for pegged usage for too long and then kill it off and restart it. That just can't be good for anything.
    Can anybody give me some more information on how to get around this, since I still can't get a more recent version compiled on OBSD? I guess I could just default to disabling networked bases, one at a time (my preliminary suspect is FIDO), until it doesn't seem to happen any more, but that seems like it'd be unreliable and a really time-consuming way to get to the bottom of this.
    Any input appreciated.

    ---
    þ Synchronet þ Tinfoil Tetrahedron BBS telnet://
  • From Access Denied@VERT to Khelair on Sun Dec 7 09:14:42 2014
    Hello Khelair,

    On 06 Dec 14 21:36, Khelair wrote to All:

    Can anybody give me some more information on how to get
    around this, since I still can't get a more recent version compiled on OBSD? I guess I could just default to disabling networked bases, one
    at a time (my preliminary suspect is FIDO), until it doesn't seem to happen any more, but that seems like it'd be unreliable and a really time-consuming way to get to the bottom of this. Any input
    appreciated.

    When this is occurring, take a look in your /sbbs/data/ directory for *.now. If
    one exists (usually fidoin.now or fidoout.now or something similar) no other events will run until that one is done. So if no others are running, and one of
    those .now files exist, *that* is the one causing other events not to run.

    With that, you can narrow down exactly which event is doing this. After you know that, and if it's fidoin.now, you can check /sbbs/data/sbbsecho.log for any errors importing messages during that timeframe. If it's fidoout.now check the same log for exporting errors.

    Sometimes it's a DOS event while processing door games for interBBS. If one game hangs during processing, it will stay locked up, and your DOS emulator would continue to run, pinging your CPU at 100%. Then again, you're running OpenBSD so you may not have any DOS games being processed, unless you're using something like DOSCMD or maybe got DOSEMU to compile for it..

    Regards,
    Nick

    --- GoldED+/LNX 1.1.5-b20
  • From Digital Man@VERT to Khelair on Sun Dec 7 18:29:19 2014
    Re: Networked message handling/thread going to 100% cpu & no imports
    By: Khelair to All on Sat Dec 06 2014 09:36 pm

    I know I've mentioned this before, but the bug in synchronet that a few people have talked about that pegs a thread @ 100% of cpu usage after one
    of the networks (I believe- though this is a glorified assumption at this point) tries to pull messages, has been bugging me a lot more often recently. Basically at least once a day now I'm finding that after a prolonged period of time no messages have been imported to any of the networked subs, and inevitably, after I check the cpu stats, the sbbs process is pegged at 100%. A kill -15 won't kill it, after awhile I kill
    -9 it, restart it, and things seem to be working again. This time around I haven't noticed any particular sub-boards being corrupted in the process, but I've trimmed down the number of sub-boards that I'm reading lately due to not enough time, and FIDONet not posting anything for me in an error
    that my RC doesn't seem to be able to help me fix.
    So I'm not sure exactly which networked function it may be, but when it happens it shuts down importing of all networked messages across 5
    networks. It's really a hinderance, and I'd rather not have to fall back on setting up a shell script to run every hour to check for pegged usage for too long and then kill it off and restart it. That just can't be good for anything.
    Can anybody give me some more information on how to get around this,
    since I still can't get a more recent version compiled on OBSD? I guess I could just default to disabling networked bases, one at a time (my preliminary suspect is FIDO), until it doesn't seem to happen any more, but that seems like it'd be unreliable and a really time-consuming way to get
    to the bottom of this.
    Any input appreciated.

    Are all of these "network" fidonet technology nets (FTNs)? If so, then the process that handles importing and exporting would be SBBSecho, not sbbs. Which
    process exactly do you see with a 100% CPU utilization? What is the log output at the time that is occuring? What version of SBBS and SBBSecho are you using? Without more details, it's really hard to help.

    digital man

    Synchronet "Real Fact" #19:
    Michael Swindell was directly responsible for Synchronet's commercial success. Norco, CA WX: 67.0øF, 54.0% humidity, 0 mph WSW wind, 0.00 inches rain/24hrs
    --
  • From Access Denied@VERT to Khelair on Wed Dec 10 17:15:14 2014
    Hello Khelair,

    On 09 Dec 14 20:36, Khelair wrote to Access Denied:

    Well I caught a couple of atypical ones now. Straight up crashes,
    where I've got an open session and I come back awhile later and the connection is terminated. These ones appear to be happening right
    around the time that qnet-qwk.now is being created, though they don't appear to have anything in the associated .lo? file.

    For one, you don't ever have to associate QWK messages with .?lo files whatsoever. Two completely different transfer protocols. My question for you would be, are you hosting a QWK network? Or maybe it's when you're polling VERT
    for Dovenet?

    Maybe check your system log and see if there's any odd things going on right around the time it crashes.

    Regards,
    Nick

    --- GoldED+/LNX 1.1.5-b20130910
    * Origin: thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin) (723:1/701)
    þ Synchronet þ thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin)
  • From Khelair@VERT to Access Denied on Wed Dec 10 21:41:22 2014
    Re: Re: Networked message handling/thread going to 100% cpu & no imports
    By: Access Denied to Khelair on Wed Dec 10 2014 17:15:14

    don't appear to have anything in the associated .lo? file.

    For one, you don't ever have to associate QWK messages with .?lo files whatsoever. Two completely different transfer protocols. My question for you would be, are you hosting a QWK network? Or maybe it's when you're polling VERT for Dovenet?

    I meant what I said about .lo? files, as in the ones that accumulate in /sbbs/data/logs/*.lo? (.log & .lol).

    Maybe check your system log and see if there's any odd things going on right around the time it crashes.

    Yep, that's what I referenced doing in the above file extensions. ;)

    ---
    þ Synchronet þ Tinfoil Tetrahedron BBS telnet://tinfoil.synchro.net
  • From Access Denied@VERT to Khelair on Thu Dec 11 17:12:44 2014
    Hello Khelair,

    On 10 Dec 14 21:41, Khelair wrote to Access Denied:

    I meant what I said about .lo? files, as in the ones that accumulate
    in /sbbs/data/logs/*.lo? (.log & .lol).

    Maybe check your system log and see if there's any odd things
    going on right around the time it crashes.

    Yep, that's what I referenced doing in the above file extensions.
    ;)

    I don't think those logs give you all information about your system, do they? Maybe you compiled it that way for your OS?

    Otherwise, check your system log. I use syslog-ng on Gentoo here, and it logs to /var/log/messages (aside from the stuff in the /sbbs/data/logs directory).

    Regards,
    Nick

    --- GoldED+/LNX 1.1.5-b20130910
    * Origin: thePharcyde_
  • From mark lewis@VERT to Khelair on Thu Dec 11 22:15:51 2014
    On Wed, 10 Dec 2014, Khelair wrote to Access Denied:

    don't appear to have anything in the associated .lo? file.

    For one, you don't ever have to associate QWK messages with .?lo files

    someone confused .lo? files with .?lo files... the latter are binkley style mailer files ;)

    )\/(ark


    * Origin: (1:3634/12)
    ---
    þ Synchronet þ Vertrauen þ Home of Synchronet þ telnet://vert.synchro.net
  • From Nicholas Boel@VERT to mark lewis on Thu Dec 11 22:57:06 2014
    Hello mark,

    On 11 Dec 14 22:15, mark lewis wrote to Khelair:

    For one, you don't ever have to associate QWK messages with .?lo
    files

    someone confused .lo? files with .?lo files... the latter are binkley style mailer files ;)

    I did. But then again, I originally wasn't referring to anything in /sbbs/data/logs, either. I was referring to the system log (ie: /var/log/messages in some Linux distros, journalctl on Archlinux, etc. ie2: your SYSTEM log, not your BBS logs, and if installed normally, Synchronet will automatically log to your system logs if you don't tell it not to, or don't run
    as a daemon.

    Regards,
    Nick

    --- GoldED+/LNX 1.1.5-b20130910
    * Origin: thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin)