log in |
Questions and Answers : Windows : Hanging Process
Author | Message |
---|---|
I noticed that often MM WUs get stuck in my systems. It looks like the wrapper loses sight of the worker application, which seems to finish normally, and then fails to report progress and status to the BOINC client, thus remains taking a slot until way past its due time (e.g., this WU). | |
ID: 1272 · Rating: 0 · rate:
![]() ![]() ![]() | |
Here's an update on something that I just observed: 2 MM WUs, one running and the other suspended in memory. ACTR: boinc_init_options complete
ACTR: boinc_get_init_data(actr_aid) complete
ACTR: Trace 1
ACTR: Trace 2
ACTR: Trace 3
ACTR: Trace 4
ACTR: Trace 5
ACTR: Trace 6
ACTR: Trace 8
ACTR: Trace 9
ACTR: Trace 10 -- Lisp Running
ACTR: Trace 11 -- Watchdog Running (if Win32)
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting I'll wait for the one to finish to see what happens to the other. ____________ ![]() | |
ID: 1274 · Rating: 0 · rate:
![]() ![]() ![]() | |
Now that the one WU finished, the other WU was restarted, as can be seen by its error output: ACTR: boinc_init_options complete
ACTR: boinc_get_init_data(actr_aid) complete
ACTR: Trace 1
ACTR: Trace 2
ACTR: Trace 3
ACTR: Trace 4
ACTR: Trace 5
ACTR: Trace 6
ACTR: Trace 8
ACTR: Trace 9
ACTR: Trace 10 -- Lisp Running
ACTR: Trace 11 -- Watchdog Running (if Win32)
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
ACTR: boinc_init_options complete
ACTR: boinc_get_init_data(actr_aid) complete
ACTR: Trace 1
ACTR: Trace 2
ACTR: Trace 3
ACTR: Trace 4
ACTR: Trace 5
ACTR: Trace 6
ACTR: Trace 8
ACTR: Trace 9
ACTR: Trace 10 -- Lisp Running
ACTR: Trace 11 -- Watchdog Running (if Win32) Notice that the usual start-up log was appended to the existing error output. ____________ ![]() | |
ID: 1275 · Rating: 0 · rate:
![]() ![]() ![]() | |
After restarting, the other WU finished successfully. | |
ID: 1276 · Rating: 0 · rate:
![]() ![]() ![]() | |
Here's the error output of a stuck process: ACTR: boinc_init_options complete
ACTR: boinc_get_init_data(actr_aid) complete
ACTR: Trace 1
ACTR: Trace 2
ACTR: Trace 3
ACTR: Trace 4
ACTR: Trace 5
ACTR: Trace 6
ACTR: Trace 8
ACTR: Trace 9
ACTR: Trace 10 -- Lisp Running
ACTR: Trace 11 -- Watchdog Running (if Win32)
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
ACTR: boinc_init_options complete
ACTR: boinc_get_init_data(actr_aid) complete
ACTR: Trace 1
ACTR: Trace 2
ACTR: Trace 3
ACTR: Trace 4
ACTR: Trace 5
ACTR: Trace 6
ACTR: Trace 8
ACTR: Trace 9
ACTR: Trace 10 -- Lisp Running
ACTR: Trace 11 -- Watchdog Running (if Win32)
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
ACTR: boinc_init_options complete
ACTR: boinc_get_init_data(actr_aid) complete
ACTR: Trace 1
ACTR: Trace 2
ACTR: Trace 3
ACTR: Trace 4
ACTR: Trace 5
ACTR: Trace 6 I'll leave it alone and see how the BOINC client deals with it. PS: this is the WU in question. ____________ ![]() | |
ID: 1277 · Rating: 0 · rate:
![]() ![]() ![]() | |
It looks like the zombie processes are eventually purged. | |
ID: 1279 · Rating: 0 · rate:
![]() ![]() ![]() | |
It looks like the zombie processes are eventually purged. Well, not really. I have a couple of dangling MM processes, one for 24h, the other for 48h. They keep two files in their BOINC slots:
No heartbeat from core client for 30 sec - exiting Consequently, the file just keeps growing in size. One is at 4MB, the other, at 8MB. Bottom line: I cannot afford to run this project. Please, advise. ____________ ![]() | |
ID: 1281 · Rating: 0 · rate:
![]() ![]() ![]() | |
Questions and Answers :
Windows :
Hanging Process