log in
1) Questions and Answers : Windows : Hanging Process (Message 1281)
Posted 20 Mar 2009 by ebahapo
It looks like the zombie processes are eventually purged.

Well, not really. I have a couple of dangling MM processes, one for 24h, the other for 48h. They keep two files in their BOINC slots:

  • boinc_lockfile
  • stderr.txt


The latter is updated every few seconds with a new line:

No heartbeat from core client for 30 sec - exiting

Consequently, the file just keeps growing in size. One is at 4MB, the other, at 8MB.

Bottom line: I cannot afford to run this project.

Please, advise.
2) Questions and Answers : Windows : Hanging Process (Message 1279)
Posted 19 Mar 2009 by ebahapo
It looks like the zombie processes are eventually purged.
3) Questions and Answers : Windows : Hanging Process (Message 1277)
Posted 18 Mar 2009 by ebahapo
Here's the error output of a stuck process:
ACTR: boinc_init_options complete ACTR: boinc_get_init_data(actr_aid) complete ACTR: Trace 1 ACTR: Trace 2 ACTR: Trace 3 ACTR: Trace 4 ACTR: Trace 5 ACTR: Trace 6 ACTR: Trace 8 ACTR: Trace 9 ACTR: Trace 10 -- Lisp Running ACTR: Trace 11 -- Watchdog Running (if Win32) No heartbeat from core client for 30 sec - exiting No heartbeat from core client for 30 sec - exiting ACTR: boinc_init_options complete ACTR: boinc_get_init_data(actr_aid) complete ACTR: Trace 1 ACTR: Trace 2 ACTR: Trace 3 ACTR: Trace 4 ACTR: Trace 5 ACTR: Trace 6 ACTR: Trace 8 ACTR: Trace 9 ACTR: Trace 10 -- Lisp Running ACTR: Trace 11 -- Watchdog Running (if Win32) No heartbeat from core client for 30 sec - exiting No heartbeat from core client for 30 sec - exiting ACTR: boinc_init_options complete ACTR: boinc_get_init_data(actr_aid) complete ACTR: Trace 1 ACTR: Trace 2 ACTR: Trace 3 ACTR: Trace 4 ACTR: Trace 5 ACTR: Trace 6

I'll leave it alone and see how the BOINC client deals with it.

PS: this is the WU in question.

4) Questions and Answers : Windows : Hanging Process (Message 1276)
Posted 18 Mar 2009 by ebahapo
After restarting, the other WU finished successfully.

I'll continue keeping an eye out for hanged wrappers.
5) Questions and Answers : Windows : Hanging Process (Message 1275)
Posted 18 Mar 2009 by ebahapo
Now that the one WU finished, the other WU was restarted, as can be seen by its error output:
ACTR: boinc_init_options complete ACTR: boinc_get_init_data(actr_aid) complete ACTR: Trace 1 ACTR: Trace 2 ACTR: Trace 3 ACTR: Trace 4 ACTR: Trace 5 ACTR: Trace 6 ACTR: Trace 8 ACTR: Trace 9 ACTR: Trace 10 -- Lisp Running ACTR: Trace 11 -- Watchdog Running (if Win32) No heartbeat from core client for 30 sec - exiting No heartbeat from core client for 30 sec - exiting ACTR: boinc_init_options complete ACTR: boinc_get_init_data(actr_aid) complete ACTR: Trace 1 ACTR: Trace 2 ACTR: Trace 3 ACTR: Trace 4 ACTR: Trace 5 ACTR: Trace 6 ACTR: Trace 8 ACTR: Trace 9 ACTR: Trace 10 -- Lisp Running ACTR: Trace 11 -- Watchdog Running (if Win32)

Notice that the usual start-up log was appended to the existing error output.
6) Questions and Answers : Windows : Hanging Process (Message 1274)
Posted 18 Mar 2009 by ebahapo
Here's an update on something that I just observed: 2 MM WUs, one running and the other suspended in memory.

Observing the system using BOINCView and Process Explorer, suddenly, the other suspended WU vanished from Process Explorer (all, the wrapper, the worker and the watchdog), yet both BOINCView and the official BOINC Manager report that the other is still active.

I wonder if the watchdog killed the other WU even though it was suspended...

Here's the other's error output:
ACTR: boinc_init_options complete ACTR: boinc_get_init_data(actr_aid) complete ACTR: Trace 1 ACTR: Trace 2 ACTR: Trace 3 ACTR: Trace 4 ACTR: Trace 5 ACTR: Trace 6 ACTR: Trace 8 ACTR: Trace 9 ACTR: Trace 10 -- Lisp Running ACTR: Trace 11 -- Watchdog Running (if Win32) No heartbeat from core client for 30 sec - exiting No heartbeat from core client for 30 sec - exiting

I'll wait for the one to finish to see what happens to the other.
7) Questions and Answers : Windows : Hanging Process (Message 1272)
Posted 18 Mar 2009 by ebahapo
I noticed that often MM WUs get stuck in my systems. It looks like the wrapper loses sight of the worker application, which seems to finish normally, and then fails to report progress and status to the BOINC client, thus remains taking a slot until way past its due time (e.g., this WU).

I suspect that it happens when a MM WU is suspended, albeit kept in memory. It seems that the worker application keeps on running and the wrapper, suspended, misses the completion signal from it.

Killing the wrapper solves things, but it still happens with about 10% of the WUs.

Please, advise.
8) Message boards : Number crunching : Application hangs if suspended (Message 1243)
Posted 14 Feb 2009 by ebahapo
If the BOINC client suspends the MM application for whatever reason, such as because it got another higher priority WU from another project, even with suspending to memory enabled, when its turn to run comes up again, a new MM application is launched and the previous one becomes a zombie.

Eventually, a system collects several such zombie MM applications in memory, unbeknown to the BOINC client, which does not report the zombie MM application and returns the WU crunched by the new instance.

I've only observed this on Windows, but I cannot say that it does not happen on Linux too.

TIA
9) Message boards : Number crunching : Understanding WU Progression (Message 1145)
Posted 7 Dec 2008 by ebahapo
Should this project identify itself as non-CPU intensive project? I barely notice any CPU usage by it, just a few spikes lasting a fraction of a second every now and then.



TIA
10) Message boards : Number crunching : Understanding WU Progression (Message 1103)
Posted 24 Nov 2008 by ebahapo
How much CPU time is being used versus completion time?

As I said above, about an order of magnitude (i.e., the CPU time reported is about 5min, but the WU takes it up for over 1h).

If the overhead is wasting CPU cycles, I can adjust a credit fix.

Given that a WU does take up the CPU that could be used by other BOINC projects, the credit should at least compensate for the credits that would have been gained by running other projects.

Another possibility would be to mark the project as non-CPU intensive. This may not be so easy because it seems that the WUs vary, but if all the WUs being generated follow this profile, you might consider this.

I believe that in a future BOINC version specific applications or WUs may be marked non-CPU intensive, but until then the whole project must be marked as such.

HTH
11) Message boards : Number crunching : Understanding WU Progression (Message 1085)
Posted 20 Nov 2008 by ebahapo
I have a couple of such slow-progressing WUs. Looking at Windows' Task Manager, they barely use any CPU time, and the time that the BOINC client reports matches that of Task Manager: just a few minutes even though they've been running for about an hour.

I've also noticed something similar on Linux, but I cannot confirm the CPU time taken by the WUs.

Please, advise.
12) Questions and Answers : Web site : Nice! (Message 851)
Posted 28 Jun 2008 by ebahapo
The account page is the nicest one among all BOINC projects! Really cool.

Kudos.




Main page · Your account · Message boards


Copyright © 2020 MindModeling.org