60+ hours... is this normal? |
| log in |
Message boards : Number crunching : 60+ hours... is this normal?
1 · 2 · Next
| Author | Message |
|---|---|
|
I thought that these longer tasks were supposed to end at 48 hours? I don't mind letting it go but I am curious if there is a problem. | |
| ID: 2050 · Rating: 0 · rate:
| |
|
Workunits should definitely NOT take that long. For a quick fix, I suggest either aborting the task or restarting the client. We have seen this issue of workunits getting "stuck" in the past, but have not yet narrowed down the cause. Some other users are currently experiencing similar symptoms (on their Windows hosts). | |
| ID: 2051 · Rating: 0 · rate:
| |
|
I suspended the task and rebooted the Windows 7 PC (uptime has been a few months). When I launched BOINC they cleared and either errored or were invalid due to "Completed, too late to validate". | |
| ID: 2053 · Rating: 0 · rate:
| |
|
Tom I still get some tasks that get stuck at 100%, Win 7 64-bit. | |
| ID: 2054 · Rating: 0 · rate:
| |
|
vaughn - my 4 year old mac book takes about about an hour to crunch a single WU | |
| ID: 2055 · Rating: 0 · rate:
| |
|
I've any never ending tasks.I abort it. | |
| ID: 2059 · Rating: 0 · rate:
| |
I've any never ending tasks.I abort it. Yeah that's fine if you're watching your computer every 30 mins. I just had to abort 3 more WUs that were at 8hrs, 10hrs and 11 hrs runtime. Wasted half my day basically. This is rediculous. Is it really too much to ask to get this fixed or at the very least add a line in the code like: IF CurRuntime > 3600 THEN CALL ForceCompletion or whatever equivalent that does the same thing. I like the project but i can't be wasting my CPU time and electricity like this. I'm sorry. | |
| ID: 2060 · Rating: 0 · rate:
| |
|
I discovered that with 100% repeatability, if the task is suspended and restarts for ANY reason on my systems, with keep tasks in memory NOT SELECTED, then the tasks will go into overtime while using ZERO CPU time and never finish properly. I've proved this on 3 systems. I have NOT tested this with Keep Suspended tasks in memory option set. | |
| ID: 2061 · Rating: 0 · rate:
| |
I discovered that with 100% repeatability, if the task is suspended and restarts for ANY reason on my systems, with keep tasks in memory NOT SELECTED, then the tasks will go into overtime while using ZERO CPU time and never finish properly. Actually, way back in 2008(!) a suspend/resume bug was found, and never fixed AFAIK. There was a problem with the checkpointing meaning that only the last part of the run of the WU was reported, meaning the credit calculation was way too low. Maybe it was fixed, but introduced this bug. I never used to use 'keep tasks in memory', but this and another project at the time needed that option selected to run properly, so I've left it enabled since then. FWIW, I'm not sure that there's much of a downside to keeping in memory, other than it uses more swap space. Maybe it's important if you have you swap on an SSD? Cheers, Al. | |
| ID: 2064 · Rating: 0 · rate:
| |
|
I don't mind setting the flag to keep things in memory except when I am running tasks that use over 2Gig Mem each. | |
| ID: 2065 · Rating: 0 · rate:
| |
|
FYI | |
| ID: 2076 · Rating: 0 · rate:
| |
|
For the current batch of workunits (i.e. jobs prefixed with "veksler"), see http://mindmodeling.org/forum_thread.php?id=554&nowrap=true#2080 regarding variable workunit runtimes. We've since pulled these jobs from the system. For reference, these workunits are prefixed with MindModeling-188, MindModeling-190, and MindModeling-192. The number on the end corresponds to an internal job ID. | |
| ID: 2082 · Rating: 0 · rate:
| |
|
As an FYI, a number of the 191 tasks are also experiencing long run times > 1 hour. I've been aborting anything that goes over 1 hour run time and with the remaining time still increasing (but it's usually over an hour as well). I've got a few that I'm going to let run on my Win7 Q6600 machine just to see if they ever do complete or if they were like the earlier tasks and go 4+ hours or more. | |
| ID: 2084 · Rating: 0 · rate:
| |
As an FYI, a number of the 191 tasks are also experiencing long run times > 1 hour Yes, I have the same thing. Quite a few 191s crunching for 7-12 hours, still using 100% CPU, but looking to complete way after deadline. I've aborted a bunch that hadn't started, and some that were clearly going to run far too long. For educational (comedy?) purposes, I've left a few to run. I think they might complete, even if after deadline. Might learn something. Might not. Whatever :) Cheers, Al. | |
| ID: 2085 · Rating: 0 · rate:
| |
|
I have a couple of 191's that went high priority after lots of hours. I did not write down exactly how many. I found this thread and shut down boinc to restart it. Boinc exited but the worker threads were still running according to ps, so I rebooted the machine. | |
| ID: 2086 · Rating: 0 · rate:
| |
|
A couple of the ones I had on the Q6600 finished in about 2 hours. I have one though that is at 3:09 elapsed and 2:45 remaining (h:m) with both increasing at about the same rate. Going to abort that one. | |
| ID: 2087 · Rating: 0 · rate:
| |
|
Just as a test, I aborted a long running task on Linux and it leaves behind the extra 1.77 process. So I now have 5 tasks - ah, it's a dual core - running. Since I've had to abort a few tasks on this machine, I suspect these have just been building up. I had to do that more on the Windows machine, so the problem was actually worse there. | |
| ID: 2088 · Rating: 0 · rate:
| |
I have a couple of 191's that went high priority after lots of hours. I did not write down exactly how many. I found this thread and shut down boinc to restart it. Boinc exited but the worker threads were still running according to ps, so I rebooted the machine. Both units finished after a system reboot that caused them to start from scratch. They both finished in under an hour runtime. One of them previously ran for over ten hours and was using 1G of RAM and was hung at 75%. Both of the units had been aborted three other times. edit: I should probably mention this is on a 64 bit linux machine. | |
| ID: 2089 · Rating: 0 · rate:
| |
I have a couple of 191's that went high priority after lots of hours. I did not write down exactly how many. I found this thread and shut down boinc to restart it. Boinc exited but the worker threads were still running according to ps, so I rebooted the machine. I have the same problem I discovered... 6 WU's going way too long and stuck.. MindModeling@Beta 1.77 ACT-R cognitive modeling environment leveraging Clozure Common Lisp (sse2) MindModeling-191-500dedafb1b51_1 21:14:39 (21:10:07) 87.500 99.64 03:01:44 7/24/2012 11:37:13 PM Running High P. Linux-Compaq MindModeling@Beta 1.77 ACT-R cognitive modeling environment leveraging Clozure Common Lisp (sse2) MindModeling-191-500e563ba5995_1 15:56:51 (15:53:15) 87.500 99.62 02:16:25 7/25/2012 4:58:35 AM Running High P. Linux-Compaq MindModeling@Beta 1.77 ACT-R cognitive modeling environment leveraging Clozure Common Lisp (sse2) MindModeling-191-500dd87e36702_0 20:55:07 (20:50:35) 37.500 99.64 01d,02:21:10 7/24/2012 11:37:13 PM Running High P. Linux-Compaq MindModeling@Beta 1.77 ACT-R cognitive modeling environment leveraging Clozure Common Lisp (sse2) MindModeling-191-500e5b311a770_0 17:17:07 (17:13:20) 37.500 99.63 21:46:32 7/25/2012 3:32:59 AM Running High P. Linux-Compaq MindModeling@Beta 1.77 ACT-R cognitive modeling environment leveraging Clozure Common Lisp (sse2) MindModeling-191-500e14ed6828e_4 15:51:46 (15:48:11) 37.500 99.62 19:59:00 7/25/2012 4:58:35 AM Running High P. Linux-Compaq MindModeling@Beta 1.77 ACT-R cognitive modeling environment leveraging Clozure Common Lisp (sse2) MindModeling-191-500dd7fd3a125_0 19:46:02 (19:41:38) 25.000 99.63 01d,10:17:03 7/25/2012 1:03:28 AM Running High P. Linux-Compaq I will try the reboot thing... :) Update: Rebbot woorked. Looks like there is some sync problem that leaves some old stuff running in memory or something... have no real idea, but Linux-64b 3.0.23 running on a 1055T processor. | |
| ID: 2090 · Rating: 0 · rate:
| |
|
There a still quite a few tasks that have very long run times. I suspect in the morning I will have most cores stuck running these and wind up having to kill them. | |
| ID: 2091 · Rating: 0 · rate:
| |
Message boards :
Number crunching :
60+ hours... is this normal?