Posts by ChertseyAl |
| log in |
|
1)
Message boards :
Number crunching :
Negative CPU - WTH?!
(Message 2141)
Posted 292 days ago by ChertseyAl
D'oh, forgot to mention the existing thread on this project: http://mindmodeling.org/beta/forum_thread.php?id=580 Cheers, Al. |
|
2)
Message boards :
Number crunching :
Negative CPU - WTH?!
(Message 2140)
Posted 292 days ago by ChertseyAl
It's happening with other projects too. I'll see if I can find the relevant threads on those message boards. Cheers, Al. |
|
3)
Message boards :
Number crunching :
60+ hours... is this normal?
(Message 2093)
Posted 329 days ago by ChertseyAl
I aborted my last long-running WUs as they weren't getting anywhere. One had reached nearly 30 hours. When I aborted them, the message log had "[error] Can't rename outfile file" for each WU - Don't know if that's a clue. Oh, and as reported elssewhere, the app continues to run and burn up CPU time even though the WU has been aborted and reported. Sadly I didn't notice this yesterday, so some of my hosts spent 12 hours running phantom WUs overnight :( Cheers, Al. |
|
4)
Message boards :
Number crunching :
60+ hours... is this normal?
(Message 2085)
Posted 330 days ago by ChertseyAl
As an FYI, a number of the 191 tasks are also experiencing long run times > 1 hour Yes, I have the same thing. Quite a few 191s crunching for 7-12 hours, still using 100% CPU, but looking to complete way after deadline. I've aborted a bunch that hadn't started, and some that were clearly going to run far too long. For educational (comedy?) purposes, I've left a few to run. I think they might complete, even if after deadline. Might learn something. Might not. Whatever :) Cheers, Al. |
|
5)
Message boards :
Number crunching :
Program still listed in Task list after I suspend work.
(Message 2078)
Posted 330 days ago by ChertseyAl
Maybe connected with this? http://mindmodeling.org/beta/forum_thread.php?id=574&nowrap=true#2061 Cheers, Al. |
|
6)
Message boards :
Number crunching :
60+ hours... is this normal?
(Message 2064)
Posted 342 days ago by ChertseyAl
I discovered that with 100% repeatability, if the task is suspended and restarts for ANY reason on my systems, with keep tasks in memory NOT SELECTED, then the tasks will go into overtime while using ZERO CPU time and never finish properly. Actually, way back in 2008(!) a suspend/resume bug was found, and never fixed AFAIK. There was a problem with the checkpointing meaning that only the last part of the run of the WU was reported, meaning the credit calculation was way too low. Maybe it was fixed, but introduced this bug. I never used to use 'keep tasks in memory', but this and another project at the time needed that option selected to run properly, so I've left it enabled since then. FWIW, I'm not sure that there's much of a downside to keeping in memory, other than it uses more swap space. Maybe it's important if you have you swap on an SSD? Cheers, Al. |
|
7)
Message boards :
Number crunching :
New task mostly resulting in computation errors
(Message 2043)
Posted 361 days ago by ChertseyAl
It is an annoying message that looks like an error but the results are still good. Thanks for the quick feedback, especially at the weekend! I'll keep on crunching :) Cheers, Al. |
|
8)
Message boards :
Number crunching :
New task mostly resulting in computation errors
(Message 2041)
Posted 361 days ago by ChertseyAl
I'm seeing a lot of errors in my message logs like this: 23/06/12 12:10:53|MindModeling@Beta|Starting MindModeling-164-4fe57dac18ac5_0 23/06/12 12:10:56|MindModeling@Beta|Starting task MindModeling-164-4fe57dac18ac5_0 using ccl_wrap version 175 23/06/12 12:46:19|MindModeling@Beta|[error] Can't rename output file MindModeling-164-4fe57dac18ac5_0_1 23/06/12 12:46:20|MindModeling@Beta|Computation for task MindModeling-164-4fe57dac18ac5_0 finished 23/06/12 12:46:20|MindModeling@Beta|Starting MindModeling-164-4fe57e070ce58_0 23/06/12 12:46:24|MindModeling@Beta|Starting task MindModeling-164-4fe57e070ce58_0 using ccl_wrap version 175 23/06/12 12:46:27|MindModeling@Beta|Started upload of MindModeling-164-4fe57dac18ac5_0_0 23/06/12 12:46:31|MindModeling@Beta|Finished upload of MindModeling-164-4fe57dac18ac5_0_0 The WU for that one was: http://mindmodeling.org/beta/workunit.php?wuid=432068 http://mindmodeling.org/beta/result.php?resultid=517531 I'm getting loads of these, but they are difficult to find as I'm crunching so many :) I don't see any errored or invalid WUs in my tasks list, so presumably it's not important? Cheers, Al. |
|
9)
Message boards :
Number crunching :
New task mostly resulting in computation errors
(Message 2040)
Posted 362 days ago by ChertseyAl
Now that I can see my results page again I can confirm that I've had no errors at all :) Just one WU pending at the moment. Nice! I've temporarily increased the task deadline to 2 days for all newly created workunits. Since we don't generate all the workunits for a job up front, you should start downloading these longer workunits very soon (but maybe not immediately). We'll have to be more dynamic about this in the future. Currently, we hard code the delay_bound value, so all workunits have the same deadline despite the size of the job. In the future, we'll set the deadline as a function of the total estimated runtime of a job. Good news. I don't think I've got to the longer deadline WUs yet, but it will be neat when the deadlines adjust to suit the volume of work. Should make things much easier for the crunchers. I don't mind short bursts of 'urgent' work, but days of it at a time can be a bit tricky to manage :) Looking at the percentage complete of the current batch I'm looking forward to a week of solid crunching :) Cheers, Al. |
|
10)
Message boards :
Number crunching :
New task mostly resulting in computation errors
(Message 2037)
Posted 362 days ago by ChertseyAl
The new huge batch is good news. Well, sort of ... I think your database is getting clogged up with so many results coming in. I'm unable to access my tasks list or even view my computers via my account page. Well, maybe if I waited for more than 10 minutes it would work, but I get easily bored ;) Anyway, from what I saw earlier today all of my WUs are running to completion and validating (all XP 32-bit at the moment). My only problem is the short deadline forcing panic mode on all hosts and limiting the amount of work I'm able to get (no work fetch when stuck in high priority) and also blocking other projects. Given the size of this batch and the length of time it's going to take to complete, maybe the deadline could be increaded to 2 or 3 days? Anyway, nice to have some work to crunch :) Cheers, Al. |
|
11)
Message boards :
Number crunching :
Credit New?
(Message 2023)
Posted 384 days ago by ChertseyAl
p.s. Forgot to say that I do have a lot of sympathy for the project admins who just want to get the work done and not worry about the technical details. It's a shame that the BOINC software puts them in the firing line sometimes when they haven't deliberately done anything to upset the cruchers. Of course, some admins like to get their hands dirty and try to make the project fun, and worthwhile for the cruchers (not just in terms of credit, but with feedback, fixes etc). But only when they they don't vanish on "Lost Weekends" ;) [In-joke for some of us] I'll shut up now. Cheers, Al. |
|
12)
Message boards :
Number crunching :
Credit New?
(Message 2022)
Posted 384 days ago by ChertseyAl
From what I've seen with CreditRandom so far (and not talking specifically about this project, which doesn't seem badly afflicted from my POV), each batch of different work starts with decent credit, then fizzles out down to the 'not worth the electricity cost' level. I think it must recalibrate when the app changes (which would sensible, if you could ever call CreditRandom sensible!). Solution: Crunch until the credit starts dropping, reset the project, wait for a new app version, rinse and repeat. Of course, this is terrible for the project as so much work has to be resent and causes delays, but that's the result of a choice made by the project admins. This is the approach I've taken with a couple of projects in the past. I'm not *that* concerned with credit, but when I'm paying for the electricity I don't like my contribution devalued by as much as 90% in some cases. Anyway. Whatever. Yeah ;) Cheers, Al. |
|
13)
Message boards :
Number crunching :
Rarely able to get work although it appears to be available
(Message 2021)
Posted 384 days ago by ChertseyAl
Thanks for the explanation - Makes sense now :) I know that the project has (historically) not had much work available, and I was just concerned that something was broken. I'll open up my hosts for work again once the current SIMAP run is ended, and hopefully start getting some work, even if sporadically :) Cheers, Al. |
|
14)
Message boards :
Number crunching :
Rarely able to get work although it appears to be available
(Message 2016)
Posted 391 days ago by ChertseyAl
Ah, no, I think maybe you've missed my point. There *is* work available according to the home page, but none gets sent. At the moment you have "vdv-SecondLife... " showing as only 5% complete, but none of machines can get any of the remaining 95% :) I did get a brief flurry of work yesterday, but nothing since. Just tried an 'update' on one machine: 24/05/12 19:03:30|MindModeling@Beta|Sending scheduler request: Requested by user. Requesting 43423 seconds of work, reporting 0 completed tasks 24/05/12 19:03:36|MindModeling@Beta|Scheduler request succeeded: got 0 new tasks 24/05/12 19:03:36|MindModeling@Beta|Message from server: Project has no tasks available Same with both windows and linux hosts (all 32 bit). I've got a dozen machine requesting work regularly at the moment. Maybe there's some platform specific limitation? Maybe the homepage is out of date and all of the work has actually been sent out? Cheers, Al. |
|
15)
Message boards :
Number crunching :
Rarely able to get work although it appears to be available
(Message 2010)
Posted 398 days ago by ChertseyAl
For a while now I've been noticing that although there is plenty of work available, I rarely get any. My hosts are asking for work, but just get "project has no tasks available". Then suddenly I'll get a few dozen WUs across my hosts, and then nothing for days. Is this deliberate behaviour, or is there something amiss? Cheers, Al. |
|
16)
Message boards :
Number crunching :
New task mostly resulting in computation errors
(Message 2005)
Posted 405 days ago by ChertseyAl
Not 'mostly' but ALL - Not one single successful WU here, but 932 errors! Al. |
|
17)
Message boards :
Number crunching :
Not getting any WU
(Message 1925)
Posted 532 days ago by ChertseyAl
I checked the server status and it shows that there are WU's available. The front page shows no work available ... Current Jobs None ... And the server status page at http://mindmodeling.org/beta/server_status.php is not properly configured, so does not show anything (this needs fixing admins!). Not sure where you are seeing that there is work available :) Al. |
|
18)
Message boards :
Number crunching :
Downloads hanging up.
(Message 1855)
Posted 754 days ago by ChertseyAl
Will abort tomorrow if it is not done With runtimes measured in milliseconds, hopefully it will finish on time ;) My longest WU ran for 0.09 seconds :) Al. |
|
19)
Message boards :
Number crunching :
No Credit
(Message 1821)
Posted 813 days ago by ChertseyAl
I only got one WU that apparently ran in 0.05 seconds ;) Maybe it took longer, but perhaps the old checkpointing bug is still there? Al. |
|
20)
Questions and Answers :
Windows :
replace file error, just started in the last few days
(Message 1049)
Posted 1722 days ago by ChertseyAl
I don't micromanage my hosts, so that one could have been sitting there for hours/days doing nothing :( Sadly, today, every host that I run MM on has been affected by this. The only reason that I noticed was that as soon as I VNC'd into my hosts to open up SIMAP they were all showing the dreaded dailog box. I've no doubt that running MM as the only project doesn't show this problem (clue: it only happens when resuming after being checkpointed out), but I have minimal computing power and have to share it around many projects. I can't afford to leave hosts dead waiting for user input when they have no KVM :( I'm afraid I'll have to drop this project for a while, or maybe just leave one host crunching only MM - Do you have enough work to keep this host running? Al. |