log in

Advanced search

Message boards : Number crunching : One WU says it will take 4+ days to complete - normal time is < 45 minutes

1 · 2 · Next
Author Message
David Ball
Send message
Joined: 9 Dec 12
Posts: 6
Credit: 141,068
RAC: 0
Message 3623 - Posted: 18 Jan 2016, 18:57:47 UTC
Last modified: 18 Jan 2016, 18:57:47 UTC

I have a workunit which has run for 2 hours 23 minutes and is only 2% complete. It says it will take 4 days and 21 hours to complete.

https://mindmodeling.org/workunit.php?wuid=20966706

I have another which has run for 25 minutes but is also still at 2%. Should I abort these workunits?

PRL
Send message
Joined: 2 Aug 09
Posts: 4
Credit: 82,326
RAC: 61
Message 3624 - Posted: 18 Jan 2016, 19:08:41 UTC
Last modified: 18 Jan 2016, 19:08:41 UTC

I have had a few of those recently. As far as I can tell, with only six hours to complete the task, they do not finish on time and are invalidated. All I see is them disappearing from BOINC, having wasted processing time.
I am not accepting new tasks until this is resolved.

Profile Reeltime
Send message
Joined: 31 Jan 08
Posts: 16
Credit: 122,275
RAC: 0
Message 3632 - Posted: 21 Jan 2016, 8:35:31 UTC
Last modified: 21 Jan 2016, 8:35:31 UTC

It's because they sit at 2% for the whole crunching time. I've found that if you start a unit you will get credit for it, even if it is past the deadline. Generally units are taking up to 2.5 hours on my machine

PRL
Send message
Joined: 2 Aug 09
Posts: 4
Credit: 82,326
RAC: 61
Message 3633 - Posted: 21 Jan 2016, 16:44:43 UTC - in response to Message 3632.
Last modified: 21 Jan 2016, 16:44:43 UTC

As far as I can tell, once a work unit is past the deadline, it is re-sent to somebody else. Even if you have spent time processing it, whoever finishes first gets the credit.

Six hours seems rather short.

tferr
Send message
Joined: 19 Nov 15
Posts: 3
Credit: 919,696
RAC: 918
Message 3635 - Posted: 21 Jan 2016, 19:36:39 UTC
Last modified: 21 Jan 2016, 19:36:39 UTC

My machine has rocked the WUs in around an hour for all that ive done in the past few months and now all of the sudden they will not complete. I even let a few run for over a day and they did not complete. I will be suspending all Mindmodeling WUs until this issue is resolved

Profile Brandon
Project administrator
Project developer
Project tester
Avatar
Send message
Joined: 5 Jan 15
Posts: 261
Credit: 1,456,117
RAC: 0
Message 3636 - Posted: 22 Jan 2016, 14:07:00 UTC
Last modified: 22 Jan 2016, 14:07:00 UTC

Thanks for letting me know guys.

I will talk a look at this.

Reeltime it looks like your work units are returning just fine.

PRL I couldn't see any of your recent work units.

David Ball All the work units you ran yesterday validated and were complete so I would try to run a few more and see if you run into an issue.

tferr I have seen your work units that are the issue and will take a look them.

Thanks for supporting us and happy crunching,

Brandon

Profile Brandon
Project administrator
Project developer
Project tester
Avatar
Send message
Joined: 5 Jan 15
Posts: 261
Credit: 1,456,117
RAC: 0
Message 3640 - Posted: 22 Jan 2016, 15:50:56 UTC
Last modified: 22 Jan 2016, 15:50:56 UTC

Also, the python app currently is not reporting back updates for completion percentages.

So if you work unit is stuck at 2% don't stop it, it is still working properly.

Profile Brandon
Project administrator
Project developer
Project tester
Avatar
Send message
Joined: 5 Jan 15
Posts: 261
Credit: 1,456,117
RAC: 0
Message 3641 - Posted: 22 Jan 2016, 15:57:47 UTC
Last modified: 22 Jan 2016, 15:57:47 UTC

The job that was causing issues was suspended and the issue is being resolved.

The available work units are from a different job and should have no more problems getting completed on time.

PRL
Send message
Joined: 2 Aug 09
Posts: 4
Credit: 82,326
RAC: 61
Message 3646 - Posted: 24 Jan 2016, 2:55:52 UTC - in response to Message 3636.
Last modified: 24 Jan 2016, 2:56:02 UTC

Brandon

Thanks for your help. Given your other comments, I will give it another try.

Profile Brandon
Project administrator
Project developer
Project tester
Avatar
Send message
Joined: 5 Jan 15
Posts: 261
Credit: 1,456,117
RAC: 0
Message 3648 - Posted: 25 Jan 2016, 2:15:01 UTC
Last modified: 25 Jan 2016, 2:15:01 UTC

Awesome!! Let me know if you run into any other issues.

tferr
Send message
Joined: 19 Nov 15
Posts: 3
Credit: 919,696
RAC: 918
Message 3649 - Posted: 25 Jan 2016, 5:06:13 UTC
Last modified: 25 Jan 2016, 5:06:13 UTC

Wonderful I will keep up the crunching

marmot
Send message
Joined: 6 Dec 15
Posts: 12
Credit: 2,071,787
RAC: 0
Message 3654 - Posted: 25 Jan 2016, 22:35:02 UTC
Last modified: 25 Jan 2016, 22:35:02 UTC

Yeah, I just aborted a WU like this. It was suspended in RAM after the machine ran out of all available memory (Swap+Physical) and I think it's related to the growing memory issue.

Gary Wilson
Send message
Joined: 25 Nov 08
Posts: 56
Credit: 3,088,536
RAC: 907
Message 3656 - Posted: 26 Jan 2016, 2:31:56 UTC
Last modified: 26 Jan 2016, 2:31:56 UTC

The 2.5 hour timeout was actually put in probably more than a year ago as that has long been an issue on Windows machines. So the task isn't really finishing all the way, it's just hitting a self imposed time limit to keep them from running forever. Just let them run. They will all stop at 2.5 hours anyway and you get credit for them.

Profile Brandon
Project administrator
Project developer
Project tester
Avatar
Send message
Joined: 5 Jan 15
Posts: 261
Credit: 1,456,117
RAC: 0
Message 3657 - Posted: 26 Jan 2016, 2:37:18 UTC
Last modified: 26 Jan 2016, 2:37:18 UTC

Hello marmot,

I have sent an email to the modeler to verify these results.

After running a number of work units from this job through my system, each task is using about 1 GB of ram for its working directory.

This seems to be the norm for this job.

I'll update again if I have any more information.

Thanks for supporting us and happy crunching,

Brandon

Atis Kozulis
Send message
Joined: 28 Feb 14
Posts: 1
Credit: 387
RAC: 0
Message 3663 - Posted: 29 Jan 2016, 13:42:24 UTC
Last modified: 29 Jan 2016, 13:42:24 UTC

FYI: Got 3 tasks, two of which 'stuck' at those 2% and they are:
Native Python v2.7 Application (Windows Only) 1.10
- MindModeling_4972_1652956ab483fba0dc (estimated to complete (rising) 4days; ran for 2h already)
- MindModeling_4979_1653256ab496b90072 (same estimate)
(I'll let them 'expire' if you wish).

Third is about to complete in 3-3.5h (41% already) ... hmm or not (ran for 1h39m and 'stuck' at those 41.308%) :S
- MindModeling_4979_1653256ab5001c14f8

Task manager shows 3 python2.7.exe processes running full core each; RAM: 416'104K, 2x 23'684K respectively. Maybe it will help diagnose a problem. (RAM consumption doesn't increase)

Regards,
Atis

P.S. Suspending new job requests for now.

Profile Brandon
Project administrator
Project developer
Project tester
Avatar
Send message
Joined: 5 Jan 15
Posts: 261
Credit: 1,456,117
RAC: 0
Message 3664 - Posted: 29 Jan 2016, 14:23:09 UTC
Last modified: 29 Jan 2016, 14:23:09 UTC

Thanks for letting me though I'll take a look at this

Profile Brandon
Project administrator
Project developer
Project tester
Avatar
Send message
Joined: 5 Jan 15
Posts: 261
Credit: 1,456,117
RAC: 0
Message 3665 - Posted: 29 Jan 2016, 16:05:26 UTC
Last modified: 29 Jan 2016, 16:05:26 UTC

Hey Atis,

Can you please post a link to a few of your work units that are giving you issues?

Thanks for supporting us and happy crunching,

Brandon

marmot
Send message
Joined: 6 Dec 15
Posts: 12
Credit: 2,071,787
RAC: 0
Message 3677 - Posted: 7 Feb 2016, 23:44:17 UTC - in response to Message 3656.
Last modified: 7 Feb 2016, 23:46:38 UTC

The 2.5 hour timeout was actually put in probably more than a year ago as that has long been an issue on Windows machines. So the task isn't really finishing all the way, it's just hitting a self imposed time limit to keep them from running forever. Just let them run. They will all stop at 2.5 hours anyway and you get credit for them.



But this limit isn't working.

I have a WU at 3 hours 31 minutes and it reports 7 days remaining.

The server gives a timed out message and even after clicking update on the local machine the work unit isn't being completed or server-side cancelled. I will have to manually abort this WU.

Here my machine is still working on it and it's already been given to someone else's machine:
http://mindmodeling.org//workunit.php?wuid=21923780

tferr
Send message
Joined: 19 Nov 15
Posts: 3
Credit: 919,696
RAC: 918
Message 3722 - Posted: 4 Apr 2016, 16:58:18 UTC
Last modified: 4 Apr 2016, 16:58:18 UTC

Well it appears this may be happening again. I believe they do end up being solved around the two and a half hour mark regardless of what it says but something to note.

Just giving a heads up

Thanks guys!

Profile Brandon
Project administrator
Project developer
Project tester
Avatar
Send message
Joined: 5 Jan 15
Posts: 261
Credit: 1,456,117
RAC: 0
Message 3727 - Posted: 6 Apr 2016, 13:52:41 UTC
Last modified: 6 Apr 2016, 13:52:41 UTC

Hello tferr,

Thanks for the heads up.

Thanks for your support and happy crunching,

Brandon

1 · 2 · Next

Message boards : Number crunching : One WU says it will take 4+ days to complete - normal time is < 45 minutes


Main page · Your account · Message boards


Copyright © 2018 MindModeling.org