[error] No close tag in scheduler reply

log in

Advanced search

Message boards : Number crunching : [error] No close tag in scheduler reply

1 · 2 · Next
Author Message
frankhagen
Send message
Joined: 22 Oct 08
Posts: 21
Credit: 28,287
RAC: 3
Message 1968 - Posted: 6 Mar 2012, 21:12:14 UTC

06.03.2012 22:02:43 | MindModeling@Beta | update requested by user
06.03.2012 22:02:47 | MindModeling@Beta | Sending scheduler request: Requested by user.
06.03.2012 22:02:47 | MindModeling@Beta | Requesting new tasks for CPU
06.03.2012 22:02:49 | MindModeling@Beta | [error] No close tag in scheduler reply

zombie67 [MM]
Volunteer tester
Avatar
Send message
Joined: 25 Jan 08
Posts: 85
Credit: 1,111,054
RAC: 1
Message 1969 - Posted: 7 Mar 2012, 16:43:51 UTC

Yeah. I cannot return a (failed) task. It's been setting in my queue for a day or two.
____________
Dublin, CA
Team SETI.USA

frankhagen
Send message
Joined: 22 Oct 08
Posts: 21
Credit: 28,287
RAC: 3
Message 1970 - Posted: 7 Mar 2012, 17:01:11 UTC - in response to Message 1969.

Yeah. I cannot return a (failed) task. It's been setting in my queue for a day or two.


i bet it's affecting 7.x clients again.. :(

Profile Tom
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 23 Jun 08
Posts: 311
Credit: 105,388
RAC: 0
Message 1971 - Posted: 7 Mar 2012, 17:56:46 UTC - in response to Message 1970.
Last modified: 7 Mar 2012, 17:58:54 UTC

Yeah. I cannot return a (failed) task. It's been setting in my queue for a day or two.


i bet it's affecting 7.x clients again.. :(


I'm sorry, what bug are you referring to?

And Zombie, which task is in a waiting queue? Do you have an error message or a link to the workunit?

frankhagen
Send message
Joined: 22 Oct 08
Posts: 21
Credit: 28,287
RAC: 3
Message 1972 - Posted: 7 Mar 2012, 19:37:53 UTC - in response to Message 1971.

you did subscribe there http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev and read what's going on?

zombie67 [MM]
Volunteer tester
Avatar
Send message
Joined: 25 Jan 08
Posts: 85
Credit: 1,111,054
RAC: 1
Message 1977 - Posted: 7 Mar 2012, 22:15:14 UTC - in response to Message 1971.
Last modified: 7 Mar 2012, 22:19:07 UTC

And Zombie, which task is in a waiting queue? Do you have an error message or a link to the workunit?


I would post a link to the task, but I can't. According to my tasks lists, all have been returned. But according to BOINC, it is still sitting on my machine, waiting to be reported.

3/7/2012 2:14:27 PM | MindModeling@Beta | Sending scheduler request: Requested by user.
3/7/2012 2:14:27 PM | MindModeling@Beta | Reporting 1 completed tasks, not requesting new tasks
3/7/2012 2:14:27 PM | MindModeling@Beta | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
3/7/2012 2:14:27 PM | MindModeling@Beta | [sched_op] NVIDIA work request: 0.00 seconds; 0.00 devices
3/7/2012 2:14:27 PM | MindModeling@Beta | [sched_op] ATI work request: 0.00 seconds; 0.00 devices
3/7/2012 2:14:32 PM | MindModeling@Beta | [error] No close tag in scheduler reply
3/7/2012 2:14:32 PM | MindModeling@Beta | [sched_op] Deferring communication for 1 min 19 sec
3/7/2012 2:14:32 PM | MindModeling@Beta | [sched_op] Reason: can't parse scheduler reply

This happened on a win 64 7.0.18 machine. I just upgraded it to 7.0.20, but no change in behavior.

Edit: I looked through the task list on the web site, and figured it out it is this one. I had to abort it after 4 hours, because it was one of those run-forever tasks.

http://mindmodeling.org/beta/result.php?resultid=252844
____________
Dublin, CA
Team SETI.USA

Gary Wilson
Send message
Joined: 25 Nov 08
Posts: 50
Credit: 949,716
RAC: 3,817
Message 1978 - Posted: 8 Mar 2012, 3:21:42 UTC - in response to Message 1977.

Seem to be getting a number of tasks that are getting errors on multiple machines like this one:

http://mindmodeling.org/beta/workunit.php?wuid=213877

Mark Gallaher
Send message
Joined: 1 Feb 08
Posts: 3
Credit: 57,334
RAC: 4
Message 1979 - Posted: 8 Mar 2012, 6:17:41 UTC

I am running Windows 7 Home Premium x64 on a few I7 systems. Some tasks run a minute or so and finish ok.

Several other tasks had hours of runtime, but about a minute of CPU time. They clogged up two machines of the three. Not uncommon was CPU time of 9 seconds but a run time of (forever), I aborted those.

I am running BOINC 6.10.60 on all machines. Is my BOINC client version too out of date for these tasks? I'm sticking on that version due to another project for the next ... year or so.

If there is any other info I can give let me know. I work so I cannot run this project anymore during the day but I can run some WU's at night when I can keep an eye on things.

My machines are fully Windows patched but otherwise pretty stock.

Hopefully we can figure out what is happening with these problem WU's :)

Gary Wilson
Send message
Joined: 25 Nov 08
Posts: 50
Credit: 949,716
RAC: 3,817
Message 1982 - Posted: 8 Mar 2012, 13:46:53 UTC - in response to Message 1979.

That's a known issue on this set of tasks with Win 7 64-bit. Not sure about other versions of windows.

However, as I noted above, there are a lot of tasks that have errored out for other reasons on my Windows and Linux machines and many of those that errored have gone 4 strikes and your out (failed on 4 machines so task was failed and not reissued). Oddly, my MAC seems to not have any problems with any of the tasks and most of the ones still left in its queue are ones that have failed a number of times on other machines. So at least these tasks will probably finish then and not be useless to the project.

Profile Tom
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 23 Jun 08
Posts: 311
Credit: 105,388
RAC: 0
Message 1985 - Posted: 8 Mar 2012, 17:14:55 UTC - in response to Message 1982.

That's a known issue on this set of tasks with Win 7 64-bit. Not sure about other versions of windows.

However, as I noted above, there are a lot of tasks that have errored out for other reasons on my Windows and Linux machines and many of those that errored have gone 4 strikes and your out (failed on 4 machines so task was failed and not reissued). Oddly, my MAC seems to not have any problems with any of the tasks and most of the ones still left in its queue are ones that have failed a number of times on other machines. So at least these tasks will probably finish then and not be useless to the project.


Correct, although we have reduced the frequency of "hung" tasks on Win 7 64-bit, we haven't eliminated them completely. Mark, how often is this afflicting your machine?

Also, one of the new errors is occurring because our model is trying to allocate more memory than is allowed by our CCL application. Specifically, the error you will see is:

# Error message: value 16777216 is not of the expected type (UNSIGNED-BYTE 24).


We'll need to adjust the memory options of our application/model.

Trog Dog
Send message
Joined: 5 Jan 08
Posts: 3
Credit: 31,794
RAC: 42
Message 1988 - Posted: 11 Mar 2012, 1:43:19 UTC

silverbox

2208 MindModeling@Beta 11/03/2012 9:08:44 AM Sending scheduler request: To report completed tasks.
2209 MindModeling@Beta 11/03/2012 9:08:44 AM Reporting 1 completed tasks, not requesting new tasks
2210 MindModeling@Beta 11/03/2012 9:08:48 AM [error] No close tag in scheduler reply
2211 MindModeling@Beta 11/03/2012 9:13:21 AM Sending scheduler request: To report completed tasks.
2212 MindModeling@Beta 11/03/2012 9:13:21 AM Reporting 1 completed tasks, not requesting new tasks
2213 MindModeling@Beta 11/03/2012 9:13:25 AM [error] No close tag in scheduler reply
2217 MindModeling@Beta 11/03/2012 9:23:32 AM Sending scheduler request: To report completed tasks.
2218 MindModeling@Beta 11/03/2012 9:23:32 AM Reporting 1 completed tasks, not requesting new tasks
2219 MindModeling@Beta 11/03/2012 9:23:35 AM [error] No close tag in scheduler reply


Are you doing anything about this error - it's not just trivial it's preventing results from being reported
____________
CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1

Profile Jack.Harris
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 24 Apr 07
Posts: 499
Credit: 636,587
RAC: 564
Message 1989 - Posted: 11 Mar 2012, 16:56:45 UTC - in response to Message 1972.
Last modified: 11 Mar 2012, 16:57:43 UTC

The only mention of 'No close tag in' in the list serve archives came from Travis back in March 2011.

However, the 'No close tag' in this case does not seem to be related to using a very old version of boinc server code.

Instead, it appears as though there is an incompatibility in the 7.x clients (Development version) and the latest boinc_stable server code.

I suppose the best way to figure out which were things went awry is to find which version of the client started causing this problem. With that information it would be possible to identify 'what changed' in the client source code and feed the fix to Rom for future Development version.

I am aware of a new xml parser that was integrated in some parts of the BOINC back in January which could be related to these issues.

Thanks to Zombie we know 7.0.18 and 7.0.20 have this issue.

Older versions of the development client can be found here http://boinc.berkeley.edu/dl/

Of course, the 'Recommended version' of the BOINC client (6.12.34) is what MindModeling officially supports, but if we know exactly where in the Development versioning histroy this problem started, we will help work to resolve this issue.
____________
MindModeling@Home is fun

Trog Dog
Send message
Joined: 5 Jan 08
Posts: 3
Credit: 31,794
RAC: 42
Message 1990 - Posted: 13 Mar 2012, 11:14:48 UTC - in response to Message 1989.

Of course, the 'Recommended version' of the BOINC client (6.12.34) is what MindModeling officially supports, but if we know exactly where in the Development versioning histroy this problem started, we will help work to resolve this issue.


No probs, however, more and more projects are requiring a 7.0.x client so it will become more of a problem. I will start downgrading through each of the testing versions starting at 7.0.17 and report back.
____________
CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1

Profile Jack.Harris
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 24 Apr 07
Posts: 499
Credit: 636,587
RAC: 564
Message 1991 - Posted: 13 Mar 2012, 11:35:31 UTC - in response to Message 1990.
Last modified: 13 Mar 2012, 11:35:55 UTC

Trog Dog -- let me just say -- 'you are the man'.

You may want to divide an conquer (do a binary search of the versions) -- test a middle one first (7.10ish) or so and narrow things down from there based on whether it worked or not

We really appreciate the support and hopefully we can figure out this 7.X issue soon.
____________
MindModeling@Home is fun

Trog Dog
Send message
Joined: 5 Jan 08
Posts: 3
Credit: 31,794
RAC: 42
Message 1993 - Posted: 13 Mar 2012, 14:57:41 UTC - in response to Message 1991.

ok just checked out & compiled every 7.0.x tag and the no close tag error is there in everyone. I have one box running 6.12.42 which is ok so it looks like its somewhere between changset 24473 & 24699
____________
CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1

Profile Jack.Harris
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 24 Apr 07
Posts: 499
Credit: 636,587
RAC: 564
Message 1994 - Posted: 13 Mar 2012, 15:19:19 UTC - in response to Message 1993.

Wow! Fabulous Work.

we'll start with the earliest version of 7 and start diff'ing from there

Thanks!!



____________
MindModeling@Home is fun

Profile Jack.Harris
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 24 Apr 07
Posts: 499
Credit: 636,587
RAC: 564
Message 1995 - Posted: 13 Mar 2012, 18:47:34 UTC - in response to Message 1994.
Last modified: 13 Mar 2012, 19:28:19 UTC

Well Trog Dog - thanks for the help
I did a comparison between the 7.0 and 6.12.35 source code and found that 7.0 was using a different (stricter / more correct) XML parser

From that we were able to identify which XML from the scheduler was causing issues.
It turned out to be related to the skinning configuration having a bad closing tag.

Anyway -- 7.X clients should be working now
____________
MindModeling@Home is fun

frankhagen
Send message
Joined: 22 Oct 08
Posts: 21
Credit: 28,287
RAC: 3
Message 1997 - Posted: 13 Mar 2012, 20:31:46 UTC - in response to Message 1995.
Last modified: 13 Mar 2012, 20:32:46 UTC

Well Trog Dog - thanks for the help
I did a comparison between the 7.0 and 6.12.35 source code and found that 7.0 was using a different (stricter / more correct) XML parser

From that we were able to identify which XML from the scheduler was causing issues.
It turned out to be related to the skinning configuration having a bad closing tag.

Anyway -- 7.X clients should be working now


only took you 6 days after i told you it's affecting 7.x clients - pretty sportive.

i also told you, that credtinew will fry you with your setup of wu's.
are you watching what's happening to credits?

Profile Jack.Harris
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 24 Apr 07
Posts: 499
Credit: 636,587
RAC: 564
Message 1998 - Posted: 14 Mar 2012, 14:45:50 UTC - in response to Message 1970.

Thanks -- you were right -- it was 7.X related.

As far as credit -- While looking through the DB, I have not found a credit inconsistency due to the this issue.

If this is not the case, just let us know the result (e.g., link) where credit should have been assigned and wasn't and we will see what we can do.


____________
MindModeling@Home is fun

frankhagen
Send message
Joined: 22 Oct 08
Posts: 21
Credit: 28,287
RAC: 3
Message 1999 - Posted: 14 Mar 2012, 19:28:56 UTC - in response to Message 1998.

Thanks -- you were right -- it was 7.X related.


only you got access to the server-code you are running.

boinc is not XML-standard compatible - this has fouled up a lot of projects over the years if they changed something - several times - go ask google.

it might even be simply a missing CR/LF which causes 7.x clients to be unable to parse the sheduler reply.

if you stick to supporting only 6.x stable, you'll be out of business soon because more and more projects require 7.x clients.


1 · 2 · Next
Post to thread

Message boards : Number crunching : [error] No close tag in scheduler reply


Main page · Your account · Message boards


Copyright © 2013 MindModeling.org