Error No Complex Attribute For Threshold Np_load_avg
Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Server Fault Questions Tags Users Badges Unanswered Ask Question _ Server Fault is a question and answer site for system and network administrators. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top OGE no value for load_avg up vote 1 down vote favorite There is a problem with my OGE configuration. The load_avg for the nodes does not get set (remains at -NA-). Because of this and because of the np_load_avg threshold on the queue no jobs are being run. [ce@node1 ce]$ qhost -F -l h=node2 HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- node2 - - - - - - - No errors pop up in default/spool/localhost/messages nor in qmaster/messages. The queue scheduling message is 'no value for complex attribute np_load_avg'. I do not see any indications as to what could be going wrong, the following works on the execution node: gethostname gethostbyname master qstat -f loadcheck linux centos gridengine share|improve this question asked Sep 9 '14 at 7:55 Adversus 1163 add a comment| 1 Answer 1 active oldest votes up vote 1 down vote The problem was in my /etc/hosts file, I had: 127.0.0.1 node2 this had to become: 10.0.0.2 node2 Finally giving me [ce@node1 ce]$ qhost -F -l h=node2 HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- node2 linux-x64 8 0.00 31.3G 308.8M 11.9G 0.0 and [ce@node2 ce]# utilbin/linux-x64/gethostname Hostname: node2 Aliases: Host Address(es): 10.0.0.2
work on Linux/PPC64 Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] Strange. I've never seen that "error: no complex attribute for threshold np_load_avg" error before in qstat output. I don't think it is a communication problem. If that was the case your queue instances would be state "au" with the (u) meaning 'unreachable'. Also when SGE does not hear from a compute node after a timeout threshold than those "-NA-" fields should go back to the builtin safety default of "99.99". We may have to wait on someone more familiar with the source code to divine root causes behind that error message, if that fails the sge dev list http://serverfault.com/questions/627180/oge-no-value-for-load-avg may have more insight. -Chris On Sep 25, 2008, at 5:45 PM, Nick Tan wrote: > Hi Chris, > > I ran utilbin/loadcheck and got this: > > arch lx26-ppc64 > num_proc 2 > load_short 0.00 > load_medium 0.00 > load_long 0.00 > mem_free 4130.062500M > swap_free 2047.992188M > virtual_free 6178.054688M > mem_total 4363.109375M > swap_total 2047.992188M > virtual_total 6411.101562M > mem_used 233.046875M > swap_used 0.000000M > virtual_used 233.046875M > cpu 0.3% > > It looks like it can http://arc.liv.ac.uk/pipermail/gridengine-users/2008-September/020456.html collect the data so would that indicate a > communication error then? > > Thanks, > > Nick > > Chris Dagdigian wrote: >> Hi Nick, >> I'm guessing that maybe the PDC part of SGE on your ppc systems is >> unable to poll the apple nodes to get load and state status. >> Can you try the following? >> Run the utilbin/loadcheck program on your PPC systems and see what >> comes back? >> Running it on my OS X intel macbook pro returns: >>> $ /opt/sge/utilbin/darwin-x86/loadcheck >>> arch darwin-x86 >>> num_proc 2 >>> load_short 1.35 >>> load_medium 1.37 >>> load_long 1.39 >>> mem_free 2044.082031M >>> swap_free 0.000000M >>> virtual_free 2044.082031M >>> mem_total 4096.000000M >>> swap_total 0.000000M >>> virtual_total 4096.000000M >>> mem_used 2051.917969M >>> swap_used 0.000000M >>> virtual_used 2051.917969M >>> cpu 45.5% >> If you can't find the equiv for your PPC/Linux setup then I think >> that may be the issue (SGE is running but can't collect local >> performance data) >> Regards, >> Chris >> On Sep 25, 2008, at 2:26 AM, Nick Tan wrote: >>> Hi all, >>> >>> I am setting up a cluster with 33 nodes running Linux on x86_64 >>> (SunFire X2100) and 40 nodes running Linux on ppc64 (Apple Xserve >>> G5 cluster node). >>> >>> I am using the precompiled SGE binaries for the x86_64 nodes which >>> are working fine. I have comp
and other SourcesProgramming APIsGrid Engine InternalsGrid Engine DocumentationOther Stories Using Open Containers with runc in a Univa Grid Engine Compute Cluster (2015-06-28) runc is a tool http://gridengine.eu/grid-engine-internals written in Go which is creating and starting up a Linux container https://www.cfa.harvard.edu/twpub/HPC/HPCGEManPages/queue_conf.5.html according to the OCF specification. Its source code repository can be found here. If you have a Go development environment then building it is very simple - just follow the instructions in the README (they are using a Makefile which internally calls godep, the standard tool error no for handling package dependencies in Go / probably you need to install it as well). After installing the single runc binary you are able to startup containers right on the command line by pointing runc to a JSON description of the container. The container itself obviously also needs to be on the file system in order to chroot to error no complex it (which is done by runc). One major difference to Docker itself is that it does not do any kind of image management, but probably this is not required in case you have a good shared filesystem. How to use runc in Univa Grid Engine After runc is verified to run on command line it is time to use it under the control of Univa Grid Engine in order to exploit your compute clusters resources. The integration can be very straight forward depended in what you want to achieve. I keep it here as simple as possible. First of all you want to submit the container described by the Open Container Format (OCF) as JSON description to Univa Grid Engine and probably also use the resource management system of Grid Engine for handling cgroups and other limitations. This is possible since all container processes are children of runc - no daemon here is in play. In order to setup running runc you can override the starter_method in the Univa Grid Engine queue configurati
format of the template file for the cluster queue configuration. Via the -aq and -mq options of the command, you can add cluster queues and modify the configuration of any queue in the cluster. Any of these change operations can be rejected, as a result of a failed integrity verification. The queue configuration parameters take as values strings, integer decimal numbers or boolean, time and memory specifiers (see time_specifier and memory_specifier in as well as comma separated lists. Note, Grid Engine allows backslashes (\) be used to escape newline (\newline) characters. The backslash and the newline are replaced with a space (" ") character before any interpretation. FORMAT The following list of parameters specifies the queue configuration file content: qname The name of the cluster queue as defined for queue_name in As template default "template" is used. hostlist A list of host identifiers as defined for host_identifier in For each host Grid Engine maintains a queue instance for running jobs on that particular host. Large amounts of hosts can easily be managed by using host groups rather than by single host names. As list separators white-spaces and "," can be used. (template default: NONE). If more than one host is specified it can be desirable to specify divergences with the further below parameter settings for certain hosts. These divergences can be expressed using the enhanced queue configuration specifier syntax. This syntax builds upon the regular parameter specifier syntax separately for each parameter: "["host_identifier=