[logo] 
 
Home
News
Activity
About/Contact
Major Tools
  Dinotrace
  Verilator
  Verilog-mode
  Verilog-Perl
Other Tools
  BugVise
  CovVise
  Force-Gate-Sim
  Gspice
  IPC::Locker
  Rsvn
  Schedule::Load
  SVN::S4
  Synopsys-modes
  SystemPerl
  Verilog-Pli
  Voneline
  Vregs
General Info
  Papers


rschedule

NAME

rschedule - User interface for Perl Schedule::Load configuration and status

SYNOPSIS

rschedule [ --help ] [ --port=port ] [ --dhost=host ] [ --version ]

rschedule top rtop

rschedule hosts rhosts

rschedule loads rloads

rschedule holds

rschedule status

rschedule [ --host=host ] reserve

rschedule [ --host=host ] release

rschedule [ --host=host ] allow_class class

rschedule [ --host=host ] deny_class class

rschedule [ --host=host ] set_const var=value

rschedule [ --host=host ] set_stored var=value

rschedule --class=class best

rschedule --class=class best_or_none

rschedule [ --load=load ] fixed_load pid

DESCRIPTION

rschedule will report or set status for load distribution using the Perl Schedule::Load package.

If symbolically linked to the name "rtop" rschedule will by default produce a listing of each host and the top loads on those hosts. Similarly, a link to "rhosts" will show the host report by default, and a link to "rloads" will show the load report.

COMMANDS

allow_class <class>

Sets the hostname to allow the specified class of jobs. This sticks across reboots.

best

Returns the best host for a new job.

best_or_none

Returns the best host if there are free CPUs laying around, else fails.

classes

Displays a listing of the classes of jobs each host can run.

cmnd_comment <pid>

Sets the command comment for the given process ID. In rschedule top (rtop) displays, this will be shown rather than the name of the command. Command comments are inherited by children of commented parents.

deny_class <class>

Sets the hostname to deny the specified class of jobs.

fixed_load <pid>

Sets the given process ID to have that process count as one host load, even if it is using less CPU time than that due to high disk activity or other sleeps.

holds

Displays a listing of jobs that are blocked waiting for resources.

hosts

Displays a listing of each host being monitored along with its load and system type.

hostnames

Displays list of each hostname. Multi-CPU hosts appear once.

idle_host_names

Displays list of each idle CPU. Multi-CPU hosts appear multiple times.

loads

Displays a longer command line of top jobs, along with any fixed_load jobs.

release

Releases a host from dedicated use. Use --host to specify which host.

Any person may release a host, not just the original user requesting the reservation. You may also use "release reserve" together to change an existing reservation.

reserve

Reserves a host for dedicated use. To be reservable the reservable flag must be set when that host's slreportd is invocated. This is indicated on the top report by a "R" in the column next to the command. To override a existing reservation you need to release the reservation first. Use --host to specify which host. A optional --comment specifies the reservation comment; the default time and user will be prepended unless a leading - is used.

set_const var=value

Sets a constant reporter parameter to the specified value. Slreportd will loose the information when rebooted, so this should only be used to avoid restarting the daemon after changing the slreportd's boot flags.

set_stored var=value

Sets a stored reporter parameter to the specified value. Slreportd will keep the information when rebooted, and override any set_const setting.

sleep secs

For debugging only, sleep the specified number of seconds.

status

Displays a listing of each host and its daemon's status. Intended only for debugging problems with the scheduler.

top

Displays a listing of top processes across all hosts being monitored.

ARGUMENTS

--allow-reserved
--no-allow-reserved

Specifies if reserved hosts may be returned by the best, best_or_none, hostnames, idle_host_names, and jobs commands.

--class <class>

Specifies the job class for the best, best_or_none, hostnames, idle_host_names, and jobs commands.

--comment <comment>

Specifies the command comment for the cmnd_comment command.

--dhost <hostname>

Specifies the host name that slchoosed uses. May be specified multiple times to specify backup hosts. Defaults to SLCHOOSED_HOST environment variable, which contains colon separated host names.

--help

Displays this message and program version and exits.

--kill <signal>

With the "loads" command, convert the listing to a form that will login to the host and kill the processes. With a argument, use the argument as the signal name.

--load <load>

Specifies the load value for the fixed_load command, defaults to 1.

--port <portnumber>

Specifies the port number that slchoosed uses.

--similar

Specifies only machines with the same OS version as the current host should be returned for the best, best_or_none, hostnames, idle_host_names, and jobs commands.

--version

Displays program version and exits.

DISTRIBUTION

The latest version is available from CPAN and from http://www.veripool.org/.

Copyright 1998-2011 by Wilson Snyder. This package is free software; you can redistribute it and/or modify it under the terms of either the GNU Lesser General Public License Version 3 or the Perl Artistic License Version 2.0.

SEE ALSO

Schedule::Load

AUTHORS

Wilson Snyder <wsnyder@wsnyder.org>


slchoosed

NAME

slchoosed - Distributed load chooser for Perl Schedule::Load

SYNOPSIS

slchoosed [ --help ] [ --port=port ] [ --dhost=host ] [ --version ]

DESCRIPTION

slchoosed will start a daemon to choose machines for the Schedule::Load package. Slchoosed creates two processes, so that if second process exits, the first may restart it automatically.

slchoosed is run on one host in the network. This host is specified in the SLCHOOSED_HOST environment variable, which may also specify additional cold standby hosts in case the first host goes down. Slchoosed collects connections from the slreportd reporters, and maintains a internal database of the entire network. User clients also connect to the chooser, which then gets updated information from the reporters, and returns the information to the user client. As the chooser has the entire network state, it can also choose the best host across all CPUs in the network.

It will take 30-60 seconds for the reporting hosts to be rediscovered when the chooser first starts.

ARGUMENTS

--help

Displays this message and program version and exits.

--dhost

Specifies the daemon host name that slchoosed uses. May be specified multiple times to specify backup hosts. Defaults to SLCHOOSED_HOST environment variable, which contains colon separated host names. When slchoosed starts, any hosts listed AFTER the current host are assumed to be backup hosts, and are sent a reset so that this host may takeover the choosing task.

--nofork

For debugging, prevents the daemon from creating additional processes and from going into the background. This allows messages to appear on stdout, and ctrl-C to stop the daemon.

--port

Specifies the port number that slchoosed uses. Defaults to SLCHOOSED_PORT environment variable or slchoosed service, or 1752.

--version

Displays program version and exits.

SEE ALSO

slchoosed_watchd, Schedule::Load, Schedule::Load::Chooser


slchoosed_watchd

NAME

slchoosed_watchd - Make sure the slchoosed stays up

SYNOPSIS

slchoosed_watchd [ --help ]

DESCRIPTION

slchoosed_watchd will periodically ask the slchoosed server for information, and if it does not respond, restart it. This is rarely needed, as slchoosed is fairly standard, but provides another level of assurance for critical applications.

ARGUMENTS

--help

Displays this message and program version and exits.

--nofork

For debugging, prevents the daemon from creating additional processes and from going into the background. This allows messages to appear on stdout, and ctrl-C to stop the daemon.

--period secs

Specify the period in seconds between scheduler requests. The default is 10 minutes.

--timeout secs

Specify the longest acceptable delay in seconds.

--version

Displays program version and exits.

SEE ALSO

Schedule::Load


slpolice

NAME

slpolice - Warn and renice top CPU hogs

SYNOPSIS

slpolice [ --help ] [ --port=port ] [ --dhost=host ] [ --cpu-hours ] [ --version ] [ --version ]

DESCRIPTION

slpolice will determine the top cpu users across a cluster of hosts. It will send mail if a process has over a specified amount of cpu time.

It will also mail if a user has a reservation for a long period of time.

Usually slpolice is run with a crontab entry similar to:

    5 8-21 * * * /usr/local/bin/slpolice --cpu_min 120 --reserved_min 120 long=999 >/dev/null 2>&1

This sends warnings each hour after 2 hours of CPU time. It does not check at night so that long overnight jobs will not receive warnings.

Additional non-parameter arguments specify specific command regular expressions. When a process' command matches that regexp, the specified number of minutes will be used to determine when to send mail instead of the default.

This program is most valuable when used with the nicercizerd program, or a operating system where nice 19 processes get only leftover cpu resources. It requires a program called nice19 which is a version of nice that is setgid root and renices a job to 19. This comes with nicercizerd.

ARGUMENTS

--help

Displays this message and program version and exits.

--debug-user

With --debug, who to send the mail to instead of the process owner.

--port <portnumber>

Specifies the port number that slchoosed uses.

--dhost <hostname>

Specifies the host name that slchoosed uses. May be specified multiple times to specify backup hosts. Defaults to SLCHOOSED_HOST environment variable, which contains colon separated host names.

--cpu-min

Number of cpu minutes the job should have before being reported to the user. Defaults to 0, which is off.

--renice-min

Number of minutes after which the nice value of a high cpu using process that is not at 1 or 10 is reniced to 19. Defaults to 0, which is off.

--reserved-min

Number of minutes a host may be reserved before reporting it to the user. Defaults to 0, which is off.

--version

Displays program version and exits.

SEE ALSO

Schedule::Load, nicercizerd, nice19,


slreportd

NAME

slreportd - Distributed load reporter for Perl Schedule::Load

SYNOPSIS

slreportd [ --help ] [ --port=port ] [ --dhost=host ] [ --version ]

DESCRIPTION

slreportd will start a daemon to report machine loading for the Schedule::Load package. It will create two similar processes, so that if second process exits, the first may restart it automatically.

slreportd must be running on every host in the network, usually started with a init.d script. It reports itself to the slchoosed daemon periodically, and is responsible for checking loading and top processes specific to the host that it runs on.

slreportd may also be invoked with some variables set. This allows static host information, such as class settings to be passed to applications.

ARGUMENTS

--help

Displays this message and program version and exits.

--dhost

Specifies the daemon host name that slchoosed uses. May be specified multiple times to specify backup hosts. Defaults to SLCHOOSED_HOST environment variable, which contains colon separated host names.

--fake

Specifies load management should not be used, for reporting of a "fake" hosts' status. Often the hostname and other parameters will want to be overridden, for example:

slreportd hostname=lab_1 cpus=1 max_clock=100 osname=myos osvers=1 archname=myarch reservable=1 load_limit=1

--nofork

For debugging, prevents the daemon from creating additional processes and from going into the background. This allows messages to appear on stdout, and ctrl-C to stop the daemon.

--port

Specifies the port number that slchoosed uses. Defaults to SLCHOOSED_PORT environment variable or slchoosed service, or 1752.

--version

Displays program version and exits.

{variable}={value}

Sets a arbitrary constant variable to the specified value. This variable may be used so that a process requesting a machine can choose a machine with specific properties.

dynamic_cache_timeout={secs}

When set, after this number of seconds the dynamic load information for this host will no longer be cached by slchoosed, and when next needed must be reread from the slreportd. If not set, slchooserd picks a default, currently 10 seconds. Turning this number up may improve performance at the cost of decreased accuracy.

load_limit={value}

Set a maximum number of jobs that the scheduler can run on this machine.

load_pctcpu={1|0}

When set, determine load as a floating point number based on CPU usage percentage of all tasks. Default, or when clear, load is a absolute number where each high CPU job counts as one job, regardless of what percentage of the CPU is used. Using pctcpu tends to keep CPUs busy more often, at the possible expense of slowing down interactive jobs that are not using an entire CPU.

rating_adder={value}

Add the specified value to the rating obtained for the machine. A positive rating will make the machine less desirable for scheduling.

rating_mult={value}

Multiply the specified value to the rating obtained for the machine. The value 2 would act the same as a halved clock frequency, making the machine less desirable for scheduling.

SEE ALSO

Schedule::Load, Schedule::Load::Reporter,


slrsh

NAME

slrsh - Perform rsh command on all clump systems

SYNOPSIS

slrsh command

slrsh command command ... quit

DESCRIPTION

slrsh executes the arguments as a shell command like rsh does. However the command is executed on every host registered with rschedule. This is useful for system management functions.

Without a argument, slrsh will prompt for commands and execute them.

In any commands, @HOST is replaced with the name of the local host (ala `hostname`), and @HOSTS causes the command to be replicated for each host. Thus this command on a 2 machine clump:

    slrsh mount /net/@HOSTS

will execute 4 commands: ssh host1 mount /net/host1 ssh host1 mount /net/host2 ssh host2 mount /net/host1 ssh host2 mount /net/host2

ARGUMENTS

--help

Displays this message and program version and exits.

--hosts

Add a host to the list of hosts to be executed on, or add a list of colon separated hostnames or class aliases. If not specified, the default is all hosts.

--noprefix

Disable the default printing of the hostname in front of all --parallel output.

--parallel

Run each command on all machines in parallel. The command cannot require any input. The name of the machine will be prefixed to all output unless --noprefix is used.

--summary

With --parallel, summarize the output, showing hosts with identical outputs together. This is useful for then creating a new list of hosts from those hosts which had a specific output.

COMMANDS

exit (or x)

Exit slrsh. Control-C will not exit this program, as hitting Ctrl-C is more commonly used to interrupt commands on the remote machines.

hosts

Specify the list of hosts to run the following commands on. If nothing is specified on the command line, print a list of all class aliases, and prompt for the list of hosts. Hosts may be separated by spaces, commas, or colons. Hosts may also be a scheduler class, which adds all hosts in that class. Hosts may also include a leading - (minus) to remove the specified host. Thus "hosts CLASS_COUNTRIES -turkey washington" would return all hosts that are of scheduler class "COUNTRIES", excluding the host "turkey," and adding the host "washington".

quit (or q)

Same as exit.

SETUP

Here's an example of setting up ssh keys so root can get between systems. This example will differ for your site.

  ssh-keygen -t dsa
  mv .ssh/authorization_keys2 .ssh/authorized_keys2
  slrsh su root
  ssh -l root jamaica
  rm -rf /root/.ssh
  ln -s \$(DIRPROJECT_PREFIX)/root/.ssh /root/.ssh

SEE ALSO

Schedule::Load, rhosts


Schedule::Load

NAME

Schedule::Load - Load distribution and status across multiple host machines

SYNOPSIS

  #*** See the SETUP section of the Schedule::Load manpage.
  #*** Daemons must be running for this test
  # Get per-host or per top process information
  use Schedule::Load::Hosts;
  my $hosts = Schedule::Load::Hosts->fetch();
  foreach my $host ($hosts->hosts_sorted) {
      printf $host->hostname," is on our network\n";
  }
  # Choose hosts
  use Schedule::Load::Schedule;
  my $scheduler = Schedule::Load::Schedule->fetch();
  print "Best host for a new job: ", $scheduler->best(), "\n";
  # user access
  rschedule reserve <hostname>

DESCRIPTION

This package provides useful utilities for load distribution and status across multiple machines in a network. To just see what is up in the network, see the rschedule command. For initial setup, see below.

Most users do not need the Perl API, and can use the command line utilities that come with this package, and are installed in your standard binary directory like other unix applications. This package provides these four Unix programs:

rschedule

rschedule is a command line interface to this package. It and the potential aliases rtop, rhosts, and rloads report the current state of the network including hosts and top loading. rschedule also allows reserving hosts and setting the classes of the machines, as described later.

slchoosed

slchoosed is run on one host in the network. This host is specified in the SLCHOOSED_HOST environment variable, which may also specify additional cold standby hosts in case the first host goes down. Slchoosed collects connections from the slreportd reporters, and maintains a internal database of the entire network. User clients also connect to the chooser, which then gets updated information from the reporters, and returns the information to the user client. As the chooser has the entire network state, it can also choose the best host across all CPUs in the network.

slreportd

slreportd must be running on every host in the network, usually started with a init.d script. It reports itself to the slchoosed daemon periodically, and is responsible for checking loading and top processes specific to the host that it runs on.

slreportd may also be invoked with some variables set. This allows static host information, such as class settings to be passed to applications.

slpolice

slpolice is a optional client daemon which is run as a cron job. When a user process has over a hour of CPU time, it nices that process and sends mail to the user. It is intended as a example which can be used directly or changed to suit the system manager preferences.

lockerd

IPC::PidStat package. If running, it allows the scheduler to automatically cancel held resources if the process that requested the resource exits or is even killed without cleaning up.

MODULES

For those desiring finer control, or automation of new scripts, the Perl API may be used. The Perl API includes the following major modules:

Schedule::Load::Hosts

Schedule::Load::Hosts provides the connectivity to the slchoosed daemon, and accessors to load and modify that information.

Schedule::Load::Schedule

Schedule::Load::Schedule provides functions to choose the best host for a new job, reserving hosts, and for setting what hosts specific classes of jobs can run on.

Schedule::Load::Reporter

Schedule::Load::Reporter implements the internals of slreportd.

Schedule::Load::Chooser

Schedule::Load::Chooser implements the internals of slchoosed.

RESERVATIONS

Occasionally clusters have members that are only to be used by specific people, and not for general use. A host may be reserved with rschedule reserve. This will place a special comment on the machine that rschedule hosts will show. Reservations also prevent the Schedule::Load::Schedule package from picking that host as the best host.

To be able to reserve a host, the reservable variable must be set on that host. This is generally done when slreportd is invoked on the reservable host by using slreportd reservable=1.

CLASSES

Different hosts often have different properties, and jobs need to be able select a host with certain properties, such as hardware or licensing requirements. Classes are generally just boolean variables which start with class_. Classes can be specified when slreportd is invoked on the slreportd class_foo=1. The class setting may be seen with rschedule classes or may be read (as may any other variable) as a accessor from a Schedule::Load::Hosts::Host object.

Once a class is defined, a scheduling call can include it the classes array that is passed when the best host is requested. Only machines which match one of those classes will be selected.

COMMAND COMMENTS

rschedule loads or rloads show the command that is being run. By default this is the basename of the command invoked, as reported by the operating system. Often this is of little use, especially when the same program is used by many people. The rschedule cmnd_comment command or Schedule::Load::Schedule::cmnd_comment function will assign a more verbose command to that process id. For example, we use dc_shell, and put the name of the module being compiled into the comment, so rather than several copies of the generic "dc_shell" we see "dc module", "dc module2", etc.

HOLD KEYS

Hold keys allow a job request to be queued, so that when the resource is freed, it will be issued to the oldest requester. The hold will persist for a specified time until a process actually starts up on the selected host, and enough CPU time elapses for that new process to claim CPU time.

For a this limited time, the load on the host will be incremented. When the job begins and a little CPU time has elapsed the hold is released with a hold_release call, the timer expiring, or IPC::PidStat detecting the holding process died. This will cause the load reported by rschedule hosts to occasionally be higher than the number of jobs on that host.

FIXED LOADS

Some jobs have CPU usage patterns which contain long periods of low CPU activity, such as when doing disk IO. make is a typical example; the parent make process uses little CPU time, but the children of the make pop in and out of the CPU run list.

When scheduling, it is useful to have such jobs always count as one (or more) job, so that the idle time is not misinterpreted and another job scheduled onto that machine. Fixed loading allows all children of a given parent to count as a given fixed CPU load. Using make again, if the parent make process is set as a fixed_load of one, the make and all children will always count as one load, even if not consuming CPU resources. The rschedule loads or rloads command includes not only top CPU users, but also all fixed loads. If a child process is using CPU time, that is what is displayed. If no children are using appreciable CPU time (~2%), the parent is the one shown in the loads list.

SETUP

When setting a new site with Schedule::Load, first read the DESCRIPTION section about the various daemons.

First, make sure you've built and installed this package on all of your machines.

Then, pick a reliable master machine for the chooser. Set the SLCHOOSED_HOST environment variable to include this host name, and add this setting to a site wide file so that all users including daemons may see it when booting. You may add additional colon separated hostnames which will be backups if the first machine is down. Run slchoosed on the SLCHOOSED_HOST specified host(s).

On all the hosts in the network you wish to schedule onto, check SLCHOOSED_HOST is set appropriately, then run slreportd. Optionally run pidstatd (from IPC::Locker) on these hosts also.

The rschedule hosts command should now show your hosts.

If you run slreportd before slchoosed, there may be a 60 second wait before slreportd detects the new slchoosed process is running. During this time rschedule won't show all of the hosts.

When everything is working manually, it's a good idea to set things up to run at boot time. Manually kill all of the daemons you started. Then, make init files in /etc/init.d so the daemons start at boot time. Some examples are in the init.d directory provided by the distribution, but you will need to edit them. Exactly how this works is OS dependent, please consult your documentation or the web.

ENVIRONMENT

SLCHOOSED_HOST

A colon separated list of hostnames to contact to find slchoosed. They will be contacted in order; after the first connection is established, remaining hostnames will be backups.

SLCHOOSED_PORT

Default port number that slchoosed uses. If not defined, defaults to /etc/services assigned slchoosed port number, or if not specified there, 1752.

SEE ALSO

User program for viewing loading, etc:

rschedule, slrsh, slpolice

Daemons:

slreportd, slchoosed, slpolice

Perl modules:

Schedule::Load::Chooser, Schedule::Load::FakeReporter, Schedule::Load::Hosts, Schedule::Load::Hosts::Host, Schedule::Load::Hosts::Proc, Schedule::Load::Reporter, Schedule::Load::ResourceReq, Schedule::Load::Schedule


Schedule::Load::Chooser

NAME

Schedule::Load::Chooser - Distributed load choosing daemon

SYNOPSIS

  use Schedule::Load::Chooser;
  Schedule::Load::Chooser->start(port=>1234,);

DESCRIPTION

Schedule::Load::Chooser on startup creates a daemon that clients can connect to using the Schedule::Load package.

start ([parameter=>value ...]);

Starts the chooser daemon. Does not return.

PARAMETERS

port

The port number of slchoosed. Defaults to 'slchoosed' looked up via /etc/services, else 1752.

ping_dead_timeout

Seconds after which if a client doesn't respond to a ping, it is considered dead.

SEE ALSO

Schedule::Load, slchoosed


Schedule::Load::FakeReporter

NAME

Schedule::Load::FakeReporter - Distributed load reporting daemon

SYNOPSIS

  use Schedule::Load::Reporter;
  Schedule::Load::Reporter->start(fake=>1);

DESCRIPTION

Schedule::Load::FakeReporter creates a Schedule::Load::Reporter::ProcessTable similar to Proc::ProcessTable, which allows replacing the normal host information with special fixed information. This allows the Schedule::Load facilities to be used to manage other resources, such as laboratory equipment, that has CPU like status, but cannot locally run slreportd.

Pctcpu is based on the load_limit or if unspecified, each fixed load counts as 100%. Pid is the process ID that should be tracked on the current CPU, if this is not desired, add a pid_track=0 attribute.

See Schedule::Load::Reporter for most accessors.

SEE ALSO

Schedule::Load::Reporter, slreportd


Schedule::Load::Hold

NAME

Schedule::Load::Hold - Return hold/wait information

SYNOPSIS

  See Schedule::Load::Schedule

DESCRIPTION

This package provides accessors for information about a specific request that is either waiting for a host, or has obtained a host and is holding it temporarily.

ACCESSORS

allocated

Set by scheduler to indicate this hold has been scheduled resources, versus a hold that is awaiting further resources to complete. For informational printing, not set by user requests.

comment

Text comment for printing in reports.

hold_key

Key for generating and removing the request via Schedule::Load::Schedule.

hold_load

Number of loads to apply, for Schedule::Load::Schedule applications. Negative will request all resources on that host.

hold_time

Number of seconds the hold should apply before deletion.

req_age

Computed number of seconds since request was issued.

req_hostname

Host the request for holding was issued from.

req_pid

Pid the request for holding was issued by.

req_pri

Priority of the request, defaults to zero. Lower is higher priority.

req_time

Time the request for holding was issued. The chooser may move this time back to correspond to the very first request if the new hold's key matches a hold issued earlier. Due to this, hold_keys should be different with each unique request.

SEE ALSO

Schedule::Load, Schedule::Load::Hosts, Schedule::Load::Hosts::Host


Schedule::Load::Hosts

NAME

Schedule::Load::Hosts - Return host loading information across a network

SYNOPSIS

    use Schedule::Load::Hosts;
    my $hosts = Schedule::Load::Hosts->fetch();
    $hosts->print_machines();
    $hosts->print_top();
    # Overall machine status
    my $hosts = Schedule::Load::Hosts->fetch();
    (my $FORMAT =    "%-12s    %4s     %4s   %6s%%       %5s    %s\n") =~ s/\s\s+/ /g;
    printf ($FORMAT, "HOST", "CPUs", "FREQ", "TotCPU", "LOAD", "ARCH/OS");
    foreach my $host ($hosts->hosts_sorted) {
        printf STDOUT ($FORMAT,
                       $host->hostname,
                       $host->cpus_slash,
                       $host->max_clock,
                       sprintf("%3.1f", $host->total_pctcpu),
                       sprintf("%2.2f", $host->adj_load),
                       $host->archname ." ". $host->osvers,
                       );
    }
    # Top processes
    (my $FORMAT =    "%-12s   %6s    %-10s     %-5s    %6s     %5s%%    %s\n") =~ s/\s\s+/ /g;
    printf ($FORMAT, "HOST", "PID", "USER",  "STATE", "RUNTM", "CPU","COMMAND");
    foreach my $host ($hosts->hosts_sorted) {
        foreach $p ($host->top_processes) {
            printf($FORMAT,
                   $host->hostname,
                   $p->pid,             $p->uname,
                   $p->state,           $p->time_hhmm,
                   $p->pctcpu,          $p->fname);
        }
    }

DESCRIPTION

This package provides information about host loading and top processes from many machines across a entire network.

$self->fetch ()

Fetch the data structures from across the network. This also creates a new object. Accepts the port and host parameters.

$self->format_table(formats=>[...], data=>[...]);

Used internally by the print routines, but may be useful for external use also. Return a table as a string. Named format argument must be an array reference containing sprintf strings, plus '^' may be used as the width of the widest data column. Named data argument must be two dimmensional array reference of the data table to be printed.

$self->restart ()

Restart all daemons, loading their code from the executables again. Use sparingly. chooser parameter if true (default) restarts chooser, reporter parameter if true (default) restarts reporter.

$self->hosts ()

Returns the host objects in name sorted order, accessible with Schedule::Load::Hosts::Host. In an array context, returns a list; In a a scalar context, returns a reference to a list. This function is historical, using hosts_sorted or hosts_unsorted is faster.

$self->hosts_sorted ()

Returns array of host objects in name sorted order, accessible with Schedule::Load::Hosts::Host.

$self->hosts_unsorted ()

Returns array of host objects in unsorted order, accessible with Schedule::Load::Hosts::Host.

$self->hosts_match (...)

Returns Schedule::Load::Hosts::Host objects for every host that matches the specified criteria. Criteria are named parameters, as described in Schedule::Load::Schedule, of the following: classes specifies an arrayref of allowed classes. match_cb is a routine returning true if this host matches. allow_reserved=>0 disables returning of reserved hosts.

$self->idle_host_names (...)

Returns a list of host cpu names which are presently idle. Multiple free CPUs on a given host will result in that name being returned multiple times.

$self->ping

Return true if the slchoosed server is up.

$self->get_host ($hostname)

Returns a reference to a host object with the specified hostname, or undef if not found.

$self->classes ()

Returns all class_ variables under all hosts. In an array context, returns a list; In a a scalar context, returns a reference to a list.

$self->print_classes

Returns a string with the list of machines and classes that may run on them in a printable format.

$self->print_hosts

Returns a string with the list of host machines and loading in a printable format.

$self->print_top

Returns a string with the top jobs on all machines in a printable format, ala the top program.

$self->print_loads

Returns a string with the top jobs command lines, including any jobs with a fixed loading.

PARAMETERS

dhost

List of daemon hosts that may be running the slchoosed server. The second host is only used if the first is down, and so on down the list.

port

The port number of slchoosed. Defaults to 'slchoosed' looked up via /etc/services, else 1752.

SEE ALSO

Schedule::Load, rschedule

Schedule::Load::Hosts::Host, Schedule::Load::Hosts::Proc


Schedule::Load::Hosts::Host

NAME

Schedule::Load::Hosts::Host - Return information about a host

SYNOPSIS

  See Schedule::Load::Hosts

DESCRIPTION

This package provides accessors for information about a specific host obtained via the Schedule::Load::Host package.

classes_match

Passed an array reference. Returns true if this host's class matches any class in the array referenced.

eval_match

Passed a subroutine reference that takes a single argument of a host reference. Returns true if the subroutine returns true. It may also be passed a string which forms a subroutine ("sub { my $self = shift; ....}"), in which case the string will be evaluated in a safe container.

fields

Returns all information fields for this host.

exists (key)

Returns if a specific field exists for this host.

get (key)

Returns the value of a specific field for this host.

ACCESSORS

A accessor exists for each field returned by the fields() call. Typical elements are described below.

adj_load

Total number of processes in run or on processor state, adjusted for any jobs that have a specific fixed_load or hold time, and adjusted for jobs that have not yet scheduled but are collecting resources for a new run. This is the load used for picking hosts.

archname

Architecture name from Perl build.

cpus

Number of CPUs. On hyperthreaded Linux systems, this indicates the maximum number of simultaneous threads that may execute; see physical_cpus for the real physical CPU count.

cpus_slash

Returns a string with the number of cpus, or in hyperthreaded systems, the number of physical cpus "/" the number of SMT cpus.

holds

Returns list of Schedule::Load::Hosts::Hold objects, sorted by age.

hostname

Name of the host.

max_clock

Maximum clock frequency.

load_limit

Limit on the loading that a machine can bear, often set to the number of CPUs to not allow overloading of a machine. Undefined if no limit.

osname

Operating system name from Perl build.

physical_cpus

Number of CPUs physically present.

reservable

If true, this host may be reserved for exclusive use by a user.

reserved

If true, this host is reserved, and this field contains a username and start time comment.

systype

System type from Perl build.

top_processes

Returns a reference to a list of top process objects, Schedule::Load::Hosts::Proc to access the information for each process. In an array context, returns a list; In a a scalar context, returns a reference to a list.

total_load

Total number of processes in run or on processor state.

total_pctcpu

Total CPU percentage used by all processes.

total_rss

Total resident memory used by all processes.

total_size

Total memory size, resident and swapped, used by all processes. This will often exceed the physical memory size.

SEE ALSO

Schedule::Load, Schedule::Load::Hosts, Schedule::Load::Hosts::Proc


Schedule::Load::Hosts::Proc

NAME

Schedule::Load::Hosts::Proc - Return process information

SYNOPSIS

  See Schedule::Load::Hosts

DESCRIPTION

This package provides accessors for information about a specific process obtained via the Schedule::Load::Hosts package.

fields

Returns all information fields for this process.

exists (key)

Returns true if a specific field exists for this process.

get (key)

Returns the value of a specific field for this process.

ACCESSORS

A accessor exists for each field returned by the fields() call. Typical elements are described below. All fields that Proc::ProcessTable supports are also accessible.

nice0

Nice value with 0 being normal and 19 maximum nice.

time_hhmm

Returns the runtime of the process in mmm:ss or hh.hH format, whichever is appropriate.

username

Texual user name running this process.

SEE ALSO

Schedule::Load, Schedule::Load::Hosts, Schedule::Load::Hosts::Host


Schedule::Load::Reporter

NAME

Schedule::Load::Reporter - Distributed load reporting daemon

SYNOPSIS

  use Schedule::Load::Reporter;
  Schedule::Load::Reporter->start(dhost=>('host1', 'host2'),
                                  port=>1234,);

DESCRIPTION

Schedule::Load::Reporter on startup connects to the requested server host and port. The server connected to can then poll this host for information about system configuration and current loading conditions.

start ([parameter=>value ...]);

Starts the reporter. Does not return.

PARAMETERS

dhost

List of daemon hosts that may be running the slchoosed server. The second host is only used if the first is down, and so on down the list.

port

The port number of slchoosed. Defaults to 'slchoosed' looked up via /etc/services, else 1752.

fake

Specifies load management should not be used, for reporting of a "fake" hosts' status or scheduling a non-host related resource, like a license.

min_pctcpu

The minimum percentage of the CPU that a job must have to be included in the list of top processes sent to the client. Defaults to 3. Setting to 0 will consume a lot of bandwidth.

stored_filename

The filename to store persistent items in, such as if this host is reserved. Must be either local-per-machine, or have the hostname in it. Defaults to /usr/local/lib/rschedule/slreportd_{hostname}_store. Set to undef to disable persistence (thus if the machine reboots the reservation is lost.) The path must be **ABSOLUTE** as the daemons do a chdir.

SEE ALSO

Schedule::Load, slreportd


Schedule::Load::Reporter::Disk

NAME

Schedule::Load::Reporter::Disk - slreportd disk data collector

SYNOPSIS

  use Schedule::Load::Reporter::Disk;
  my $n = new Schedule::Load::Reporter::Disk;
  $n->poll;
  print Dumper($n->stats);

DESCRIPTION

Schedule::Load::Reporter::Disk is a plugin for slreportd that collects disk performance statistics from Linux 2.16 machines.

new

Creates a new report object.

poll ($now_secs, $now_usecs)

Collects statistics, and scales by the time since the last poll. Pass in the current time (this avoids multiple syscalls when there's many plugins).

stats

Return an array reference with the statistics.

SEE ALSO

Schedule::Load, slreportd


Schedule::Load::Reporter::Filesys

NAME

Schedule::Load::Reporter::Filesys - slreportd filesystem data collector

SYNOPSIS

  use Schedule::Load::Reporter::Filesys;
  my $n = new Schedule::Load::Reporter::Filesys;
  $n->poll;
  print Dumper($n->stats);

DESCRIPTION

Schedule::Load::Reporter::Filesys is a plugin for slreportd that collects filesystem performance statistics for most Linux systems.

new

Creates a new report object.

poll ($now_secs, $now_usecs)

Collects statistics, and scales by the time since the last poll. Pass in the current time (this avoids multiple syscalls when there's many plugins).

stats

Return an array reference with the statistics.

SEE ALSO

Schedule::Load, slreportd


Schedule::Load::Reporter::Network

NAME

Schedule::Load::Reporter::Network - slreportd network data collector

SYNOPSIS

  use Schedule::Load::Reporter::Network;
  my $n = new Schedule::Load::Reporter::Network;
  $n->poll;
  print Dumper($n->stats);

DESCRIPTION

Schedule::Load::Reporter::Network is a plugin for slreportd that collects network statistics from Linux 2.16 machines.

new

Creates a new report object.

poll ($now_secs, $now_usecs)

Collects statistics, and scales by the time since the last poll. Pass in the current time (this avoids multiple syscalls when there's many plugins).

stats

Return an array reference with the statistics.

SEE ALSO

Schedule::Load, slreportd


Schedule::Load::ResourceReq

NAME

Schedule::Load::ResourceReq - Generate a request for a single resource

SYNOPSIS

  See Schedule::Load::Schedule

DESCRIPTION

This package provides a constructor for a request of a single resource. When scheduling, multiple resource requests may be created and the scheduler will fill (or deny) all requests in one atomic operation. This prevents nasty deadlocks (like the chopsticks deadlock.)

METHODS

new (...)

Create a new object with the parameters specified in the following section.

PARAMETERS

The following parameters are accepted by new(), and are also may be read via accessor methods.

allow_reserved

When set, reserved hosts may be scheduled.

classes

An array reference of which classes the host must support to allow this job to be run on that host. Defaults to [], which allows any host.

favor_host

The hostname to try and choose if all is equal, under the presumption that there are disk access time benefits to doing so. Defaults to the current host.

jobs_running

Current number of jobs the requester is running. This is compared to max_jobs.

keep_idle_cpus

Minimum of cpus that should remain idle before scheduling this job. Negative fraction indicates that percentage of the clump, for example -0.5 will keep at least 50% of all CPUsidle. Defaults to 0.

match_cb

A string containing a subroutine which will be passed a host reference and should return true if this host has the necessary properties. This must only look at constant properties of the host (IE NOT the current host loading), as the match results may be cached. This will be evaluated in a Safe container, and can do only minimal core functions. For example: match_cb=>"sub{return $_[0]->get_undef('memory')>512;}"

max_jobs

Maximum number of jobs that can be issued if allow_none is specified in a scheduler request. Negative fraction indicates that percentage of the clump, for example -0.5 will use at most 50% of all CPUs. Defaults to 100% of the clump.

rating_cb

A string containing a subroutine which will be passed a host reference and should return a number that is compared against other hosts' ratings to determine the best host for a new job. This may include dynamic information such as instantaneous loading. A return of zero indicates this host may not be used. Ratings closer to zero are better. Defaults to a function that includes the load_limit and the cpu percentage free. Evaluated in a Safe container, and can do only minimal core functions.

SEE ALSO

Schedule::Load, Schedule::Load::Hosts, Schedule::Load::Hosts::Host


Schedule::Load::Safe

NAME

Schedule::Load::Safe - Evaluate callback in Safe container with caching

SYNOPSIS

  See Schedule::Load::Schedule

DESCRIPTION

This package is for internal use of Schedule::Load. It allows a function to be defined inside a Safe container, then saved inside a cache for later use. This is significantly faster than creating a safe container for each evaluation.

SEE ALSO

Schedule::Load


Schedule::Load::Schedule

NAME

Schedule::Load::Schedule - Functions for choosing a host among many

SYNOPSIS

    use Schedule::Load::Schedule;
    my $scheduler = Schedule::Load::Schedule->fetch();
    print "Best host for a new job: ", $scheduler->best();

DESCRIPTION

This package will allow the most lightly loaded host to be chosen for new jobs across many machines across a entire network.

It is also a superclass of Schedule::Load::Hosts, so any functions that work for that module also work here.

METHODS

best (...)

Returns the hostname of the best host in the network for a single new job. Parameters may be parameters specified in this class, Schedule::Load::Hold, or Schedule::Load::ResourceReq. Those packages must be used individually if multiple resources need to be scheduled simultaneously.

fixed_load (load=>load_value, [pid=>$$], [host=>localhost], [req_pid=>$$, req_hostname=>localhost])

Sets the current process and all children as always having at least the load value specified. This prevents under-counting CPU utilization when a large batch job is running which is just paused in the short term to do disk IO or sleep. Requests to fake reporters (resources not associated with a CPU) may specify req_pid and req_hostname which are the PID and hostname that must continue to exist for the fixed_load to remain in place.

hold_release (hold_key=>key)

Releases the temporary hold placed with the best function.

hosts_of_class (class=>name)

Depreciated, and to be removed in later releases. Use hosts_match instead.

jobs (...)

Returns the maximum number of jobs suggested for the given scheduling parameters. Presumably this will be used to spawn parallel jobs for one given user, such as the make -j command. Jobs() takes the same arguments as best(), in addition to the max_jobs parameter.

release (host=>hostname)

Releases the machine from exclusive use of any user. The user doing the release does not have to be the same user that reserved the host.

reserve (host=>hostname, [comment=>comment])

Reserves the machine for exclusive use of the current user. The host chosen must have the reservable flag set. rschedule hosts will show the host as reserved, along with the provided comment.

schedule (hold=>Schedule::Load::Hold ref, resources=>[], [allow_none=>1])

Schedules the passed list of Schedule::Load::ResourceReq resources, and holds them using the passed hold key. If allow_none is set and the loading is too high, does not schedule any resources. Returns a object reference to use with scheduled_hosts, or undef if no resources available.

scheduled_hosts

Returns a list of Schedule::Load::Host objects that were scheduled using the last schedule() call.

set_stored (host=>hostname, [set_const=>1], [key=>value])

Set a key/value parameter on the persistent storage on the remote server, such as if a class is allowed on that host. With const=>1, don't make it persist, but make it look like the daemon was started with that option; when the daemon restarts the information will be lost.

PARAMETERS

Parameters for the new and fetch calls are shown in Schedule::Load::Hosts.

allow_none

If allow_none is true, if there is less than a free CPU across the entire network, then no cpu will be chosen. This is useful for programs that can dynamically adjust their outstanding job count. (Presumably you would only set allow_none if you already have one job running, or you can get live-locked out of getting anything!)

SEE ALSO

Schedule::Load, Schedule::Load::Hosts, rschedule