Friday, February 20, 2015

Netman, ITA Agent and their sons

When TWS runs there are many processes involved, some are historical, others has been added in the latest releases for new functionalities.

In order to better manage the environment and autonomously troubleshoot issues, it can be useful to know when they run, which command starts / stops each process, and which main files and which TCP ports they use.

For this reason (and since some customers was asking for this information) I've decided to consolidate here this information, hoping this can be useful also to other TWS admins.



Dynamic Agent


The processes for a Dynamic Agent are the following:
  • ITA Agent is the network daemon for the dynamic agent, it's started by StartUpLwa and stopped by ShutDownLwa scripts. It listen on the dynamic agent TCP port (31114 by default), passing the incoming requests to the sons.
  • JobManagerGW is an optional process sin 9.2 that is present if a local Gateway is configured (during the installation or later), it modifies the Dynamic Agent protocol removing any connection from the broker server to the agent. It's started by the ITA Agent at startup (if enabled) and opens TCP connections to connect to the broker server. It also receives from the ITA Agent requests coming from the agents.
  • JobManager is the actual dynamic agent process, it processes the submit request coming from the broker server or from one of its Gateways. It's started by the ITA Agent at startup and opens connections to the broker server or its Gateway.
  • CIT is a set of tools periodically run by the Dynamic Agent to discover machine resources.
  • SSM Agent is used to monitor local files for Event Driven Workload Automation feature, it's started and stopped by JobManager, but since there is an intermediate temporary process during the startup, the SSM Agent is reported with no parent process. If both Dynamic Agent and FTA are running on the system, there are two SSM Agent processes, the one for Dynamic Agent is the one with EDWA in the config path: ./ssmagent.bin -f <TWA dir>/TWS/EDWA/ssm/config
Every time a job is started, the following processes are launched:
  • taskLauncher is a process started by JobManager to run a native job, it runs under the job streamlogon and is used to switch userid and create the job environment, there is one process for each native job that is running.
  • taskLauncher.sh is a script started by taskLauncher to complete the creation of the job environment, and can be customized by the admin, since it uses "eval" to start the job, there are usually two taskLauncher.sh processes for each job
  • Your job in the above picture are the actual processes of your jobs, scripts, executables, etc.. the chain varies depending of what is run.

Fautl Tolerant Agent


For the FTAs architecture, including Standard Agent, Domain Manger, Dynamic Domain Manger and Master, the above picture illustrates the main processes. Most of the processes has a .msg file as input queue, each queue can have only one process reading the events from the queue, more details about the message flow can be found in Message flow and processes on FTAs and classic E2E article.
  • Netman is the network daemon for the FTA architecure, it's started by StartUp and stopped by conman shut (ShutDown on Windows). It listen on the agent TCP ports (31111 by default and 31113 for SSL), starting required services and in some cases passing the incoming connection to the sons. It also has a local incoming message files NetReq.msg.
With start / stop commands (local or remote) the following processes are started or stopped:
  • Mailman is responsible to connect to other agents and route the messages to Batchman and the other agents that need to receive them. It's started and stopped by netman, it reads messages from Mailbox.msg.
    A DM or Master may have multiple instances of mailman process if mailman servers are used in the definitions of the workstations for that domain, each mailman server is identified by a character and reads messages from a dedicate .msg file, e.g. the mailman server A reads from serverA.msg.
    Mailman processes also open connections to remote agents, and use in exclusive mode the .msg files in the pobox directory that are used to keep messages for unlinked agents.
  • Batchman (not present on Standard Agents) is the process that manage locally the plan and schedules jobs. It's started and stopped by mailman when it starts or stops. Batchman reads messages from Intercom.msg.
  • Jobman is the process responsible to starts jobs, it's started and stopped by batchman and reads messages from Courier.msg.
  • Writer is the process that writes locally the events received from remote agents via network, there is on writer for each workstation linked to the local one. Writer is started by netman when the remote workstation links to the local one and is stopped when the local workstation is unlinked.
Monitoring processes are controlled by startmon / stopmon commands, run locally or remotely:
  • Monman downloads event management configuration for the local agent, filters and sends event about TWS objects. It reads events from Monbox.msg and quick commands via Moncmd.msg
  • SSM Agent is used to monitor local files for Event Driven Workload Automation feature, it's started and stopped by JobManager, but since there is an intermediate temporary process during the startup, the SSM Agent is reported with no parent process. If both Dynamic Agent and FTA are running on the system, there are two SSM Agent processes, the one for Dynamic Agent is the one with EDWA in the config path: ./ssmagent.bin -f <TWA dir>/TWS/EDWA/ssm/config
Masters and dynamic domain managers includes a WebSphere Application Server, this is started or stopped with startappserver / stopappserver:
  • Appservman is responsible to start, stop and monitor WAS, restarting it in case of crashes. It reads messages (actually few commands) from Appserverbox.msg.
  • WAS (WebSphere Application Server) is used to provides APIs, centralized access to DB and hosts the broker and on the master the planning processing. It listen on many TCP ports and receives messages from other processes via several .msg files: server.msg for Workload Service Assurance, planbox.msg for cross dependencies, and since 9.1 mirrorbox*.msg to replicate the plan in the DB for UI access.
Every time a job is started, the following processes are launched:
  • Job Monitor impersonates the required userid, launches the job and monitors its execution to report its completion. On UNIX this is just a clone of the parent Jobman process, on Windows this is started by jobman and is actually called jobmon.
  • jobmanrc is a script started by job monitor to complete the creation of the job environment, and can be customized by the admin.
  • Your jobs in the above picture are the actual processes of your jobs, scripts, executables, etc.. the chain varies depending of what is run.
StartUp script automatically issue a startappserver starting also Appservman and WAS (if not already running. However this does not include start and startmon, for a complete startup at boot the following commands are needed:
  1. StartUp
  2. start
  3. startmon
When conman shut / Shutdown is run netman stops its childs, with the only exception of WAS that is kept running.

The following tables summarizes .msg files and TCP ports used by the processes.

Process.msg files readdefault TCP ports
ITA Agent31114
netmanNetReq.msg31111
31113
mailmanMailbox.msg
batchmanIntercom.msg
jobmanCourier.msg
monmanMonbox.msg
Moncmd.msg
appservmanAppserverbox.msg
WAS (java)server.msg
planbox.msg
mirorrbox*.msg
31115 to 31124
31131
41114

References:
Tivoli Workload Scheduler workstation processes
Configuring dynamic agent communications through a gateway

If you like this article and you find it useful, please share it on social media so other people may take advantage of it.

11 comments:

  1. Keep it up Franco.Very Nice Article.

    ReplyDelete
  2. Hi Franco, Images in above article are not visible. can you please re-upload them

    ReplyDelete
    Replies
    1. That should be an issue with your firewalls or browser. Images are hosted on docs.google.com and can be accessed without any account.

      Delete
  3. This is a great. Thank you!

    ReplyDelete
  4. Hello Franco,

    after unexpected reboot of MDM monman can't start. I tried all start/stop scripts, it just doesn't work. I even can't find any error code or something in logs. Any idea where to find useful info?

    Thanks
    Ivan

    ReplyDelete
    Replies
    1. The full startup sequence for FTA is:
      - StartUp
      - conman start
      - conman startmon

      If monman doesn't start looks for errors in NETMAN and TWSMERGE logs in stdlist/traces

      If you are not able to find the cause by your own, open a PMR attaching the mustdatagather output (http://www-01.ibm.com/support/docview.wss?uid=swg21295038)

      Delete
  5. Please keep Posting such article, it is really nice to read.

    ReplyDelete
  6. It's really good article to understand the basic functionality of the TWS processes. Appreciate it :)

    ReplyDelete