IBM & HCL Workload Automation Best Practices: 2014

Monday, December 22, 2014

Prevent and solve queuing issues in Tivoli Workload Scheduler

In this article I've talked about Tivoli Workload Scheduler message queues and how they work as input queue for TWS processes.
In large and busy environment these message files can start growing creating delay issues.

I've just published with Paolo Salerno a new article on developerworks about monitoring message files, detect issues and a new feature introduced with 9.2 FP1 that allows TWS administrators to prevent and solve delays issues monitoring how much of the mailman capacity is used and which workstation should be moved under a mailman server to have a more reliable environment.

The article is available here: http://bit.ly/wablog-mailman-queues

If you like this article and you find it useful, please share it on social media so other people may take advantage of it.

Friday, December 19, 2014

How to replace CA7 eXtended Agent in Tivoli Workload Scheduler

I've just published togheter with Silvano Lutri an article on developer works about two possible solutions to replace the CA7 XA: http://bit.ly/wablog-raplace-CA7-XA

This specific Tivoli Workload scheduler agent is used to coordinate workload scheduled by TWS and workload running on z/OS by CA7.

This agent is going end of support on September 30, 2015 together with Tivoli Workload Scheduler for Applications 8.4 that is the last version including that eXtended Agent.

If you are currently running that agent, check the article to verify which solution can work for you. You can comment on this blog or contact me on social media if you need any further clarification or help.

If you like this article and you find it useful, please share it on social media so other people may take advantage of it.

Wednesday, December 17, 2014

Running What If Analysys on Tivoli Workload Scheduler

I'm very excited to talk you about the new "What If" feature that we have just published yesterday on IBM Workload Automation SaaS and on beta with new refresh we are publishing right now.
You may be already aware of this new capability if you are participating to the Transparent Development program.

This new capability is target to answer the following questions:

How much time do I have to fix this failure without impacting the SLAs for my critical workload?
What will happen if this job will take longer today?
Why my workflow is completing so late? Which job and dependencies I should work on to anticipate it?

Message flow and processes on FTAs and classic E2E

I've received a request from a customer for information about the flow of messages (events) in Tivoli Workload Scheduler for z/OS classic End-to-End.
With messages I'm referring to the information exchanged between TWS agents and servers in order to start and track job execution, submit new workload, modify existing workload, etc..
I'll not use the word event, that is sometime used in this context, to avoid confusion with the events of Event Driven Workload Automation (EDWA).

This is a very specific and technical topic, however understanding this flow was the first think I made when I started working on the TWSd code to start the porting on z/OS and integration with TWSz (OPC at that time) to make first release of the classic E2E. It was the year 2000 and TWS development was still in Santa Clara, while OPC was already here in Rome. I started creating the diagrams that I'll use in this article and they was on the wall in front of me for several months.
Even if this information is also available in the manuals, I think it could be useful to have this information also on this blog.

The picture above represents the basic message flows for a Fault Tolerant Agent (FTA).

Workload Service Assurance and Ideal Batch

Workload Service Assurance is one of the most powerful feature present in Tivoli Workload Scheduler, present both in TWS for z/OS and in TWS distributed.
This is also know as "Dynamic Critical Path" or with the acronym WSA.
This was made to address the need of taking under control the end of complex critical workflows.

Scheduling batch processes is a fundamental backbone of every IT infrastructure, from the smaller organizations to the larger one, and very critical processes are delegated to the scheduler, like creating payrolls, drive money transfers, calculate and distribute price lists, create financial statements, automatically process orders, process claims in insurance companies, etc...
Most of this processes are still scheduled and usually need to complete within a specific time, otherwise it will have significant business impacts and often will result in fees to pay. Also dynamic workload may have SLAs that imposes to process the request within a specific amount of time.
Traditionally this critical processes are constantly monitored by operations or application teams to assure they complete on time, the main challenge is to identify all the jobs that are part of the flows and assure that no one has issues that can impact the completion of the overall process. If there is an high number of jobs in the flow, the user try to identify the critical path that need to be monitored with more attention in order to react quickly to any issue. This remains a complex and time consuming work.
Workload Service Assurance is made to address those needs, but at the same time makes a step forward, removing the need to constantly monitor those processes.
The ideal batch is the one that you can forgive, that you can assume is working and that will provide the expected result on time, you should care about it only when there is an unexpected issue and in this case you should be notified and able to easily find where the issues is.

Using Tivoli Workload Scheduler to automate complex reboots

Few weeks ago I've published with Enrica Alberti an article on IBM developer works about how we have automated the reboot of machines on our IBM Workload Automation SaaS environment.

This is an example of how we used TWS itselft to manage our infrastructure for Workload Automation SaaS. On our SaaS we are running tens of servers running the product for the customers, in addition to them we have a couple of machines used to control the infrastructure and where we are running an internal TWS used to automate any recurring task:
- create, configure and deprovision VMs used to run customer subscriptions
- create, delete, suspend, resume customer subscriptions
- add and remove users to customer subscriptions

For these tasks we have created some REST APIs that are invoked by the Service Engage common infrastructure components and that submit to TWS the appropriate job stream to actually modify the environment.

In addition we have some scheduled housekeeping job streams and now the reboot process described in the article.

The actual work we need to make on the environment is minimal, with all the operations running automatically and easy to monitor and recover thanks to TWS.

This experience reinforce the message that automation can save a lot of effort, especially when considered and planned at the beginning of the project or quickly recognized later if missed at the first analysis.

If you like this article and you find it useful, please share it on social media so other people may take advantage of it.

Tuesday, December 2, 2014

Scheduling FINAL (Best Practices)

One of the most important process when setting a TWSd environment is when and how to extend the plan and there are several options in optman that are related to this process.

I already presented this topic during the ASAP conference in 2011, but due to the important, and many users that are not yet familiar with that, I think this is a good argument to start the blog.

First decision to make is when to run the FINAL job stream to extend the plan. The decision have to take in consideration several elements like:

At what time most of the batch workload is complete and the plan extension have the minimum impact?
When are the administrators available in case there is any issue with the plan extension?
Are the administrators available in the weekends and holidays?
How many hours of buffer I want to keep in order to be able to fix extension problems without impacting the productions?

The result of this decision could be for example that I want to schedule FINAL only on working days, running at 7 AM and with 5 hours of buffer.
That means we need a FINAL scheduled at 7 AM on working days that extends the plan until the 12 PM of the next working day.

Welcome to this blog

I'm working for IBM since 1998 and I spent most of my time working in development and support for IBM Tivoli Workload Scheduler, both distributed and z/OS.
After years spend as developer, L3 technical leader and Lab Advocate of several customer, I'm currently the TWS Chief Designer and Architect for IBM Workload Automation SaaS.

I've decided to open this personal blog to share my experience with the product, how to use new features and common best practices. This blog is also a way to discuss about this topics with customers and users so, please, use the comments to rise your questions or provide your opinion.

Best practices that I'll publish here come from the experience I made working at L3, as Lab Advocate and running our SaaS offering.

Franco

About Franco Mossotto

I'm the lead architect for the IBM/HCL Workload Automation products.
I've started working in IBM in 1998 as a Tivoli Workload Scheduler for z/OS developer, and then worked in design, development and support of IBM Tivoli scheduling and provisioning products. In the scheduling area I've worked as developer, chief designer, L3 technical leader for both Tivoli Workload Scheduler and Tivoli Workload Scheduler for z/OS, and as an architect for the development of SaaS offerings.
Following the IBM and HCL partnership in 2016, I transitioned to HCL with the rest of the development team to continue my work on IBM Workload Automation portfolio.

IBM & HCL Workload Automation Best Practices

Pagine