Friday, December 12, 2014

Workload Service Assurance and Ideal Batch

Workload Service Assurance is one of the most powerful feature present in Tivoli Workload Scheduler, present both in TWS for z/OS and in TWS distributed.
This is also know as "Dynamic Critical Path" or with the acronym WSA.
This was made to address the need of taking under control the end of complex critical workflows.

Scheduling batch processes is a fundamental backbone of every IT infrastructure, from the smaller organizations to the larger one, and very critical processes are delegated to the scheduler, like creating payrolls, drive money transfers, calculate and distribute price lists, create financial statements, automatically process orders, process claims in insurance companies, etc...
Most of this processes are still scheduled and usually need to complete within a specific time, otherwise it will have significant business impacts and often will result in fees to pay. Also dynamic workload may have SLAs that imposes to process the request within a specific amount of time.
Traditionally this critical processes are constantly monitored by operations or application teams to assure they complete on time, the main challenge is to identify all the jobs that are part of the flows and assure that no one has issues that can impact the completion of the overall process. If there is an high number of jobs in the flow, the user try to identify the critical path that need to be monitored with more attention in order to react quickly to any issue. This remains a complex and time consuming work.
Workload Service Assurance is made to address those needs, but at the same time makes a step forward, removing the need to constantly monitor those processes.
The ideal batch is the one that you can forgive, that you can assume is working and that will provide the expected result on time, you should care about it only when there is an unexpected issue and in this case you should be notified and able to easily find where the issues is.



That's exactly the goal of Workload Service Assurance:

  • monitor the critical workload for you
  • constantly estimating when it will finish
  • changing the scheduling and execution priority in case of delays
  • notifying you if there is any delay or problem in the network that can impact your critical workload, relating the alert to the workload that is impacted.
  • allow to easily list all the problems that have an impact on a specific workload
  • provides other views to analyze and monitor the critical workload, focusing on critical paths and current issues.

And this is possible just flagging as critical the last job of your critical workload, the one that need to be completed within a specific time, and setting a deadline for it. That's so simple.

On z/OS Workload Service Assurance is present since version 8.3, with many improvements provided via APAR or in next releases. On TWS distributed this has been introduced with TWS 8.5. The two implementations have small differences, but the scenarios are run in almost the same way.
Despite the fact this is providing the same value to both version of the product, this feature is a very popular on TWSz, but has never began so popular on TWSd. I think the main reason of this different adoption can be found in how scheduling teams are organized on z/OS and on distributed platforms. On z/OS the bath is still managed by a centralized team, accountable of when the critical workload completes, on distributed platforms this is decentralized and the final user has less scheduling skills, sometimes not expecting from the scheduler more then cron provides.

References:
Workload Service Assurance on IBM Knowledge center

If you like this article and you find it useful, please share it on social media so other people may take advantage of it.

No comments:

Post a Comment