Swaminathan Nagarajan: IT Health Checkup

Showing posts with label IT Health Checkup. Show all posts

Thursday, November 29, 2018

CIOs and CTOs: How frequently do you perform a health check up of your portfolio?

An IT landscape, like any other real estate, grows and changes continuously. In the process, it accumulates technical debt, grows equally well in both legacy and modern technologies, redevelops functionality, creates redundant systems and keeps tweaking many products to suit the BAU scenario. On top of it, if the organization acquires another company, the diversity of the portfolio multiplies. In the entire melee, no one bothers about keeping a proper count of inventory or documentation or health of the portfolio.

Very few organizations invest in a portfolio assessment exercise and fewer do so periodically. No doubt, such an exercise is time consuming and expensive. But, the benefits of a proper exercise, if done well, will help the organization to take decisions faster. Not knowing the extent of technical debt or the inability of the agility of the portfolio will creep up more and more work in every IT project and eventually pull back the organization’s ability to develop new differentiating products or services.

Now the industry is moving from on-premises to cloud; legacy to differentiating technologies; from systems that are obsolete or about to go out of support to open sources; separate development and support to DevOps; waterfall methodology to agile; sedate landscape to one that evolves and learns. Consequently, the focus of such assessment exercises should bring out these factors in the form of metrics.

Based on my experience, here is a primer in the form of FAQ on such an exercise:

What is such an exercise called?

It is typically known by various names. Some of them are APA (Application Portfolio Assessment), APR (Portfolio rationalization) or APO (Portfolio Optimization) or APM (Portfolio Modernization).

How frequently this should be carried out?

Once in three years is a minimum. This should be modified depending on the growth of the organization in terms of IT systems, inorganic acquisition of other entities etc. If the pace of change is more, once in 2 years will be better.

What is the duration and involvement of such an exercise?

The duration can range from 1 to 3 months depending on the quality of the IT assets and configuration data base maintained. Involvement from Product owners, Enterprise architects, procurement and senior management will be required. This initiative should be owned and driven from the CTO’s office.

Let’s talk about the outcome of a typical APA exercise.

Once we understand the output, the activities to be done, during the exercise, can be inferred easily. For the purpose of clarity, let’s look at the output as consisting of two parts:

Part-1 is about metrics that indicate the health of the landscape across various dimensions
Part-2 is about resulting initiatives and recommendations.

What does Part-1 focus on?

This has 2 sets of metrics. First set focuses on hygiene factors; the second set on value added parameters. The hygiene set should address, at the minimum on the completeness, quality and currency of data pertaining to the following 7 dimensions:

Inventory (List of applications and related details like description, technology, # users etc.)
Mapping (Business capability or function to Platform Platform to IT application and IT application to Infra)
Product Usage (Product Name, associated details, Extent of customization of the product, Vendor)
License (Vendor, License, ILF – Initial License Fee, RLF – Recurring License Fee, Contract expiry date etc.)
Documentation (Required documents for understanding the system, required documents for providing support/development of the system)
Technical debt (a quick assessment in the form of KLOC – Kilo Lines of Code or Cost)
Cost related (Various cost elements including but not limited to resources, service provider, product vendor, license vendor, compute, network, storage, cloud etc.)

The second value-added set should focus on the ability of the IT organization to closely align with business and quickly respond to the changing needs without undue lag. What are the areas to focus here?

Cloud readiness – Cloud is great leveler of the landscape and imposes more uniformity and standardization without compromising on the business capability. How much of the landscape can be easily shifted from on-premises to cloud?

Platformication readiness – How are the IT systems and infra being developed? Do they lend themselves to a vertical clean cut enabling business, application, data and infra layer to go towards a platform or a utility model? Typically the top 30 to 40% of the landscape that are key to the business should fall into this bucket.

System of innovation and differentiation – All the systems can be bucketed into system of records (back office, core, legacy that undergo little change), differentiation (the middle layer that gives opportunities to differentiate the services) and innovation (the key ones that are integral and unique to the organization’s capability). As the landscape matures, more and more systems should move from records to differentiation and differentiation to innovation. That way, more and more of the landscape will lend itself to agility and help in a faster turn around.

DevSecOps readiness – How are changes implemented in the portfolio? How is support provided? Is there a wall between them? What type of people are being used across the landscape? What is the release periodicity? Is the entire IT operating on a single cadence even if many methodologies are followed? How is security principles embedded in the life cycle? Today’s world is taking the landscape towards a common set of cadence with no blockers to the ground.

Skill readiness – The organization should adopt a uniform system across its internal as well as external employees / contractors / service providers to denote the skill and competency. Dreyfuss or SFIA can be adopted. This help in talking the same language, drawing up training plan, cross-skilling of people, providing a career path etc.

Tell us about Part-2 of the outcome

Once the above is complete, various metrics can be collected that can provide a useful basis for arriving at a set of recommendations. We can classify the recommendations into 3 categories:

Category-1: Hygiene initiatives

All the hygiene factors, mentioned in part-1, should be ranked on a scale of 1 to 10 (10 being high quality, complete and up to date). Wherever the score is below 5, a program plan for bringing it to 8+ should be provided. If the score is > 5, either a separate plan or as a by-product of a major upgrade/project should be specified.

Whatever it is, by a certain time line, all the hygiene factors should be brought to 8+. Even if no modernization is undertaken, this is very important. Otherwise, this will drag the entire development and release.

Category-2: Extrinsic initiatives

These should address factors like the following:

Data centre Consolidation (Number and locations)
Service Provider Consolidation (for a given IT spend, how many service providers should exist?
Scope for consolidation; Dependency vs Risk vs Savings)
License Optimization (Optimization of licenses across the organization, management of RLFs, negotiation with vendors etc.)

Category-3: Intrinsic initiatives

These arise from within the portfolio and enable the organizations to move to a certain state. Every state should be characterized by the set of value-added factors mentioned in part-2.

Some of these could be:

Categorization of applications into those that can be decommissioned, migrated and modernized and consequent programs to implement the same. All the initiatives (a functional migration, technical reengineering, refactoring etc.) should be ranked according to the RoI, Time taken to implement and risk including cost of change across the organization. Implementing such initiatives will lead to reduced application intensity, reduced technology spread etc.
Program plan for increasing the cloud usage
Program plan for bringing the entire organization into uniform DevOps or DevSecOps
Program plan for eliminating niche, obsolete and other technologies that will soon go out of support
Program plan for baselining the skill, productivity and measurement across the landscape

Tell us about the stakeholders

These exercises should be driven by the CxO through an appointed PO (Program Owner). A ToR (Terms of Reference) of such an exercise, objective, timeline and involvement and support of different stakeholders should be drawn up and roadshows held one to two months before the exercise. The program should be part of the CxO’s steering committee. A dashboard indicating how the health and agility of the portfolio evolves during the next 3-year is a must.

Wednesday, October 24, 2018

How to address technical debt in an evolving IT landscape?

Can you wipe out technical debt from your enterprise landscape? Can you always remain one step ahead? These are tough questions that require a calibrated answer. Before answering, we need to understand what is meant by technical debt, why it occurs and ways to manage them.

The first realization should be that technical debt is bound to happen no matter your toll gates and quality check. The second is an organized program should be initiated, at periodic intervals, to estimate the size and manage its growth. The third is that not all technical debt is bad.

Like a financial debt, if not managed or contained, it would grow to such an extent that it affects the time to market, sink the morale of employees and cease to make the IT shop attractive. Technical debt accrues interest in the sense that it can generate new debts and make it progressively difficult to manoeuvre the development. Tech debt will always be there.

What is TECHNICAL DEBT?

In simple terms, it is the marginal (incremental) work required to complete the software development in order to address the drawbacks. It doesn’t apply only to those projects that are in development stage. It follows even when it is in BAU (Business As Usual). Technical debt holds the organization back from introducing new functionality quickly. We can get an estimate of the debt as the time or money required for refactoring. This refactoring could result in change in design, cleaning up of bad code or porting to a new technology.

What causes technical debt?

The experts cite 4 major reasons for introduction of technical debt.

Cause #1 - Poor Conception: It is the rush or speed in delivery that causes poorly designed software.

Cause #2 – Poor Scheduling: Underestimating the time to develop a product or complete a project often is the culprit that introduces technical debt.

Cause #3 - Bad and inconsistent development practices: Various developers working across different modules tend to introduce their own practices that affect the design and possibly rebuild the logic independently.

Cause #4 - Outdated Technology: As technology evolves, software standards become higher every day. With each improvement, new technical debt can arise.

Types of TECHNICAL DEBT

Type #1: Intentional debt – The software engineers almost always know the right way to code something and the quicker to way to do it. In many cases, the quick way also turns out to be the right way and in others, it may not be. To quickly deliver a project or a product, the functionality will be achieved but not in the best possible manner.

Type #2: Design tech debt – Do we spend time thinking ahead? Or Do we future-proof our design with quicker delivery? As systems become richer with more functionality, the developers may find implementing a new feature very difficult. It may be easier to refactor the original design if it was constructed in a suitable manner. If not, what do you do? Be ready to bite the bullet and get into significant refactoring.

Type #3: Obsolete debt – As many people work on a system, it tends to evolve in a rotten manner. Some symptoms like copy-paste and cargo-cult programming can be easily seen to identify this type. This is directly incurred by the developers. This is one debt that we should avoid in a consistent manner.

Another way to look at this is using the following quadrant that classifies using two dimensions – deliberate or accidental and manner of introduction of such debt.

How to manage this?

Type #1 – Record the debt as backlog at the time it is incurred. This can be revisited later.

Type #2 – Whenever the system is in steady state BAU, allocate some time to look at this type of debt and see if any refactoring has to be done.

Type #3 – Continuous refactoring is the solution to address this. Very experienced and strong teams tend to take time to understand the design of the system before they work on it. When they work, the improve the design incrementally and clean up the bad code en route.

Measures / Signs of technical debt:

Source Code Formatting: It is a common measure. Insisting on the right tool as well as template during and before the SDLC can reduce this type.

Low Test Coverage: It is a measure of code quality. A very low level of the test coverage reduces the certainty of the accuracy of the software's behaviour and makes it difficult to solve problems when they occur.

Lack of Modularity: This generally results from a poor code design. Some code sometimes serves different business logic. The more codes developers write, the more lack of modularity can bottleneck. It is harder to manage software that has logic all over the codes and have parts of codes handling several logic.

Code Complexity: Complexity can be measured in several different ways, but it measures the dependence and path length to perform an operation. A long path leads to complex code.

Lack of the Documentation: Documentation is part of software development best practices. The software is often driven to evolve. It is important that the written code is always understood at all times by everyone who may be involved in the development process.

One approach is to perform a static analysis of code using tools that support the analysis. The following is a list of the most popular tools used: Coverity, SonarQube, Checkstyle, Closure Compiler.

There are two ways of measuring technical debt. The first one is to get a ratio of technical debt according to code volume, and the second one is to use directly the estimates given by the tools (like SonarQube), along with the list of technical debts and their references to the code, SonarQube gives an estimate in days or hours needed to fix this debt. For the ratio approach, we can use the initial estimates or even better, the overall time needed to develop the software so far and extrapolate the value according to the technical debt ratio. The time needed for the development is very accurate so measuring technical debt from ratio can give an accurate estimate of the work needed to fix the issues.

How to Reduce or Eliminate Technical Debt

Being agile is the best way of managing technical debt and reducing it when it appears. The sooner we address the issue, the less interest we'll have to pay over time. To address technical debt, software development teams can use the following approach.

The Quickest to Solve: Fixing debts that take little time to fix is an excellent way to eliminate technical debt gradually. Debts like code formatting can be solved in a little time, making up a template and apply these templates to all the codes that have been developed so far, then integrate these templates in the tools used by the developers.

Priority: It is also important to address issues by priority. All issues that can lead to more significant issues should be addressed quickly and should be prioritized to avoid accumulation.

Technology Update: When outdated technology leads to technical debt, it is important to update the software to the newest versions of the frameworks, application servers, databases etc. It is even important to include every stable evolution of a framework used for instance to always have the latest update and to bring small change without breaking the software.

Refactoring: Reviewing the software architecture and refactoring codes often can be useful when we don't want to end up with duplicate code or codes that lack modularity.

The quadrant, shown, can help to categorize the technical debt to identify which ones to fix first. There are many such approaches. Coming to agile processes, how to estimate / eliminate technical debt? It should be entered in the product backlog as a user story and should be prioritized like any user story. The prioritization should take into account the impact of not managing technical debt identified at the beginning of a new iteration. When deciding which stories to include in the next development iteration, we should analyze whether postponing technical debt correction for the next iteration is more or less advantageous in the long term. As a rule of thumb, we should address every critical issue as soon as it is identified.

Closing remarks

We have to learn to live with certain amount of technical debt. A good IT shop will always have a handle on the measure in terms of either the time or money needed to fix the accumulated debt to a manageable level. Understanding of different types of debt and the rate at which the organization accumulates will help in introducing specific measures to eliminate or minimize such debt.

References:

https://www.agilealliance.org/introduction-to-the-technical-debt-concept/

https://hackernoon.com/there-are-3-main-types-of-technical-debt-heres-how-to-manage-them-4a3328a4c50c

https://www.atlassian.com/agile/software-development/technical-debt

https://www.bmc.com/blogs/technical-debt-explained-the-complete-guide-to-understanding-and-dealing-with-technical-debt/