Thursday, April 26, 2012

VCAP5-DCD - The Hot Target for VCDX

I am thinking to start blogging about the Objectives of both the exams soon.


VCAP4-DCD  - yes that's right this is version 4 for now as version 5 is not GA yet (but will be soon)

VCAP5-DCD - Once it becomes GA then link will be available.

VCP5-DT 

All other certification details are available here.

You can also find out the blog posting by others on the similar subject.

All the discussion and material is purely made for my own study purpose so you can make the changes and use it accordingly.

Brain dumps people - this is not the place you should visit as there won't be any questions and answers here ;-)

Well lets get started.

I am going to start posting the objectives for the Design Exam as it will be GA in near future. If any one wants to do any guest posting, all are welcome to contribute and share the knowledge.

Just contact me offline through @mandivs


VCAP5-DCD


Objective 1.1 – Gather and analyze business requirements 
(e.g. current availability, manageability)

Skills and Abilities

·         Determine the relevant data set required to understand the current customer environment.
·         Given a design requirement and data set within a multi site environment, determine which components would be included in a design.
·         Given results of a requirement gathering survey, identify the business requirements.
·         Given one or more business requirements, analyze and determine the impact of the requirements on the design.

Tools 

VMware Virtualization Case Studies                                 



-          A good  VMware design matches products, features, and capabilities to business needs
-          Business need is the combination of many things summarized by: requirements, constraints, assumptions
-          Resulting in the identification of – design decisions, justifications, impacts, risks
Architecture vision
-          Scope – project boundaries, eliminate creep
-          Goals – business problems we’re solving with measurable results
-          Requirements – must meet
-          Assumptions - valid but not proven, as few as possible
-          Constraints – limit design choices
-          Risks- every design will have them, find them and communicate them
The vision will guide the project through its phases
Which may not fall into requirements and constraints are fall in to assumptions
Take the assumptions from customer and get the clarity – valid but not proven
e.g. customer budgetary concern, don’t have enough infrastructure but will purchase additional hosts in future
Wrote categories on the board
Requirements, assumptions, constraints and risks (which are not conveyed properly or not clear)

Five steps of design
1)      Initial Design meeting – scope, goals, requirements, constraints, who should be invited?
2)      Current state analysis – complete datacenter inventory, virtualization candidates, tools, constraints, assumptions
3)      Stake holder and SME training educate SMEs who can help make informed design decisions
4)      Design sessions – design decisions with stakeholders/SMEs, no surprises later
5)      Design deliverable – documentation capacity analysis, hosts, vCenters, clusters, network, storage, monitoring, patching, backup, restore, DR, security, installation, operations, scalability, support, logical, physical etc.

Current state Analysis and vApps
-          Identify virtualization candidates and applications (identify non-virtualization candidates)
-          Capture baseline performance metrics including average and peak loads (feeds into capacity requirements, feeds into consolidation rations, can be used for comparison purposes)
-          Identify unique dependencies
-          Identify reusable hardware for the design
Design best practice Gems
-          Avoid known problems and achieve predictable results
-          Use as flexible guidelines, not rigid rules
-          Blend with the business unit goals and requirements
-          Evolve with technological advances
-          VCDX defense caution (Requirements, constraint, Assumption or Risk?

Managing downtime

-          So while major disasters may be infrequent events, the risk of downtime is still constant and costly.  The graph shown here breaks down the average cost of downtime by industry, but you can see that the average is around $1.5 million dollars per hour.  This includes not only lost revenue to the company, but also lost employee productivity and damage to a company’s relationship with partners and suppliers.  

-          Another key fact to point out is that downtime is common.  More than half of companies experience some amount of downtime in a given year, and some analysts have found that around a quarter of companies have had to declare a disaster at least once in the past five years.  Often these events are due to causes like an extended power outage, something you wouldn’t normally think of as being a disaster.  Companies who weren’t prepared to handle such disaster recovery situations would stand to lose millions of dollars and, if recovery takes long enough or is ultimately unsuccessful, could also lose their business.

Recovery Risk

-          So while major disasters may be infrequent events, the risk of downtime is still constant and costly.  The graph shown here breaks down the average cost of downtime by industry, but you can see that the average is around $1.5 million dollars per hour.  This includes not only lost revenue to the company, but also lost employee productivity and damage to a company’s relationship with partners and suppliers.  

-          Another key fact to point out is that downtime is common.  More than half of companies experience some amount of downtime in a given year, and some analysts have found that around a quarter of companies have had to declare a disaster at least once in the past five years.  Often these events are due to causes like an extended power outage, something you wouldn’t normally think of as being a disaster.  Companies who weren’t prepared to handle such disaster recovery situations would stand to lose millions of dollars and, if recovery takes long enough or is ultimately unsuccessful, could also lose their business.

 Additional Reading


   
Recovery Time Objective (RTO) is the period of time within which systems, applications, or functions must be recovered after an outage. This defines the amount of downtime that a business can endure, and survive. Recovery time includes: fault detection, data recovery, and bringing applications back online.

Recovery Point Objective (RPO) is the point in time to which systems and data must be recovered after an outage. This defines the amount of data loss a business can endure. Different business units within an organization may have varying RPOs.

Business Continuity is a holistic approach to planning, preparing, and recovering from an adverse event. The focus is on prevention, identifying
risks, and developing procedures to ensure the continuity of business function.

Disaster recovery planning should be included as part of business continuity.

Objectives of Business Continuity:
- Facilitate uninterrupted business support despite the occurrence of problems.
- Create plans that identify risks and mitigate them wherever possible.
- Provide a road map to recover from any event.

Disaster Recovery is more about specific cures, to restore service and damaged assets after an adverse event. In our context, Disaster Recovery is the coordinated process of restoring systems, data, and infrastructure required to support key ongoing business operations.

Business Continuity Planning (BCP) is a risk management discipline. It involves the entire business--not just IT. BCP proactively identifies vulnerabilities and risks, planning in advance how to prepare for and respond to a business disruption. A business with strong BC practices in place is better able to
continue running the business through the disruption and to return to “business as usual.”

BCP actually reduces the risk and costs of an adverse event because the process often uncovers and mitigates potential problems.

In summary you have 3 major complications in a traditional DRP that is based on physical systems:

First, a variety of information and data that needs to be protected. All of which needs to be available to ensure recovery. These different kinds of data require different processes and tools. For example, backing up system disks (which are typically on internal storage) uses a different process than backing up data disks. Tracking hardware configuration requirements is also difficult. A common approach is to use spreadsheets, which easily become out of date or get lost. And backups used to protect data can be easily misplaced or corrupted.

Second, a very complex recovery process. Organizations are also faced with a complex recovery process with lots of steps, which can cause recovery to fail. For example, it is easy to miss a hardware dependency that leads to a failure and additional steps. Given the number of manual steps, it is easy for the staff who are executing recovery to make errors that impede recovery. All in all, the process can easily take days, and maybe even longer.

Third, DRPs are very difficult to test. Organizations face significant difficulties testing recovery plans. Tests may require additional servers and storage arrays so that the recovery environment is not disturbed. It is also very difficult to test without disrupting production systems and protecting the process.

Disaster protection for services tends to be tiered. Tier one includes services for which no downtime at all can be tolerated. These services tend to be deployed from the start in active-active configurations. For the remaining services, however, disaster recovery plans are typically in a three-ring binder to be used in the event of a disaster. Instructions in the binder describe how to recover a particular server, configure particular hardware, reinstall the operating system, recover from tape, and so forth.

Of course, these steps are very manual, and they are often very difficult to test.

A Recovery Point Objective is the point in time restored systems have to be started with. A Recovery Time Objective is the amount of time allowed between the disaster and the time when you will have recovered systems to match the RPO. The terms “recovery point objective” and “recovery time objective” will be discussed in more depth later in this course.

Disaster recovery planning  and business continuity planning  are two different things. A DRP is designed to be a plan or a set of procedures that guides employees during the chaos of a disaster and the time immediately following a disaster. It is focused on safeguarding assets and personnel and it is procedure-oriented. It will include things like a step by step procedure on how to get critical systems back online fast at the recovery site. By nature, it is designed to provide a temporary safe harbor for the business. This involves failover planning. Failover is the process by which key business systems are transitioned rapidly during an emergency to a remote recovery site. In contrast business continuity is the process of keeping daily operations running after the disaster.

A Business continuity plan, or BCP  provides guidance on how to keep day to day business operations going at the recovery site. Business continuity planning also changes depending on perspective. For example, the IT department might be concerned mainly about data backups. (Data backups must continue to be made even while operating at the recovery site.) Other departments might have a wider view. To get a call center back online, you not only need your customer data, you also need a facility with desks and phones.

BCPs  should specifically address three things. First, how to run business operations with the smaller capacity of the recovery site. Second, how to prevent interference between recovery operations and the production operations that were formally run at the site. And third, how to eventually failback to the original primary site.

Talk about RPO (recovery point objective) – which is about how much data is lost, i.e., since the last BU.  We can handle most anything, from minutes to years.

Also talk about RTO (recovery time objective) – which is about how quick we can get back to work.  We cannot do real time or near real-time.  Hours is generally what we can do, but it is very important to understand we must test to determine exactly what is possible 

Continued .......
Objective 1.2 – Gather and analyze application requirements [soon]


No comments:

Post a Comment