The latest BriefingsDirect discussion focuses on one of the toughest balancing acts in seeking the best of cloud computing benefits. This balance comes from obtaining the proper degree of centralization or “common good” for infrastructure efficiency, while preserving a sufficient culture of decentralization for agility, innovation, and departmental-level control.
The requirement for empowering centralization is no more evident than in a large university setting, where support and consensus must be preserved among such constituencies as faculty, staff, students, and researchers — across an expansive educational community.
But the typical IT model does not support localized agility when it takes weeks to spin up a server, if online services lack automation, or if manual processes hold back efficient ongoing IT operations. Too much IT infrastructure redundancy also means weak security, high costs, lack of agility, and slow upgrades.
We’re joined by an IT executive from the University of New Mexico (UNM) to learn more about moving to a streamlined and automated private cloud model to gain a common good benefit, while maintaining a vibrant and reassured culture of innovation. We’re also joined by a VMware executive to learn more about the latest ways to manage cloud architectures and processes to attain the best of cloud efficiencies, while empowering improved services delivery and process agility.
They are: Brian Pietrewicz, Director of Computing Platforms at the University of New Mexico in Albuquerque, and Kurt Milne, Director of Product Marketing in the Management Business Unit at VMware. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
Gardner: Tell us about your IT organization at the university and how you’ve been able to do change, but at the same time not alienate your users, who are, I imagine, used to having things their way.
Pietrewicz: At the University of New Mexico, it’s a highly decentralized organization. In most cases, the departments are responsible for their own IT. In most cases, that means they don’t have the resources to effectively run IT, in particular, things like data centers, servers, storage, disaster recovery (DR), and backups.
What we’re doing to improve the process is providing infrastructure as a service (IaaS) to those groups so that they don’t have to worry about the heavy lifting of the infrastructure pieces that I mentioned before. They can stay focused on their core mission, whether that’s physics, or psychology, or who knows what.
So we offer IaaS. We’re running a VMware stack, and we’re also running vCloud Automation Center (vCAC). We’ve deployed the Self-Service Portal. We give departments, faculty members, or departmental IT folks the ability to go into the portal and deploy their own machines at will.
Then, they are administrators of that machine. They also have additional management features through the vCAC console so that they can effectively do whatever they need to do with the server, but not have to worry about any of the underlying infrastructure.
Gardner: That sounds like the best of both worlds. In a sense, you’re a service provider in the organization, getting the benefits of centralization and efficiency, but allowing them to still have a lot of hands-on control, which I assume that they want.
Pietrewicz: Correct. The other part is the agility, the ability for them to be able to react quickly, to consume infrastructure on demand as they need it, and have the benefit of all the things that virtualization brings with redundant infrastructure, lower cost of ownership, and those sorts of things.
Milne: It’s an interesting time to be in the IT space, because there’s this new set of expectations being imposed on IT by the business to be strategic, to quickly adopt new technology, and boost innovation.
At the same time, IT still has the full set of responsibilities they’ve always had — to stay secure, to avoid legacy debt, to drive operational excellence so they maintain uptime, security, and quality of service for transactional systems and business-critical systems.
It’s really an interesting paradox. How do you do these two things that are seemingly mutually exclusive — go fast, but at the same time, stay in control?
Brian’s approach is what I call it “push button IT,” where you give folks a button to push and they get what they need when they want it. But if IT controls the button and they control what happens when the user pushes the button, IT is able to maintain control. It’s really the best of both worlds.
Gardner: Brian, tell us a little bit about how long you have been there and what it was like before you began this journey?
Pietrewicz: I’ve been at UNM for about two-and-a-half years, and I can tell you the number one complaint. We suffer from a lot of the same problems that other large IT shops have, with funding and things like that. But the primary issue that we had when I walked in the door was customers being upset because we didn’t have clearly-defined services, and we had sold these services to these customers.
We had sold virtual machines (VMs) with database backups, and all kinds of interesting things, with no service-level agreements (SLAs), no processes, nothing wrapped around it. The delivery of these services was completely inconsistent.
So I started out down the new path. The first thing that we did was to make the services more consistent. Just to give you an example, deploying a virtual machine for a customer. The way that it was when I got here was that a ticket came into the service desk. It went to a single technician, and then whichever technician got that ticket figured out their own way of getting that machine deployed.
As the next step in that process, we went through and, instead of just having it being done a different way by whoever received the ticket, we identified all the steps associated. In looking at all the steps associated, we identified over a 100 manual steps that went though six different completely separate groups inside of our organization.
Those included operating system, storage, virtualization, security, and networking for firewall changes. In all those various groups that deploy their individual piece of that puzzle, it was being done differently every time. Our deployment times were taking as long as three weeks. You can imagine how painful that is when it takes 20 minutes to spin up a VM — but it was taking three weeks to deploy it to a customer.
We identified all the steps and defined the process very, very clearly; exactly what it takes to deploy a VM. The interesting thing that came out of that was that it gave us the content necessary to be able to start developing a true service description and an SLA.
It also made it so that it was consistent. We did a few things after we did the process development. We generated workflows within our ticketing system, so that all that happened was a ticket was put in and then it auto-generated all the necessary tickets to deploy the VM, so it happened in a very consistent way.
That dropped the deployment time from three weeks down to about three days, because it still had to go through certain approval process and things like that with security.
For the next step we said, “Okay, how can we do this better?” We looked at all of those steps that we put in place and found that they were all repetitive, manual steps that could be easily automated. So enters VMware vCAC.
We took all the steps, after we had them clearly defined, and we automated all the steps that we could. We couldn’t automate all of them, for example, sending information to our billing system to bill the customer back. From vCAC we shoot an email over to our ticketing system, that generates a ticket. Then, the billing information is still entered manually, and we are working on an upgrade to that.
UNM is approximately 45,000 faculty, staff, and students. We have about 100 either departments or affiliates, and today, we’re running about 660 VMs for our organization. For central IT, we’re between 98 percent and 99 percent virtualized.
When I first got here, the services were not defined and the processes were not defined. Since then, we have clearly defined the processes, narrowed those down into the very specific processes and tasks that had to be done, and then we automated. We’re going through the process of automating every step in that process.
Now, we have a thing we call Lobo Cloud — our mascot is the Lobo. Customers can now go online and deploy a machine within 20 minutes. So basically everything has transformed from extremely inconsistent service and taking as long as three weeks to deploy, to now it being the equivalent going into McDonald’s and ordering a Big Mac. It’s extremely consistent and down from three weeks to 20 minutes.
Gardner: I assume Brian that you’ve adopted some industry-standard methods, perhaps a framework, that gave you some guidance on this. How does your service delivery policy adhere to an industry standard like ITIL?
Pietrewicz: That’s what we use. We follow ITIL and we’re at varying levels of maturity with it. ITIL is very challenging to implement, but it’s extremely helpful, because it gives you a framework to work within, to start narrowing down these process, defining services, setting SLAs. It gives you a good overarching framework to work within.
The absolute hardest part of all of this is implementing the ITIL framework, identifying your processes, identifying what your service is, and identifying your SLA. Walking through all of that is exponentially harder than putting the technology in place.
Gardner: It seems to me that not only are you going to get faster servers, response times, and automation, but there are some other significant benefits to this approach. I’m thinking about security, disaster recovery (DR), the ability to budget better through an OPEX model, and then ultimately reduce total costs.
Is it too soon or have some of these other benefits that I have heard about typically when people move to a more automated cloud approach? How is that working for you?
Pietrewicz: We don’t really have good statistics on it. For the folks that had machines sitting underneath their desks and in closets before, we don’t have a lot of the statistics to know exactly the cost and the time they were spending on that.
Anybody who works with virtualization quickly learns that once you hit a certain size, it becomes significantly less expensive. You become far more agile and you get a huge number of benefits. Some of them are things that you mentioned — the deployment time, DR, the ability to automate, the taking advantage of economies of scale.
Instead of deploying one $10,000 server per application, you’re now loading up 70 machines on a $15,000 server. All of those things come into play. But we really don’t have good statistics, because we didn’t really have any good processes before we started.
What’s interesting now is that our next step in the process is to automate our billing process. Once we do that, we’re going to have everything from our virtual infrastructure deployed into our billing system and either a charge-back or a show-back methodology.
So we’ll have complete detailed costs of all of our infrastructure associated with every department and every application that is using our service. We’ll be able to really show the total cost of ownership (TCO).
Milne: Brian, it sounds like you’re on a path that a lot of our customers are on. What we see typically is that there is a change in consumption behavior when your customers know that they can get IaaS on demand. They stop hoarding resources. The same kind of tools and processes that can automate the delivery of those services can also automate tearing down those services when they’re done.
Virtualization by itself increases capacity utilization quite a bit, but then going to this kind of services delivery, service consumption for infrastructure, actually further increases utilization and drives down over-provisioning.
Adding that cost transparency to that service will further change your consumers’ behavior and the ability to get it when you need it and only pay for what you use drives down the amount of resources that you have to keep in your data center.
Pietrewicz: Absolutely. It’s amazing what happens when you have to pay for something and it’s very visible.
Milne: I always feel that if IT is free that really changes the supply and demand equation, if you study economics. People don’t know what to do with free. They typically take too much.
Pietrewicz: Right. This really starts driving basic economic and social behavior into the equation in IT. It’s a difficult thing for organizations to get their head around, and they’re sort of getting it here at the university. It’s not completely in place. The way that we look at it is as a, “We’ll build it, and they’ll come” kind of thing.
Most folks have figured out that they can really save that money. Instead of going out and buying a $10,000 server, they can buy a $1,000 VM from us that does the exact same thing. If they don’t want it any more, they can turn it off and not pay any more. All of those things come into play.
Another piece on that is the university was experimenting with a thing called reliability centered maintenance (RCM), which is a budgeting process that works toward the bottom line of a particular organization. That means that people have to be transparent and make clear decisions about where they’re spending their money. That’s also starting to drive adoption.
Gardner: We talked about some of the ancillary benefits of your approach, but there are some direct benefits when you go to a cloud model, which gives you more options. You can have your private cloud. You can look to public cloud and other hosting models, and then you can start to see a path or a vision towards a hybrid cloud environment, where you might actually move workloads around based on the right infrastructure approach for the right job at the right time. Any thoughts about where your clouds goals are vis-à-vis the hybrid potential?
Pietrewicz: We have a few things in play that we’re actively working. Today, we have people using various cloud providers. The interesting part about that they’re just paying for it with a credit card out of their department, and the university doesn’t have any clear way of knowing exactly what’s out there. We don’t really have any good security mechanisms in place for determining whether there’s any sensitive data being stored out there inadvertently.
We’re working with a lot of the cloud providers that we are already spending money with and we are already working with to develop consolidated accounts. One, we can save money through economies of scale. And two, we can get some visibility into what folks are actually using the cloud for. And then three, IT would like to act as an adviser to be able to point out for the various cloud providers that are out there — this particular provider is good at functionality or this particular provider is good at security.
The first step is to corral the use of public cloud for UNM and create an escorting process to the cloud. The second step is going to be a hybrid cloud that we’ll set up from our private cloud here on site. We envision setting up hybrid cloud services with those public cloud providers to be able to move the workloads back and forth when necessary.
The other major benefit that we very much look forward to is being able to do DR in the cloud and taking advantage of the ability to replicate data and then spin up systems as you need them, rather than having a couple of million dollars in equipment sitting, waiting, and hoping you never use it. Things that you have to refresh every four years so that you have a viable DR plan.
Gardner: Is vCloud Automation Center something that will be useful in moving to this hybrid model? The one button to push, as it were, on the private cloud, will that become a one button to push in the hybrid model as well?
Pietrewicz: It will. I mentioned those various cloud service providers. Most of them are compatible with the vCloud Connector, so that you can simply just connect up that hybrid cloud service and with a little bit of work, be able to massage your portal.
We can have a menu option of public cloud providers through our portal that they could just select and say that they want to get a vCHS, Amazon, or Terremark, and then potentially move workloads back and forth. So vCAC and vCloud Connector are all at the center of it.
The other interesting piece that we’re working on and going to try to figure out as part of this is that we really want to start looking into NSX and/or VIX to be able to provide very clear security boundaries, basically multi-tenancy, and then potentially be able to move those multi-tenant environments back and forth in the cloud or extend them from public to private cloud as well.
Gardner: Brian, you mentioned multi-tenancy earlier, and of course, there is a lot going on with software-defined data center, networking, and storage. What is it about it that’s interesting to you and why is this a priority for you, software-defined networking (SDN), for example?
Pietrewicz: SDN is the next sort of step in being able to truly automate your IaaS and your virtual environment. If you want to be able to dynamically deploy systems and have them be in a SAN box that is multi-tenant by customer, you really need to have an SDN-type solution, or at least that’s extremely helpful to do that.
One of the things that we are looking at next is to be able to implement something like NSX, so that we can deploy the equivalent of what’s a virtual wire, a multi-tenant environment, to individual customers, so that they can only see their stuff and can’t see their neighbors and vice versa.
The key is the ability to orchestrate that on demand and not have to deal with the legacy VLAN and firewall kind of issues that you have with the legacy environment.
Gardner: It’s interesting how a lot of these major trends — service delivery, cloud, private cloud, DR, and SDN — are interrelated. It’s a complex bundle, but the payoffs, when you do this inclusively, are pretty impressive.
Pietrewicz: Whenever you get to the point of abstracting things to the software level, you provide the ability to automate. When you have the ability to automate, you get tremendous flexibility. That sometimes can be an issue in and of itself, just making decisions on how you want to do something. But along with that flexibility, you get the ability to automate just about anything that you want or need to be able to do.
The second piece to that is that we’re really excited about figuring out, when we build the hybrid cloud model, how we might be able to extend those tenants into the cloud, either as active running workloads or in a DR model, so that the multi-tenancy is retained.
Milne: From VMware’s perspective, that kind of network virtualization capability is critical for our hybrid cloud service. It’s that capability that NSX provides that creates that seamless experience from your data center out to the hybrid cloud.
As you said, Brian, that kind of network configuration, allocation, and reallocation of IP addresses, when you are moving things from one data center to another, is not something you want to do on a manual basis. So NSX is a key component of our hybrid cloud vision. It’s something that lot of the other cloud providers just don’t have.
Pietrewicz: I see it as the next frontier in IT. I think that when SDN starts taking off, it’s going to be a game changer in ways that we are not even recognizing yet, and that’s one example. Moving a workload from one network to another network is extremely powerful.
Gardner: Kurt, this sounds as if not only is Brian transitioning into being a service provider to his constituencies, but now he’s also becoming a cloud broker. Is this typical of what you’re seeing in the market as well?
Milne: It is. Some of our customers will take a step to try to get their arms around shadow IT, users going around IT, to just offer that provisioning option through the IT portal. So it’s like, “You’re using Amazon? That’s fine. We can help you do that.” So putting a button in the service catalog deploys the kind of work that they’ve been doing in a public cloud like Amazon, but it has to come through IT. Then, IT is aware of it.
There’s a saying I like. It’s called the “cloud boomerang.” A lot of times, the IT customers will put thing out in the public cloud, but like a boomerang, it seems to always come back. The customer wants to integrate it with an existing system or they realize that they have to support it up in the cloud. A lot of times, those rogue deployments make their way back to the IT organization. So putting an Amazon service in the vCAC portal and not changing anything else is a nice first step in corralling that.
Pietrewicz: That is exactly what we’re seeing. At a university, because there isn’t really governance, it’s more like build a good service and hope they come. We take the approach of trying to enable it. We want to make it very transparent and say that they can use Amazon or vCHS, but there’s a better way to do it. If you do it through the portal, you may be able to move those workloads back and forth.
We are actually seeing exactly what you mentioned, Kurt. Folks are reaching the limitations of using some of the cloud providers, because they need to get access to data back here at UNM and are actually doing the boomerang approach. They started out there and now they’re migrating their machines into our IaaS so that they can get access to the data that they need.
Gardner: Kurt, we heard some very interesting things at VMworld recently around the cloud-management platform. Why don’t you tell us a little bit about that and how that fits into what we’ve been discussing in terms of this ongoing maturity and evolution that a large organization like the University of New Mexico is well into?
Over the years, VMware has either built or acquired quite a few different management products. We’ve combined those products into a number of suites, like our automation, operations, and our business management suites. Now, we’re taking that next step and combining a lot of those capabilities into a single platform.
There are a couple of guiding ideas there. We see in organizations like Brian’s is that the lines between the automated provisioning of those workloads automation, provisioning those workloads, and the ongoing operations and maintenance and support of those workloads, is really starting to blur.
So you have automation tasks that might happen when you’re doing a support call. Maybe you want to provision some more resources, and there are operations tasks like checking system health that you might want to do as a step in an automation routine.
Our product strategy change is to move toward a shared-services model, similar to a service-oriented architecture. The different services that are underlying our management products would be executable through a tool like vCAC, through a command line interface, or through like a REST API. There’s kind of a mix-and-match opportunity to execute those services in different ways.
To build that platform with the shared service model on top, we need to start re-architecting some of our products in the back-end, so that we have a common orchestration engine, a common DR backup and a common policy engine. You don’t want one tool to undo the work that another tool did yesterday. You can’t have conflicting robots going out and doing automated tasks.
The general idea is to try to further consolidate these different management functions into a single platform. The overall goal is to try to help organizations maintain control, but then also increase flexibility and speed for their business users.
Gardner: Brian, is that something that you think is going to be on your radar? Is management so distributed now that you’re looking for a more consolidated approach that’s inclusive?
Pietrewicz: That would be wonderful. We’re doing things many different ways. If you take the example of orchestration, we are using Orchestrator, PowerShell, Perl, and starting to experiment with Puppet.
It would be really good if you could have one standardized way that you approach orchestration, as an example, and how that might tie into all the other pieces for back-end management, rather than handling it several different ways. As Kurt was mentioning, one part starts to step on another part. Having that be consolidated and consistent would be a huge value.
Milne: The other part of the strategy is also to make that work across environments. So the same tools and services would be available if you are provisioning up to Amazon or to your private cloud or hybrid cloud service, and even different hypervisors.
We’re fully aware of the heterogeneous nature of the modern data center. So we’re shifting to try to create that kind of powerful common management stack with that unified management experience across all of the environment. It’s kind of a nirvana. When we talk to people, they say that’s exactly what they want. So our vision is to kind of march towards delivering on that.
Gardner: Kurt, I am trying to recall from VMworld whether this was offered on-premises, as a service from a cloud, or some combination?
Milne: That’s the other interesting part of this. We’re starting to go down the path of offering a number of our management products as a service. For example, at VMworld, we announced the availability of a beta for our vCAC product as a software as a service (SaaS), so you can without installing any software get a service portal, get that workflow and policy engine, and deploy infrastructure services across different environments.
We’ll be rolling out betas for our other products in subsequent quarters over the next year or so. Then potentially we could have the SaaS services interact with and combine with the services that are available through the products that are installed on-premise. Our goal is to get these out there and then understand what the best use cases are, but that kind of mix and match is part of the vision.
Gardner: It’s interesting. We might have a reverse boomerang when it comes to the management of all of this. Does that sound appealing Brian? Is that something you would look to as a cloud service, comprehensive management?
Pietrewicz: Absolutely, but it’s largely dependent on return on investment (ROI). It’s that balance of, when you get to a certain level in an IT shop, it’s sometimes cheaper to do things in-house than it is to outsource it, and sometimes not. You have to do the analysis on the ROI on what makes more sense to bring it in or to use a SaaS.
As an example, we completely outsourced all of our email, because it’s a lot of work. It’s very simple and easy to do as a SaaS solution, but it’s a lot more work to do in-house. It’s definitely something that we would look into.
Milne: In a mid-sized organization that might have 300 different applications that the IT organization supports, maybe 50 of those are IT tools. Already we’ve seen progress with companies like ServiceNow that have a SaaS-based service desk. It makes sense to start to turn more of those management products into a SaaS delivery model.
Gardner: Brian, any thoughts about others who are starting to move in your direction, perhaps their own Lobo Cloud, their own portal rationalizing these services, being able to measure them better. What in 20/20 hindsight do you have that you could recommend for them as they go about this? Any learned lessons you could share?
Pietrewicz: The biggest lesson learned, without a doubt, is the focus on the process orientation, the ITIL model. The technology is really not that hard. It’s determining what your service is, what are you trying to deliver, and then how do you build that into a consistently delivered service, complete with SLAs and service descriptions that meet the customer needs. That’s the most difficult part.
The technical folks can definitely sling the technology. That doesn’t seem to be that big of a deal. The partners and providers do a very good job of putting together products that make it happen, but the hard part is defining the processes and defining the services and making sure that they are meeting the customer needs.
Gardner: Kurt, any thoughts in reaction to what Brian said in terms of getting started on the right path around cloud rationalization of your IT organization?
Milne: One of the things that I’ve seen is a lot of organizations go through this process that Brian has described, trying to clearly define their services and figure out which parts of those services they’re going to automate.
A lot of organizations start that service definition effort from an inside-out perspective, get a bunch of IT guys together, and try to define what you do on a daily basis in a service. That’s hard.
The easier approach is just to go talk to your customers and users and ask, “If I were going to give you a button you could click to get what you need, what would you put behind the button?” Then, you define your services more from an outside-in perspective. It seems to be where companies get anyway and you just shortcut a lot of teeth gnashing and internal meetings when you do it that way.
You may also be interested in:
- A Gift That Keeps Giving, Software-Defined Storage Now Demonstrates Architecture-Wide Benefits
- The Open Group panel: Internet of things poses massive opportunities and obstacles
- Navicure Gains IT Capacity Optimization and Performance Monitoring Using VMware vCenter Operations Manager
- How Waste Management builds a powerful services continuum across IT operations, infrastructure, development, and processes
- VMware vCloud Hybrid Service Powers Journey to Zero-Cost Applications Support for City of Melrose
- Millennium Pharmacy Takes SaaS Model to New Heights VIA Policy-Driven Operations Management and Automation
- Thomas Duryea Consulting provides insights into how leading adopters successfully solve cloud risks
- Indiana health care provider goes fully virtualized, gains head start on BYOD and DR benefits
- AT&T Cloud Services Built on VMware Cloud Datacenter Meet Evolving Business Demands for Advanced IaaS