The next BriefingsDirect big-data use case leadership discussion explores how retail luxury goods market analysis provider Sky I.T. Group has upped its game to provide more buyer behavior analysis faster — and with more user depth.
Learn how Sky I.T. changed its data analysis platform infrastructure to Hewlett Packard Enterprise (HPE) Vertica — and why that has helped solve its challenges around data variety, velocity, and volume and make better insights available across the luxury retail marketplace.
To share how retail intelligence just got a whole lot smarter, we welcome Jay Hakami, President; Dane Adcock, Vice President of Business Development, and Stephen Czetty, Vice President and Chief Technology Officer, all at Sky I.T. Group in New York. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
Gardner: What’s driving the need for greater and better big-data analysis for luxury retailers? Why do they need to know more, better, faster?
Adcock: Well, customers have more choices. As a result, businesses need to be more agile and responsive and fill the customer’s needs more completely or lose the business. That’s driving the entire industry into practices that mean shorter times from design to shelf in order to be more responsive.
It has created a great deal of gross marketing pressure, because there’s simply more competition and more selections that a consumer can make with their dollar today.
Gardner: Is there anything specific to the retail process around luxury goods that is even more pressing when it comes to this additional speed?
Adcock: Yes. The downside to making mistakes in terms of designing a product and allocating it in the right amounts to locations at the store level carries a much greater penalty, because it has to be liquidated. There’s not a chance to simply cut back on the supply chain side, and so margins are more at risk in terms of making the mistake.
Ten years ago, from a fashion perspective, it was about optimizing the return and focusing on winners. Today, you also have to plan to manage and optimize the margins on your losers as well. So, it’s a total package.
Gardner: So, clearly, the more you know about what those users are doing or what they have done is going to be essential. It seems to me, though, that we’rere talking about a market-wide look rather than just one store, one retailer, or one brand.
How does that work, Jay? How do we get to the point where we’ve been able to gather information at a fairly comprehensive level, rather than cherry-picking or maybe getting a non-representative look based on only one organization’s view into the market?
Hakami: With SKYPAD, what we’re doing is collecting data from the supplier, from the wholesaler, as well as from their retail stores, their wholesale business, and their dot-com, meaning the whole omni channel. When we collect that data, we cleanse it to make sure its meaningful to the user.
Now, we’re dealing with a connected world where the retailer, wholesalers, and suppliers have to talk to one another and plan together for the buying season. So the partnerships and the insight that they get into the product performance is extremely important, as Dane mentioned, in terms of the gross margin and in terms of the software information. SKYPAD basically provides that intelligence, that insight, into this retail/wholesale world.
Gardner: Isn’t this also a case where people are opening up their information and making it available for the benefit of a community or recognizing that the more data and the more analysis that’s available, the better it is for all the participants, even if there’s an element of competition at some point?
Hakami: That’s correct. The retail business likes to share the information with their suppliers, but they’re not sharing it across all the suppliers. They’re sharing it with each individual supplier. Then, you have the market research companies who come in and give you aggregation of trends and so on. But the retailers are interested in sell-through. They’re interested in telling X supplier, “This is how your products are performing in my stores.”
If they’re not performing, then there’s going to be a mark down. There’s going to be less of a margin for you and for us. So, there’s a very strong interest between the retailer and a specific supplier to improve the performance of the product and the sell-through of those products on the floor.
Gardner: Before we learn more about the data science and dealing with the technology and business case issues, tell us a little bit more about Sky I.T. Group, how you came about, and what you’re doing with SKYPAD to solve some of these issues across this entire supply chain and retail market spot.
Hakami: I’ll take the beginning. I’ll give you a little bit of the history, Dana, and then maybe Dane and Stephen can jump in and tell you what we are doing today, which is extremely complex and interesting at the same time.
We started with SKYPAD about eight years ago. We found a pain point within our customers where they were dealing with so many retailers, as well as their own retail stores, and not getting the information that they needed to make sound business decisions on a timely basis.
We started with one customer, which was Theory. We came to them and we said, “We can give you a solution where we’re going to take some data from your retailers, from your retail stores, from your dot-com, and bring it all into one dashboard, so you can actually see what’s selling and what’s not selling.”
Fast forward, we’ve been able to take not only EDI transactions, but also retail portals. We’re taking information from any format you can imagine — from Excel, PDF, merchant spreadsheets — bringing that wealth of data into our data warehouse, cleansing it, and then populating the dashboard.
So today, SKYPAD is giving a wealth of information to the users by the sheer fact that they don’t have to go out by retailer and get the information. That’s what we do, and we give them, on a Monday morning, the information they need to make decisions.
Dane, can you elaborate more on this as well?
Adcock: This process has evolved from a time when EDI was easy, because it was structured, but it was also limited in the number of metrics that were provided by the mainstream. As these business intelligence (BI) tools have become more popular, the distribution of data coming from the retailers has gotten more ubiquitous and broader in terms of the metrics.
But the challenge has moved from reporting to identification of all these data sources and communication methodologies and different formats. These can change from week to week, because they’re being launched by individuals, rather than systems, in terms of Excel spreadsheets and PDF files. Sometimes, they come from multiple sources from the same retailer.
One of our accounts would like to see all of their data together, so they can see trends across categories and different geographies and markets. The challenge is to bring all those data sources together and align them to their own item master file, rather than the retailer’s item master file, and then be able to understand trends, which accounts are generating the most profits, and what strategies are the most profitable.
It’s been a shifting model from the challenge of reporting all this data together, to data collection. And there’s a lot more of it today, because more retailers report at the UPC level, size level, and the store level. They’re broadcasting some of this data by day. The data pours in, and the quicker they can make a decision, the more money they can make. So, there’s a lot of pressure to turn it around.
Gardner: When you’re putting out those reports on Monday morning, do you get queries back? Is this a sort of a conversation, if you will, where not only are you presenting your findings, but people have specific questions about specific things? Do you allow for them to do that, and is the data therefore something that’s subject to query?
Subject to queries
Adcock: It’s subject to queries in the sense that they’re able to do their own discovery within the data. In other words, we put it in a BI tool, it’s on the web, and they’re doing their own analysis. They’re probing to see what their best styles are. They’re trying to understand how colors are moving, and they’re looking to see where they’re low on stock, where they may be able to backfill in the marketplace, and trying to understand what attributes are really driving sales.
But of course, they always have questions about completeness of the data. When things don’t look correct, they have questions about it. That drives us to be able to do analysis on the fly, on-demand, and deliver some responses, “All your stores are there, all of your locations, everything looks normal.” Or perhaps there seems to be some flaws or things in the data that don’t actually look correct.
Not only do we need to organize it and provide it to them so that they can do their own broad, flexible analysis, but they’re coming back to us with questions about how their data was audited. And they’re looking for us to do the analysis on the spot and provide them with satisfactory answers.
Gardner: Stephen Czetty, we’ve heard about the use case, the business case, and how this data challenge has grown in terms of variety as well as volume. What do you need to bring to the table from the data architecture to sustain this growth and provide for the agility that these market decision-makers are demanding?
Czetty: We started out with an abacus, in a sense, but today we collect information from thousands of sources literally every single week. Close to 9,000 files will come across to us and we’ll process them correctly and sort of them out — what client they belong to and so forth, but the challenge is forever growing.
We needed to go from older technology to newer technology, because our volumes of data are increasing and the amount of time that we need to consume to data in is static.
So we’re quite aware that we have a time limit. We found HPE Vertica as a platform for us to be able to collect the data into a coherent structure in a very rapid time as opposed to our legacy systems.
It allows us to treat the data in a truly vertical way, although that has nothing to do with the application or the database itself. In the past we had to deal with each client separately. Now we can deal with each retailer separately and just collect their data for every single client that we have. That makes our processes much more pipelined and far faster in performance.
The secret sauce behind that is the ability in our Vertica environment to rapidly sort out the data — where it belongs, who it belongs to — calculate it out correctly, put it into the database tables that we need to, and then serve it back to the front end that we’re using to represent it.
That’s why we’ve shifted from a traditional database model to a Vertica-type model. It’s 100 percent SQL for us, so it looks the same for everybody who is querying it, but under the covers we get tremendous performance and compression and lots of cost savings.
Gardner: For some organizations that are dealing with the different sources and different types of data, cleansing is one problem. Then, the ability to warehouse that and make it available for queries is a separate problem. You’ve been able to tackle those both at the same time with the same platform. Is that right?
Czetty: That’s correct. We get the data, and we have proprietary parsers for every single data type that we get. There are a couple of hundred of them at this point. But all of that data, after parsing, goes into Vertica. From there, we can very rapidly figure out what is going where and what is not going anywhere, because it’s incomplete or it’s not ours, which happens, or it’s not relevant to our processes, which happens.
We can sort out what we’ve collected very rapidly and then integrate it with the information we already have or insert new information if it’s brand-new. Prior to this, we’d been doing this by hand to a large-scale, and that’s not effective any longer with our number of clients growing.
Gardner: I’d like to hear more about what your actual deployment is, but before we do that, let’s go back to the business case. Dane and Jay, when HPE Vertica came online, when Steve was able to give you some of these more pronounced capabilities, how did that translate into a benefit for your business? How did you bring that out to the market, and what’s been the response?
Hakami: I think the first response was “wow.” And I think the second response was, “Wow, how can we do this fast and move quickly to this platform?”
Let me give you some examples. When Steve did the proof of concept (POC) with the folks from HPE, we were very impressed with the statistics we had seen. In other words, going from a processing time of eight or nine hours to minutes was a huge advantage that we saw from the business side, showing our customers that we can load data much faster.
The ability to use less hardware and infrastructure as a result of the architecture of Vertica allowed us to reduce, and to continue to reduce, the cost of infrastructure. These two are the major benefits that I’ve seen in the evolution of us moving from our legacy to Vertica.
From the business perspective, if we’re able to deliver faster and more reliably to the customer, we accomplished one of the major goals that we set for ourselves with SKYPAD.
Adcock: Let me add something there. Jay is exactly right. The real impact, as it translates into the business, is that we have to stop processing and stop collecting data at a certain point in the morning and start processing it in order for us to make our service-level agreements (SLAs) on reporting for our clients, because they start their analysis. The retail data comes in staggered over the morning and it may not all be in by the time that we need to shut that processing off.
One of the things that moving to Vertica has allowed us to do is to cut that time off later, and when we cut it off later, we have more data, as a rule, for a customer earlier in the morning to do their analysis. They don’t have to wait until the afternoon. That’s a big benefit. They get a much better view of their business.
Driving more metrics
The other thing that it has enabled us to do is drive more metrics into the database and do some processing in the database, rather than in the user tool, which makes the user tool faster and it provides more value.
For example, maybe for age on the floor, we can do the calculation in the background, in the database, and it doesn’t impede the response in the front-end engine. We get more metrics in the database calculated rather than in our user tool, and it becomes more flexible and more valuable.
Gardner: So not only are you doing what you used to do faster, better, cheaper, but you’re able to now do things you couldn’t have done before in terms of your quality of data and analysis. Is there anything else that is of a business nature that you’re able to do vis-à-vis analytics that just wasn’t possible before, and might, in fact, be equivalent of a new product line or a new service for you?
Czetty: In the old model, when we got a new client we had to essentially recreate the processes that we’d built for other clients to match that new client, because they’re collecting that data just for that client just at that moment.
So 99 percent of it is the same as any other client, but one percent is always different, and it had to be built out. On-boarding a client, as we call it, took us a considerable amount of time — we are talking weeks.
In the current model, where we’re centered on retailers, the only thing that will take us a long time to do in this particular situation is if there’s a new retailer that we’ve never collected data from. We have to understand their methodology of delivery, how it comes, how complex it is and so forth, and then create the logic to load that into the database correctly to match up with what we are collecting for others.
In this scenario, since we’ve got so many clients, very few new stores or new retailers show up, and typically it’s just our clients on retail chain, and therefore our on-boarding is just simplified, because if we are getting Nordstrom’s data from client A, we’re getting the same exact data for client B, C, D, E, and F.
Now, it comes through a single funnel and it’s the Nordstrom funnel. It’s just a lot easier to deal with, and on-boarding comes naturally.
Hakami: In addition to that, since we’re adding more significant clients, the ability to increase variety, velocity, and volume is very important to us. We couldn’t scale without having Vertica as a foundation for us. We’d be standing still, rather than moving forward and being innovative, if we stayed where we were. So this is a monumental change and a very instrumental change for us going forward.
Gardner: Steve, tell us about your actual deployment. Is this a single tenant environment? Are you on a single database? What’s your server or data center environment? What’s been the impact of that on your storage and compression and costs associated with some of the ancillary issues?
Czetty: To begin with, we’re coming from a multi-tenant environment. Every client had its own private database in the past, because in IBM DB2, we couldn’t add all these clients into one database and get the job done. There was not enough horsepower to do the queries and the loads.
We ran a number of databases on a farm of servers, on Rackspace as our hosting system. When we brought in Vertica, we put up a minimal configuration with three nodes, and we’re still living with that minimal configuration with three nodes.
We haven’t exhausted our capacity on the license by any means whatsoever in loading up this data. The compression is obscenely high for us, because at the end of the day, our data absolutely lends itself to being compressed.
Everything repeats over and over again every single week. In the world of Vertica, that means it only appears once in wherever it lives in the database, and the rest of it is magic. Not to get into the technology underneath it at this point, from our perspective, it’s just very effective in that scenario.
Also in our IBM DB2 world, we’re using quite costly large SAN configurations with lots of spindles, so that we can have the data distributed all across the spindles for performance on DB2, and that does improve the performance of that product.
However, in HPE Vertica, we have 600 GB drives and we can just pop more in if we need to expand our capacity. With the three nodes, we’ve had zero problems with performance. It hasn’t been an issue at all. We’re just looking back and saying that we wish we had this a little sooner.
Vertica came in and did the install for us initially. Then, we ended up taking those servers down and reinstalling it ourselves. With a little information from the guide, we were able to do it. We wanted to learn it for ourselves. That took us probably a day and a half to two days, as opposed to Vertica doing it in two hours. But other than that, everything is just fine. We’ve had a little training, we’ve gone to the Vertica event to learn how other people are dealing with things, and it’s been quite a bit of fun.
Now there is a lot of work we have to do at the back end to transform our processes to this new methodology. There are some restrictions on how we can do things, updates and so forth. So, we had to reengineer that into this new technology, but other than that, no changes. The biggest change is that we went vertical on the retail silos. That’s just a big win for us.
Gardner: As you know, HPE Vertica is cloud-ready. Is there any benefit to that further down the road where maybe it’s around issues of a spike demand in holiday season, for example, or for backup recovery or business continuity? Any thoughts about where you might leverage that cloud readiness in the future?
Czetty: We’re already sort of in the cloud with the use of dedicated servers, but in our business, the volume increases in the stores around holidays is not doubling the volume. It’s adding 10 percent, 15 percent, maybe 20 percent of the volume for the holiday season. It hasn’t been that big a problem in DB2. So, it’s certainly not going to be a problem in Vertica.
We’ve looked at virtualization in the cloud, but with the size of the hardware that we actually want to run, we want to take advantage of the speed and the memory and everything else. We put up pretty robust servers ourselves, and it turns out that in secure cloud environments like we’re using right now at Rackspace, it’s simply less expensive to do it as dedicated equipment. To spin up a machine, like another node for us at Rackspace, would take about same time it would take for virtual system setup and configure to a day or so. They can give us another node just like this on our rack.
We looked at the cloud financially every single time that somebody came around and said there was a better cloud deal, but so far, owning it seems to be a better financial approach.
Gardner: Before we close out, looking to the future, I suppose the retailers are only going to face more competition. They’re going to be getting more demand from their end users or customers for user experience for information.
We’re going to see more mobile devices that will be used in a dot-com world or even a retail world. We are going to start to see geolocation data brought to bear. We’re going to expect the Internet of Things (IoT) to kick in at some point where there might be more sensors involved either in a retail environment or across the supply chain.
Clearly, there’s going to be more demand for more data doing more things faster. Do you feel like you’re in a good position to do that? Where do you see your next challenges from the data-architecture perspective?
Czetty: Not to disparage too much the industry of luxury, but at this point, they’re not the bleeding edge on the data collection and analysis side, where they are on the bleeding edge on social media and so forth. We’ve anticipated that. We’ve got some clients who were collecting information about their web activities and we have done analysis for identifying customers who are presenting different personas through their different methods as they contact the company.
We’re dabbling in that area and that’s going to grow as it becomes so tablet-oriented or phone-oriented as the interfaces go. A lot of sales are potentially going to go through social media and not just the official websites in the future.
We’ll be capturing that information as well. We’ve got some experience with that kind of data that we’ve done in the past. So, this is something I’m looking forward to getting more of, but as of today, we’re only doing it for a few clients.
Hakami: In terms of planning, we’re very well-positioned as a hub between the wholesaler and the retailer, the wholesaler and their own retail stores, as well as the wholesaler and their dot-coms. One of the things that we are looking into, and this is going to probably get more oxygen next year, is also taking a look at the relationships and the data between the retailer and the consumer.
As you mentioned, this is a growing area, and the retailers are looking to capture more of the consumer information so they can target-market to them, not based on segment but based on individual preferences. This is again a huge amount of data that needs to be cleansed, populated, and then presented to the CMOs of companies to be able to sell more, market more, and be in front of their customers much more than ever before.
Gardner: That’s a big trend that we are seeing in many different sectors of the economy — that drive for personalization, and it really is a result of these data technologies to allow that to happen.
Any other thoughts about where the intersection of computer science capabilities and market intelligence demands are coming together in new and interesting ways?
Adcock: I’m excited about the whole approach to leveraging some predictive capabilities alongside the great inventory of data that we’ve put together for our clients. It’s not just about creating better forecasts of demand, but optimizing different metrics, using this data to understand when product should be marked down, what types of attributes of products seem to be favored by different locations of stores that are obviously alike in terms of their shopper profiles, and bringing together better allocations and quantities in breadth and depth of products to individual locations to drive better, higher percentage of full-price selling and fewer markdowns for our clients.
So it’s a predictive side, rather than discovery using a BI tool.
Czetty: Just to add to that, there’s the margin. When we talked to CEOs and CFOs five or six years ago and told them we could improve business by two, three, or four percent, they were laughing at us, saying it was meaningless to them. Now, three, four, or five percent, even in the luxury market, is a huge improvement to business. The companies like Michael Kors, Tory Burch, Marc Jacobs, Giorgio Armani, and Prada are all looking for those margins.
So, how do we become more efficient with a product assortment, how do we become more efficient with distribution and all of these products to different sales channels, and then how do we increase our margins? How do we not over-manufacture and not create those blue shirts in Florida, where they are not selling, and create them for Detroit, where they’re selling like hotcakes.
These are the things that customers are looking at and they must have that tool or tools in place to be able to manage their merchandising and by doing so become a lot more agile and a lot more profitable.
You may also be interested in:
- Redmonk analysts on best navigating the tricky path to DevOps adoption
- DevOps by design–A practical guide to effectively ushering DevOps into any organization
- Need for Fast Analytics in Healthcare Spurs Sogeti Converged Solutions Partnership Model
- HPE’s composable infrastructure sets stage for hybrid market brokering role
- Nottingham Trent University Elevates Big Data’s role to Improving Student Retention in Higher Education
- Forrester analyst Kurt Bittner on the inevitability of DevOps
- Agile on fire: IT enters the new era of ‘continuous’ everything
- Big data enables top user experiences and extreme personalization for Intuit TurboTax
- Feedback loops: The confluence of DevOps and big data
- IoT brings on development demands that DevOps manages best, say experts
- Big data generates new insights into what’s happening in the world’s tropical ecosystems
- DevOps and security, a match made in heaven
- How Sprint employs orchestration and automation to bring IT into DevOps readiness