Handling volume explosion of data

Apologies for any typing / grammatical errors! Please point out the same through your invaluable comments.

Lets say you are Not Google, and you do not have BigTable, and you need to work with huge volumes of data, and over that you cannot invest in infrastructure like Google has done– and as if this was not enough, you want to do analytics of some sort on this huge set of data. Now think about handling some petabytes of seemingly useful information that you would have to analyze before u say “Bingo, here it is!! That’s it!!!”.

How did that sound?

I was reading a bunch of articles for quite some time, and the key topic concerning everyone across the board is what to do with the explosive volume of data. We all know “data” is rich storehouse of information, and the better we utilize, the better we serve the customers– well Customers are Gods.

So we have two statements here:
1. we need to serve our customers (which could be business and retail) and thus their needs
2. we have to deal with a freakishly large and ever growing amount of data.

I have a small question here– did the customers say, I want the data “of” everything “for” everything? Well the truth is “No”. This is probably what could solve the matter.

All tasks need data, and all the data requirements are different. Hence what you would need for customer analytics is not what you would need for fraud management. The objective i want to point out is that– the data treatment is where the real play is. For a fraud management scenario, the actual xDRs really do not serve much purpose (except may be in the court of law). But for purposes of fraud identification, it is about the properties of the data which matters. So, that is not exactly a subscriber profiling, it is almost like profiling of the information of data– something like working with the profiled “data of data”. The delta shift of the “properties” depict more that just looking at the entire raw data. In that case — do we really need the xDR details?

Imagine analysis of customer behavior– a call center gets 5 calls from a single customer in a single day. Is this an unhappy customer? may be not. A single customer can have multiple issues and had to call up the customer care. Personally while working in a call center in my first  job days, I have faced the same issue. (The hit was on my personal performance though, because of the SLAs and KRAs. That is a separate issue). Do we need the details of the transaction for all the 5 calls? I think No. key points of observation says it all. 5 issues, 5 calls, 5 closed, 5 satisfied events– 1 customer. — This is still a happy customer. Therefore that data that i need from here is the no. of different issue types, sum of values (say of calls, closed issues)– and related meta information of the actual transaction. Analysis of the meta solves bigger chunks of analytics issues. Here i am not storing the entire detailed information.

In telecom there is a boom in the amount of data. Data segmentation based on the “kind” if meta information possessed within the data could be more helpful than the data itself. I was listening to a free webinar by a particular company, where they proposed, that the data could be classified into 3 classes– the prime class, the middle class and the lower class. (Deliberately I have changed all the terminology and methods of classification). The prime class data is what we would be looking at in all detail. From the middle class, we really could take in snapshot of information, and play more with the meta information to track the behavior, while the lower class, is almost like the throwaway class– from which summarized and condensed information could be extracted and the rest turned to cold data. The challenge is to identify the classification.

Data classification is what plays the real role in handling information and then serving “customers”. Now the question could be what is the basis of the classification? Well that just depends on the need of the domain– revenue assurance has its own needs, fraud and risk management has its own needs, where as analytics (based on the region of the analysis) has its own needs. There could be obvious overlaps– but then the trick is really to be the clever manager to get the “single” work done by the two or more different “people”– basically the trick is on thee methods of “cleaning” the  data. I would say the requirement is to get the meta information from the seemingly important real-looking data.

By the way– not always does the analyst look of the change in the meta information. I was looking at a blog post by Kathy Romano from Verizon (from a link provided in the talk RA blog (one of the best blogs of its kind)[“http://talkra.com/page/5” Raw Data, Workflows and Mr. RA Analyst])
where she quantified and said, that it is more important to look at the raw data, and analyze the shifts. Now that context was for more from a revenue assurance side. But note, the objective was still to identify the shifts. — at least that is what I would gauge out of her requirement. Thus shift of behavior is not the real data– but it is the “data of the data”.

This entire post is just a first phase of wild thought! I may post more to clean up this initial thought– so u know– at some point (after you all readers have contributed) we would get the “meta” to help us ease our work!

– c ya.

This entry was posted in Customer, Revenue Assurance, Revenue management, telecom. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s