The Prophet Jeremiah

Identifying Toll Fraud is Harder Than finding a Needle in a Haystack

What Does That Have To Do With Big Data?

[Post is better viewed on the blog Website]

The Prophet JeremiahAt the time of creation, God spoke with man directly, without any proxy. God spoke to Adam, Eve, the snake and even handled the first murder interrogation by himself when asking Cain “Where is your brother Abel?” After this, when there were too many people, God abandoned the one-on-one approach and started sending his messages and commands through prophets. Then came the kings who listened to the oracles and ignored the prophets.

And then came the scientific revolutionaries, visionaries, dreamers and most recently, the predictors. Unlike prophets, scientific revolutionaries, visionaries and the dreamers, the predictors look to the past to predict the future. And the deeper the predictor studies the past, the clearer he can envision the future.

The predictor is a by-product of an emerging technology – big data. I am sure that big data was invented by a male, since it’s totally built on a male character trait –don’t throw anything away that you may one day need.  This is probably the reason why another male invented large garages. Unlike the traditional rational data bases, big data deals with voluminous amounts of unstructured data (not organized by any method), which is gathered from many sources in large quantities, various formats and varying qualities. There are four main characteristics related to big data (aka the four Vs); Volume, Velocity, Variety, Volatility. Allow me to add a simple analogy from my life to describe the difference between rational and irrational data bases. When I return from the grocery I pile ALL the vegetables and fruit in the refrigerator inside their plastic bags. My lovely and more rational wife washes them all, skins the melons and watermelon and cuts them into pieces, peels the vegetables and sometimes cuts them as well, to be ready for making salad or cooking.

Mr. Gurdeep Singh Pall is the Corporate Vice President for Skype and Lync at Microsoft Corp. Mr. Pall just returned to the Lync unit, after spending the past two years working on Artificial Intelligence projects within Microsoft.  Pall used his opening keynote at this year’s Lync Conference to describe how the work of analytics and Bayesian predictions, will eventually make its way into communications systems. In practice, Singh Pall said, “We can actually predict who you will be calling in the next five minutes.”

Big data and Unified Communications & VoIP services

Unified Communications and VoIP services collect a lot of raw data; this data is worthy of analysis due to its wealth of intelligence. Many companies are increasingly aware that there is information that can be collected and refined to an essence which can be used for performance optimization and network design improvements. UC and VoIP big data analytics will be the key element in converting the big data to a tool which ensures cloud-based VoIP service, including privacy, security, toll fraud, performance, cost and more.

So what can you do with voice analytics?

  • VoIP analytics will build its own multi-layered picture of the network’s topology derived from the big data over time.
  • VoIP analytics will provide Network & Users Profiling
  • VoIP analytics will provide Advanced Call fraud detection and attack prediction
  • VoIP analytics will provide Advanced Multi-Dimensional Cost Analysis

Toll Fraud detection and prevention using big data analytics

Call fraud is associated with significant revenue loss and is hard to discover. I read that discovering call fraud in the masses of call records is more difficult than finding a needle in a haystack. Actually, that’s an easy problem to solve; you know how a needle looks like and by adding enough manpower to do the looking you can eventually find it.  But fraud calls are similar to legitimate calls, so if you can’t identify a fraud call, no matter how much manpower (or CPU power in this case) you put on the job it will be impossible to detect. It’s more like trying to find a specific strand of hay in a haystack.

A common approach to detect call fraud is based on examining accounts made up of several statistics that are computed over a specific period. For example, average call duration, longest call duration, and numbers of calls to particular countries might be computed over the past hour, several hours, day or several days. Account summaries can be compared to thresholds for each period, and an account whose summary exceeds a threshold can be queued and analyzed for fraud.

VoIP analytic fraud detection is designed on a statistical principle of dynamic VoIP fraud detection. The algorithm is based on Tracking Account Behavior which is able to alert or terminate the fraudulent call as it occurs. The algorithm will relay runtime & historical attributes gathered per user, group of users, sites, SIP interface and etc. The VoIP analytics create a signature of predicted usage behavior for each user/group/interface, update the statistical model with each call and score calls for fraud using predicted behavior as the baseline. When a call exceeds a predictive user signature boundary, the VoIP analytic may take actions as per the configuration.

The VoIP fraud detection analytic is built on three stages:

  1. Training – The analysis of large numbers of enterprises of various types such as: Unified Communications, Contact Center, etc. Based on this information, the VoIP analytic Fraud Detection System creates preliminary statistical information which is later segmented per the organization’s characteristics.
  2. Adaptation – Adjustment of the statistics collected in the previous stage to the specific organization. This is done by comparing in real-time the statistics to actual call activity of the organization.
  3. Test – Each call is compared against the statistical call pattern in real-time. Calls that don’t match the pattern will result in fraud alarms with the probability (confidence) grade.

 Conclusion

We are often tempted to impose the way we see things through the prism of our own life experiences on our friends and family while in actuality, what we are really doing is judging them for the way they see things.  A friend once told me that life experience is like a flashlight hanging on your back when you are going forward, in other words, useless.  That may be true. But in the case of Big Data analytics, the system’s life experience is the basis for predicting a better future.

2 replies
  1. Tsahi Levent-Levi (@tsahil) says:

    Yossi, that first half of this post? The best intro I’ve read about Big Data there is. Enjoyed it immensely – thanks 🙂

    As for haystack and needles – easiest way may be to just burn all the haystack. What’s left will be ashes and a needle. Can’t do that for toll fraud either.
    The problem I see with calls meta data and analytics is that the only one who seems to be doing anything interesting in this space is the NSA – the rest aren’t making as much use of it as they should. VoIP and telecom vendors need to look at how web companies are focusing on analytics to learn a trick or two.

    Reply
  2. Yossi Zadah says:

    Hi Tsashi
    I am happy that you liked it.
    As an enthusiastic follower of your blog, I probably influenced from your writing style
    Seriously, cloud providers can gather plenty of statistical data. Question is, will they mine gold from this.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha *