For example, it’s tough to comprehend what fine actually means.

July 2nd, 2018

We now have the perfect creating support on the internet to day. Customer service is truly pleasant! A client receives a totally free revision guarantee. New and returning customers are consistently able to save money when buying documents within our site You may take a break convinced that you’ll earn an effective buy within our site While selecting a actual site one has to be sure to completely review the internet because every thing may be on the web and without seeing anybody. Read the rest of this entry »

  • Share/Bookmark

Hello world!

June 15th, 2018

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!

  • Share/Bookmark

Hello world!

June 13th, 2018

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!

  • Share/Bookmark

ScoreData announces Promotion for HPE Customers

July 26th, 2016

Promo Title: HPE Big Data Marketplace Promotion

Promo Description: ScoreData Corporation is pleased to offer HPE customers 3 months of ScoreFast license at no charge for ScoreFast based on trailing six months of data. This will be extremely useful as you build your Big Data Analytics solutions.

This offer expires on December 31, 2016.

For more information, please write to us at

  • Share/Bookmark

Predictive Analytics for the Intelligent Customer Engagement Center

July 20th, 2016


The ScoreData Team

A Customer Engagement center is a central point from which all customer contacts, including voice calls, email, social media, faxes, letters, etc., of an enterprise are managed.  It is part of a company’s customer relationship management.  With the large amounts of data collected by modern engagement centers, the possibilities of applying predictive analytics to improve engagement center efficiency and customer satisfaction are very many.  The applications can be broadly classified into the following categories:

  • Enhancing Customer Engagement and improving customer experience. The possible applications include:
    • Improved caller/agent matching
    • Enhanced cross-selling and upselling
    • Superior customer retention
    • Interactive Voice Response (IVR) analytics
    • Behavioral targeting to better serve customers
  • Optimizing contact center management and control. Possible applications include:
    • Agent ranking and performance measurement
    • Improved call routing and distribution
    • Centralized global queueing
    • Staffing optimization

ScoreData has extensive experience in building predictive models using their ScoreFast™ engine, many of which can be integrated with caller-business engagement scenarios, e.g., improved customer retention by churn prediction and mitigation, enhanced cross-selling and upselling, risk analytics, etc.  This paper is about a staffing optimization project undertaken in collaboration with our partner Avaya.  Then we go on to examine how predictive analytics will help contact centers of the future.

Customer Engagement centers employ a large number of contract agents as the workloads tend to vary significantly over the course of a year.  Being able to predict workload and staffing requirements to keep the customer wait times less than a specified threshold is critical for agent capacity planning.

Engagement Center Optimization Example

Business Objective: To build a workload prediction system that accurately forecasts call volumes and agent capacity requirements for the following week.  While forecasting is a common use case in many verticals, as the data, the number of variables, and the requirements for finer levels of forecasting increase, the modeling problem becomes more challenging.  Avaya provided the data for this project from a demonstration system using historical data sets.

Methodology: We used the following methodology for the forecasting project:

  1. Data preparation, audit and understanding- We engaged with Avaya teams to grasp the business context and understand the data (tables, fields and their meanings).  We normalized and merged all datasets to form a single view of the data for modeling.
  2. Feature Engineering and Data Loading – ScoreData created information rich features from the single view dataset. The resultant dataset was loaded into the ScoreData analytics platform.
  3. Feature Extraction and Model Development – We conducted statistical analyses on the ScoreData platform to generate the best predictors of the target variables. Then we developed various models for forecasting workload and staffing.
  4. Insights and Report Preparation – Analyses outcomes and insights were collated in a report along with model characteristics.

Data Audit and Insights:

  • Looking at per-day level call volume data, one can see two flat periods (with no activity) in the data (‘call volume vs. days’ graph below).  As a consequence, ScoreData did not use this as a homogeneous time series data (from 6/11/13 to 7/22/13), rather we only used the data for the two time periods that have call volumes, ignoring the flat periods.

  • Call Segment Data Funnel – Step by step data stats on applying filters to remove Interactive Voice Response (IVR) calls and data stats:
  1. Total number of segments (or calls): 9,242,604
  2. After removing IVR Calls which we can not use: 2,107,867
  3. Removed several more rows that did not meet Avaya’s criteria for including in the analysis
  • Average Number of agents handling calls for each hour

  • Average Queue Group Time  (Ring  + Talk + After call)

  • Average Queue Group Time (without the Talk time)

  • Time Distributions

  • Disposition Time of Agents

Models: Since the call volumes varied by day, prediction models were built for each day of the week.  We used the Generalized Linear Model (GLM) algorithm with Poisson distribution to predict the number of agents required for a given call volume, wait time, day of week and queue group.  The following figure shows the features with the highest relative importance.  The most important feature turns out to be the Number of Calls, with wait time coming in next; there is a negative dependence on wait time (highlighted in orange).  The greater the wait time allowed, the fewer the agents required.

The project was useful to the ScoreData team to understand the nature of call center data and to create a unified view of the data. Forecast of call volumes was done on a daily basis.  The next step in the project, given enough data, is to forecast call volumes on an hourly basis and then predict staffing requirements also on an hourly basis.

Analytics for the Engagement Centers of the Future:

In this section, we will examine a few emerging trends in the contact centers of the future and discuss how analytics will play a crucial role in the transformation of such centers.  The engagement centers of the future must be agile enough to adapt quickly as customers’ expectations shift with advances in and varieties of interaction options open to them.  Analytics will play a most important role as contact centers adapt to the changing demands of the future.  According to Laurent Philonenko, SVP of Corporate Strategy & Development, CTO of Avaya, most businesses that are making analytics an urgent investment are doing so to be better positioned to (1) compete more successfully, and (2) grow their business to increase revenue potential.

Virtualized Engagement Centers: The engagement center of the future is not likely to be a centralized facility, but will be distributed geographically with agents working out of their homes.  High turnover in the engagement center staff is causing business leaders to look for effective ways to attract and retain the best talent.  Flexibility in working hours and workplace is an important factor in this.  Virtualization technology will play a key role in making this practicable.  Key issues here are security and privacy of customer data.  Machine Learning and Analytics are increasingly playing a critical role in detecting security breaches and avoiding future attacks.  Analytics techniques are now being applied across the network, application and data layers to provide increased security and privacy.  One of the techniques used to provide privacy is data encryption.  Ability to process encrypted data and draw insights will become increasingly important.

Future Agents: It turns out that today contact center staff accounts for almost 75% of the cost of running a center.  Therefore, it is important to optimize agent performance with the right tools and information.  Agents will increasingly have to multitask among voice calls, social, chat and email interactions with the customers.  Today, most agents have to toggle between a set of standalone software applications to access the information they need to service customers.  Agents will need a seamless view of customer information to fluidly meet customer needs.  With the digital transformation of an enterprise, the data silos that are entrenched in today’s enterprises can be overcome.  Again according to Philonenko, in a digital enterprise, with a single analytics application it becomes much easier to have a single view into the entire journey of customer data, partner data, employee data, process data, etc.

Instead of specific agent skill groups that today’s contact centers employ, future centers will have a fluid workforce in terms of skills.  For example, a particular agent may have foreign language skills as well as the ability to cross-sell or up-sell effectively.  Agents will not belong to specific skill groups, but will be called upon to service customers based on the needs and demand.  Analytics will play an important role in forecasting the workload, optimizing the staff, and routing customer calls to the right agents to minimize the customer waiting times.  Optimal workload forecasting is an extremely challenging problem under those constraints.  ScoreData has worked out an approach to solving this challenging problem with predictive analytics.  We hope that will be the subject of another report in the future.

The way agent performance is measured and ranked will also need to change to keep up with customer demands.  Contact centers will need to transition from a reliance on efficiency-based metrics, such as Average Handling Time (AHT) and calls handled per hour, to customer and business focused measures like First Call Resolution (FCR), customer satisfaction, and ROI.  Analytics can be leveraged in that transition as well.  For example, customer feedback surveys – both ratings and comments – can be analyzed to assess customer satisfaction.

Future Customers: According to the American Express 2011 Global Service Barometer, U.S. consumers prefer to resolve their service issues using a variety of touch-points, including the telephone (90%), face to face (75%), company website or email (67%), online chat (47%), text message (22%), social networking site (22%), and using an automated response system (20%).  And according to the 2014 Global Service Barometer, for simple issues consumers prefer going online (36% versus 14% by phone) and for difficult enquiries talking to an agent by phone (48% versus 10% by email).  Consumer preferences will keep changing.  When they escalate a service issue from chat to voice, they expect the new agent to know the interaction that has already happened.  Cloud-based contact center and analytics can help agents to follow a customer’s journey seamlessly.

How ScoreFast™ Makes a difference

The entire project with the run-time engine was delivered in six weeks.  Our models were built using  the ScoreFast™ engine, the web based model development and management.  It is built as an enterprise grade modeling system that can be used to develop a broad range of models for use cases inside and outside the Engagement Center.

Its data and model management modules are easy to use, dashboard driven and intuitive. It chooses models for specific use cases after trying out many algorithms internally and selecting the one with best performance metrics. ScoreFast has collaboration features encouraging sharing and collaboration within large cross functional teams, and access control features that are designed keeping in mind specific needs of ScoreData’s big enterprise clients. The collaboration features encourage cross-functional knowledge sharing and innovation within the companies.

The platform has built in hooks to link raw data feeds into the system and its one push provisioning features mean models once developed and tested can be deployed onto downstream systems with a single push of a button. These features make ScoreFast easy to integrate into existing business processes without any disruption or cost overheads.  

The canonical engagement center use cases deal with Agent Ranking, Caller-Agent Mapping, Dashboards with cross-sell or upsell with detailed presentation of customer profiles, and Engagement Center workload optimization.  All these yield substantial improvements in customer satisfaction and improved top-line and bottom-line benefits.  The ScoreFast™ engine delivers unique value throughout the predictive analytics insights-to-decision process, with a dramatically lower total cost of ownership.

Conclusion: Consumer demands will continue to change and technology will continue to evolve.  Predictive Analytics will play a crucial role in helping businesses to adapt to the changing world.  Engagement centers that adapt to changes to empower agents while keeping customer satisfaction in mind will become “relationship centers.”  As Philonenko says, analytics allow human beings to be smarter, act faster, evolve and grow, all of which are essential for an agile relationship center.

  • Share/Bookmark

ScoreFast™: Predicting hospital readmission rates for diabetic patients

June 16th, 2016

I am a biologist by training and new to Machine Learning (ML).  As a member of the ScoreData team, I wanted to take the opportunity to try to analyze bioscience data on ScoreData’s ML platform (ScoreFast™).  This project allowed me to use my skills as a biologist and my passion to learn new technologies and concepts. After searching for a public data set to experiment with, I decided to use Diabetic 130-US hospitals (1999-2008). This data set has a relatively large size (about 100,000 instances, 55 features).  Also, it has a published paper  linked to the data.

ScoreFast™, the web based model development and management platform from ScoreData, is built as an enterprise grade machine learning platform.  Its data and model management modules are dashboard driven, and intuitively easy to use.  ScoreFast™ supports many algorithms and with its in-memory computation, it makes it easy to build and test models quickly.  As someone new to machine learning, the easy interface to build models made it easy for me to get started.

The goals of this project were to:

(1) Use ScoreFast™ platform to find correlations between data features and readmission of patients to hospitals within 30 days,

(2) Compare results from the ScoreFast™ platform with other platforms such as IBM’s Watson, Amazon’s ML platform and with the published paper, and

(3) Build a predictive model on ScoreFast™ platform.

First, I read the paper to understand a bit more about the data.  In the paper, the authors hypothesized that measurement of HbA1c (The Hemoglobin A1c test, HbA1c, is an important blood test that shows how well diabetes is being controlled) is associated with a reduction in readmission rates in individuals admitted to the hospital.  The authors took a large dataset of about 74 million unique encounters corresponding to 17 million unique patients and extracted data down to 101,766 encounters.  The authors further cleaned up the data to avoid bias: (1) Removed all patient encounters that resulted in either discharge to a hospice or patient death. (2) Considered only the first encounter for each patient as primary admission and determined whether or not they were readmitted within 30 days.  The final data set consists of 69,984 encounters. This is the dataset we used to do data analysis and to build a predictive model.

A few key details about the data set:  1) Although all patients were diabetic, only 8.2% of the patients had diabetes as primary diagnosis (Table 3 of paper), 2) HbA1c test was done for only 18.4% of the patients during the hospital encounter (Table 3 of paper).

We used the same criteria, as mentioned in the paper, to pare down the data to 69,984 encounters.  To reduce bias, I did not identify the parameters that were found to be significant in the paper.  I wanted to see which parameters the ScoreFast™ machine learning platform picked up to be significant to the hospital readmission rate and how well they compared to the analysis in the paper and also to other ML platforms.


Data Analysis: Understanding the data

The data set had 55 features.  After ignoring some of the features, such as Encounter_ID, Patient_No, Weight (97% of the data was missing), and Payer_Code, which were either irrelevant to the response (readmission to hospital) or had poor data quality, the total number of relevant features was 46.  The first step was to do a data quality check on the 69,984 records (46 features) to understand the quality of data. Out of 46 features, 23 were medicine related, and of those, 22 were diabetic medications and one was cardiovascular.  The data had 17 features that had constant values.  The diagnosis features (diag_1, diag_2, diag_3) had icd9  (the International Classification of Diseases, Ninth revision; standard list of 6-character alphanumeric codes to describe diagnoses) codes.  These codes were transcribed to their corresponding descriptions (Table 2 of paper).  Diag_1 was renamed to ic9Groupname, diag_2 renamed ic9Groupname2 and diag_3 to ic9Groupname3.  By doing this aggregation, the correlation between any disease/procedure and hospital readmission was easier to see.

When I used the ScoreFast™ platform to build the models, I got AUC (Area Under the Curve) in the range of 0.61-0.65 (Table B, below), which indicated that the data was not of very good quality.  This was confirmed by both IBM Watson which gave a data quality score of 54, and Amazon ML platform which also had AUC 0.63 (Table A).

Building a predictive Model on ScoreFast™ Platform

I found it easy to upload the data set onto the ScoreFast™ platform.  The platform easily splits the data into train and test sets.  The platform has options to build four different classes of models, among others: GBM, GLM, DL (Deep Learning) and DRF (Distributed Random Forest).  ScoreData plans to add more algorithms in the near future.   Without knowing the details of the algorithms, I was able to build the four models easily on the platform.   All the four models had very similar results with AUC ranging from 0.61-0.65. 

Once the models were built, I could click on the “Detail” links to learn more about each model; the MSE, ROC, thresholds and the key features.  The top ten features with significant correlation with hospital admission rates were similar on GBM and DRF models (Table C, below).  I was intrigued to see that the top ten features with significant correlation with hospital readmission rate, were different on the GLM model and DL model (Table D, below).  After discussions with my colleagues, I understood the sensitivity differences between the various algorithms.  I realized that one can use different models depending on what someone is looking for and data available.

Comparison with the Amazon and IBM Machine Learning Platform

The IBM Watson platform, showed the following three parameters to have significant correlation with the hospital readmission rate: Discharge disposition (where the patient is discharged to), the number of inpatient visits to hospital, and time spent at each hospital visit. Their interaction graphs are very useful to understand visually.  When the data was analyzed on ScoreFast™, the three significant features on IBM platform were also part of the top ten significant features on the GBM and DRF models of ScoreFast™ (Table C for ScoreFast™ platform and Table E on IBM’s Watson).

The Amazon ML platform did not provide a way to visualize the top features of the model. It does provide the cross validation result. The percentage of correct was 91% with an error rate of 9%. The True positive (TP) was very low. This was again validated in the confusion matrix results on the ScoreFast™ platform, as shown in Table F.

The top features, model accuracy and results on ScoreFast™ platform were very similar when compared with IBM Watson and Amazon ML platform.


Predicting admission rates for new patients

With the push of a button, the models can be deployed in a production environment in the ScoreFast™ platform. 14,462 (20% of original data) rows were used to predict the models. Using the batch prediction interface on ScoreFast™, I tested the hospital readmission models and the results are shown below (Table G).

The model was able to correctly predict 92% of population which do not require readmission (True Negative). Of the remaining 8%, for GBM model, 9 out of 1299 were correctly predicted (TP) but False Negative was high (1290).  Deep learning had a slightly better True Positive rate (19 out of 1299).  The DRF model did not give good True Positive predictions. The low TP rate was expected as we had observed during building of the models.

Table H (below) shows how threshold impacts the accuracy of the model (GBM).  Maximum accuracy is obtained with a threshold of 0.644. If threshold is tuned to increase the TP, it starts to impact TN. In this case, both TN (a patient who should not be readmitted is not readmitted) and TP (a patient who should be readmitted is readmitted) are both important.  In general, the threshold can be used to tune the model for desired result.

To improve the TP and/or reduce the False Negative (FN), we need more data samples as well as additional features. We had observed from the beginning that the data quality was not that great and the model had an AUC of 0.63.


As mentioned earlier, the data quality was not good.  The paper mentioned that there was a significant correlation between the HbA1c test not done during the hospital stay and the readmission rate.  The DL model on the ScoreFast™ platform also showed a correlation between HbA1c test not done and hospital readmission rate.  The DL model also picked up Circulatory diagnosis as a significant parameter which was also mentioned as a significant feature in the paper.  Again, due to the differences in the algorithms, the other models did not pick up these features.  As a next step, I need to understand the algorithms better to answer how data/features affect the choice of models for building predictive models and to understand the impact of each feature.

The significant parameters with a high correlation to hospital readmission found on the ScoreFast™ (GBM & DRF models) and in the Watson platform were comparable to the data in Table 3 of paper.  The data in the Table 3 of paper shows a correlation between the hospital readmission rate and (1) discharge disposition, i.e., where the patient is discharged to after the first visit (discharge_disposition_id), (2) primary diagnosis, and (3) age (higher chances of a person older than 60 years to be readmitted).

I found ScoreFast™ easy to use.  It was easy to load large data sets and analyze data.  The platform provides a choice of many different kinds of algorithms.  These can be used either to confirm the significant correlations between parameters or can be used for performing different kinds of analysis.  It supports model versioning which allows one to try different variations of models keeping track of the performance of each model.

What’s next? 

I would like to get to understand the details of the algorithms of the different models a bit deeper, so I can figure out how to choose different models for different purposes. As a next step, I plan to understand the data a bit more and try to see how I can improve the quality of data. One idea is to club the medicines from 23 features to 2. I also want to understand the feature correlations and the importance of variables.  Also, I plan to work with my data scientist colleagues to tweak the algorithm configurations to see if I can improve the model accuracy.

Beyond this diabetic data analysis project, I plan to experiment with a few more datasets in order to gain more insight into how machine learning and ScoreFast™ can be used to get actionable insights from clinical or medical data.


I would like to thank Prasanta Behera for helping me guide this project.

  • Share/Bookmark

ScoreData announces Promotion for Avaya Customers

June 2nd, 2016
Promo Title: Agent Ranking Promotion
Promo Description: ScoreData Corporation is pleased to offer Avaya customers 3 months of ScoreFast license at no charge for Agent Ranking based on trailing six months of data.   This will be extremely useful as you build your caller agent mapping solutions.
This offer expires on December 31, 2016.

  • Share/Bookmark

Churn Management for the Masses using ScoreFast™

May 10th, 2016

There are broadly three ways for a business to grow and defend its current revenue stream: by acquiring new customers, by cross or up-selling to existing customers, and by improving customer retention. All three have a cost associated with them and the businesses are interested in the ROI on their investments. Acquiring new customers may cause anywhere between five to fifteen times more than selling to an installed customer-base.  For a consumer facing business, their ability to set up robust processes that predict their consumers’ propensity to churn well in advance, and with enough time to run retention campaigns, and stop critical consumer segments from leaving, makes sound business sense and essential to building a robust business.

Companies have been employing machine-learning techniques on their data to find patterns that signal their customers’ propensity to churn. Historically, companies worked with analytics consulting companies that specialized in developing churn prediction models for specific industries and functions. These consulting companies used established processes to develop churn prediction scorecards for each significant consumer segment in their consumer base. Here are some example steps:

First, ETL & Data wrangling: the efficacy of predictive models is based on the datasets used for machine learning (model development).

Second, Feature engineering: the process of defining customer attributes through historical information about them, followed by identifying predictor features (attributes with highest impact on churn behavior) through statistical algorithms and numerical methods.

Third, Model development:  This is usually followed by defining time frames (input data window, output/prediction window), and model training and validation. The models and churn propensity scores thus developed are used to identify future churn propensity based on recent customer behavior.

Fourth, Deployment: These inputs are plugged into churn retention campaigns for specific customer segments.

In this traditional paradigm, churn prediction and management was only accessible to mid-to-large size companies, those that could build data science teams or hire analytics consulting companies. This traditional model of churn management is no longer relevant today for two reasons.

First: new age SAAS prediction services like ours, ScoreData’s ScoreFast™ are bringing down the infrastructure investment and upfront costs substantially, abstracting away the science of churn propensity prediction- making it easier to use for the business managers, all of it contributing to make churn prediction and management accessible to businesses of all sizes.

Second: Cloud, social media, IOT and ubiquitous devices are redefining consumer touch points and the competitive landscape for businesses every day. Companies today are employing smart customer engagement solutions at multiple layers. In this new world, by the time the manually developed churn prediction models get deployed, the underlying assumptions- the indicators of churn behavior, may have already changed. This means your churn prediction models are obsolete by the time they are deployed. Companies need systems that are nimble on their feet, systems that keep up with ever changing business landscape and keep giving superior results.

Let’s try to understand these concepts with the help of some real world examples of churn management in business.

Churn Management for Telecom

First, let’s look at the telecommunication industry. It is one of the earliest adopters of churn management solutions and among the heaviest users today. The landscape of churn prediction and management has gone through a sea change in the last couple of years in this industry on account of big data analytics.

In the Telecom industry, customers (subscribers) are known to frequently switch from one company to another and this voluntary churn has always been a critical business concern. It is a subscription based business model where the majority of revenues come from recurring monthly subscription fees from existing customers.

Although telecom companies have accumulated a lot of domain knowledge about the drivers of churn behavior, they cannot predict (and contain) churn basis these static insights. For example, new plans from competitors are a well known driver of voluntary churn. Companies offer lucrative data and voice packages for new customers but not for existing ones, frequently resulting in customers moving from one company to another to get a better plan.

But subscription plans are very dynamic in nature, where new plans are being launched every day and the whole landscape changes within a matter of months. So you cannot predict future churn based on a competitive plan landscape of today. Moreover mobile phones are no longer just telecommunication devices; subscribers’ needs have a very strong social purpose as well (due to social media, image/ media sharing etc.) and these social attributes are nowhere captured in regular telecom data sets.

The point is you cannot manage churn effectively solely based on the known reasons of churn behavior. And this is where ScoreData has dramatically improved the business problem. What you need is a strategy that allows you to develop predictive models that quantify current churn drivers and keep up with changing landscape of churn behavior at all the times. The model performance necessarily decays over time and you need systems that keep fine-tuning the models whenever performance decays beyond the accepted thresholds.

Churn Management for the Weight/loss management

Let’s look at another industry, and another churn management problem. The “weight loss/management” industry has a big customer churn problem. Consumers subscribe to plans for x months and then discontinue a program even though they may still need to continue the program to experience full benefits from their program. One very important driver of this churn behavior is the difference between expectations and reality. Customers sign up with unrealistic expectations and that often results in disappointments even with modest results (moderate weight loss).

Although this is a well-established phenomena and companies do try to handle expectation management for existing customers there are several other, more important drivers of the churn behavior as well. And these drivers of churn behavior keep changing with time, location and other macro parameters.  If the business needs to incorporate hundreds of additional factors to determine which features are really causing churn, you may want to compare and contrast several models with several data sets while experimenting with new signals or external data sets. You need systems that develop churn prediction models, capture these signals from the data, and implement the monitoring on a continuous basis. Systems like these enable businesses to understand the churn behavior of their important customer segments and devise retention strategies.

ScoreFast™, the web based model development and management platform from ScoreData, is built as an enterprise grade churn management system. Its data and model management modules are easy to use, dashboard driven and intuitive. It chooses models for specific use cases after trying out hundreds of algorithms internally and selecting the one with best performance metrics. ScoreFast has collaboration features encouraging sharing and collaboration within large cross functional teams, and access control features that are designed keeping in mind the specific needs of ScoreData’s big enterprise clients. The collaboration features encourage cross functional knowledge sharing and innovation within the companies.

The platform has built in hooks to link raw data feeds into the system and its one push provisioning features mean models once developed and tested can be deployed onto downstream systems with a single push of a button. These features make ScoreFast easy to integrate into existing business processes without any disruption or cost overheads. ScoreFast’s real time self learning module makes sure your model performance never goes below the statistical or business validation thresholds that you setup. As soon as the performance drops below the line, it triggers a retrain- without any human intervention required. This means your churn prediction models are always on top of the game and all relevant signals are taken into consideration while scoring a consumer for their propensity to churn.

ScoreFast has features for advanced users as well: those who want to peek under the hood and customize the models. The platform is not just for the business user, it empowers the data scientist to get into the specifics of model definitions, analyze performance comparisons, and fine tune the models.

ScoreFast is the market leading machine learning and model management platform that is making predictive model development, specifically churn prediction accessible to companies, regardless of their size, with identical predictive power for all. With ScoreFast’s cloud based architecture, and built-for-business-manager interaction designs, the paradigms for churn management are changing very quickly. Companies that respond to these changes and efficiently leverage future ready platforms like ScoreFast for their churn prediction and retention strategies are going to have a substantial competitive edge in the marketplace of today and in future.

- Mudit Chandra and the ScoreData team

  • Share/Bookmark

Rapid Development and Deployment of Machine Learned Models

April 20th, 2016

In the past ten years, we have seen a dramatic rise in the use of machine learning techniques to build predictive models.  In the rapidly evolving Predictive Analytics tools landscape, more and more applications are using machine-learned models as part of the core business process. This trend will continue to grow. From an evolutionary perspective, according to Gartner, the landscape is moving from descriptive to prescriptive analytics.

Enterprises are being challenged with what to do with the tremendous amount of data being generated within the enterprise.  If you look outside Silicon Valley, there are not too many data scientists available or not every enterprise can afford to pay the high prices to perform the analysis.  Even to start a project, the cost is high and value may not be quickly realized.

In order for ML technology to be used in all kinds of enterprises (and not just tech savvy ones), new generation of platforms/tools need to be easy to use and at reasonable cost. Platforms must make it easy for even business professionals (not just data scientists) at companies to be able to use ML techniques to improve business outcomes. Business outcomes span the customer engagement journey from repeat business, to enhanced customer retention and customer satisfaction.

We have seen many recent announcements of Deep Learning technology used by Google, IBM, Microsoft and Facebook and private companies such as H2O among others and now some of those technologies are being open-sourced by enterprises. We now have more than dozen open-sourced technologies that one can leverage to get started. However, there are a few challenges, which we need to address to make platforms easy to use.  Higher-level abstractions need to be defined in the platform for not only for data scientists but also for business users. It would be great, if my product manager can use the platform to find what models are running in the platform and even can build a model using the configuration I used and set up for an A/B test. Why not? Yes, big tech companies have built tools to support that but it is still a struggle for smaller departments, mid-sized companies and startups.  Personally, I have experienced that challenge in large technology companies as well as in startups.

Predictive Analytics and Machine Learning are such oft-used overloaded phrases that there is a tendency to overpromise the benefits. It takes time to show value to a business and the “start small, move fast” philosophy comes in handy. So, the need to be able to start a project at a low cost is critical. Another point to note is that there is no substitute for on-line testing. Don’t use back-tested results to “overpromise” the impact to business.

The best way to approach testing is rapid iteration. Let’s look at key features required in a platform to achieve rapid development. Let me start with a cautionary note – I am not targeting tech-heavy enterprises which have big teams of data scientists, but rather enterprises which want to leverage this technology to solve their business problems and can afford a small data team to prove the worth before investing more. Cloud-based solutions from small and big companies (e.g., Amazon ML) are now available to test out new ideas at a smaller cost.

Let me start with a problem that I ran into in an ad tech area recently and consider that use case to discuss different features that are important for the ML platform. Even if I target “US” audiences, invariably we find that some percentage of traffic is being tagged outside the targeting criteria by reporting system reporting systems (such as DFA). Those fraud impressions cost real-money.  We need a quick way to detect it and not bid on those suspicious impression calls via Real-Time Bidding (RTB) system. This means we need a platform that can be used to continuously update the model every hour.

So, the ability to build models faster with automation for production use at a low cost is important. Let’s look at what key features the next generation of machine learning platform should support.

Understanding Data

Data preparation is a big effort by itself and many tools / platforms may be used to process, clean, augment, and to wrangle the data.  This is a big topic by itself. Right now, 60-80% of the time is spent on data preparation.  For the sake of this blog, we are assuming that data has been prepared to build the models (okay, I will come back and do a post later on data wrangling – hopefully :-) ).

There are a couple of things ML platforms can provide for additional insight from data such as understanding the “goodness of data” for a good model fit.  A data set with mostly constant data is not good – even if it is complete – it will be hard to build a good model.  Simple statistical properties like outlier bounds, skew, variance, correlation, histograms can be easily computed. However, the platforms should go to the next step, i.e., provide a “data quality scorecard” at a feature level and overall. But what does a score of 86 mean?  Is it good or bad? That’s where additional insights and recommendation can help.  It can show the score “compared to” other similar data sets or from a configured well-known feature.  The system can be trained to provide that score and even better a model can be built to generate the quality scorecard.

When one is dealing with 100’s of features, it is quite hard to review data properties – so a recommendation/hint can go a long way to understanding the data and making sure highly correlated/dependent variables are ignored from the model. (Note: Highly correlated variable will be removed in the feature reduction process)

Ease of Use in building Models

Ability to build models for common problems easily is important to platform adoption and broader support. Platform should provide solution templates so that one can get started easily. If I am building a customer churn prediction, it is not hard to build a workflow that can guide the user in easy steps. Can the past models built for the same use cases guide the user in feature engineering?

There is a wide array of algorithms such as GLM, GBM, Deep Learning, Random Forest, and are now available in most of the platforms. Platforms supporting in-memory computations are able to build the models faster and quicker at a lower cost. This is important since newer use cases need the ability to be able to adapt and build to real-time use cases and need the ability build a model frequently (every hour per say).  Start with simple algorithms such as GLM and GBM; they are easier to understand and tune.  Whenever a data scientist in a team comes up with a proposal to solve a problem with complex algorithms, ask them to take a pause and see how to get started with a simple algorithm first and iterate. The iteration is more important than finding the exact algorithm.

Productizing the models

Once models are built, it is critical that, they be enabled for production quickly.  There is no better test than running in production with a small percentage of on-line traffic and getting some results. The quicker, it can be done, the better it is. The platform should support experimental logging of scores. This way you can get scores on your model on production traffic without impacting production application. This functionality is a much-required requirement for data scientists and will enable them to experiment models quickly.

In the past, models were built, converted to code and pushed to production system taking weeks. The new generation of SAAS-based ML platforms have integrated model building and scoring into the platform so that models can be enabled for scoring with click of a button and can be scaled easily. PMML adds portability to the model – although it is never works in an optimized way like the models that are built and scored in the same platform (optimized). So, PMML gives flexibility but sometimes at the expense of optimization – a normal tradeoff, which we encounter in other technology stacks also.

Quick iteration is the best way to know the efficacy of the model and make tuning adjustments.

Visibility & Collaboration

Data science is still a black box for many inside the company.  What models have been built, what models are being used for A/B testing for a certain application, etc., are hard to get at. If you ask a question few months later, what training data was used to build the model, it is not an easy answer.  Many tech-savvy companies have built proprietary tools to manage it.  Data scientists are now using wide array of tools such as R, Python, H20, Spark/MLib among many others. Integration with other platforms /tools is important in providing visibility to peers and fostering collaboration.  How models are built in this wide array of tools can be organized and learnings can be shared should be part of it.

A platform which make it easy to organize/tag models; allow collaboration, and keep track of changes will help speed innovation. The more open it is, the better the chance of success.

Model Templates  

There will be complex problems for which one has to do lots of analysis, feature engineering and build sophisticated algorithms but there are classes of problems for which using simple algorithms and solutions will be good enough. The platform should be easy enough for product managers / business analysts to be able to build models. They should also be able to leverage model configurations from their data scientists to play around with new data.  It should be easy to compare scores of multiple models to see how the new model stacks up.  Providing model templates to common set of problems / verticals can help new users to leverage the platform better.

Data drift: Adaptive to data change & Model Versioning

In most organizations, retraining of models is scheduled, once in 6 weeks or a quarter. These days’ data is changing at a much faster rate and it is important to leverage it sooner. So, the platform needs to provide important data characteristics changes and feature level. These point to data pipeline issues which needs to be addressed sooner since it impacts the model performance.

It will be a good tool to compare differences between two models; configuration, and feature differences. It will be a good analysis tools to understand how data is changing over time and the impact of it.

Note, tech-savvy companies will have lots of tools and a big team of data scientists and they will build custom tools – we are not talking about them.  We are talking about many companies which cannot afford a big data science team and are not in the technology area, and they need tools which are simple to use, can help speed adoption of machine leaning into their business. Cloud based SAAS platform are the best way to get started at a lower cost.

At ScoreData, we have built ScoreFast™, a cloud based platform that is geared for such businesses – simple to use at lower cost. Once a model is built, it can be enabled by a single click for scoring. The model is optimized for speed.  The models can be shared among peers so that they can see what features are being used as well as leverage the configuration to build models using their data.  Configuring a data quality ScoreFast™ Scorecard for each feature and of the overall data set with recommendations to the modeler.

The next generation of ML platforms will make it more transparent, collaborative and easier to use at a lower cost.


  • Share/Bookmark

Predictive Analytics for Financial Services Industry and ScoreFast™

April 4th, 2016

History shows that financial Services industry has always been an early adopter of new age technologies. And even more so with utilizing their data assets for business benefits, which is essentially what data analytics is. IT is because financial services is, at the core, business of making profits over the spread between the earnings on the assets and the expenses on liabilities over a reasonably big customer portfolio.

So the industry by definition is data intensive and success depends a lot on an organization’s ability to understand its customers, their behaviors, and to leverage those insights in day-to-day operations. This is the reason why during the early computing era in the sixties and seventies, banks and financial services institutions were the first businesses to leverage their historical datasets for important business functions like credit decisioning.

Today with the advent of IOT and big data when almost everything we do is being captured at an unprecedented rate. When cloud technologies both deliver new data sources and provide a scalable, pervasive ecosystem for analytics; the same DNA of the financial services industry is fostering an era of unprecedented innovative usage of data and machine learning technologies. On the one hand, traditional financial services firms are finding novel ways to leverage machine learning and big data to optimize standard business processes, while on the other hand new age FinTech firms like Klarna, Notion and Affirm are using all this technological power to redefine the industry itself. One example would be using social media signals and Internet footprint in the credit profiling and decision process, making it more robust and at the same time reducing turnaround times.

Data is the bedrock on which machine learning and predictive analytics stand. So in order to look at how predictive analytics and new advancements in these technologies are changing the banking and financial services industry, let’s look at all the different types of data and signals that are available to these businesses.

At a broader level, there are two types of datasets that a company has access to – business data (the data a company gathers while conducting its business- customer demographics, transactional datasets) and outside data, which in turn can be either public data (social media etc.) or private datasets available for restricted usage (e.g. credit ratings). Companies use these datasets for all sorts of purposes, but essentially to understand their customer segments, their habits, behaviors and preferences; and use these insights to inform their (the company’s) business decisions.

If we list core business functions in the financial services industry – from a business standpoint as well as from the perspective of a variety of predictive use cases for this industry, we can list the following functions: Sales and Marketing, Risk Management (fraud risk, credit risk etc.), Customer Relationship Management, and Collections and recoveries

Having established the categories of datasets and important business functions, let’s now look at predictive use cases for various business functions one at a time:

Sales and Marketing

Marketing involves investing money into campaigns in order to lure new (or old) customers into the business. In order to allocate marketing budgets optimally, it is vitally important to understand the returns on investments from various marketing campaigns historically. One important predictive use case in this arena is Promotion Response Models – which understand the interplay of promotions and resultant responses (footfalls, click-through-rates (CTRs), sales etc.) dependent on historical data. Essentially, these models help companies simulate potential sales (or other relevant metrics) basis specific promotional dollars allocation and run different scenarios according to business strategy. And then use all these simulations to come up with winning budget allocations for maximum ROI.

Another evolved area of predictive analytics application is Sales Forecasting Tools. Being able to use historical sales trends, market directions, macroeconomic data and other relevant signals and accurately foresee a future sales trend is of primary importance to any business, more so for large-scale financial services organizations. Channel Optimization is another area that is a very important predictive use case in sales and marketing functions. It entails devising an elaborate channel wise budget allocation plan for maximum ROI.

To summarize, fundamentally the focus of predictive analytics in sales and marketing functions is on improving marketing efficiency and maximizing the ROI on sales campaigns.

Risk Management

Being able to accurately understand underlying risks (be it fraud risk or credit risk or other risks) and use this information efficiently for business benefit is at the core of success in the financial services industry. And this is why one of the first use cases of predictive analytics in the industry was in the area of credit ratings. Today, with a lot of diverse datasets available, the industry is innovating everyday in order to improve risk management functions.

In Credit Risk Assessment functions, especially at the time of onboarding, new age FinTech firms are using all sorts of signals – from customers’ social media footprint, to social network maps (friends/ colleagues/ family), along with more traditional data sources like demographic and profile information and credit history to improve the credit decision processes – making it more efficient as well as cutting down timelines. The decision cycles are getting shorter without compromising on decision qualities, and in fact in many cases improving them. The whole cloud based distributed computing ecosystem and mobile technologies have made high end computing resources available for innovation and opened up the marketplace. New age FinTech companies (e.g. LendingClub, Affirm, Klarna etc.) are leading efforts in these areas.

A lot of predictive analytics is also used in Credit Line Management- a dynamic assessment of credit line that is to be extended to a customer based on her profile, past behavior and most recent transactional signals.

Another key area for predictive analytics applications in risk management functions is Fraud Risk Management. This comprises of all fraud risk exposures for a bank- from the time of sourcing to all transactional fraud exposures during the lifetime of a customer relationship. Financial services companies use predictive analytics to predict the propensity to fraud at customer levels as well as at transaction levels, and use this information in their risk management decision to establish acceptable risk criteria. Cloud based distributed computing architecture is allowing companies to be very nimble with their fraud risk containment decisions. Companies are using the most recent signals and trends to inform their fraud alert systems, helping them tread the fine balance between customer experience and fraud risk exposure at all times.

Customer Relationship Management

Once a customer comes on board, till the time the relationship ends, all interactions with the customer can be labeled under customer relationship management. Predictive analytics plays an instrumental role in various CRM functions. One such high impact area is Cross-sell/ Up-sell. Selling to an already existing customer makes more business sense than acquiring a new customer. If you do it right, you not only deepen your existing customer relationships but also invest your marketing dollars in the most efficient place. And on the flip side, if you don’t do it right, cross selling to uninterested customers can result in irate customers, and eroding the brand equity. Financial services companies use customer behavior data along with their demographic information and other signals to accurately predict customers’ interest for other products. Today’s machine learning tools can ingest even the most obscure signals to predict the propensity of customers to react positively to cross-sell / up-sell offers.

Another important area of predictive focus is Customer Churn. Acquiring a customer can involve big investments and churn takes away the opportunity of a business to make good on a customer relationship. Being able to successfully predict customers’ propensities to churn in a given period gives businesses enough time to run preventive campaigns and contain customer churn.


Collections is one of the core functions in the financial services business. A company’s ability to collect efficiently on its debts in today’s market depends a great deal on their ability to use the historical data efficiently. This enables them to – preempt the possible default events, predict the payment propensities etc. This helps companies to optimally allocate their collections budgets. Some established predictive use cases in the collections function are variousDelinquency Prediction Scorecards and Payment Propensity Prediction Scorecards (for recoveries portfolios).

With such a complex array of functions to perform in the spectrum of customer engagement- speed of execution, speed of anticipation and speed of delivery of offers to consumers is essential, especially for the Banking and Financial Services industry. ScoreData, with its ScoreFast™ engine makes it possible for all sizes of financial services companies to make their decisions in real-time or near real-time in the broad spectrum of applications in Sales and Marketing, Customer Churn, Risk Management, and Customer Relationship Management.

The most important need for any consumer-facing industry such as Banking and Financial Services is customer engagement. In the three years since ScoreData was founded, they have focused on building solutions for consumer facing industries. In order to further assist banks to improve the responsiveness and effectiveness of their sales and marketing campaigns, and to implement cross-sell strategies to assess customer loyalty, the analytics platform has a variety of pre-built model offerings in consumer analytics, risk analytics and other areas like churn management etc.

The ScoreFast™ platform fosters widespread analytics consumption and insights usage across organizations and has an easy to use business dashboard driven data/model development and deployment facility.   Comprehensive centralized model management with version control means less duplication, more collaboration, and ease of diagnosis when model performance deteriorates.

At run time, models update themselves incorporating a wide variety of company internal, and third party and regulatory data. The platform is flexible enough to ingest new data sources or tune out old data sources during the model building process.

This ensures that the most accurate models get deployed over time. ScoreFast™ then compares and contrasts results from hundreds of in-memory-built models with these algorithms. This is a significant improvement over legacy practices, thus shrinking model-to-market times from weeks or months to days or hours. ScoreFast™ is an ideal platform for new ecosystems in the Banking, Financial Services, and Insurance Industries.

Mudit Chandra
April 03, 2016


  • Share/Bookmark