Opportunities and Challenges of Driving Value from Data Analytics

Over the next few posts, we will be talking about the progression of Data Analytics — where we are today and where we are headed next. But, first, we start with some history. With basic statistics being the foundation of Analytics, the use of Analytics dates back to the 1900s, which began receiving significant attention in the late 1960s when computers became decision-making support systems.

Data analytics has dominated almost all the industries of the world, and data collection has become an integral part of any organization. These days every click or scroll you do, and every time you open an app, huge amounts of data are being generated and stored for business intelligence and data mining.

Various industries like finance, banking, transportation, manufacturing, e-commerce, and healthcare, use this data to make smarter decisions, gain meaningful insights and predict outcomes. Today, businesses are increasingly using data science to uncover patterns, build predictions using data, and employ machine learning and AI techniques.

For example, the Banking industry uses data analytics in credit risk modeling, fraud detection, and evaluate customer lifetime value. Erica, the virtual assistant of Bank of America, gets smarter with every transaction made by studying customers’ banking habits and suggests relevant financial advice. Finance industries use machine learning algorithms to segment their customers, personalize relationships with them, and increase their businesses’ profitability.

 

Predictive analytics is another aspect of data science that has become necessary for the transportation and logistics industry. Public and private transportation providers use statistical data analysis to map customer journeys and provide people with personalized experiences during normal and unexpected circumstances. Logistics companies use artificial intelligence to optimize their operations in distribution networks, anticipate demand, and allocate resources accordingly.

Data science and AI in biomedical and healthcare data are modernizing the healthcare industry by providing public health solutions. From medical image analysis and drug discovery to personalized medicine, data analytics is revolutionizing patient outcomes.  Data science and machine learning have revealed that there are solutions to the most difficult problems in different industries, and the future success of companies relies on their adoption of data-centric approaches to discover actionable insights. By automating the analytic process, the time value of unlocking insights can be accelerated to provide rapid forecasting and decision making.

“By 2020, 50% of analytic queries will be generated using search, natural-language processing or voice, or will be auto-generated.” – Gartner Analytics Magic Quadrant, 2019.”

We will discuss major challenges and opportunities in adopting various Data Analytics techniques for their businesses in next week’s post. Watch this space or follow us on LinkedIn to stay tuned.

Read More

Expert Analysis on Implementation of Machine Learning on Lobectomy Data.

Our research has enabled us to develop models suitable for targeting and capturing nearly eight readmitted patients out of every 10. Our final model revealed a combination of demographic and diagnosis related features. These combinations further allowed us to analyze the likelihood of someone being readmitted when going through a lobectomy procedure.

This has helped us understand which variables contribute the most to the model.

Circulatory system diseases (I00-I99), certain infectious and parasitic diseases (A00-B99), neoplasms (C00-D49), musculoskeletal system and connective tissue diseases (M00-M99) were among the top contributing factors to the predictive ability of our model in the medical factors.

By understanding the likelihood of a patient’s readmission, pre/post-operative interventions such as weight loss, home monitoring programs, or additional medical procedures can be introduced into a patient’s hospital care cycle, which would improve their outcome and reduce the relative costs for them, healthcare provider, and the hospital.

Likewise, our approach can target different medical procedures for any dataset with similar information but not necessarily all the features used in our models.

Limitations

One of the key limitations we faced in our research was the ICD10 data being available only from Q415 to Q417. This limited us only to research the existing data from a two-year period.

Similar research done on readmission cases covers a decade’s worth data.

Acquisition of more data can enable us further optimizing the models based on the desired target metric and help with class imbalance. The study is limited to the non-medical factors that are being collected in the NRD, and depending on healthcare information providers, the final model is subject to change.

Next Steps

  • Refine the readmission predictive analysis model on a smaller subset of medical and non-medical features and perform more real-world data validation.
  • Refine the model by applying to more massive data sets from other sources.
  • Working with the medical community on possible preventive actions to reduce readmissions.

The Healthcare industry is one of the primary adopters of Machine Learning initiatives in the past decade. Applications of ML goes beyond this prescriptive analysis and can even contribute to highly sensitive AI operations.

Follow us on LinkedIn to stay tuned with the latest technology trends. Or connect with our experts on info@allwyncorp.com.

Read More

Applying the right Machine Learning model for accurate statistics of Lobectomy Patients

More than ten different classification methods such as Logistic Regression, Random Forest, and Xgboost for different feature combinations were used to compare our target classification metrics and choose an optimum model.

Models that consistently showed the close range of scores in their validation phase were chosen. The best performing models were further optimized for high recall scores through cross-validation and grid search methods while keeping precision and accuracy in an acceptable range.  We chose an XGBoost model with a combination of socioeconomic and medical code groups as the final model due to its 75% recall, the ability for interpretation, high efficiency, and fast scoring time.

XGBoost, which falls into the gradient boosting framework of machine learning algorithms, has been a consistent, highly efficient problem solver and can run in major distributed environments.

Recall is the ability of a model to find all relevant cases within a dataset. In our case, true positives (TP) were the correctly classified readmitted patients, and false positives (FP) were the readmitted patients who were incorrectly classified as not readmitted.

We specifically aimed for higher recall scores (TP/TP+FP) since accuracy for an imbalanced dataset would not be a good measure to assess model performance, and we had to focus on identifying the readmitted patients to target and further analyze their underlying features properly.

Feature importance of the final XGBoost model and recall/accuracy curve

The final model showed that socioeconomic features such as the pay category being Medicare, patient age, gender, wage index, and the population category of patients and their diagnosis code groups and many other features that contribute to classification for readmission.

Follow us on LinkedIn and do not miss our final blog on the Machine Learning for Lung Cancer.

Read More

Machine Learning to Improve Outcomes by Analyzing Lung Cancer Data

Finding a suitable dataset for machine learning to predict readmission was the first challenging task we had to overcome. Since, presently available datasets in the healthcare world, could either be dirty and unstructured or clean but lacking information.

Most patient-level data are not publicly available for research due to privacy reasons.

With these limitations in mind, after researching multiple data sources, including SEER-MEDICARE, HCUP, and public repositories, we decided to choose the Nationwide Readmissions Database (NRD) from Healthcare Cost and Utilization Project (HCUP). The Agency creates the HCUP databases for Healthcare Research and Quality (AHRQ) through a Federal-State-Industry partnership, and NRD is a unique database designed to support various types of analyses of national readmission rates for all patients, regardless of the expected payer for the hospital stay.

Our research involved using machine learning and statistical methods to analyze NRD. Data understanding, preparation, and engineering were the most time-consuming and complex phases of this data science project, which took nearly seventy percent of the overall time.

Using big data processing and extraction technologies like Spark and Python, 40 million patients’ records were filtered. (only the ones who have at least undergone a lobectomy procedure once). The filtered data was later put through the best data quality check processes and cleaned while imputing missing values.  And more than 100 input variables were explored that were analyzed correlations with the outcome and understood our target group’s demographics or were redundant.

Many of these features were categorical that required additional research and feature engineering.

NRD dataset mainly consists of three main files: Core, Hospital, Severity.

Core file mainly included the patient-level medical and non-medical factors like their age, gender, payment category, urban/rural location of a patient, and many more are among the socioeconomic factors. However, medical factors include detailed information about every diagnosis code, procedure code, their respective diagnosis-related groups (DRG), time of those procedures, yearly quarter of the admission, etc.

Allwyn data engineering practices included analyzing every single feature, researching, and creating data dictionaries and feature transformation to see which features contribute to our prediction algorithms.  With an average age of 65 for lobectomy patients, the data showed that women had more lobectomies than men, more men were readmitted than women.

Severity file further provided us the summarized severity level of the diagnosis codes. The Hospital dataset presented us information with hospital-level information such as bed size, control/ownership of the hospital, urban/rural designation, and teaching status of urban hospitals, etc.

We consulted subject matter experts in the lung cancer field and, through their advice, added additional features such as Elixhauser and Charlson comorbidity indices to enrich our existing dataset. By delving deep into the clinical features, we also ensured the chosen variables are pre-procedure information and verified no information leakage from post-operative or known future level variables.

The features were then analyzed to check whether they had statistical significance with our selection of predictive models by looking at correlation matrices and feature importance charts.

Analyzing the initial data distribution for many of the features required us to remove outliers, transform skewed distributions, and scale the majority of the features for algorithms that were particularly sensitive to non-normalized variables. Diagnosis codes were grouped into 22 categories to reduce dimensionality and improve interpretation.

The resulting dataset was highly imbalanced in terms of the readmitted and not readmitted classes, 8% and 92%, respectively. Most classification models are extremely sensitive to imbalanced datasets, and multiple data balancing techniques such as oversampling the minority class, under-sampling the majority class, and Synthetic Minority Oversampling Technique (SMOTE) were used to train our algorithms and compare the outcomes.

Initial machine learning models had both low precision and recall scores. Although this could be due to many different reasons, the Allwyn team focused mainly on additional feature engineering to remove the high dimensionality of initial input variables while also comparing different data balancing methods. This was a time-consuming iterative process and required training more than a thousand different models on different combinations or groupings of diagnosis codes (shown in Table 2) along with other non-medical factors.

K-fold cross-validation was also used during the training and validation to ensure the training results represent the testing. We weighted the admission and readmission classes by training models and comparing their validation scores to classify the readmitted patients further.

We also collaborated with George Mason University through their DAEN Capstone program.  The team led by Dr. James Baldo and several participants from the graduate program analyzed the underlying data and developed predictive models using various technologies, including AWS SageMaker Autopilot. The resulting models and their respective hyperparameters were further analyzed and tuned to achieve high recall.

After choosing the best model, we designed and implemented this workflow in Alteryx Designer to automate our process and put it into a feedback-re-evaluation phase as a Cross-Industry Standard Process for Data Mining (CRISP-DM) to enable our model to evolve and be deployed in production.

To know more about how we decided on the best model and associated classification methods, follow us on LinkedIn.

 

Read More

Predicting hospital readmissions and underlying risk factors of Lung Cancer with Machine Learning

Readmission after pulmonary lobectomy is a frequent challenge for hospitals, healthcare plans, and insurance providers. Readmission is a condition when a patient is admitted to a hospital for any reason within 30 days of discharge from their hospital. Re-occurring problems and readmissions have been a major issue in the healthcare system. Readmissions are often costly; however, their findings can be incredibly beneficial for both the public and healthcare industries. With this in consideration, to improve Americans’ healthcare, Hospital Readmissions Reduction Program (HRRP) was brought in motion by the Centers for Medicare & Medicaid Services (CMS). This program penalizes hospitals with excessive readmissions.

Allwyn is developing a machine learning based approach to reduce readmissions by recommending data-driven preventive actions prior to a lobectomy procedure. This approach can be used by various organizations such as hospitals or healthcare companies to take proactive measures and circumvent readmissions by predicting:

  • The probability of a patient’s readmission
  • Underlying risk factors

We will be sharing the challenges with Data Exploration and Engineering, followed by our Strategy and its impact. Follow us on LinkedIn as we share our approach in the coming weeks.

Read More

The Importance of Data Asset Awareness and Protection as part of Your Cybersecurity Strategy

When it comes to cybersecurity, there are three basic knowledge issues that plague most organizations:

  1. They don’t know what data assets they have.
  2. They don’t know where it is stored.
  3. They don’t know how secure it is.

A recent good example is the July 2019 Capital One breach where a hacker is said to have accessed data about credit card customers’ and applicants’ data via a firewall misconfiguration in the firm’s cloud infrastructure. Having the answers to 1-3 above, might have helped prevent this loss of valuable data and even more valuable customer trust.

For the longest time, the common rule among firms was to “collect as much info as you can from visitors/members in hopes that it might prove insightful one day.”  Such unnecessary collection of unessential data is still prevalent on most websites, mobile applications, and sadly brick and mortar stores. Would-be users are faced with the decision of either forking over non-consequential data or forgoing discounts at best. At worse, they’re forbidden access to tools that are in today’s society deemed necessary (banking tools, social media, etc.).  Their data  – phone numbers, email addresses, geographical locations via ip addresses, etc. – which can be used to identify them completely, have become the mandated price of entry.  This cost, at first a light burden to consumers and a supposed boon to the businesses and sites they patronized, are now correctly recognized as a dual-edged sword: insightful and valuable, yes, but also a severe liability due to the ever-increasing prevalence of hacking and data abuse.

The number of cyber attacks, data breaches, data leaks and espionage have increased dramatically and devastatingly in the past decade with cyber crime evolving into a sophisticated industry for “hacking” companies. These companies operate with evermore impunity, some with physical buildings and regularly-paid employees, and reside primarily in places where authorities turn a blind eye to their operations. These firms create the demand for their own services by attacking targets, and then posing as legitimate cybersecurity companies, they provide the solution to those hapless victims, at a cost.

This year, the Word Economic Forum listed cyber threats behind climate change and natural disasters as the fourth greatest risk to world economies. Companies and individuals residing in locations that have been lauded for their social fabric and lack of corruption are no longer as insulated as they once were by geographical boundaries from the lawlessness and vicious criminal activities present elsewhere. Through the connection of the internet of things and mass migration of data to the cloud, all data is accessible with the right tools or information. AI is utilized by cyber criminals now to create nearly undetectable polymorphic malware code and personalize attacks. The dark web provides increasingly sophisticated communication tools and a relatively secure marketplace to hackers. Worse, the amounts of massive money in play in this industry grows exponentially. All of these factors have rendered cyber crime defense a never ending race to stay ahead of cyber criminals, and sadly, since no one – companies or individuals – are immune, all must participate.

——————————-

Cost of cybercrime courtesy of Raconteur:

————-

What are the first things that companies and individuals should do to avoid suffering a devastating attack?

  • Do a Data Asset Audit. Know what you have, where it is stored, and how well it is protected. Follow a standard data asset framework tool like SOC 2 to accomplish this
  • Do a Risk Analysis. Use a standard risk analysis tool like NIST 800-53 and understand the potential impacts of your vulnerabilities being exploited.
  • Determine Your Risk Appetite. Figure out what damage you can live with, and prioritize defending against the damage that will cause the most detriment.
  • Implement Defensive Measures. Hire a well-known or recommended firm to help resolve your prioritized vulnerabilities. Fix easy issues first like training employees on best practices and observing least-privilege access rules.
  • Manage and Maintain. Last, but not least, form an emergency plan and establish best practices for monitoring your systems and managing your data. If possible, also have a cybersecurity firm on retainer to help you restore systems and mitigate damage at a moments notice not if, but when, you are successfully attacked.

Of course the first action item, if you are a company, is to convince your decision makers that these steps are necessary and must be prioritized. That’s easier said than done. However, if the quick rundown of facts in this article aren’t sufficient, bring us in to make the case for you.

Read More

5 ways to fight cyber attacks using AI

The last few years has seen an increase in cyber attacks – whether it is hacking into personal data or bringing down electric grids or tampering with federal data. According to the State of Cyber 2019 report, there is an exponentially increasing breach rate of 232 records/sec. This is only going to see upward trends as the number of connected devices increases, exposing the risk of cyber attacks.

Source: Wipro State of Cyber Report 2019

It is humanly impossible to handle the terabytes of data that is vulnerable to such attacks. Automation is the only answer to this challenge of defending our data. However, unlike traditional software, Artificial intelligence tools like machine learning can plough through the vast quantities of data to find vulnerabilities, hacking patterns and response mechanisms. Machine learning is a discipline of AI where an algorithm can be help in learning from vast quantities of data and make predictions without being explicitly programmed for an output.

Here we take a look at five ways to use AI and machine learning to fight cyber attacks.

1. Intrusion detection:

Typical intrusion detection and defense software use monitors based on previously classified intruders and malicious attributes. Using deep learning, a technique of machine learning, intrusion detection can identify previously unrecognized patterns. Deep learning has the ability to learn from highly unstructured data coming from heterogeneous environments. They are better than other forms of machine learning due to their ability to learn incrementally and extrapolate new features from a limited data set.

2. Multi-entity response:

With the advent of machine learning, a new form of Intelligent Threat response is being used to rapidly and accurately respond to threats. Based on the results obtained by threat detection, threat responses can be driven by machine learning algorithms. These responses are typically undertaken based on recommendations by the users. Based on the type of threat, AI programs can block the source automatically or outmaneuver by sending false signals to gather additional information. As threat volume increases, it is increasingly useful to deploy automated responses to cyber attacks in order to reduce the security incident response times.

3. Tracing the dark web:

Dark web is content on the Internet that requires specific software, configurations or authorization to access. It is usually a nesting ground for illegal activities and can be a source for emerging cyber threats. Machine learning can be used in two ways to monitor activities in the dark web 1. To identify potential threats and keep you abreast of the upcoming trends of attacks or patterns of detection and 2. To identify any information pertaining to your organization, your employees or your products. They can also be used to identify if your company assets like software source code are being openly developed or traded. The exploits identified in the dark web will help accelerate your responses to any attacks. As most hackers constantly change their IPs and domain infrastructure, it is almost impossible to track their activities using traditional mechanisms. Machine learning is helpful to gather insights into these chaotic patterns. Another feature of the dark web is the use of local languages and machine learning and natural language processing can be used to successfully transcend these linguistic and geospatial barriers.

 

Source: Kali Tutorials DarkWeb Statistics

4. Endpoint and network monitoring:

Cyber security teams are often challenged with reduced budgets and increase in security activities such as detection and response. Automating the monitoring of networks and device endpoints is crucial to ensure compliance with your security governance rules. Machine learning/AI provides you the tools to automate the monitoring process. Machine learning can also help you break down data silos and authenticate all users accessing the various sources of data – whether it is transactional or reporting systems. With the help of machine learning, you can monitor new variants of malware by understanding and learning from various aspects and attributes of malware or viruses. You can also use machine learning to simplify your multitude endpoint and networking monitoring tools and consolidate them into a single dashboard.

5. Third party detection:

While in-house systems, applications and devices are vast in a huge organization, it is almost impossible to keep track of third-party systems like vendors and suppliers that often integrate with your systems. Your ecosystem multiplies your risk and exposes your systems if they do not take security as seriously as you do. Recent research shows that organizations are way behind on instituting the governance and technology around third-party risks, across software supply chain, access governance, or data handling.
Machine learning can be used to detect, monitor and alert data coming in and out of third party systems by learning the patterns of data or breaches that occur. In order to effectively manage security of third party data, you would need additional monitoring, controls and governance in place. Machine learning can help you automate the monitoring process across a wide variety of unstructured data. It can also be used to enforce system controls and security policies.

Conclusion:
In conclusion, we are in an age of data proliferation, increased cyber-attacks and cyber security incidents. The only way to manage data protection, reduce risk and increase security is to automate the process. Artificial Intelligence mechanisms like machine learning can help with sifting through the vast quantities of data and use intelligent algorithms to learn and detect patterns of vulnerability so that cyber threats are thwarted and your organization is protected.

Read More

Tired of managing multiple properties? Maybe there is a solution

The rate of home ownership in the USA is expected to fall to 50 percent by the year 2050. A migrant population and change in the perceptions of young people regarding real estate to save their money is the main cause for this trend. Landlords can capitalize on this trend as it will certainly lead to an increase in the rental yields. However, they might face a number of issues related to the management of their property. One of them is tracking rent payments. Here are some of the issues we have noticed:

Managing Multiple Properties/Tenants

Many landlords now own multiple properties and depend on rents earned from them for their livelihood. Tracking rental payments from multiple tenants can be a hassle as some tenants make delayed payments.

Payment tracking

Tenants mostly make their payments through channels other than cash like checks and direct bank transfers. Landlords need a well-planned digital system to track their payments and manage their finances in a streamlined fashion.  

Repair Management

It is the duty of the landlord to conduct repairs to the property and solve other problems raised by the tenant. Keeping track of these concerns and managing the resolution can be a tiresome process, especially when multiple properties are involved

What is the solution?

A good solution to the issues faced by landlords is a software product offering custom digital solutions. Custom-built features embedded into such software should facilitate easy management of properties.

OneRoof from Allwyn Innovations is one such software product that will make life easy for landlords. OneRoof is a ‘customer relationship management platform’ in the cloud that allows landlords to access all documents at one place and manage crucial information that assists them in managing multiple properties, tracking rental payments, and conducting repairs.

Read More

Allwyn Corporation Wins Technology Leader Award for 2018

Allwyn Corporation’s CEO – Ms. Madhu Garlanka won the Technology leader award in the Dulles Regional Chamber of Commerce’s annual business awards for 2018. Theawards were announced at the “Stars Over Dulles” Awards Luncheon event held onDec 5th at Crowne Plaza Dulles Airport in Herndon.

The annual awards recognize the companies andorganizations operating in Dulles region that have exhibited outstandingperformance. Every year, the awards are announced in areas ranging fromleadership in running a small business to demonstrating exemplary corporatesocial responsibility. All the award winners were presented with CongressionalRecord recognition by the office of the Congressman for D-VA 11th District anda Congressional Proclamation by the office of theCongresswoman for R-VA 10th District.

The award demonstrates the efforts of Allwyn Corporation in contributing tothe local economy and improving the quality of life people affected by itsoperations. It also demonstrates Madhu Garlanka’s abilities in leading theorganization toward success.

 

 

Read More

Expanding your horizons and your audience

We know that building a website on your own can be messy. We don’t expect business owners to be experts in web design and we would hope that you don’t expect web designers to be experts in business!

 

As the world is rapidly growing more connected through the exponential growth in the use of websites and social mediait is of the utmost importance to build an online presence for small businesses and startups. A website is the easiest way to increase word of mouth recognition for your brand, expand the audience that you reach and increase sales.

Maximize your website with google ad words as a means of making each google search efficient and generating more hits for your site- making it the most effective marketing decision for your company without the fees of a professional marketing team.With the Allwyn google ad words package, you pay-per-click (PPC) instead of paying a flat fee to give you more bang for your buck.

Allwyn is for everyone!

Nonprofit?

Are you a nonprofit organization struggling with updating your website with new events and ways to donate?

Let our team at Allwyn update your site for you, add a directory and credit card integration as a way to accept online donations.

Businesses, big and small

Are you a business trying to reach more people or trying to limit spam form submissions?

Allwyn can help create and market a website for you and secure your form submission page.

Tired of trying to use social media to market?

We have the solution for you! Let Allwyn highlight key points and create the best fit google ad words to bring more attention to your hard work.

Marketing, Sales, and Branding

The Internet is an absolute necessity as far as where we advertise the brand. A website is necessary for giving the brand legitimacy. These days, handling a brand without a website is like your business is not global. It works very well as a Sales CRM to Multi-brand on a single website.

So let us here at Allwyn Corporation handle the website, and create effective google ad words to increase your business’ online brand and notoriety.We also work to help small businesses create the necessary software for their industry and nonprofits create a website conducive to their cause.

Feel free to contact us at (703)435-4248 or drop us a line at info@allwyncorp.com

 

 

Read More