Predicting Length of Stay of Patients with Lung Cancer and Mental Illness

For determining the Correlation between lung cancer patients who have undergone lobectomy and have a mental illness, the team developed multiple machine learning models. For this study, we split the data into 80% training data (4464 sample data) and 20% test data (1117 sample data).

We divided this problem statement into two areas of evaluation:

  • Predicting LOS of a patient with both lung cancer and mental illness using only Diagnosis codes.
  • Predicting LOS of a patient with both lung cancer and mental illness using both Diagnosis codes and Socio-demographic features.

The following algorithms were then developed:

  • SGDRegressor
  • GradientBoostingRegressor
  • LinearRegression
  • KNeighborsRegressor
  • RandomForestRegressor
  • SVR
  • TensorFlow

Problem 1a: Predicting LOS of a patient with both lung cancer and mental illness using only Diagnosis codes.

Here are some examples of the algorithms developed: Below is a Gradient Boosting Regression Algorithm :

Prediction Model days 1.6898821756888593

Median Model days 2.2820053715308863

Average Model days 2.267122095573982

Prediction Model RMS 0.06232652015934372

Median Model RMS 0.7993881552828291

Average Model RMS 0.07624966411136928

 

 

Comparing the model performance of the different machine learning models showed that GBR had the minimum MAE.
GBR was then used to perform cross-validation with KFold Split by 4, and a feature importance plot for GBR was developed. In conclusion, it is clear that F0(Mental disorders due to physiological conditions) is the top feature for predicting LOS, followed by F4(Anxiety, dissociative, stress-related, and other nonpsychotic mental disorders) and F1(Mental and behavioral disorders due to psychoactive substance use).

Problem 1B: Predicting LOS of a patient with both lung cancer and mental illness using both Diagnoses codes and Socio-demographics.

Machine learning models were developed with both mental illness diagnosis and socio-demographics such as age, gender, and income quartiles. Here is the performance of the various machine learning models:

From the above results, GBR performed well with low RMSE, MAE, MSE, and Max error.

In conclusion, as per the feature importance plot, it is clear that AGE is the most important factor for predicting length of stay, followed by ZIPINC_QRTL(Median Income quartiles) and PAY1(Primary Payer information). It is understandable that Aged people will stay longer in hospitals.

Problem 3a: Predicting the total charges for a patient with both lung cancer and mental illness using Diagnosis codes.

From the above results, both Gradient Boosting and Linear regression model performed well with low RMSE, MAE, MSE, and Max error.

As per the feature importance plot, it is clear that F0 (Mental disorders due to physiological conditions) is the top feature for predicting TOTCHG, followed by F4 and F2. It is understandable that patients with mental disorders are affected much and stayed for a longer period in hospital, so they have to pay more charges.

Problem3b: Predicting the total charges for a patient with both lung cancer and mental illness using both Diagnoses codes and Socio-demographics such as Age, Sex, Race, Median household income for patient’s Zip code), expected primary payer, and hosp_locteach (rural, urban nonteaching, urban teaching) and

HOSP_REGION.

From the above results, we find that Gradient Boosting Regression performed well with low RMSE, MAE, MSE and Max error values.

As per the feature importance, it is clear that AGE is the top feature. HOSP_REGION and ZIPINC_QRTL also play a role in predicting total charges.

Limitations

As with any machine learning/AI project, more data can yield better results. We have used two years of data for this exploratory study and hence were limited to the results we have achieved.

Other limitations include not gaining a high level of accuracy for parts 2 and 3 regarding the relationship between the length of stay in the hospital with regards to socio-economic parameters.

Conclusion

Here are the results for all three problems studied for patients who have undergone lobectomy (lung cancer surgery) and have mental illnesses. In the case of patients with mental illness who have undergone lobectomy, analysis of the length of stay at the hospital revealed that F0 (Mental disorders due to physiological conditions) is the top feature affecting the longer length of stay, followed by F4 (Anxiety, dissociative, stress-related and other nonpsychotic mental disorders) and F1(Mental and behavioral disorders due to psychoactive substance use).

When considering the various socio-demographic factors, we find that AGE is the most important factor affecting both length and cost of the stays in the case of Lobectomy patients with SMI.

We hope that you found our research informative. Please follow us on LinkedIn to stay tuned with similar articles on how IT is transforming healthcare and medical research industry.