br Making the BN Cox model

2020-11-24

Making the BN-Cox model computationally efficient One of the challenges to applying the BN-Cox model is an exponential growth of the conditional probability tables (CPT) corresponding to the survival variables, as the number of risk factors increases [22]. When the number of risk factors is high, this Creatinine table becomes intractable. We evaluated two approaches to mitigate this problem: (1) decomposition of the underlying Bayesian network known as parent divorcing, and (2) simplifying the network structure by removing least influential risk factors.
Discussion The CPH model, the Bayesian network learning from data, and the K–M estimates have a similar predictive ability when there are enough data to learn from. However, when there are few data records, the survival curves derived from the K–M estimate and Bayesian network learning from data models depart from the CPH survival curves. In most practical problems, data can include many combinations of risk factors with very small number of records for most of the combinations. In such cases, the CPH model helps to smoothen out the distributions similarly to other models of interactions among variables [30]. Hence, Bayesian networks interpreted from the CPH model may be more useful in practice than the K–M estimate or Bayesian networks learned from data. Their largest benefit over the CPH model, however, are in all those cases, when the CPH assumptions are violated. Bayesian networks have no problem with modeling such situations and offer a more flexible modeling tool that allows for a combination of expert knowledge with data. We have presented one application of the BN-Cox model: A risk calculator for the pulmonary arterial hypertension. We created the BN-Cox model i.e., from the CPH parameters available in the literature without having access to the REVEAL Registry data [5]. Our calculator reproduced the results of the current PAH Risk Calculator exactly. We plan to refine this calculator by (1) learning the parameters of the BN model from the data captured in the REVEAL Registry, and (2) enhancing the resulting BN model with medical expert knowledge. The extended model will relax the assumption of the multiplicative character of interactions between the risk factors and the survival variable. We have little doubt that, with some modeling effort, we should be able to obtain a calculator producing higher accuracy of the risk estimate than the original CPH-based risk calculator. We can use any statistical variable selection method [29] to simplify or reduce the number of risk factors in the CPH models when we have a data set to refit the simplified model. However, when data are not available, we can simplify the model by removing least influential risk factors based on both the value of β coefficients and the statistical significance. When removing risk factors, we suggest marginalization, as secretion leads to smallest error on the average. There are good reasons for not worrying too much about the precision of the BN parameters in practice. Oniśko and Druzdzel [31] found that in medical diagnostic systems based on Bayesian networks, precision of parameters may not be as important as popularly believed. Hence, approximating relationships may leave minimal effect on the practical accuracy of systems based on Bayesian networks.
Acknowledgements We acknowledge the support the National Institutes of Health under grants U01HL101066-01 and 1R01HL134673-01, Department of Defense under grant number W81XWH-17-1-0556, and the Faculty of Information and Communication Technology, Mahidol University, Thailand. Implementation of this work is based on GeNIe and SMILE, a Bayesian inference engine developed at the Decision Systems Laboratory, University of Pittsburgh. It is currently a commercial product but is still available free of charge for academic research and teaching from BayesFusion, LLC, at https://www.bayesfusion.com/. Parts of this paper were presented at the 2014 and 2016 International Conferences on Probabilistic Graphical Models [13], [22]. The BN-Cox Interpretation of the PAH risk calculator, discussed in details in [20], was developed in collaboration with Dr. Raymond L. Benza of Allegheny General Hospital, Pittsburgh, PA.