RIBEIRO, LUIZ A. P. A. ; GARCIA, A. C. B. ; DOS SANTOS, PAULO SÉRGIO MEDEIROS . Dependency factors in evidence theory: a benchmark analysis in a multisensor information fusion scenario. Sensors, v. 22, p. 1-33, 2022. doi: 10.3390/s22062310
Dependency Factors in Evidence Theory: An Analysis in an Information Fusion Scenario Applied in Adverse Drug Reactions
Luiz Alberto Pereira Afonso Ribeiro (UNIRIO)
Ana Cristina Bicharra Garcia (UNIRIO)
Paulo Sérgio Medeiros dos Santos (UNIRIO)
Multisensor information fusion brings challenges such as data heterogeneity, source precision, and the merger of uncertainties that impact the quality of classifiers. A widely used approach for classification problems in a multisensor context is the Dempster–Shafer Theory. This approach considers the beliefs attached to each source to consolidate the information concerning the hypotheses to come up with a classifier with higher precision. Nevertheless, the fundamental premise for using the approach is that sources are independent and that the classification hypotheses are mutually exclusive. Some approaches ignore this premise, which can lead to unreliable results. There are other approaches, based on statistics and machine learning techniques, that expurgate the dependencies or include a discount factor to mitigate the risk of dependencies. We propose a novel approach based on Bayesian net, Pearson’s test, and linear regression to adjust the beliefs for more accurate data fusion, mitigating possible correlations or dependencies. We tested our approach by applying it in the domain of adverse drug reactions discovery. The experiment used nine databases containing data from 50,000 active patients of a Brazilian cancer hospital, including clinical exams, laboratory tests, physicians’ anamnesis, medical prescriptions, clinical notes, medicine leaflets packages, international classification of disease, and sickness diagnosis models. This study had the hospital’s ethical committee approval. A statistically significant improvement in the precision and recall of the results was obtained compared with existing approaches. The results obtained show that the credibility index proposed by the model significantly increases the quality of the evidence generated with the algorithm Random Forest. A benchmark was performed between three datasets, incremented gradually with attributes of a credibility index, obtaining a precision of 92%. Finally, we performed a benchmark with a public base of heart disease, achieving good results.
Keywords: information fusion; Dempster–Shafer Theory; Bayesian network; electronic health records; machine learning; adverse drug reactions