Search In this Thesis
   Search In this Thesis  
العنوان
Interactive Graph-based Prediction and Diagnosis of Communicable Diseases /
المؤلف
Ali, Noha Gamal El-Din Saad.
هيئة الاعداد
باحث / نها جمال الدين سعد علي
مشرف / حســام الدين مصطفى فهيم
مناقش / هدى مختار عمر مختار
مناقش / ايمان محمد شعبان
تاريخ النشر
2023.
عدد الصفحات
195 P. :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
1/1/2023
مكان الإجازة
جامعة عين شمس - كلية الطب - قسم نظم الحاسبات
الفهرس
Only 14 pages are availabe for public view

from 195

from 195

Abstract

In this thesis, it has been identified that, regarding the research problem, design a reliable epidemiological model, the research gap can be referred to the employment of classical methods depending on time-consuming and costly traditional sources to collect information for necessary datasets, such as population health surveys or group studies. Moreover, the scarcity caused by the employment of simple homogenous-population, time-independent, and deterministic epidemiology model. These models have not been designed to capture the heterogeneity inherited in any population construction, complexities of human behavior, (i.e. outbreak-related awareness patterns and social contacts), and the effect of deployed mitigation policies over time or implement them effectively, has been identified as another research gap that is needed to be investigated. These flaws may underlie inadequate epidemic outbreak modeling and inaccurate outbreak forecasts, leading to unrealistic or ineffective mitigation procedures. Accordingly, this study addresses these shortcomings in the field of infectious disease outbreak modeling and prediction systems seeking to provide support for assessing mitigation policies and decisions that can alleviate an infectious disease outbreak progression without imposing negative economic and societal consequences.
Very few related investigations have evidently presented a methodology to measure the potential expediency of using social network data and structures to develop a reliable infectious disease outbreak diffusion network. However, this study has employed and assessed social network data as a reliable data source for the proposed early information surveillance and warning system for pandemic outbreaks. In addition, an accurate mathematical epidemiology model, such as the one designed and developed in the current study, has been previously lacking. In the process of implementing the proposed framework for modeling an infectious disease diffusion network, this study applied machine learning/deep learning classification algorithms, along with enhanced mathematical epidemiology modeling, and social graph algorithms to develop a reliable infectious disease diffusion network model for accurate infection progression prediction, and predicting insightful information related to an ongoing infectious disease outbreak.
The current study’s proposed an enhanced framework that integrates various interlinked subsystems and processes, that is designed to accomplish the following: 1) collecting, integrating, preprocessing, and preparing necessary datasets from heterogeneous publicly available data sources, 2) analyzing sentiment taxonomies against the emergence of an infectious disease outbreak, 3) inferring the most hazardous geolocation, 4) evaluating the temporal associations between public awareness and outbreak status, 5) constructing a synthetic social network structure for simulating contact networks based on the geographic, demographic, and awareness levels of the population of a specific geolocation, as indicated in subsystems 1, 2, and 3 mentioned above, 6) building, modifying, and enhancing a mathematical epidemiology model to simulate the spatiotemporal emergence of an infectious disease, and 7) simulating the infectious disease model progression over the constructed synthetic social network. The study combined these subsystems to build an infectious disease diffusion network that is able to accurately simulate various disease progression scenarios and test cases.
In specific terms and by employing COVID-19 as a case study, subsystem 1, has collected and prepared various data sources in order to compile the unique datasets required for the current framework’s testing and evaluation processes. Subsystem 2 has employed social network data collected during the early emergent phase of an outbreak for the prediction and identification of the public sentiment in the most hazardous geolocations. Subsystem 3 has inferred the geolocation of non-geo-tagged Tweets in order to enhance the recognition of the most hazardous geolocations. Subsystem 4 has measured and quantified public awareness levels in response to the infectious disease progression and the control measures employed by the national and international healthcare organizations and governments. At this stage, the framework incorporated social network broadcast data to identify specific countries for further analysis and to extract other features, such as the awareness levels of population in the selected countries, while sustaining the focus on the principle objective of the current study: developing an infectious disease diffusion modeling network. Subsystem 5 of the proposed framework used the previously collected data and extracted features to construct a social network structure that could realistically describe the social contact patterns among the population of any country affected by the infectious disease outbreak. The resulting network (graph) was designed and constructed to consider multiple information layers, such as the population agents (nodes) and their connections (edges), agents’ demographic distribution (age brackets, gender brackets), and agents’ awareness parameterization layers, among other factors that can be considered. Numerous social network structures were implemented to provide diverse test environments that might be used in the diffusion of the infectious disease model proposed and developed in the next subsystems. Subsystem 6 has derived the SEIR(S) infectious disease model in its conventional form as the deterministic, homogenous-mixed population, compartmental model constructed in its fitted from. The fitted-SEIRS model facilitated estimating a remarkably accurate time-dependent parameters by developing an optimization algorithm to fit the compartmental model output to the actual outbreak statistics of the country under investigation. The fitted-SEIRS model was then modified to include symptomatic infection, asymptomatic infection, and fatal compartments to form the proposed SE(2I)RSF epidemiology model. In addition to the compartmental SE(2I)RSF, the modified model was implemented in the stochastic agent-based form capable of including real-world stochasticity and heterogeneity for more accurate and higher precision forecasts that would be comparable to actual statistics. The final step entailed integrating the constructed social network and the infectious disease model together in Subsystem 7 to yield the infectious disease diffusion network that could then be examined and evaluated in various scenarios.
The resulting diffusion network model was tested to show the effects of diffusing the agent-based SE(2I)RSF model to simulate the spread of an infectious disease on different network topologies, with the aim of separating out the model’s performance according to the dynamicity of the network.
6.2 CONCLUSIONS
Since several aspects of infectious disease progression or diffusion modeling are included in the proposed framework, the organization of this conclusion section is designed to highlight the study findings for each of the following areas:
• The potential employment of social network data to predict the geolocation-based social impact of an emerging infectious disease outbreak,
• The capacity for applying conventional and enhanced mathematical modeling to an infectious disease outbreak to predict the outbreak’s future impact in a specific a geolocation, and
• The value added by using social networking (graphs) algorithms to represent geolocation-based synthetic contact networks to facilitate studying infectious disease progression on the agent level.
Social Media Data and Classification Methods
1- Social media postings, along with health organizations’ official websites or news blogs, can be used in a retrospective analysis of any trending topic, especially for life threatening situations such as infectious disease emergence.
2- This framework incorporated Tweet sentiment prediction as an additional feature to the Tweet text, user screen name, and profile description in order to predict the Tweet’s geolocation (place). Although this new feature has not been used in previous research, the current study’s literature review and application of knowledge demonstrated the ability to improve Tweet classification-based location prediction with both classical machine learning and neural network algorithms. Additionally, the new feature provided a deeper level of insight into the collected data, allowing the analysis process to pinpoint the most vulnerable and infection-prone areas.
3- It was concluded that it is vital to recognize the word sense or meaning as used in a text originating from a specific location for the same language, as a different geographical region might have given a different meaning to a word (regional context). Therefore, we concluded that, in addition to employing and testing conventional machine learning-based classification models, various neural network-based algorithms should be tailored and evaluated as in the current study.
4- The experiments conducted revealed that the employment of deep learning methods could overcome many NLP problems, including textual classification, by improving the text representation component and overall model design. Furthermore, it was evident that despite the time complexity of the modern AI algorithms, they were shown to be useful for accurately analyzing, assessing, and managing unplanned situations, especially when empowered with appropriate amounts of data gathered from social networks. Also, it was observable that the employment of pre-trained word embedding was extremely powerful, reducing the time and space complexity required for the classification problem under investigation.
5- It was also concluded that the prediction of the geolocation of an emerging event based on the spatial indications of words in relation to their context and position (index) in the text was recognized when the bidirectional RNN empowered by bidirectional pre-trained word embedding and it outperformed other implemented neural network models, including MLP and CNN. When compared to state-of-the-art approaches presented in the literature, the proposed framework, with its effective selection of different algorithms, hyper-parameters, and configurations, resulted in noticeably better geolocation prediction accuracy.
6- Given the multilingual nature of social media data, the proposed methods in this study analyzed the spread of an epidemic outbreak in terms of linguistic features and numerous text-dense representation algorithms rather than relying on lexicons. This technique could greatly improve the scalability of the proposed framework, allowing it to be applied to a wide range of other incidents or situations. These incidents require the analysis of non-English textual data by using appropriate word embedding algorithms capable of dealing with new language linguistic features, such as using Ara-BERT to process Arabic corpora within the proposed implementation.
7- The collected social network’s outbreak-related early information supported the development of a location-based surveillance system for pandemic outbreaks as a part of the proposed framework, aimed to assist the development of a retrospective analysis of the global population outbreak-related awareness.
Conventional Epidemiology Mathematical Model
8- It was also concluded that, demonstrating the infectious disease outbreak progression for a certain geolocation, using deterministic conventional SEIR might not accurately represent the actual outbreak status is this geolocation, since it was noticed that the infection transmission probability, β, was responsible for the counts of exposed compartments but had a lesser effect on the peak emergence time. Additionally, the latent rate (inverse of the latent period), σ, was observed to be responsible for the timing of the emergence of an infection peak, in turn exerting a major effect on the infection transmission, such that, the longer the latent period (time required for an exposed individual to become infectious), prevents the exposed individual from transmitting the disease to large number of the susceptible individuals. In turn, it has been observed the performance of the model with different recovery rate (inverse of recovery period, or the period that an individual remains infectious), γ, such that, reduced recovery periods has affected in direct reduction of the exposed and infectious population counts but it did not affect much in the peak emergence timing.
9- Furthermore, both the mathematical and numerical analyses presented for the conventional SEIR model supported the conclusion that such models might have two equilibria—disease-free equilibrium, when the basic reproduction number is less than unity, and endemic disease equilibrium, when the basic reproduction number is equal to unity. That said, if the basic reproduction number is greater than unity, the infectious disease model is unstable and can be labeled an epidemic.
Fitted Epidemiology Mathematical Model
10- Another conclusion that emerged from the study was how reliable to combine and integrate two types of models, the phenomenological models and the compartmental models to develop an accurate, and trustworthy epidemiology mathematical model. The phenomenological models provided much higher accuracy than compartmental models, especially when trying to calibrate different outbreak waves. However, these models were mainly employed to represent a single disease compartment (univariate time-series), such as the cumulative infections, daily active cases, or daily fatalities. In other words, the phenomenological models could not be modified to include, for instance, the governmental countermeasures considered to mitigate the spread of the infectious disease in the same way as compartmental models.
11- Nonetheless, utilizing compartmental models solely, might not result in the best outbreak progression predictions due to this type of model’s inability to capture an abrupt increase or decrease in the outbreak counts and biological parameters.
12- Another conclusion that emerged from this study was the utilization of the fitted-SEIRS to integrate the operation of phenomenological LGM curve-fitting and the compartmental SEIRS models along with the utilization of the L-BFGS-B nonlinear optimization algorithm for the fitting and parameters estimation process, has resulted in a very reliable parameters estimation indicated by the reduced RMSE.
Enhanced Epidemiology Mathematical Model
13- The numerical simulations of the extended SE(2I)RSF model supported the conclusion that any variation in the infectious disease epidemiology model’s compartments might lead to a completely differentiated progression model. Notably, applying mitigation policy is extremely time-dependent. Specifically, applying any particular control policy at the wrong time could lead to varying adverse consequences, ranging from severe social and economic disruptions to a completely inefficient mitigation plan in the face of the emerging outbreak. It is then inferred that applying control policies early enough can make a significant difference in overall disease progression. Another reasonable conclusion is that a better scenario is likely to result from immediate but slow intervention policy enforcement that gradually decreases the contact rate and, in turn, reduces infection probability while giving the exposed population a better opportunity to recover, as opposed to delayed but more abruptly applied intervention policy enforcement.
14- When applied early, control policies tend to shift the infection peak into the future, while discontinuous interventions or endogenous behavioral responses can generate a flattened, multi-peaked infection curve. Nevertheless, the latter may impose costlier economic consequences by prolonging the duration of an epidemic.
15- The study findings indicate that in an uncontrolled epidemic, the intrinsic uncertainty mostly originates from uncertainties related to biology of a novel infectious disease (i.e., its unknown reproduction number). In a controlled epidemic, characterized by a low ratio of infected population members, the randomness of the social network becomes the major source of forecast uncertainty, particularly in the context of short-range forecasts. In this light, infectious disease transmission models with accurate social network models are essential for improving epidemic forecasting.
16- The understanding of the interaction between disease dynamics and human behavior is crucial in controlling infection. The experiments conducted over the course of this investigation support the conclusion that the physical and biological characteristics of an infectious disease, such as its infection rate, latent period, basic reproduction number, recovery rate, and fatality rate, along with the contact patterns forming the social network of the population under investigation (characterized by node degree, assortativity, transitivity, and other factors), are the two main aspects that influence the spread of an infectious disease.
17- It was concluded that, real world presents various random factors that have a significant impact on the rate of infection and the rate of diagnosis of patients with the disease, which made it necessary to consider a certain degree of randomness in the model. Therefore, considering the use of a stochastic agent-based SE(2I)RSF/(W)/(WP) with human awareness factorization, infectious disease diffusion model for complex networks is crucial. The proposed methodology has showed the importance of employing heterogeneity in epidemiological modeling, such as, demographical heterogeneity, geographical heterogeneity, and behavioral heterogeneity. These heterogeneities were employed on the agent level as well as the social network level, which, was crucial to the implementation of a realistic epidemic modeling.
Social Network-Based Diffusion Model
18- Collecting accurate real-world contact networks, is likely to be crucial in the process of infectious disease outbreak mitigation, and, the generation of a synthetic social network to represent human-to-human interaction seems imperative, especially when designed and constructed with a good relevance to the actual real-world contact network characteristics (e.g., node degree, node degree distribution, and demographic and social characteristics of the population under investigation). Additionally, taking into account the population density that is found in cities, and the inclusion of measures reflecting population demographic distribution can be essential to the design of mitigation policies that address the most active areas and agents involved in the process of spreading an infectious disease.
19- The main advantage of the network approach, compared with conventional epidemiology models, is that the agent-based model can capture the heterogeneity and locality of social contacts as possible vectors of transmission. This focus allows a micro-level, agent-based modeling of health and economic policy outcomes and individual behavioral responses. However, the network approach still requires additional future empirical work in terms of the specification and identification of the social contacts graph, initial conditions, and node path followed by the epidemic, in addition to the standard epidemiological parameters governing disease incubation, infectiousness, recovery, and mortality.
20- Networks have a fundamental role to play in shaping a scientific understanding of epidemiological processes. Restricting individual interactions to others within a network, rather than an entire population, slows and reduces the spread of infection; therefore, when attempting to predict population-level dynamics from individual-level observations, taking network structure into account is a vital consideration.
21- The current work using networks approximations for social contact networks has highlighted many of the differences between standard random-mixing disease models and disease spread through networks.
22- Standardized network structures, such as small-world and scale-free models, cannot be considered for a reliable human-to-human real-world contact network model because of their differentiated characteristics compared to real-world contact networks. In particular, real-world networks are mainly characterized by high assortativity and trasitivity, while small-world networks are characterized by low assortativity and high trasitivity; meanwhile, scale-free networks are characterized by high assortativity and low trasitivity. Thus, a reworked network that integrates both small-world and scale-free models showed that, it is more reliable representation of a real-world network model, according to the conclusions reached in the current study.
23- The current study indicates the vital need to interpret network indicators to analyze the key infection indicators currently applied. For example, the network structure can affect the rate of increase in the number of infected persons or the transmission rate. Therefore, considering various network and infection indicators at the same time will provide support for evidence-based decision-making during policy formulation processes. Accordingly, the current study’s findings suggest that by analyzing various network indicators and their distributions, authorities can make their policies more feasible. For instance, if the distribution of nodes’ degrees follows a positive skewed distribution, such as in the case of a scale-free network, then removing the nodes of highest degree (super-spreaders) can efficiently reduce the number of patients in the network. Consequently, screening and managing nodes with high infection potential may be more efficient than interventions targeting the entire population. This possibility is supported by the significant effect of the virtual deletion of the super-spreaders nodes. Conversely, if the distribution of network indicators is close to a normal distribution, as in the case of a small-world network, comprehensive policies targeting the entire population may be useful.
24- The current study’s findings reveal that viral transmission over a network-connected population can proceed more slowly and reach a lower peak value than transmission via uniform mixing. Network connections introduce uncertainty and path dependence to the epidemic dynamics, highlighting a significant role for bridge links and super spreaders.
25- Furthermore, the current investigation found that implementing corresponding anti-epidemic measures at different pandemic stages could achieve significant results at a low cost. In the beginning, a global lockdown policy was probably necessary, but isolating infected wards and hub nodes could be more beneficial as the situation eases. Considering the cost issue, quarantining communities where infectious cases are detected can also play a role in suppressing transmission; in addition, the earlier measures are taken, the better the effect. Furthermore, it is necessary to isolate hub nodes throughout the entire process, which means inhibiting aggregations. Nevertheless, this approach can be very disruptive to the economic and social states and requires caution.
26- As an alternative, a focus on promoting people’s awareness of self-protection against epidemics while taking measures was found to offer a critical contribution to disease mitigation. In reality, the smooth progression of work and production resumption is inseparable from people’s awareness of self-protection and preventive behavior. The results indicate that human subjective consciousness plays a decisive role in suppressing virus spread.
27- Numerical simulations demonstrated that the awareness introduced in the model does not affect the extinction of the disease; nevertheless, the scale of the disease will eventually decrease as human awareness increases, which may contribute to the eventual control of the disease.
28- With its wide applicability to any infectious disease outbreak, extending to a worldwide scope, this study was performed using COVID-19 data for various countries with detailed results for the United States of America (USA), the United Kingdom, India, Australia, Canada, and Egypt. Various insightful outcomes emerged from the available data using the proposed methodology. Thus, the proposed framework advances the existing literature and yields promising results for continuous predictive monitoring of any infectious disease pandemic.
According to this study’s analysis and results, the proposed framework is capable of fulfilling the urgent need for a reliable predictive model to support the efforts of the health advisory bodies and decision-makers to take calculated proactive measures to contain the pandemic and maintain a healthy society and economy.
6.3 FUTURE WORK
Most of the networks discussed in this study have been static— i.e. the connections have remained constant over time. This assumption contrasts with the intuitive perception of human interactions breaking and forming. However, when these networks are used for epidemic modeling, this problem does not necessarily arise, provided that the turnover of connections is slow relative to the timescale of the pathogen. Nevertheless, if the pathogen is rapidly evolving amongst agents, then considering temporal networks is crucial. So, further investigations will be detained by constructing a temporal contact graph that quantifies the daily contacts between infectious and susceptible individuals by exploiting a large volume of location-related data. Such a temporal contact graph will be employed in various experimental scenarios, such as, analyze the dynamic contact behavior, identify the potential infected contacted individuals, and assist the decision-making of control measures.