An ensemble consists of a set of individual predictors whose predictions are combined. Generally, different classification and regression models tend to work well for different types of data and also, it is usually not know which algorithm will be...
Because of limitations in randomized controlled trials, medical researchers are often forced to rely upon studies of observational data. Confounding is a major difficulty encountered in such studies that can create considerable bias in estimates of...
Electronic surveillance; Information technology--Social aspects; Computer networks--Design and construction
Because of the anonymity that P2P networks provide, they are an ideal medium for
the exchange of contraband material such as child pornography. Unfortunately, not
much research has been conducted on how to best monitor these types of networks
for...
Brief Overview of the Problem: The Environmental Protection Agency (EPA), a government funded agency, provides both legislative and judicial powers for emissions monitoring in the United States. The agency crafts laws based on self-made regulations...
Clustered longitudinal data is often collected as repeated measurements on subjects over time arising in the clusters. Examples include longitudinal community intervention studies, or family studies with repeated measures on each member. Meanwhile,...
Data libraries--Security measures; Computer networks--Security measures; Information storage and retrieval systems; Digital preservation
Data centers (DC) are the core of the national cyber infrastructure. With the incredible growth of critical data volumes in financial institutions, government organizations, and global companies, data centers are becoming larger and more...
Data Classification is a task that could be found in many life activities. In general, the term could be used for any activity that derives some decision or forecast based on the currently available information. Using a more accurate definition, a...
DNA microarrays--Statistical methods; Gene expression--Statistical methods
Data derived from gene expression microarrays are frequently used to identify candidate genes which can characterize and distinguish between two biological phenotypes. A key step in this process is the selection of an appropriate test statistic to...
Human being can easily acquire information by showing the object than reading the description of it. Our brain stores images that the eyes are seeing and by the brain mapping, people can analyze information by imagination in the brain. This is the...
In recent years, a number of computational and statistical problems for identifying SNP-SNP interactions in high dimensional survival data have been studied, and several data mining approaches have been proposed. However, the relative performance...
In this dissertation research, we aim to solve problems of two types of survival data, clustered survival data with potentially informative cluster size and sojourn time data. The methods for these two types of data are different. However, both...
INTRODUCTION: Attention Deficit/Hyperactivity Disorder (ADHD) is a disorder that is prevalent throughout the world. It is believed that 5% of school aged children suffer from ADHD, with some estimates indicating as high as 10% may suffer from the...
Introduction: Children diagnosed with an Autism Spectrum Disorder (ASD) often lack the ability to recognize and properly respond to emotional stimuli. These emotional deficits are also observed in children with Attention-Deficit Hyperactivity...
Longitudinal studies occupy an important role in scientific researches and clinical trials. When taking the analysis of longitudinal data, investigators are often confronted with missing data which will produce potential biases, even in...
Data mining; Lungs--Cancer; Outcome assessment (Medical care)
Lung cancer is the leading cause of cancer death in the United States and the world, with more than 1.3 million deaths worldwide per year. However, because of a lack of effective tools to diagnose Lung Cancer, more than half of all cases are...
Many important applications require the discovery of items which have occurred frequently. Knowledge of these items is commonly used in anomaly detection and network monitoring tasks. Effective solutions for this problem focus mainly on reducing...
Missing data is very common in survey research. However, currently few guidelines exist with regard to the diagnosis and remedy to missing data in survey research. The goal of this thesis was to investigate properties and effects of three selected...
Bioinformatics; Breast--Cancer--Treatment; Medical care--Data processing
Statistical models have been the first choice for comparative effectiveness in clinical research. Though effective, these models are limited when the data to be analyzed do not fit the assumed distributions; which is mostly the case when the study...
Swarm Intelligence (SI) techniques were inspired by bee swarms, ant colonies, and most
recently, bird flocks. Flock-based Swarm Intelligence (FSI) has several unique features, namely
decentralized control, collaborative learning, high exploration...
The complexity of high-dimensional data creates a number of concerns when trying to analyze it. This data often consists of a response or survival time and potentially thousands of predictors. These predictors can be highly correlated, and the...