Regression analysis is a statistical method used to relate a variable of interest, typically y (the dependent variable), to a set of independent variables, usually, X1, X2,...,Xn . The goal is to build a model that assists statisticians in...
Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence,...
Mishaps in prenatal development can influence mammary gland development
and, ultimately, affect susceptibility to factors that cause breast cancer. This research was
based on the underlying hypothesis that maternal dietary composition during...
Phase II clinical trial are generally single arm trial where a homogeneity assumption is placed on the response. In practice, this assumption may be violated resulting in a heterogeneous response. This heterogeneous or overdispersed response can be...
Images, Photographic--Databases; Image processing--Digital techniques; Data mining; Cluster analysis
The performance of content-based image retrieval systems has proved to be inherently constrained by the used low level features, and cannot give satisfactory results when the user's high level concepts cannot be expressed by low level features. In...
Data Classification is a task that could be found in many life activities. In general, the term could be used for any activity that derives some decision or forecast based on the currently available information. Using a more accurate definition, a...
Pattern recognition systems; Land mines--Detection; Pattern perception--Data processing; Data mining
Traditional machine learning and pattern recognition systems use a feature descriptor to describe the sensor data and a particular classifier (also called "expert" or "learner") to determine the true class of a given pattern....
Missing data is very common in survey research. However, currently few guidelines exist with regard to the diagnosis and remedy to missing data in survey research. The goal of this thesis was to investigate properties and effects of three selected...
Pattern perception--Data processing; Pattern recognition systems; Land mines--Detection; Data mining
For complex detection and classification problems, involving data with large intra-class variations and noisy inputs, no single source of information can provide a satisfactory solution. As a result, combination of multiple classifiers is playing...
The purpose of this research study is to examine the use of time series forecasting and text mining to investigate the prescription of antibiotics. The specific objective is to examine the relationship between the total payments, private insurance...
From a computerized image analysis prospective, early diagnosis of lung cancer involves detection of doubtful nodules and classification into different pathologies. The detection stage involves a detection approach, usually by template matching,...
The purpose of this dissertation is to find ways to decrease Medicare costs and to study health outcomes of diabetes patients as well as to investigate the influence of Medicare, part D since its introduction in 2006 using the CMS CCW (Chronic...
DNA microarrays--Statistical methods; Gene expression--Statistical methods
Data derived from gene expression microarrays are frequently used to identify candidate genes which can characterize and distinguish between two biological phenotypes. A key step in this process is the selection of an appropriate test statistic to...
Electronic commerce--Corrupt practices--Prevention; Internet advertising--Corrupt practices; Internet fraud
Online search advertising is currently the greatest source of revenue for many Internet giants such as Google™, Yahoo!™, and Bing™. The increased number of specialized websites and modern profiling techniques have all contributed to an...
The complexity of high-dimensional data creates a number of concerns when trying to analyze it. This data often consists of a response or survival time and potentially thousands of predictors. These predictors can be highly correlated, and the...
African American men--Social conditions; Crime and race; Peer pressure
The initial goals of this study include locating and identifying the taxonomic groups mentioned in Moffitt's (1993) (i.e. life-course persistent offenders, adolescent-limited offenders) using data from the National Longitudinal Survey 1997...
The phase II clinical trial is a critical step in the drug development process. In the oncology setting, phase II studies typically evaluate one primary endpoint, which is efficacy. In practice, a binary measurement representing the response to the...
Item response theory; Goodness-of-fit tests; Monte Carlo method
Item response theory (IRT) is expanding to diverse research settings, without accompanying access to easily implemented model fit methods. One simple model fit approach involves x2/df ratios. However, its utility is not known across several...
Corpus callosum; Diagnostic imaging; Autism--Diagnosis
Early detection of human disease in today’s society can have an enormous impact
on the severity of the disease that is manifested. Disease such as Autism and Dyslexia,
which have no current cure or proven mechanism as to how they develop, can...
An ensemble consists of a set of individual predictors whose predictions are combined. Generally, different classification and regression models tend to work well for different types of data and also, it is usually not know which algorithm will be...