The area of Knowledge discovery and data mining is growing rapidly. Feature Discretization is a crucial issue in Knowledge Discovery in Databases (KDD), or Data Mining because most data sets used in real world applications have features with...
This study defines a new approach for building a Web Services based infrastructure for distributed data mining applications. The proposed architecture provides a roadmap for "autonomic" functionality of the infrastructure hiding the...
Missing data is very common in survey research. However, currently few guidelines exist with regard to the diagnosis and remedy to missing data in survey research. The goal of this thesis was to investigate properties and effects of three selected...
Brief Overview of the Problem: The Environmental Protection Agency (EPA), a government funded agency, provides both legislative and judicial powers for emissions monitoring in the United States. The agency crafts laws based on self-made regulations...
Data libraries--Security measures; Computer networks--Security measures; Information storage and retrieval systems; Digital preservation
Data centers (DC) are the core of the national cyber infrastructure. With the incredible growth of critical data volumes in financial institutions, government organizations, and global companies, data centers are becoming larger and more...
Swarm Intelligence (SI) techniques were inspired by bee swarms, ant colonies, and most
recently, bird flocks. Flock-based Swarm Intelligence (FSI) has several unique features, namely
decentralized control, collaborative learning, high exploration...
Educational leadership; School improvement programs; School management and organization
This study examines how two schools utilized elements of distributed leadership to implement strategies from a reform intervention for whole school and classroom improvement planning from data. The notion of distributed leadership was refined in a...
This work develops a method of estimating peak daily streamflow, Qpeak, for Kentucky streams using daily average streamflow, Qave, data from the United States Geological Survey's (USGS) National Water Information System (NWIS) website. The purpose...
The topic of this dissertation is the automation of the process of extracting understandable patterns and rules from data. An unprecedented amount of data is available to anyone with a computer connected to the Internet. The disciplines of Data...
Clustered longitudinal data is often collected as repeated measurements on subjects over time arising in the clusters. Examples include longitudinal community intervention studies, or family studies with repeated measures on each member. Meanwhile,...
In this dissertation research, we aim to solve problems of two types of survival data, clustered survival data with potentially informative cluster size and sojourn time data. The methods for these two types of data are different. However, both...
DNA microarrays--Statistical methods; Gene expression--Statistical methods
Data derived from gene expression microarrays are frequently used to identify candidate genes which can characterize and distinguish between two biological phenotypes. A key step in this process is the selection of an appropriate test statistic to...
The revolution in information technology and the explosion in the use of computing devices in people's everyday activities has forever changed the perspective of the data mining and machine learning fields. The enormous amounts of easily...
Data mining; Lungs--Cancer; Outcome assessment (Medical care)
Lung cancer is the leading cause of cancer death in the United States and the world, with more than 1.3 million deaths worldwide per year. However, because of a lack of effective tools to diagnose Lung Cancer, more than half of all cases are...
The airline regulatory communities are interested in methods that can access degradation in the polymer insulation of aging aircraft wiring. This study investigates the response of bulk polymer films and aircraft wiring under indentation; changes...
Respiratory syncytial virus is the leading cause of lower respiratory tract infection
in infants and currently lacks an effective vaccine or treatment beyond symptom relief.
The atomic force microscope is particularly well suited for imaging...
A common research interest in medical, biological, and engineering research is determining whether certain independent variables are correlated with the survival or failure times. Standard statistical techniques cannot usually be applied for...
United Parcel Service; Aeronautics, Commercial--Freight--Data processing; Forecasting--Data processing
This thesis develops a forecasting model to predict six different volume measures on a weekly and daily basis for UPS-Supply Chain Solutions (UPS-SCS). The volume measures are used by UPS-SCS to develop business plans, operation plans, and staffing...
Nonparametric statistics; Estimation theory; Distribution (Probability theory)
Multistate models are a type of multi-variate survival data which provide a framework for describing a complex system where individuals transition through a series of distinct states. This research focuses on nonparametric inference for general...
The performance of pediatric circulatory support devices depends on the properties of pediatric blood. This study reports the measurement of the viscoelastic properties of pediatric blood at 37°C [body temperature]. The results were compared with...