RSARSubsetEval is an “Attribute Selection Evaluator” algorithm (a term within Weka) that evaluates subsets of features. Evaluators for individual dataset attributes (columns/features) also exist, though naturally most data problems are defined by combinations of multiple factors (attributes), so they tend to be less applicable.

RSARSubsetEval is more widely known as a Feature Selection algorithm, which falls under the Dimensionality Reduction theme of machine learning. Essentially, it tries to reduce the number of dataset attributes (or features) your algorithm is processing. Rough Set Attribute Reduction or the “QuickReduct” algorithm is designed to quickly and measurably find a “core reduct” that leads to a consistent label across all instances. Core Reduct is a term from mathematical Rough Set Theory, a quick summary of how that works can be found in these slides. It selects those attributes that are required (according to the sample dataset instances) to distinguish the instance class labels.

## installATION Guide

How to install the Weka Package, via Weka’s SimpleCLI.

1. Start Weka and then Weka's simple command line interface, by clicking the 'SimpleCLI' button.
2. Paste in the following command, and press Enter:  (with the path to the downloaded plugin)

3. After installation, check the package installed correctly, paste in the following, and press Enter.
(this can also be checked via the Weka PackageManager)

java weka.core.WekaPackageManager -package-info installed RSARSubsetEval

4. To use the package from within Weka, close the Weka software, and then re-open Weka.
5. To uninstall the package:

java weka.core.WekaPackageManager -uninstall-package RSARSubsetEval

## How To Use RSARSubsetEval IN a classification Pipeline:

1. Open the Weka Experimenter or Weka Explorer.
The “AttributeSelection” algorithm evaluator will be a part of a pipeline machine learning process (e.g. for model classification).
2. Select either AttributeSelection or AttributeSelectedClassifier.
This choice depends on the outcomes and process you want to achieve.
3. Choose the Evaluator as RSARSubsetEval.
If the option does not exist, follow the installation guide above (and ensure you have Weka 3.7.2 installed).
4. Choose the search mechanism and its parameters.
Best First Search is a good starting point.

The software plugin is for the opensource data mining Weka application. The plugin was built as part of my undergraduate dissertation project at the
Department of Computer Science in Aberystwyth, Wales. The project was supervised by Dr. Richard Jensen.

The plugin algorithm is an implementation of the rough set attribute reduction (RSAR) algorithm presented by A. Chouchoulas and Q. Shen in their 2001 paper: Rough Set-Aided Keyword Reduction for Text Categorisation. This is a feature selection algorithm used to select the attributes of most importance.

The plugin was built and validated against high dimensionality gene expression datasets from the RSCTC 2010 Discovery Challenge conference competition.
The datasets tested were classified, continuous-valued, and had 25k-54k attributes with only <80 objects. Datasets were prepared from microarray experiments on patients suffering from cancers. Specifically under analysis were transcription profiles of human gliomas (E-GEOD-4290) and of gingival tissue (E-GEOD-10334).

## Project

Author: Peter Scully (pds7@aber.ac.uk)
Supervisor:
Project: Investigating Rough Set Feature Selection for Gene Expression Analysis.
Date: May 2011
Location: Department of Computer Science, Aberystwyth University.
• Documents: Investigating Rough Set Feature Selection for Gene Expression Analysis (BSc Dissertation PDF) • (Summary Slides PDF)

## CITe

If you intend to publish your results using RSARSubsetEval in academic publications, please kindly use the following citation to reference the implementation:

Scully P.M.D., Jensen R.K (2011). Investigating rough set feature selection for gene expression analysis.(BSc Computer Science dissertation). Retrieved from (http://users.aber.ac.uk/pds7/weka/). Accessed on (xx/mm/yyyy).

or

Scully P.M.D., Jensen R.K (2011). Investigating rough set feature selection for gene expression analysis.(BSc Computer Science dissertation). Retrieved from (https://petescully.co.uk/weka/). Accessed on (xx/mm/yyyy).