Write-out and read-in parameters into/from configuration files
Allows for repeatable runs and historical records
Write-out and read-in a rule-base
Allows for repeatable experiments
Support mapping column domains to files
Allows for custom domain values. Especially nominal domains such as
colors and conditions.
Support for Background Knowledge
Attach generalization
hierarchies to attributes of any data type. For continuous a
tree decomposition could look like this (imagine the lines
connecting the tokens):
ANY
Low Mid High
A B C D E
[0-.9) [.9-1.7) [1.7-2.6) [2.6-4.1) [4.1,6]
Create rules which map to balanced decision tree
i.e. Murthy, Salzberg 95
changing the know. representation may be difficult
better to find a mapping from tree to DNF rules
Generate rules with more than one attribute class
e.g. Association rules (Agrawal et. al)
e.g. Bayesian networks
Support every model representation formalism used by the more important
data mining algorithms.
Decision Lists
Centroids
Neural Networks
First Order Logic
Questions
How to find out what people need?
How will data mining benchmarks be defined?
Dynamic Definition: A general set of DatGen parameters
is specified.
Fixed Rule Definition: A (set) of domain descriptions
are proposed.
Success Factor: Whatever benchmark style is chosen it has
to be tested against several learn-from-example algorithms
and the results must expose strengths and weaknesses in each.