QUESTION:  I need help to build a dataset that comprise 250 numerical features and two distribution classes and at least 1000 rows. I am working with medical real data. My project is a classification system based on supervised model. (basically I am using an hybrid classification method  GA/WKNN). As a benchmark I would like to use a synthetic dataset and I found it your website contains an overview of DataGen system. I was playing with different interfaces on this website but unfortunately I was not successful to get done what I think should contain the dataset that I’m looking for.

 

 

REPLY: From: support@datasetgenerator.com
Sent: Monday, June 19, 2006 11:20:16 AM
 

To create the 250x1000 dataset via the web-based interface to DATGEN I recommend that you use the following form:

http://www.datasetgenerator.com/form_implicitComplex.html

Using the web form is probably just a preliminary way for you to decide whether DATGEN is going to be helpful to you. Afterwards you will likely need to use the command line version. You may also need to modify the code; particularly because I tested it only with fewer than 50 columns (not your required 250).

 

Anyway, for the web-based form, try changing the defaults for the following parameters.

Description:

You can play around with the settings to get different rule bases. Start from simple descriptions and then make more complex.

 

You will likely need to spend a couple of days to get what you want, so you will have to decide if this is worth your time.