Dataset Generator (DatGen)

Perfect data for an imperfect world

This site hosts a computer program that produces synthetic data in order to help analyze the performance of programs that consume data. For example, its output could be used to test the performance of a sorting program, or, and this was its original purpuse, it can be used to test the performance of a data mining classifiction program. The table below presents a sample of a generated dataset:
 

#  A1  A2  A3  A4 Class
1 4.1 3 0 C1
2.8  n/a C2
... 
9,999,999  7.3  11  C1

There are two ways that you can use DatGen. The simplest is to use the interactive Web forms below to describe and create your dataset. You can also use DatGen on your computer by downloading the program and learning the program's input parameters.

An overview of data generation
Download DatGen source code (v3.1 1999/12/14)
An overview of DatGen parameters
Use your Web browser to interactivelycreate data with DatGen!


Frequently Asked Questions
Things to do and ongoing questions, (volunteers welcomed :-)
References
 
Updated 2009/01/13 Comments to Gabor Melli