In synthetic classification data sets there is one predicted attribute and the remaining predicting attributes. The quantity and distribution of the predicted attribute is determined by the rule base. Predicting attribute characteristics include their quantity, datatype and domain size. Furthermore some common real world disturbance should also be modeled including missing relevant attributes and completely irrelevant attributes.
Eg. 10,3,R,N,V:22,R,O,V:33,5,I,O,V:44,2,R,M,V
Once the predicting attributes and the rule base are defined the system can proceed to generate data tuples (records). As the number of tuples is increased discovery tools will be able to make better predictions because of the increased information.