Reducing the dimensionality of a model parameter space, this strategy enables to explore the space in more detail. The other strategy that can be thought of is refining the ensemble by discarding models which use weak attributes. We expect that such refinement can improve the BMA performance.

To test the assumption made in section 2 and refine DT model ensembles obtained with BMA, we propose a new strategy aiming at discarding the DT models which use weak attributes. According to this strategy, first the BMA technique described in section 2 is used to collect DT models. Then posterior probabilities of using attributes in the ensemble of DT models are estimated. These estimates give us the posterior information on feature importance. Having obtained a range of the posterior probabilities, we then define a threshold value to cut off the attributes with the probabilities below this threshold – we define such attributes as weak. At the next stage we find the DT models which use these weak attributes and finally discard these DT models from the ensemble.
Obviously, the larger the threshold value, the greater number of attributes is defined as weak, and therefore the larger portion of DT models is discarded. The efficiency of this discarding technique is evaluated in terms of the accuracy of the refined DT ensemble on the test data. The uncertainty in the ensemble outcomes is evaluated in terms of entropy. Having a set of the threshold probability values obtained in a series of experiments, we can expect that there is an optimal threshold value at which the performance becomes higher. We can also expect to find a threshold value at which the uncertainty becomes lower. In the following section we test the proposed technique on the problem of assessment of newborn brain maturity from sleep EEG.

In our experiments we used EEG data recorded from 686 newborns during sleep hours. The newborns were aged from 40 to 45 weeks post-conception. Each of these 6 groups contains around 100 patients. The EEGs have been segmented in 10-s intervals to be represented by 72 attributes as spectral powers and their variances. We averaged the EEG segments of each patient to represent the patient by one data sample, so that the problem was represented by 686 data samples in 72 attribute space.
For experiments we used the Bayesian averaging over DT models introduced in section 3 and described in our previous publications [12]. The BMA ran with the following settings. In a burn-in phase we collected 200,000 DTs, and in a post burn-in phase 10,000 DTs. During the post burn-in phase each 7th model was collected to reduce the...

