How Hard Should The Test Instances Be In Instance Specific Macro Learning?

1 Introduction
During the instance-specific macro learning experiments [1], we faced a problem
in which there was no significant difference between the perfect model
and the other models / macro sets. I thought initially that learning in general
is not useful. But then I realized that this problem was caused partially by the
way I collect data. The test examples were so easy to capture any significant
difference in performance between the models. So, we need to make the test
instances harder to solve in general. It is also possible to fall in the other extreme,
which is making the problems too hard to the point where no model
can solve most of the instance. Practically, for a test instance to be considered,
I think I should fix a lower bound on the runtime of the empty set and an upper
bound on the runtime of the perfect model in order to have a clear view of
the differences between the models/macro sets.
2 Details
We need to find hard-enough instances to test the macro performance. It is
essential that the instances that I test are hard, because otherwise the differences
in models performances might not be clear. This is easy to check: if the
problem is really easily solvable on the empty set, I should not include it in the
results. So I add a lower bound on the time of the empty set:
T(i,m0) > MinTime
But is this assumption correct? If we set this in advance before running the
experiment, then maybe it is okay. My argument is: putting in mind that we
want to measure the significance in the difference of performance between the
models/macro sets, and given that the process switching time of current operating
systems is non zero, we should make such an assumption. This is because
there will be a small overhead in all processes, and also we cannot guarantee
that the process will take the whole CPU time, which can make the differences
in run-times unclear if the instances are small. So, MinTime is proportional to
the process switching time, the time measuring error, and whatever time that
makes us consider an instance in the relatively hard side of the run-time slope.
Test instances that do not satisfy this constraint should not be considered in the
test, because they are not hard enough.
In my experiment, when an instance times out, I register the cut-off time as
its runtime. This is obviously wrong information. But the alternatives can vary
between two techniques: (1) we can set a very high cut-off time where no instance
can timeout, or (2) we can remove instances that time out on any macro
set. Both of these options are impractical. The first is not practical because we
may need an extremely large cut-off time for the whole test set to finish. The
second option is not practical because, sometimes, there are...

