White Paper
A Hybrid GA-Heuristic Search Strategy
This
paper by Akeel Al-Attar (PhD), the Technical Director of Attar Software, appeared
in the September 1994 issue of AI
EXPERT USA.
printable version (.pdf format)
The
need for a hybrid GA approach
There is currently wide interest in genetic algorithms among the AI community
world-wide. This interest is driven mainly by the inadequacy of linear programming
and rule based (heuristic) systems in solving complex resource planning and
scheduling problems, and more generally by the desire to have an effective
search algorithm which is independent of the nature of the solution domain.
The GA concepts are very appealing since they appear to offer the ultimate
'free lunch' scenario of good solutions evolving without a problem solving
strategy, once the requirements and constraints are defined.
However, the reality for developers attempting to put GAs into practical use can be quite different from the hype surrounding the technology. Developers are often faced with having to use a very large number of 'genes' to represent real problems and, as a result, hours or days to evolve solutions which may not be of acceptable quality. So how does one reconcile the hype and the reality. The answer is that developers take the 'free lunch' promise too literally. They expect a GA system to discover solutions from zero beginnings; i.e. the GA is expected to search for good solutions in a massive search space without being armed with even primitive common sense heuristics. The lack of such heuristics would account for the large number of genes required and the slowness of the evolution process. Over the last three years I have been involved in developing software tools and methodologies for applying GAs to complex search problems. The approach I have developed is a hybrid GA-Heuristic search strategy. The hybrid approach is reflected in both the implementation of the Genetic Algorithm and in the methodology of applying it to solve problems.
Implementing An
Effective Genetic Algorithm
In addressing real world problems the following features are required from
a genetic algorithm (beyond the standard GA functionality):
A hybrid methodology
for Applying Genetic Algorithms
The basic concept behind the hybrid methodology is not to use the GA to directly
optimise the parameters of the solution, but rather to use the GA to optimise
the parameters of a simple heuristic problem solving strategy. The new approach
is shown in the following diagram:
I will demonstrate the application and the effectiveness of this methodology by using a simple resource planning application. In this problem we have four resources A, B, C and D each being capable of carrying out a number of different operation types O1, O2 and O3. For each resource, the time it takes to carry out an operation, the cost of running, and the available capacity are given in the following table:
01,02,03 = time per unit of operation

Given that we have three requirements (jobs) for 1400 units of O1, 1200 units of O2 and 900 units of O3, what is the optimal resource planning that meets the following objectives:
This problem is quite simple and can readily be solved using linear programming or rule based techniques, however it is useful in illustrating our GA methodology.
The obvious way of using GA's to solve the above problem is to define a number of numeric (phenotype) genes to represent the number of units of each operation assigned to each resource. The GA would then be expected to minimise a cost function which consists of the total cost of running the resources plus the total penalty points for exceeding the available capacity of resources. The relative cost of exceeding the available capacity per minute has to be set to a value which reflects the severity of the constraint. This representation of the problem needed some 30 generations of 50 individuals to evolve a solution which had a cost of 99% of the known optimal cost. This is quite slow considering the simplicity of the problem.
Following the new methodology, we used genes to represent the sequence of jobs to be presented to a simple resource planner. For this strategy to work effectively, we needed first to break down the three large jobs (requirements) into smaller jobs. The jobs were broken down as follows:
The initial three requirements are broken down into 12 jobs in total. We can now use 12 sequence genes to represent the order in which these jobs are presented to a simple planner. The simple planner uses very simple heuristics; namely take the next job in the queue and allocate to the first resource (from A to D) which has available capacity. This new gene representation strategy needed only two generations of 50 individuals to evolve a solution with a cost of 100% of the known optimal cost. The evolution time was reduced from minutes in the initial strategy to seconds using the new methodology.
The methodology described above was used to deliver a very innovative application to Channel 4, the largest commercial TV station in the UK. Channel 4 had the requirement of optimising the sequencing of advertisements within a commercial break. They have to ensure that different copies (versions) of the same advert go out simultaneously to different regions of the UK whilst satisfying as many of advertisers requests for special positions as possible. Advertisers can specify a mix of regions, first or last position in a break or a 'top & tail' pair in the same break. A break sequencer has to ensure that all the requirements are met without overlaps or gaps in the break transmitted to each region. In a four minute break across six regions there are 144 ten second slots to schedule with up to 50 adverts. A heuristic knowledge based system solution to the problem was deemed too difficult to achieve and a GA optimisation approach was considered.
The direct way of solving the break scheduling problem is to use 144 index genes to represent each of the 10 second slots across the six regions. Each index gene would return a pointer to an advert (one of 50). This strategy could take hours to optimise. Given that there are over 70 breaks a day to optimise, the evolution time is unacceptably high. However by applying our methodology, we were able to use sequence genes to represent the sequence of adverts to be scheduled (up to 50) by a simple heuristic scheduler. This approach reduced the evolution time to seconds. The simple scheduler works by taking the next advert in the sequence and placing it in the first available position (across the required regions) in the break. The optimisation task becomes one of optimising the sequence in which the adverts are presented to the simple scheduler. A major advantage of using this methodology is that a simple scheduler would always produce a valid schedule which satisfies the hard (lethal) constraints while the soft constraints are represented in the cost function and are satisfied through natural selection.
Our experience with using GA's to develop resource planning and scheduling applications, is that as the complexity of the problem increases so does the number of genes needed. For complex problems, this would result in very long evolution times and may even require more genes than can be handled by the algorithm. Our strategy is that as the complexity of the application increases, we would keep the number of genes under control at the expense of increasing the complexity of the 'simple' heuristic schedule/planner that is used in conjunction with the GA. Typical increase in complexity involves defining additional index genes to select resources from lists sorted according to domain heuristics.
Applying the Hybrid
Optimisation Strategy to Learning Rules From Data
One area in which GA's can potentially offer an effective search strategy
is that of symbolic learning of rules and patterns from data. Symbolic learning
is a supervised learning paradigm which can derive rules from data. Such rules
relate an outcome of interest to a number of attributes. These rules are typically
of the format:
IF attribute1 = a AND attribute2 < b THEN outcome = 1 (Probability = .9)
Where attribute1 and attribute2 are discrete and numeric attributes respectively.
Rules can be used to understand relationships and patterns in a data file or as a data model to predict future outcomes given attribute values. The problem facing a symbolic learning algorithm is that of searching billions of possible rules for generic rules that can accurately predict the outcomes of a test data set. Several attempts have been made to use GA's to search for rules in data. Whilst some of these attempts have been successful, the evolution times required to derive a small number of rules (less than 6) from even small data sets (several hundred records) can typically be hours. The standard approach is to use a long string of genes to represent the set of rules. A maximum number of conditions is assumed, typically four or so. A condition involving a numeric attribute would need one gene to define the attribute, a second gene to define a comparator and a third gene for an attribute value. Conditions involving discrete attributes are more demanding needing one gene to define the attribute and a number of genes representing the discrete values in the condition. Assuming an average of four genes per condition and four conditions per rule, 16 genes are required to define a rule. It follows that even searching for 6 rules would require about 100 genes and therefore a lengthy evolution time. The data set is normally split between a training and a test sample. The outcomes of the rules represented by the genes are derived by scanning the training data set. The cost function to be minimised during evolution represents the number of test data records wrongly classified by the rules.
The above gene representation does not exploit any heuristic strategies which accounts for the lengthy evolution time. On the other hand, symbolic learning algorithms, such as ID3, are based completely on heuristics. Following our methodology of a hybrid GA-heuristic approach we applied this to the symbolic learning problem. Before explaining the way in which this was achieved, let's first consider how an ID3 type of algorithm works. ID3 builds a symbolic model in the form of a classification tree, an example of which is shown below:

Note that each path from the root of the tree to a leaf (outcome point) represents one rule of the type discussed above. The branching strategy used in the above tree is called binary whereby a numeric attribute would have a threshold branch value, and a discrete attribute would have its values split between the two branches.
ID3 uses entropy as a heuristic measure to determine the best attribute to use at every node in the tree. Entropy is also used as a heuristic to determine the threshold value or the grouping of the attribute values between the branches at a node. When applied to learning from real data, classical ID3 is complemented by a heuristic tree pruning strategy to prevent the tree growing to fit noise and irrelevant attributes in the data. Statistical significance of a new node, or the errors introduced by adding a new node are often used as pruning heuristic. The limitations of ID3 with pruning extensions are:
The basic concept behind a hybrid GA approach is to use GAs to evolve multiple classification trees. The genes are used to define the attributes used in the nodes of the tree as shown below for a tree 3 attributes deep:
While the attributes at every node in the tree are defined by genes, entropy heuristics are used to define the numeric threshold or the grouping of discrete values at every branch. Pruning heuristics are also used to prune the tree defined by genes. This is a similar approach to the resource planning problem discussed above. The genes represents a set of attributes or parameters which are used to drive a simple heuristic tree builder. The hybrid approach is very efficient on genes, it uses N-1 genes to generate (potentially before pruning) N rules. By using sets of genes, we can evolve multiple trees simultaneously. The efficiency of the gene representation and the use of heuristics reduces the evolution time by a few orders of magnitude.
This hybrid symbolic learning approach was applied to Loans data supplied by a Bank. Some 430 data records were available. This was split 50:50 between a test and training test. The resulting training set (215) is on the small side for ID3 with pruning extensions which induced a five leaf tree (5 rules) with a classification accuracy of 73%. Only three attributes out of thirteen were used by the tree. In contrast, from the same data, the hybrid GA system evolved three trees which had a total of 11 leafs (rules), a classification accuracy of 78% and used 6 of the available attributes. The evolution time was only a few minutes on a 33MHz 486.
Conclusions
In this article, I have discussed a practical approach to harnessing the potential
of GA's by combining their search power with heuristic knowledge that exists
in an application domain. I hope that I have demonstrated that even simple
domain heuristics can have a major effect on the performance of a GA in a
particular domain. We at Attar Software have helped corporate clients develop
complex systems using this methodology. These systems would have been almost
impossible to develop using either a pure heuristic rule based approach, or
a pure GA optimisation approach.
Dr Akeel Al-Attar, England, July 1994.
The user case stories from Channel 4 TV and United Distillers show practical implementations of this technology.
Copyright © 2002 Attar Software Limited
![]() |
||||||
![]() |
||||||
|
|
||||||
|