Learning Heuristics via Imitation of Optimal Planners


A heuristic is a policy that takes as input the state of the search, i.e all decisions undertaken and outcomes observed, and decides which vertex to expand. The perfect heuristic is one that selects exactly the vertices that belong to the optimal path, thus being the most computationally efficient. However, the optimal path itself depends on the underlying world map which is partially known, revealed through the decisions made by the search. Hence it is important for a heuristic to update its strategy as it knows more about the world. We explore efficient methods to train such heuristics by imitating optimal planners.