Reinforcement Learning a Stochastic Dynamic Vehicle Routing Problem with Incomplete Information
Vehicle Routing Problems (VRPs) concern the decision of determining a set of routes for a fleet of vehicles that travel from (possibly more than) one central depot to provide service to (or collect items from) a set of customers, often in an urban environment. VRPs have a colorful range of applications in logistics and mobility services, such as transporting goods or passengers, or conducting services at customer homes. In many cases of VRPs, the decision maker (DM) is uncertain about the problem data. In such situations, the DM learns about such information sequentially, over multiple periods, in a dynamic manner. For example, additional customer requests might arrive during the routing mission, the travel time might increase due to road congestions, and the size of the available fleet might change due to accidents and malfunction of vehicles. In such situations, the dynamic nature of how the stochastic information unfolds gives rise to a class of VRPs known as Stochastic Dynamic Vehicle Routing Problem (SDVRP).
The goal of this thesis is to study an SDVRP in which a single vehicle is available to serve customer requests in a given time horizon defined by the vehicle driver’s shift. In this problem, some customer requests are known in advance, but others are received during the day. When a new request occurs, the DM needs to decide whether the new request is accepted or rejected. If the new request is accepted, the DM has to decide on how to update the route to include the location of the newly accepted request. Otherwise, the rejected requests are lost forever. The goal of the DM is to maximize the number of fulfilled requests.
A common approach to proceed with solving an SDVRP is to (1) assume that the underlying stochastic process can be analyzed and transferred into a predictive model (e.g., nominal probability distribution), (2) use the predictive model to anticipate future scenarios (e.g., the likelihood of a customer request occurring at a given location), and (3) incorporate the predicted scenarios into the framework of optimizing the routing decision. Nevertheless, constructing the predictive model requires sufficient past observations (e.g., previous customer requests), which might not always be available. This can be circumvented by leveraging a reinforcement learning framework that allows the DM to use an adaptive prediction model. In this case, initially, the DM is unsure how the predictive model should reflect the uncertainty. However, as time progresses, the DM observes new realizations of the stochastic process (e.g., new demand requests) and updates their belief about how the predictive model should anticipate the uncertainty. For instance, this could be a situation in which the DM does not know in advance the demand distribution at different locations. However, as more demand requests arrive, the DM revises the predictive model to reflect the different levels of demand requests at different locations.
If you are interested, please feel free to write an email with your CV and your transcript of records to email@example.com.
A framework for the vehicle routing problem with dynamically inserted intermediate stops and its application to the skip loader problem
The thesis will be supervised by an employee of PTV AG in Karlsruhe. The thesis will ideally be worked on directly in Karlsruhe. There, the work can be based on an existing code base can be used.
In the context of this thesis, a theoretical framework for VRP variants shall be developed which have the following in common: along a tour, in addition to order stops (e.g., pickup or delivery stops) have to be scheduled in order for the tour to be valid. In rare cases, more than one intermediate stop is made.
A problem with this feature is the skip loader problem. The skip loader is used to transport empty skips to customers. As a rule, full skips are taken away and immediately driven to a disposal site. The empty skip can then be dropped off at the next customer. The problem becomes more difficult due to the fact that there are skips in two sizes (different DIN), whereby each vehicle can transport both types of skips (homogeneous fleet). It is possible that an empty skip is first driven to a loading point, where another skip has to be loaded, in case the next customer requires a different type of skip. In this simple case, there is only one loading point (here, called a depot) and the vehicle has to drive there if the next customer requires a different type of skip than the one that is loaded.
In this thesis, a theoretical framework will be developed to solve such problems. After implementing a solution procedure for the skip loader problem, the effectiveness will be investigated by means of generated test instances. Different problems with the above property should be solvable with the framework and discussed in the thesis (i.e., a simple variant of the EVRP and a simple variant of the truck-and-trailer problem).
If you are interested, please send an e-mail including your CV and transcript of records to firstname.lastname@example.org.