需要关于此作业的帮助?欢迎联系我

COMP4620/8620 Assignment 2

Part C [39 pts]: Model Building.

A vacuum cleaner robot is tasked to clean a grid-world of size m X n. The floor materials of each cell may be one of the following three types: Vynil, thin carpet, and thick carpet. The effectiveness of the robot vacuum depends on the floor type of the cell. Specifically, if a cell's floor is of type Vynil, the probability that each vacuum action cleans the cell with probability 95%. If the cell's floor is of type thin carpet, each vacuum action cleans the cell with probability 85%. Last but not least, each vacuum action cleans the cell whose floor is of type thick carpet only with probability 75%. In a single move step, the robot's movement is limited to one cell to its left, right, up, or down. The goal is to compute the strategy for the robot vacuum that would enable the robot to clean the entire grid-world with as little cost as possible under these different scenarios:

  • Scenario-1. The robot's motion is perfect and information about the types of flooring in each cell is known a priori.
  • Scenario-2. Information about the types of flooring in each cell is known a priori. However, the robot motion is not perfect: It only moves to its intended destination 85% of the time, with 5% of the time the robot ended up at the left of its intended destination, 5% of the time the robot ended up at the right of its intended destination, and 5% of the time the robot ended up not moving. If the intended destination, or the left/right of the intended destination are obstacles or outside of the operating environment, then the probability mass will be added to the robot staying put.
  • Scenario-3. The robot's motion is perfect but information about the types of flooring in each cell is not known a priori and needs to be assessed using the robot's touch sensor. This touch sensor is correct only 80% of the time, and wrongly identify the floor type to be one of the wrong floor types uniformly (10% for each wrong floor type).

Part D [25pts]: Basic MDP Solving.

  1. [10pts] Please solve Part C - Scenario-1 using the synchronous value iteration method for solving MDP off-line. To this end, please implement the method using Python. For this question, you will receive a full mark if your implementation converges to the optimal solution for m = 1 and n = 4 in under 10 minutes.
  2. [15pts] Please empirically analyse the above implementation on the problems you have defined in Part C - Scenario-1 to reveal the scalability issues of the synchronous value iteration method. You should present the empirical results and a discussion of your findings.

Part E [25pts]: Show us your creativity!

Please improve the rudimentary solver in Part D, so as to either be able to solve larger problems of Part C Scenario-1 or able to solve Scenario-3. We prefer you explore online-solver. But, we will leave this open for you to decide.

  1. [5pts] Please describe your improved solver and why you think this method will be more scalable than the method in Part D.
  2. [20pts] Please implement your proposed improvements and empirically analyse your proposed solver on the appropriate problems you have defined in Part C to reveal the improvements you have made. You should submit your code and test cases, and present the empirical results and a discussion of your findings in the report.