Do My CS 452 Artificial Intelligence Assignment using Java
Question - Articial Intelligence (CS 452), Fall 2013
Assignment 05 (100 points)
Due by 10:00 AM, Saturday, 14 December For this assignment, you will be doing some reinforcement learning experiments in the domain of a
simplied version of the video game Tetris. In this game, there are 5 types of blocks; each appears
on-screen as a dierently colored ob ject, and each is encoded using a 2-dimensional integer array.
The types of blocks and their corresponding arrays are as shown in Figure 1. Figure 1: Each block is of a dierent type (from
1to 5) and corresponds to an integer array.
In this game, a player is given a certain number of blocks to begin with (the value is set to 50in the
distributed code). They play by choosing actions. Each action is encod ...Read More ed using a pair of integers
(in a 2-element array), fm; r g, where 0 m 5 is the number of squares to the right that we move
the block (some blocks can only move 4, but that is taken care of by the game itself anyway), and
0 r 3 is the number of rotations of the block (each rotation moves it 90 degrees clockwise).
Once the player has chosen an action, the block drops down the screen and the next block is in
play; blocks are generated randomly, and the player can see the current block and the next one, at
the top of the screen.
As in regular Tetris, each time a row is completely lled on-screen, those blocks disappear. In
addition, whenever any column of blocks gets too high (greater than 3 blocks high) the entire set of
blocks is pushed down, and the bottom row(s) disappear permanently; this way, the screen never
lls up, and the game can always continue until all the blocks are used up. A player begins with a
number of points equal to the total block count for the game (again, 50 in the distributed code).
Each time a row is pushed down, the player loses a point. Thus, the best possible score is equal to
the starting score, and can only be achieved by constantly lling rows and eliminating them before
the columns get too high.
As the game goes on, the screen will ll with blocks. Again, the game-board is encoded using a
2-dimensional array of integers, corresponding to the colors of the various blocks. As shown in
Figure 2, the state of play of the game at any point will be an array (5 rows by 6 columns) with
corresponding to a blank space, and other values (from 1to 5) consisting of colored blocks. Figure 2: One state of the game board area, and the array that encodes it.
The game is written so that at each step, it gets an action from the current player, places the block,
and updates the screen:
Actions are returned by the chooseAction()method in theAgentclass.
After an action happens, the player is given a full copy of the current block, the next block
to be played, and the full game-board array, via the Agentmethod getFullState() .
Based on that state-transition, the player receives a reward via the Agentmethod getReward() .
There are two types of agents that are already written; one acts purely at random, and does very
poorly; the other uses a simple xed policy, and does much better. Your job is to write code for
the third type of agent, which should learn how to play the game in a better fashion (initially, it
just acts randomly, too).
To do this assignment, you will do the following:
Decide upon your reinforcement learning algorithm. You can use any of the methods we
learned about, or use more than one in dierent versions of your program.
Decide upon the agent's state representation. As described, the agent currently gets access
to all details about the blocks and game board. If you try a simple learning algorithm with
such a detailed representation, it will be very slow to learn, since it will treat every dierent
set of colored blocks on-screen as a dierent state, even if they have the same basic shape, for
instance. You will certainly want to somehow simplify the details of the state that the agent
uses when learning its value-function and policy.
Decide upon a reward function. Currently, the reward given by the system, which is returned
by the method getDropReward() in thePlayGame class, is always 0. You should change this
so that a dierent reward is generated, depending upon things like whether the agent has
lled and eliminated a row (a good thing), or if a column has become too high, resulting in
a loss of points (a bad thing). There may be other factors in the reward; this is up to you.
Once you have implemented your learning method, you will need to test it. To do so, you will
need to alter the parameters in the PlayGameclass that control the number of blocks per game
( learningSteps ), and the number of total games ( numRuns). You should set the rst of these
to some reasonably high value (say 2500 or so), so that each game is quite long; you should set
the number of games quite high as well, since your learning agent will need some time to do its
learning. You should also make sure that the game runs very fast when learning; you can adjust
this speed by setting the speedvariable. When testing your code, you can set this value to some
high amount, like 200, which will run the game slowly, so that you can see what it does, but while
learning you will want a low value, like 0, so that it runs as fast as possible. Even at high speed,
learning may take several hours, so you will want to get your code ready, set it up to do some tests,
and let it run for a long time in order to see how it does.
If, by the end of the learning, the agent is not doing very well, you should consider adjusting
some of the details of what you tried. Perhaps you need to run longer, reducing randomness in
your -greedy policy more slowly. Perhaps you need a dierent state representation or reward.
A successful assignment will likely try more than one set of experiments, and report on dierent
approaches tried. You will hand in source-code for the program. In addition, you will hand in a brief report (3 or more
pages) that describes the experiments you ran. In that report you should describe the algorithm
you used, along with any parameters it employed. You should also describe the state-representation
you chose, along with the reward function, in detail. Finally, you should report your results. That
is, how well did the learning work? If you tried dierent methods to make it better, what did you
try? In this section of your report, you should also include one or more graphs of the progress of the
learning algorithm. The code, as written, will record the score for each game played, in a text-le.
You can use this data to plot the progress of learning, which should show the score value trending
upwards over time, given enough runs. (This data is in simple format, and can easily be imported
for use in a program like Excel, or other graphing software, in order to produce your graphs.)
Important Note: Whenever you start a new agent playing the game, any existing data-le wil l
be erased! Make sure that if you do dierent experiments, you move your old data-le to a new
directory for safe-keeping before generating a new one, so you still have records of your older work.
Submit the work by email. Your email should include an archive containing the source code, along
with the report you wrote.
3 ...Read Less
Solution Preview - No Solution Preview Available