Reinforcement Learning (RL)

Learning has been part of most of intelligent aspects of our daily life. We have always wondered how the weather is forecasted so we can decide what to wear. We also wonder how traffic cams can recognize the plate numbers to issue a ticket. In these type of learning, a teacher is needed to have a necessary prior knowledge. In the traffic cam example, a teacher will enrich the learning algorithm with a set of letters and numbers samples so it will learn the pattern. The other type of learning that we might not be aware of is how the baby learn to walk or learn to balance himself/herself in different positions. This kind of learning is called reinforcement learning. In this type of learning, we don’t have a teacher or supervisor to decide the proper action to take. In the traffic cam learning, some Intervention is required from a supervisor at the beginning of the learning to mark the numbers and letters. That is why reinforcement learning will not be able to solve problems like pattern recognition. In reinforcement learning you only need to give a feedback to the learning algorithm to improve the performance. For example, a cheer to a baby or reaching a destination are rewards for the baby to learn walking, but falling is a penalty. Using these rewards and penalties the baby will learn to choose the correct action for each possible position he is on, and eventually, learn to walk! Three variants of reinforcement learning was developed: Monte carlo (MC), Dynamic Programming (DP), and Temporal differences (TD). To explain the difference let us reuse the baby example again. If the baby observe the rewards and the penalties directly after taking an action, the baby will think of his/her action as good or bad, respectively. This is TD. However, if the baby did several actions to try to walk and then a terminal reward or penalty was given, he/she should try to examine his/her actions backward. Obviously the baby in real life is most applicable to the TD variant. For DP, let us assume that the baby has super powers to set a plan of actions to learn how to walk and he/she already knows the rewards to these actions. Then he/she will divide the problem of learning into smaller problems in which he/she knows the action to take.

Reinforcement learning to build a tank for RoboCode

A tank was coded to use RL with look up table. The LUT table consist of the following state-action rows:

X,Y position
Heading of the robot
Heading of the enemy robot
Relative distance to enemy
Move ahead
Move background
Move left
Move Right

The RL variant that was used is temporal difference. Code:LUTImplementation.java

 
	
/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package rl_look_up_table;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Hashtable;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import robocode.AdvancedRobot;

   import java.io.*;
import java.util.Random;
import java.util.zip.*;
import robocode.RobocodeFileOutputStream;

/**
 *
 * @author Owner
 */
public class LUTImplementation implements LUTInterface {
    /*
     * Constructor. (You will need to define one in your implementation)
     * @param argNumInputs The number of inputs in your input vector
     * @param argVariableFloor An array specifying the lowest value of each variable in the input vector.
     * @param argVariableCeiling An array specifying the highest value of each of the variables in the input vector.
     * The order must match the order as referred to in argVariableFloor. 
     */

    int argNumInputs; //4 + 2
    int[] argVariableFloor; // []
    int[] argVariableCeiling;  // []

    static HashMap stateTable;
    double Alpha = .2;
    double Gamma = .9;

//int enemyEnergy_levels=4;
    //8192
    int robotHeading_levels= 4;
    int distanceToEnemy_levels = 2;
    int headingOfEnemy_levels = 4;
//    int bearningToEnemy_levels = 4;
    int robot_coordinates_levels = 48;
//int gunAngle_levels=4;
//int movementAngle_levels=4;

    State currentState;
    Action currentAction;
    boolean robotBeningHitEvent;
    boolean robotHitTheWallEvent;
    boolean robotHitEnemeyEvent;
    boolean RobotDeathEvent;
    boolean WinEvent;
    boolean HitByBulletEvent;
    boolean BulletMissedEvent;
    boolean BulletHitEvent;

    int reward;

//    int enemyEnergy;
    int distanceToEnemy;
    int headingOfEnemy;
    int bearningToEnemy;
    String tiledPosition;
    int robotHeading;
    int countQValueChanges;
    double last_qvalue_changed;
    boolean load; //everything is static , this variable is used to load the state table from file
  
    int numberOfWins;
    ArrayList winsArrayList;
    int numberOfWallHits;
    int totalReward;
    ArrayList totalRewardArrayList;
    int totalRounds;
    
    boolean terminalRewardsOnly;
    boolean toExplore;
    double epsilon;
    boolean isOnline;
    

    @Override
    public boolean load(String argFileName) throws IOException {
        throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
    }
    enum ActionCode {head, back, left, right};
    
    public LUTImplementation() // int argNumInputs,
    // int [] argVariableFloor,
    // int [] argVariableCeiling )
    {
//     this.argNumInputs= argNumInputs;
//     this.argVariableFloor=argVariableFloor;
//     this.argVariableCeiling=argVariableCeiling;

//        stateTable = new HashMap(robotHeading_levels * distanceToEnemy_levels * headingOfEnemy_levels * bearningToEnemy_levels * robot_coordinates_levels);
        stateTable = new HashMap(robotHeading_levels * distanceToEnemy_levels * headingOfEnemy_levels  * robot_coordinates_levels);
//     StateRow X= new StateRow(0,0,0,0);//enemyEnergy, distanceToEnemy, headingOfEnemy, bearningToEnemy

//     currentState.actions.add(new Action(0,"up"));
//     currentState.actions.add(new Action(0,"down"));
//     currentState.actions.add(new Action(0,"left"));
//     currentState.actions.add(new Action(0,"right"));
        robotBeningHitEvent = false;
        robotHitEnemeyEvent = false;
        robotHitTheWallEvent = false;
        RobotDeathEvent = false;
        WinEvent = false;
        HitByBulletEvent = false;
        BulletMissedEvent = false;
        BulletHitEvent = false;
//z
        reward = 0;

//        enemyEnergy = 0;
        distanceToEnemy = 0;
        headingOfEnemy = 0;
        bearningToEnemy = 0;
        tiledPosition = "00";
        robotHeading=0;
        countQValueChanges=0;
        last_qvalue_changed=-1010;
        load=true;
        
        numberOfWins=0;
        winsArrayList= new ArrayList<>();
        numberOfWallHits=0;
        totalReward=-100;
        totalRewardArrayList= new ArrayList<>();
        totalRounds=0;
        
        terminalRewardsOnly=false;
        toExplore=false;
        epsilon=.7;
        isOnline=false;
                 
        for (int i = 0; i < robotHeading_levels; i++) {
            for (int j = 0; j < distanceToEnemy_levels; j++) {
                for (int k = 0; k < headingOfEnemy_levels; k++) {
//                    for (int l = 0; l < bearningToEnemy_levels; l++) {
                        for (int m = 0; m < robot_coordinates_levels; m++) //                      for (int m=0;m10000)
        {
           currentAction = maxAction(currentState);
        }
        
        Action lowestAction = minQ(currentState);
        System.out.println("Current State:"+ currentState.key+", Action: "+ currentAction.actionCode+", QValue: "+currentAction.QValue+", Don't go "+ lowestAction.actionCode+ ", with Qvalue:"+lowestAction.QValue);
        //2.take action a, computre reward, s'
        LUTImplementation.ActionCode actionCode = currentAction.actionCode;
        if (actionCode==ActionCode.head) {
            robot.ahead(100);      
//            robot.turnRadarRight(360);
        }
        if (actionCode==ActionCode.left) {
            robot.setTurnLeft(90);
            robot.setAhead(100);
            robot.execute();
//            robot.turnRadarRight(360);
        }
        if (actionCode==ActionCode.back) {
            robot.ahead(-100);
//            robot.turnRadarRight(360);
        }
        if (actionCode==ActionCode.right) {
            robot.setTurnRight(90);
            robot.setAhead(100);
            robot.execute();
//            robot.turnRadarRight(360);
        }
        
        if(!terminalRewardsOnly)
        {
            reward=-1;

    //         System.out.println("Action: "+currentAction.ActionName+ ", QValue: "+ currentAction.QValue);
            //2.2 compute reward
            //Select next state and give it in the MaxQ function
            if (robotBeningHitEvent) {
                robotBeningHitEvent = false;
            } else if (robotHitEnemeyEvent) {
                robotHitEnemeyEvent = false;
            } else if (RobotDeathEvent) {
                RobotDeathEvent = false;
            } else if (WinEvent) {
                WinEvent = false;
            } else if (HitByBulletEvent) {
                reward = -8;
                HitByBulletEvent = false;
                System.out.println("[HitByBulletEvent]Reward+="+Integer.toString(reward));
            } else if (BulletMissedEvent) {
                reward = -1;
                BulletMissedEvent = false;
                System.out.println("[BulletMissedEvent]Reward+="+Integer.toString(reward));
            } else if (BulletHitEvent) {
                reward = 8;
                BulletHitEvent = false;
                System.out.println("[BulletHitEvent]Reward+="+Integer.toString(reward));
            } else if (robotHitTheWallEvent) {
                reward = -10;
                robotHitTheWallEvent = false;
                System.out.println("[robotHitTheWallEvent]Reward+="+Integer.toString(reward));
            } 
            totalReward+=reward;

            System.out.println("Total Reward"+totalReward);
        }else
        {
            if (WinEvent) {
                reward = 10;
                WinEvent = false;
            } else
            if (RobotDeathEvent) {
                reward = -10;
                RobotDeathEvent = false;
            }
            
                totalReward+=reward;
            System.out.println("Total Reward"+totalReward);
        }
        //2.3 compute s'
        this.robotHeading=fixHeading(robot.getHeading());
        double x=robot.getX();
        double y = robot.getY();
        this.tiledPosition=fixPosition(robot.getX(), robot.getY());
                
                
        String key = Integer.toString(robotHeading)
                + Integer.toString(distanceToEnemy)
                + Integer.toString(headingOfEnemy)
                + tiledPosition;
        State nextState = (State) stateTable.get(key); 

        //3.Q(s,a)= Q(s,a) + alpha * (R(s,a) + gamma * Max(s', a') - Q(s,a))
        Action maxedFutureAction = maxAction(nextState);
        double maxQ = maxedFutureAction.QValue;
        
        if(isOnline)
            maxedAction=currentAction;
       
        maxedAction.QValue = maxedAction.QValue + Alpha * (reward + Gamma * maxQ - maxedAction.QValue); // q-learning      
        if(maxedAction.QValue!=0.0)
            countQValueChanges++;
        last_qvalue_changed=maxedAction.QValue;
        
        //update the state
        int index = currentState.actions.indexOf(maxedAction);
        currentState.actions.set(index, maxedAction);

        // save the state in the LUT
        stateTable.put(currentState.key, currentState);

        //s<---s'
        currentState = nextState;
    }

    
     private int fixHeading(double heading) {
        if(heading>0 && heading <90)
            return 0;
        else if (heading>=90 && heading <180)
            return 1;
        else if (heading>=180 && heading <270)
            return 2;
        else 
            return 3;
    }

    private String fixPosition(double x, double y)
    {
        int distance= (int) ( Math.floor(x/100) +(8 * Math.floor(y/100) ) );
        String distanceString;
        if(distance<10)
            distanceString="0"+Integer.toString(distance);
        else
            distanceString=Integer.toString(distance);
        return distanceString;
    }
    
    Action minQ(State state) {
        //get the old action, we only need to change the QValue
        Action lowestQAction = state.actions.get(0);//= new Action(oldAction.QValue, oldAction.ActionName);
        double minQ = 100000000;
        for (Action action : state.actions) {
            if (action.QValue < minQ) {
                minQ = action.QValue;
                lowestQAction = action;
            }
        }
        return lowestQAction;
    }
    
    Action maxAction(State state) {
        Action maxedQAction = state.actions.get(0);//= new Action(oldAction.QValue, oldAction.ActionName);
        double maxQ = -10000000;
        for (Action action : state.actions) {
            if (action.QValue > maxQ) {
                maxQ = action.QValue;
                maxedQAction = action;
            }
        }
        return maxedQAction;
    }
    
    Action anyAction(State state) {
        Random generator = new Random(); 
        int i = generator.nextInt(3) + 0;
        
        Action anyQAction = state.actions.get(i);//= new Action(oldAction.QValue, oldAction.ActionName);
       
        return anyQAction;
    }

    /*
     * My implementation
     */
    public int indexFor(StateRow X) {
        Key key = new Key(X);
        return key.hashCode();
    }

    /*
     * A helper method that translates a vector being used to index the look up table
     * into an ordinal that can then be used to access the associated look up table element.
     * @param X The state action vector used to index the LUT
     * @return The index where this vector maps to
     */
    public int indexFor(double[] X) {
        return 1;
    }

    /*
     * @param X The input vector. An array of doubles.
     * @return The value returned by th LUT or NN for this input vector
     */
    public double outputFor(StateRow X) {
        return 1.0;
    }

    /*
     *
     * This method will tell the NN or the LUT the output
     * value that should be mapped to the given input vector. I.e.
     * the desired correct output value for an input.
     * @param X The input vector
     * @param argValue The new value to learn
     * @return The error in the output for that input vector
     */
    @Override
    public double train(double[] X, double argValue) {
        return 1.0;

    }

    /*
     * A method to write either a LUT or weights of an neural net to a file.
     * @param argFile of type File.
     */
    @Override
    public void save(File argFile) {

    }

   


    /**
     * The Matcher.matches method attempts to match the *entire* input to the
     * given pattern all at once.
     */
    private static boolean matchAll(String aText) {
//    log(fNEW_LINE + "Match ALL:");
        String fNEW_LINE = System.getProperty("line.separator");
        String fREGEXP
                = "\\d*\\.\\d* \\w+" + fNEW_LINE;

        Pattern pattern = Pattern.compile(fREGEXP, Pattern.COMMENTS);
        Matcher matcher = pattern.matcher(aText);
        if (matcher.matches()) {
//      log("Num groups: " + matcher.groupCount());
//      log("Package: " + matcher.group(1));
//      log("Class: " + matcher.group(2));
            return true;
        } else {
//      log("Input does not match pattern.");
            return false;
        }
    }

    private String readFile(String file) throws IOException {
        BufferedReader reader;
        StringBuilder stringBuilder = new StringBuilder();;
        try {
            reader = new BufferedReader(new FileReader(file));
            String line = null;
            stringBuilder = new StringBuilder();
            String ls = System.getProperty("line.separator");

            while ((line = reader.readLine()) != null) {
                stringBuilder.append(line);
                stringBuilder.append(ls);
            }
        } catch (IOException e) {

        }
        return stringBuilder.toString();
    }

    @Override
    public double outputFor(double[] X) {
        throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
    }
    
    
 

}

Code: RobotWithRLWithLUT.java

 
	
/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package rl_look_up_table;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.OutputStreamWriter;
import java.io.Serializable;
import java.io.Writer;
import java.text.DecimalFormat;
import java.util.Vector;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;
import robocode.*;
import robocode.control.events.BattlePausedEvent;
import robocode.control.events.RoundStartedEvent;
import java.io.*;
import java.util.ArrayList;
import java.util.zip.*;
import rl_look_up_table.LUTImplementation.ActionCode;
import static rl_look_up_table.RobotWithRLWithLUT.RL_LUT;
import robocode.control.BattleSpecification;
import robocode.control.RobotResults;
import robocode.control.events.BattleCompletedEvent;
import robocode.control.events.BattleStartedEvent;
import sun.misc.IOUtils;
/**
 *
 * @author Owner
 */
public class RobotWithRLWithLUT extends AdvancedRobot {
    
    /**
     * @param args the command line arguments
     */
//    public static void main(String[] args) {
//        // TODO code application logic here
//    }
    static LUTImplementation RL_LUT = new LUTImplementation();
    
//    String fileName="C:\\Users\\Owner\\Google Drive\\ece592\\robots\\2.a\\RL_look_up_table\\build\\classes\\rl_look_up_table\\LUT.txt";
    @Override
    public void run() {
//        load LUT from a file if it exists
       RL_LUT.terminalRewardsOnly=false;
       RL_LUT.toExplore=true;
       RL_LUT.epsilon=.1;
       RL_LUT.isOnline=true;
       
//        if(RL_LUT.load)
//            load();
        
         while (true) { 
            RL_LUT.learn(this);
//            System.out.println("Learn");
         }
     }
  
    @Override
     public void onScannedRobot(ScannedRobotEvent e) {
        RL_LUT.distanceToEnemy=fixDistance(e.getDistance());
        RL_LUT.headingOfEnemy=fixHeading(e.getHeading());      
        
        //I am updaing my robot info from the learning class before observing the next state
        
//        RL_LUT.bearningToEnemy=fixBearing(e.getBearing());        
//        RL_LUT.tiledPosition=fixPosition(this.getX(), this.getY());
//        RL_LUT.robotHeading=fixHeading(this.getHeading());

        fire(1);
     }
     
     @Override
     public void onRobotDeath (RobotDeathEvent event)  {
         RL_LUT.RobotDeathEvent=true;
         //save the LUT to a file
//          RL_LUT.save_LUT(Filename);
     }
     @Override
    public void onWin (WinEvent event) {
         RL_LUT.WinEvent=true;
      
         //for Matlab
         RL_LUT.numberOfWins++;     
         


     }
    
    
     
     
     
    @Override
    public void onHitByBullet (HitByBulletEvent e) {
        RL_LUT.HitByBulletEvent=true;      
              
//        RL_LUT.tiledPosition= fixPosition(this.getX(), this.getY());
//        RL_LUT.robotHeading=fixHeading(this.getHeading());
//        RL_LUT.bearningToEnemy=fixBearing(e.getBearing());
     }
    @Override    
    public void onBulletMissed (BulletMissedEvent e) {
         RL_LUT.BulletMissedEvent=true;
//        RL_LUT.tiledPosition= fixPosition(this.getX(), this.getY());
//         RL_LUT.robotHeading=fixHeading(this.getHeading());
     }
    @Override
    public void onBulletHit (BulletHitEvent e) {
         RL_LUT.BulletHitEvent=true;
//        RL_LUT.tiledPosition= fixPosition(this.getX(), this.getY());
//        RL_LUT.robotHeading=fixHeading(this.getHeading());
     }
    @Override
    public void onHitRobot (HitRobotEvent e)   {
        if(e.isMyFault())
            RL_LUT.robotHitEnemeyEvent=true;
        else
            RL_LUT.robotBeningHitEvent=true;
        
//        RL_LUT.tiledPosition= fixPosition(this.getX(), this.getY());
//        RL_LUT.robotHeading=fixHeading(this.getHeading());
//        RL_LUT.bearningToEnemy= fixBearing(e.getBearing());

     }
    @Override
       public void onHitWall(HitWallEvent event)   {
       
            RL_LUT.robotHitTheWallEvent=true;
//            RL_LUT.tiledPosition= fixPosition(this.getX(), this.getY());
//            RL_LUT.robotHeading=fixHeading(this.getHeading());
            RL_LUT.numberOfWallHits++;       
     }
     
    void load()
    {
         RL_LUT.load=false;
        try
	{
            ZipInputStream zipin = new ZipInputStream(new
            FileInputStream(getDataFile("zipped.zip")));
            zipin.getNextEntry();
            ObjectInputStream in = new ObjectInputStream(zipin);
             
            String obj;
            while ((obj = (String)in.readObject()) != null)
            {
                //processing
                String[] parts = obj.split(" ");
                State tempState= new State();
                tempState.key=parts[0];
                LUTImplementation.ActionCode tempActionCode= LUTImplementation.ActionCode.values()[Integer.parseInt(parts[2])];
                
                Action tempAction=new Action(Double.parseDouble(parts[1]),tempActionCode);
                tempState.actions.add(tempAction);
                RL_LUT.stateTable.put(tempState.key, tempState);
 
            }
            
            in.close();

        }
        catch (FileNotFoundException e)
	{
		System.out.println("File not found!");
	}
	catch (IOException e)
	{
		System.err.println("Problem reading from the file zipped.zip");
	}
	catch (ClassNotFoundException e)
	{
		System.out.println("Class not found! :-(");
		e.printStackTrace();
	}
    }
            
    @Override
    public void onRoundEnded(RoundEndedEvent  event)
    {
        RL_LUT.totalRounds++;
        System.out.println("Number of wall hits: "+ RL_LUT.numberOfWallHits);
        RL_LUT.numberOfWallHits=0;
        if(this.getRoundNum()%20==0 && this.getRoundNum()>0)
        {
            RL_LUT.winsArrayList.add(RL_LUT.numberOfWins);
            RL_LUT.numberOfWins=0;
            
            RL_LUT.totalRewardArrayList.add(RL_LUT.totalReward);
            RL_LUT.totalReward=0;
            
            
//            saveWinsAndTotalReward("round_ended");
//            save_LUT();  

        }
        
    }
    

    
    @Override
    public void onBattleEnded(BattleEndedEvent event) 
    {
//        save_LUT();       
        //for matlab
        saveWinsAndTotalReward(Integer.toString(RL_LUT.totalRounds)+"_"+RL_LUT.epsilon);
    }
 
                          
    private int fixEnergey(double energy) {
        if(energy=3)
            temp=1;
        
        return temp;
        }

    private int fixHeading(double heading) {
        if(heading>0 && heading <90)
            return 0;
        else if (heading>=90 && heading <180)
            return 1;
        else if (heading>=180 && heading <270)
            return 2;
        else 
            return 3;
    }

    private int fixBearing(double bearing) {
        //-180 to 180
        if(bearing>0 && bearing <90)
            return 0;
        else if (bearing>=90 && bearing <180)
            return 1;
        else if (bearing>=-180 && bearing <-90)
            return 2;
        else 
            return 3;
    }
    private String fixPosition(double x, double y)
    {
        int distance= (int) ( Math.floor(x/100) +(8 * Math.floor(y/100) ) );
        String distanceString;
        if(distance<10)
            distanceString="0"+Integer.toString(distance);
        else
            distanceString=Integer.toString(distance);
        return distanceString;
    }
     
public Object readCompressedObject(String filename)
{
	try
	{
		ZipInputStream zipin = new ZipInputStream(new
		FileInputStream(getDataFile(filename + ".zip")));
		zipin.getNextEntry();
		ObjectInputStream in = new ObjectInputStream(zipin);
		Object obj = in.readObject();
		in.close();
		return obj;
	}
	catch (FileNotFoundException e)
	{
		System.out.println("File not found!");
	}
	catch (IOException e)
	{
		System.out.println("I/O Exception");
	}
	catch (ClassNotFoundException e)
	{
		System.out.println("Class not found! :-(");
		e.printStackTrace();
	}
	return null;    //could not get the object
}
 
public void writeObject(Serializable obj, String filename)
{
	try
	{
            File file=getDataFile(filename + ".zip");
		ZipOutputStream zipout = new ZipOutputStream(
                    new RobocodeFileOutputStream(filename));
		zipout.putNextEntry(new ZipEntry(filename));
		ObjectOutputStream out = new ObjectOutputStream(zipout);
		out.writeObject(obj);
		out.flush();
		zipout.closeEntry();
		out.close();
	}
	catch (IOException e)
	{
		System.out.println("Error writing Object:" + e);
	}
}

    private void save_LUT() {
        try
        {
            File f = getDataFile("LUT");
            RobocodeFileOutputStream is = new RobocodeFileOutputStream(f);
            OutputStreamWriter osw = new OutputStreamWriter(is);
            BufferedWriter w = new BufferedWriter(osw);
            
            //zipped
            File file=getDataFile("zipped" + ".zip");
            ZipOutputStream zipout = new ZipOutputStream(
            new RobocodeFileOutputStream(file));
            zipout.putNextEntry(new ZipEntry("zipped"));
            ObjectOutputStream out = new ObjectOutputStream(zipout);
            
            for (int i = 0; i < RL_LUT.robotHeading_levels; i++) {
                for (int j = 0; j < RL_LUT.distanceToEnemy_levels; j++) {
                    for (int k = 0; k < RL_LUT.headingOfEnemy_levels; k++) {
//                        for (int l = 0; l < RL_LUT.bearningToEnemy_levels; l++) {
                            for (int m = 0; m < RL_LUT.robot_coordinates_levels; m++) {
                                
                            //fix m
                            String distanceString;
                            if(m<10)
                                distanceString="0"+Integer.toString(m);
                            else
                                distanceString=Integer.toString(m);
                            
//                            Key key = new Key(new StateRow(i, j, k, l, m));
                                String key = Integer.toString(i)
                                        + Integer.toString(j)
                                        + Integer.toString(k)
//                                        + Integer.toString(l)
                                        + distanceString;
                                
                                if(key.length()<5)
                                {
                                    System.out.println("Key is less than 5 digits");
                                }
                                
                                State state = (State) RL_LUT.stateTable.get(key);
                                for (Action action : state.actions) {
//                                    writeObject((state.key+" "+action.QValue + " " + action.ActionName),Filename);
                                    if(action.QValue!=0.0)
                                    {
                                        DecimalFormat myFormatter = new DecimalFormat("0.000");
                                        String output = myFormatter.format(action.QValue);
                                        w.write(state.key+" "+output+ " "+action.actionCode.ordinal());
                                        (w).newLine();
                                        
                                        //zipped
                                        out.writeObject(state.key+" "+output+" "+ action.actionCode.ordinal());

                                    }     
                                  
                                }
                             

                                 
                            }
                        }
                    }
//                }
            }
                                //zipped
                                out.flush();
                                zipout.closeEntry();
                                out.close();
            
             w.close();
        }
        catch (Exception e)
        {
            System.out.print(e.toString());
        }
    }


public void saveWinsAndTotalReward (String fileName) {
        try {
            //What ever the file path is.
//            File argFile = new File(fileName);
            File argFile = getDataFile(fileName+".wins.txt");
            RobocodeFileOutputStream is = new RobocodeFileOutputStream((argFile));
            OutputStreamWriter osw = new OutputStreamWriter(is);
            Writer w = new BufferedWriter(osw);

           for (int x: RL_LUT.winsArrayList)
           {
               w.write(Integer.toString(x));
               ((BufferedWriter) w).newLine();
           }
            w.close();        
           
             argFile = getDataFile(fileName+".rewards.txt");
             is = new RobocodeFileOutputStream((argFile));
             osw = new OutputStreamWriter(is);
             w = new BufferedWriter(osw);
            for (int y: RL_LUT.totalRewardArrayList)
            {
                w.write(Integer.toString(y));
                ((BufferedWriter) w).newLine();
            }                                
            w.close();
        } catch (IOException e) {
            System.err.println("Problem writing to the file wins.txt");
        }
    }

}