Dec 13, Notes on Basic Concepts about Reinforcement Learning

type

status

date

slug

summary

01-Introduction

This brief blog post covers key concepts in Reinforcement Learning. Understanding these fundamentals is essential for mastering the field. Some of the notes are from Reinforcement Learning: An Overview

02-Basic Components of RL Systems

Let’s get started:

Agents: The agent is the central component of an RL system. It can be thought of as the learner or the decision-maker. The agent interacts with the environment, takes actions and receives feedback in the form of rewards or penalities.

Environment: The environment encompasses everything outside the agent. It represents the external system or situations with which agent interacts. The environment responds to the agent’s actions and provides feedback in the form of rewards or penalities.

State(S): The state describes the current situation or condition of the environment. It provides the agent with the information about the environment that is relevant for decision-making.

Action: An action is a move or devision made by the agent that affects the environment. The agent selects actions based on its current state and its policy.

Reward(R): A reward is a feedback signal from the environment that indicates the goodness of an action. Rewards can be positive, encouraging the agent to repeat the action or negative. discouraging the action.

Return(G): The return is the cumulative sum of rewards that agent expects to receive starting from a particualr state and following a certain policy. It’s a measure of how much total reward the agent will get over the course of an episode.

Policy( ): The policy defines the agent’s strategy for selecting actions in different states. It can be deterministic, always choosing the same action for a given state, or stochastic, selecting actions probability. In a stochastic policy, the probabilities of taking each action in a given state are specified by the policy.

Value Function(V or Q): The value function estimates the long-term value of being in a particular state or taking a specific action. It helps the agent assess the potential future rewards that can be obtained from a given state or action.

03-Problem Definition

The agent's goal is to select a policy that maximizes the expected sum of rewards. In mathematical terms, this can be expressed as follows:

where is the agent’s initial state, and is the reward function that agent uses to measure the value of performing an action in a given state. is the value function for policy evaluated at .

Now, let's examine the term :

where is the environment’s distribution over observations(which is usually unknown).

Let’s interpret this formula (1.1) now:

This equation explain how the probability of each trajectory is calculated. This breaks down the joint probability into factors, using a chain rule of probability.

The probability of a specific trajectory depends on the policy, the environment’s dynamics and how both are combined.

: The probability of the agent taking action in state according to the current policy
: The probability of the environment returning an observation after the agent takes action , given the history of past observations where are all previous observations.
: This is a Kronecker delta function ( equal to 1 if the equality is true and 0 otherwise). It ensures tht the next state is equal to the deterministic result of the environment’ s state transition function , given the current state, action, and observation. This enforces the correct transition of the environment if you weree to consider a stochastic environment, there would be no delta, but the transition from state and would be probablisitic, where the distribution would be learned by the agent.
These equations state that a trajectory’s probability is a sequence of probabilities:

The action probabilities based on policy
The environment’s transition and observation probabilities
Delta Function for transition

So, now, we can define the optimal policy as

04-Conclusion

In this blog post, we've covered the fundamental concepts of Reinforcement Learning, including the key components like agents, environments, states, actions, and rewards. We've also explored the mathematical framework that defines the RL problem, particularly focusing on policy optimization and value functions. Understanding these basic concepts provides a solid foundation for diving deeper into advanced RL topics and applications.