type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
01-Introduction
This brief blog post covers key concepts in Reinforcement Learning. Understanding these fundamentals is essential for mastering the field. Some of the notes are from Reinforcement Learning: An Overview
02-Basic Components of RL Systems
Let’s get started:
- Agents: The agent is the central component of an RL system. It can be thought of as the learner or the decision-maker. The agent interacts with the environment, takes actions and receives feedback in the form of rewards or penalities.
- Environment: The environment encompasses everything outside the agent. It represents the external system or situations with which agent interacts. The environment responds to the agent’s actions and provides feedback in the form of rewards or penalities.
- State(S): The state describes the current situation or condition of the environment. It provides the agent with the information about the environment that is relevant for decision-making.
- Action: An action is a move or devision made by the agent that affects the environment. The agent selects actions based on its current state and its policy.
- Reward(R): A reward is a feedback signal from the environment that indicates the goodness of an action. Rewards can be positive, encouraging the agent to repeat the action or negative. discouraging the action.
- Return(G): The return is the cumulative sum of rewards that agent expects to receive starting from a particualr state and following a certain policy. It’s a measure of how much total reward the agent will get over the course of an episode.
- Policy( ): The policy defines the agent’s strategy for selecting actions in different states. It can be deterministic, always choosing the same action for a given state, or stochastic, selecting actions probability. In a stochastic policy, the probabilities of taking each action in a given state are specified by the policy.
- Value Function(V or Q): The value function estimates the long-term value of being in a particular state or taking a specific action. It helps the agent assess the potential future rewards that can be obtained from a given state or action.
03-Problem Definition
The agent's goal is to select a policy that maximizes the expected sum of rewards. In mathematical terms, this can be expressed as follows:
where is the agent’s initial state, and is the reward function that agent uses to measure the value of performing an action in a given state. is the value function for policy evaluated at .
Now, let's examine the term :
where is the environment’s distribution over observations(which is usually unknown).
Let’s interpret this formula (1.1) now:
- This equation explain how the probability of each trajectory is calculated. This breaks down the joint probability into factors, using a chain rule of probability.
- The probability of a specific trajectory depends on the policy, the environment’s dynamics and how both are combined.
- : The probability of the agent taking action in state according to the current policy
- : The probability of the environment returning an observation after the agent takes action , given the history of past observations where are all previous observations.
- : This is a Kronecker delta function ( equal to 1 if the equality is true and 0 otherwise). It ensures tht the next state is equal to the deterministic result of the environment’ s state transition function , given the current state, action, and observation. This enforces the correct transition of the environment if you weree to consider a stochastic environment, there would be no delta, but the transition from state and would be probablisitic, where the distribution would be learned by the agent.
- These equations state that a trajectory’s probability is a sequence of probabilities:
- The action probabilities based on policy
- The environment’s transition and observation probabilities
- Delta Function for transition
So, now, we can define the optimal policy as
04-Conclusion
In this blog post, we've covered the fundamental concepts of Reinforcement Learning, including the key components like agents, environments, states, actions, and rewards. We've also explored the mathematical framework that defines the RL problem, particularly focusing on policy optimization and value functions. Understanding these basic concepts provides a solid foundation for diving deeper into advanced RL topics and applications.
- Author:Chengsheng Deng
- URL:https://chengshengddeng.com/article/basic-concept-rl
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts