# Intelligent Systems - Summary Notes [TOC] ## 1. Intelligence vs Autonomy What is an “Intelligent System”? ### What is Intelligence? -No common definition agreed, moving target ideas for defining it: - Memory - Decision-Making - Creativity - Awareness - Common Sense - Self-reflection - Ability to learn - Handle new situations categories of defintions for it: - social - emotional - senso-motoric: eg great tennis player can do cool tricks with a tennis ball - mental => human intelligence as starting point #### What is Turing Defintion of Intelligence? If we can not **distinguish** what a ****computer**** does from what an **human** does then the computer is intelligent. => we are still using this def in the tech fields. E.G. AlphaGO Turing++ weak AI (mimic of intelligence) >< strong AI (thinks) ### What are the different AI perspectives? #### Knowledge - based AI Intelligence might be able to remember many things and process on them to output knowledge. + manipulate symbols (understand a lngg) #### Behavior-based AI Intelligence from elementary behaviors. EG navigate a maze, grasping an object. **Senso-motoric** intelligence Physical grounding hypothesis. similar grounding #### Connectionism(neural network) Intelligence is in the **connection** between the small units (neurons) ->ANN #### Distributed AI **Interaction** = Intelligence. Interaction at signal lvl is connectionism >< Interation a kwlg lvl (**higher** lvl) = Distributed AI (+)**robustness**: agents can handle each other's tasks ## 2. Expert Systems How to construct an Intelligent System? ### What are they? _DEF:_ a computer system that emulates the **decision-making** ablity of an human expert. EG Build a modecular structure based on spectral graph. We build it by extracting knowledge from expert in order to implement it in a system = Knowledge Engineering, close to Expert System It is popular because proven to work and provable (mathematically) >< machine learning -> trustworthy #### Comparison with other systems Expert Systems = ancestor of Intelligent Systems Intelligent systems: +autonomous, +interaction with the world **Decision Support System**. goal is to provide **info**, **tools** to help the user to make a decision. >< E.S. suggests a decision to the user, data reasoning ### How they work? #### Components - non-expert **user** - **inference engine** used to handles fact, handle queries from the users. Fixed part. - **rules** how the environment works if-then statements. Gathered from experts - specific **facts** to a specific query data about specific cases, Gathered from experts by **law**, E.S. are obliged to provide an argument to their recommendation/decision Interfacing must be clear --- #### Probabilistic reasoning ##### Bayes Rules for classification **Bayesian inference**: the main type of reasoning technique applied to probabilistic E.S. (vs non-probabilistic) **Maginalizing** = probility of one event, in set of value to happen **Maximising** = finding the most likely event to happen (most probable) ->Bayes Rules uses this (Given attribute values, what is most probable value of target/class variable?) --- Naive Bayes = all events are independents Not making this assumption can be very costy -> midway solutions for joint probablities * Factor Graph: Some events are dependent (those who are obvious to be dependent), some are not $\theta$s = potential function $\theta(A,B)$ = indenpent contribution **factor**, called "cliques" , A,B = rnd variable * Markov network (undirected) we have **cliques** of events which are dependent Z = normalisation constant, because all probs sums to >1 * Bayesian network (directed) arrows are used to define how kind of **causality** we have see Reasoning Techniques course --- Inference there elements wich are - evidence (observed) - non-observed - elements which we want to make a prediction Variable elimination eliminate vars, using the evidences SotA: Adding prob to logic (state of the art) --- ProbLog $p: q{t_1,..,t_n}:-$ body restricting logical variable - p is a probability value (0
easier because we don t have to deal with joint probs
### ES Advantages (vs Human)
- **consistence** (always same decisions)
- **memory**
- **logic** (no sentimental/ biased decisions)
- **reproductible** (does not get tired)
### ES Disadvantages
- Does not have **common sense** -> need to program it and can be hard
- Lack of **Creativity**
- **Maintenance** of knowledge is manual
- Not **adaptable** to new purposes, new conditions
## 3. Agent Oriented Design
## 3.1 The Agent Concept
_DEF:_ Entity **situtated** in some environment and that is capable of **flexible**, **autonomous** activity
Minimally Intelligent Agent
- **pro-active** (taking initative)
- **reactive** (preceive and react to the env.)
- **social ability** (interacting with humans, capable of cooperating, negotiating)
### Objects vs Agents
both have
- **indentiy** (id)
- **state** (what)
- **passive** behavior (action done if invoked)
agent ++
- **active** behavior, more complex, threaded
- must be defined: when, why, with whom, whether at all
### Expert Systems vs Agents
e.s.
- is not **embodied**
- don t interact with **environment**
- is more **passive** (only answers questions)
### Environment types
* (in)accessible -> lvl of info that is accessible
* (non-)deterministic -> accurate, certain of action effect
* (non-)episodic -> repeated cyle, reseting the env
* (dynamic) static -> agent is the only cause for change, env is not moving
* (cont) discrete -> fixed nb of actions
### Architectures types
-Those architectures can be combined-
- lobic based
-> (symbolic) knw based AI view //intelligence kn (e.g. Planning agents)
components:
- **theory of agency**: how agent behave given a context (fixed)
- **belief database**: info on enviromnent
- **actions**: chose by theory of agency based on belief database
(-) how to **translate** perceptions into symbolic facts
(-) reasoning takes **time**
(-) theory of agency for a **complex** environment
(+) transaltion is becoming efficient (DL for vision)
(+) hardware is becoming fast (reasoning takes less time)
(+) logical policies can be learned
- **reactive agents**
-> behavioral, situated //intelligence from interaction with environment
situation(state) -> action
create it by:
1) defining a set of **behaviors** accomplishing some tasks
2) subsumption **hierarchy**
(+) simple + fast
(+) robust against failure
(+) elegant
(+) trackable
(-) only local info is used (**stateless**)
(-) does not know of past/**future + space**
(-) hard to **predict** if the system will work, be intelligent
(-) dynamics of **interaction** can be hard to model into a subsumption hierarchy
- belief/desire/intention (**BDI**) agents
**deliberation**: what goals we (currently) want to achieve?
**mean-ends reasoning**: what to do to achieve them?
**Intentions**:
model and decide **short-term goals**, used to focus attention (on a small goal rather then the entire goal)
agent must think:
are they still usefull?, reconsider the intentions?
-> **bold** (don t reconsider) vs **cautious** (constantly reconsider)
influenced by the rate changing of the world
Components:
1) get **believes** (-> about environment ) and update from the previous ones
2) based on it, generate **all possible action** options (=**desires**)
3) **filter** -> get the current intention
4) compute the **return action** to perform based on it
- layered
**Horizontal**:
input -> layer n -> action n } filter/hierarchy to chose which one to take (might be complex)
**Vertical**:
one pass //neural net
two pass (go back and forth in the layers). Each layer = a lvl of abstraction
## 4. Genetic Algorithms
How to generate an Intelligent System?
### Evolutionary Algos
**What**?
optimize a problem using a population influanced by the dynamics of genetic evolution
carac:
- any-time (stop the process)
- any version of the algo could fit the problem)
- iterative
- stochastic
**How?**
* initial population
needs to be diverse and large (to have enough different genes).
**Genotype** = genetics of the population, individuals. **Phenotype** = optimal solution for the problem? It is tested for its performences.
* selection
**fitness** evaluation if expensive -> limited in the nb of indivs and generations. One solution: pick up indivual representative of each type
GA Fitness function requirement:
it must be able to evaluated partial solutions. win/loose fitness doesn t work. -> solve by finess = %win
types:
- roulette wheel p=f/sum(f) + tuning using a different distribution
- tournaments + tuning (add a probability of wining)
* reproduce ->new popu
crossovers single/double_point/uniform
mutation bit flipping/ real value mutation
---
### Schema theory - Proof of why evo algo works?
**schema**: a subset to the total pop
notation:
binary nb = fixed pos
(*) = random
-> e.g.: $\space *1*1$
**defining length** = longest distance between 2 fixed pos
**order** = #fixed pos
$(1-p_m)^{o(s)}$, $p_m$ is the mutation prob of a single bit,, o(s) = order
=> prob of schemata to survive a mutation
$1-p_c \frac{d(s)}{n-1}$, $p_c$ the crossover probability, d(s) = defining length, n-1 = nb of places where you can cut a genotype
=> prob to survive a crossover
**Schema Theorem**:
a schemata with
- above average fitness (-> higher prob for survive)
- short defining lengths (-> less likely destroyed by cross over)
- low order (-> less likely d. by mutation)
-> are more likely to survive
**Building block theorem**:
g.a. are created blocks (of the schema theorem) of high fitness and are recombining them into a better solution
⚠ assumes that a problem is divisible into sub-p., which is not always the possible for complex problems
-> genotype design (guidlines)
- put related genes **close** to each other (so that they together form a b.b.)
- make each gene as **independent** as possible + cooperative coevolution : those independent (group of) genes have indenpent influance on the global fitness (-> make a GA for each group)
-> crossover design
- set a low prob to **destroy** the schemas (cut in a b.b.)
//avoid destroying (good) schemas
Elitism: copy the best solutions, and reproduct the rest of the population (mutation + crossover)
if all genes are inpendent, we can look indenpendently at each of their influance on the global fitness, to get an optimised phenotype. Do this by making a gene population
-> **SNES** - Separable Natural Evolution Strategy
separable-> each gene can be inpendently separated
natural-> natural(gaussian) distribution of each gene
---
#### SNES
no more mutation, reproduction, selection
just update the means and st.d. according to the **weighted sum** of genomes in the popl **scaled** by **utility**
//gradient search for each gene (gene//dimension)
**premature convergence**
-> we can get to a niche (sub-optimal) solution, if genotypes are not diverse enough
mutation is the only way to get out of it
-> fix by making multiple runs, or tuning for getting slower convergence
-> OR fix by adding a **niche penalty** (when a group radius similar indiv becomes too big, add a fitness penalty to decrease its size)
## 5. Multi Agent Coordination
How to construct an Intelligent System?
**coordination**: managning **dependencies** btween **activities**
---
### **cooperative system**: different agent's have the same **goal**
charac of the cooperative systems, (how easy/ tricky is it to cooperate) are evaluated using
- the environment (predictable? diverse?)
- cooperation (communication?)
- cooperation entities (number? homogeneity?)
---
### **coordination theory**
- **goals** indentifications
- **activities** --> to achiveve goal
- **actors** -> who?which tasks?
- **independencies**
- prerequisite: an activity must be done before another
- shared resource: multiple activities need an unique tool
- simultaneity: 2+ activities must happen at the same time
coordination is a must
**centralization** is **bad** because a single unit is
- limited processing power (can not handle exponentially complex problems)
- limited memory storage
- complex to design (more than a smaller subunit)
### Coordination types
---
* Client - Server
- client **asks** questions, task to do
- server **handles** the client requests
(an entity can be both c. and s).
(+) simple syncro
(-) servers can be killed by too many requests
-> multiple servers? no really because it requires syncronisation (increases complexity)
* Task and results sharing
-> splitting task to solve it more efficiently
why? each sub-problem can solved with less knowledge
(+) each sub problem requires less knowledge, is less complex
(-) each task division requires design and syncro effort
* Blackboard
every agent share a common **shared knowledge space**. Same **communication format**.
But agents are acting inpendently (they don t have to know about each other, as long as they have the format)
**robots** have troubles because it is diffuclt to transfer a program from one another (because each robot has its own specifics)
-> using blackboards where there are differents **channels** for different **topic**. Each robot works in its relevant channels
* Contract Net Protocol = negotiation protocol
1. manager, set the tasks
2. contractor, can do the task, asks for it
3. manager selects the optimal contractor
(+)flexible, distributed, dynamic
(-) design issues, when the agent would ask for task sharing, when do the task alone, how manager selects the agents? -> think of protocols
client server (erery roles are set) < c.n.p < blackboard (everything is free)
* Functionally Accurate Cooperation (FA/C)
**functionally correct**, each agent communicates only 100% sure info. (it decides locally what makes sense).
**cooperation** merge of local correctness
->design issue
who to combine inconsistencies (ones say false, the other true)?
---
### Advanced models
* Auctioning:
types:
- **english auction**: first-price open-cry, highest bidder wins
- **first-price sealed-bid**: one round, hidden price
- **vickrey second-price sealed-bid**: same as above but winner pays the second higher price
- **dutch **(descending): stops when one says "i ll take it"
values:
- **private** value auction: the value is set by the bidder's preferences (art)
- **common-value** auction: max price is common sense
- **correlated** value auction: top two are involved
bidder types:
- **risk-averse bidder**: willing to pay more because they want it. Best for **dutch** and **f-p s-b** (higher prices)
- **risk-averse auctioneer**: willing to sell a bit lower than expected: better for vickrey (higher sold price/ english)
- **risk-neutral**: price of auction doesn t matter, exp sold price will be the same
* Voting
-> used to solve the goal coordinating
**irrelevant alternative**:
there is a voting, A wins over B
+option C
now B wins over A because some voters voted C
**binary protocol**:
all decision are made for 2 options
could solve i.a. but the **order** of decisions biases the final result
**borda protocol**:
all voters **rank** the options
->still not solving i.a (no solution today)
* (Negotiation)
introduces more action then sub, set a task and best agent takes it.
* Joint Planning
move from several indivuals plans to a single coordinated one.
**Design issues**: nobody ever gets stuck?, is it optimal?
single planer with multiple executer (boss-employees)/
multiple planers with single executer (different requests, merge them)/
multiple planers with multiple executers
relationship among plans
(+) equality (plans are equivalent)
(+) subsumption (A B easier)
(-) resource conflict
(-) incompatibility (exe A -> B is not feasible)
* Partial Global Planning
it is diffcult the design the perfect joint planning, taking everything into account.
// FA/C version of Joint Planning:
locally decided what must be done in cooperation
components:
- objective: goal
- plan-activity map: what are the parties doing
- solution-construction-graph: about interactions, commu
- status: project state info
(-) some parts are done locally -> not every thing is visible (eg resource allocation, state incompatiblity)
* Commitments
all the coordination stuff can be replaced by commitments and conventions
= agreement to do smth (eg ny resolutions, promise)
precommitment = c. involved in the future
leveled c. = can cancel, but under penalty
operation types:
- create
- discharge: c. is done
- cancel
- release: agreed to remove
- delegate: he will do it for you
- assign: do it for him
* Conventions
= description of how you can reconsider your commitments
= rules
challenges
they can decrease in reliability (commitments might not been taken or released)
-> find balance between too tought/light conventions
social c.
inform other agents about our c.?
-> centrality hypothesis of c. & c.
coordination = commitments + conventions (& social)
## 6. Multi Agent Learning (1/2)
How to generate an Intelligent System?
-> use **learning** because we can not design every agent's behavior (in Multi agent systems)
RL:
* **Policy** $\pi$ = mapping states to actions. Can be non-deterimistic. (we get a probability to do each action)
* **Exploration** = explore the world to generate xp = gather new info vs **Expoitation** = exploiting what have already been learned
* **Value function** V (for a state), Q(for a state+ action) = immediate rewards are translated into them (to get an)
* **Reward** r = direct feedback to tell if the agent is doing good. Can be non-deterimistic
* **$\gamma$** = optimal policy
* **disccount factor** $\gamma$?= how much a reward loses its value over time
### Single agent RL:
there is
* set of **states**
* set of **actions**
* **transition** f (if apply a in s, which s do you get in?). Can be non-deterimistic
* **reward** f
-> find
* $\pi$ s.t. $V^{\pi}(s_t) = \sum \gamma^i r_{t+i}$
where gamma = discount factor (r goes down as we predict future states)
-> assumes that probability distributions are stationary: consquences of agents' actions (probabilities) do not change over time
**Bellman equation**: (important)
$V_{i+1}(s) = r(s,\pi(s)) + \gamma V_i(\delta(s,\pi (s)))$,
$\delta$ transition function
Value of a state = **reward** from this **current** state + value from the next **future** states (given by $\pi$)
Policy iteration: iterate until the polciy converges, (then you the optimal p.)
each iteration: take the max of reward+future states
-> rewards and next state values might be unknown
-> thus we define a Model-Free
using (state-action) Q-values
(+) transition and reward function are not need the find the optimal policy, just q-values table
wait long enough until (max) q-values converges
Difference V and Q:
**V**: policy $\pi$ which tells do that action(s) **then** compute the discounted reward (=V)
**Q** =(s+a). compute all of them for s **then** update $\pi$ according to it. V = max(Q)
### In none-deterministic env.
at each iteration, we add to the current estimate (Q-value)
$Q(s,a)+ = \alpha.[r+\gamma max_a Q(s',a') - Q(s-a)]$
$\alpha$ = learning rate param {0,1}: small = low convergence
(..-..)first part is what the agent observes, second what he believed he would observe. Together, it is the temporal difference error.
### Exploit vs Explor
- Epsilon-Greedy.
if x< epsilon -> explore
else exploit
epsilon decay over time (less exploration over time)
- Boltzman Exploration
temperature//epsilon?
probability of each action to be played is given by then q-values' size
### Multi-agents:
-> must take into account other agents' actions in RL
difficulties:
- exponential growth of states (combination of all agents')
- rewards are shared or depends on the other agents
try shared rewards based on agents' contribution to win/loose
in classical RL-> find which action led to a reward?
in MAL-> ++which action of which agents led to a reward?
when competing agents, they need to *NOT* have a deterministic policy (e.g. always same moves in chess is easy to beat)
->Best response policy
compute the best (optimal) response to other agents strategies.
->If all agents do best response, -> nash equilibrium (there can be multiple ones)
**nash equilibrium** might not be optimal
because each agent are think locally. Pareto optimality two optimals: one where one agent is doing good, the other is doing good.
### Non-stationary env.
actions rewards are determined by the actions of all agents
agents are learning simultaneously
-> best policy is a moving target because agents are adapting to the other agents' strategies
**normal form game** = e.g. prisoner dilemma
can be
- (a)symmetric (prisoner/ battle of sexes) rewards are the same relative to their action
- zero-sum game (HT)
### Learning Automata
carac:
- policy iteration: adapt its policy so that it converges to an optimal
- works in stateless env. if not -> make one LA per state
- finite action set (originally)
- uses immediate r to update action prob
technically:
vector containing each action
at each iteration, increase the value of the action that got good rewards, decrease the other actions values
$\alpha$ $\beta$ the learning params. reward, and penalty param
If they are the same, we have linear reward penalty
If $\beta = 0$, linear reward inaction
If $\alpha >> \beta$, linear-$\epsilon$ penalty
If $\alpha =1$ & $\beta=0$, cross learning (only rewards are learned, actions that are taken become even more likely to be retaken, all actions take were not taken get penalized)
rewards:
->r is $\in {0,1}$
in env. state based r is average of rewards since last time this state was visited
### Regret Minimization
carac:
- stateless
- after have taken the action, we would know what best reward could have been (e.g. after taking an exam). Instead of r -> best estimated r - r got. ->try to minize the losses
### Joint Learning
Naive - do not take other agents into account?
-> might work but nahh
### Joint action Learning
1) try to model oppenents/team mem's action proba, by making frequency count
2) learn Q-values for joint actions
-> expected V(a) = Sum Q-vals per each expected other agent actions* the prob that it do it
JL
(+) can create cooperation
(-) have to store many probs
(-+) assumes that observing all agents' actions is possible
-> if can not, have to guess on the other agents' policies
## 6. Multi Agent Learning (2/2)
there is a need to compute good Q-values which mimic immediate rewards
### Value iteration
Q(s,a,o) where o is other agent
V(s) is max min $\pi$*Q. Min because opponent will choose to minimize our reward
expected payoff of a game = sum prob action agent1.immediate r.action agent2
-> use this as immitation for immediate reward
-> used for MiniMax Qlearing
### MiniMax Qlearning
initialisation:
initialise all q vals at 1
initialise all state action probs uniformly
choose an action:
using epsilon greedy
choose action a according to current policy
learning:
- get immediate r from a+o (opponent's action)
- update Q val based on the immediate got (make a weighted avg of new/old Q)
- update the policy (change the values of the states) based on maxmin action
properties:
- designed for **0 sum** game
- designed to take the **safest** strategy (against worst opp.)
- **linear solving** could make the algo slow
### Nash QLearning
properties:
- handle **General sum games** (zero sum++ -> improvement on Minimax)
- Nash(Optimal) Q-vals = Q-vals of a N.Eq.
they are computed using every agents' qvalues (->assuming all agents are also picking NE actions)
- agents have to be able to observe other agents and their respective r (or estimate them)
- computing a NE /agents /each turn is heavy computations (> linear program of minimax)
- low convergence guarantees (long restrictions)
#### Nash Q-values
defined on