This hypothesis suggests that striatal activity
tracks action values in the context of learning. In the present study, we trained animals to carry out a sequential decision making task click here under two conditions (Figure 1). In both conditions, the animals had to determine whether there were more red or more blue pixels in a centrally presented cue and make a saccade to the peripheral target that matched the majority pixel color. In the first condition, the correct spatial sequence of saccades changed every trial, so the only information available about saccade direction was in the central fixation cue. In the second condition, the correct sequence of decisions remained fixed for blocks of eight correct trials. In this case, the animal could use the information about which spatial sequence had been correct in previous trials to improve its performance. While animals carried out this task, we recorded Temozolomide simultaneously from lateral-prefrontal cortex and the dorsal striatum (Figure 1E). This allowed us to address three hypotheses set forth above. Specifically, we examined action selection, reinforcement learning, and the trade-off between attention-demanding and automatic behaviors in the lateral prefrontal-dorsal striatal circuit. We found that in the random condition, the fraction of correct decisions improved consistently with increasing color bias (Figure 2A) as
the difficulty of determining the majority color of the stimulus decreased. In the fixed condition performance was, on average, consistently better
than performance in the random condition, at each color bias (Figure 2A). Furthermore, this improvement developed across trials after a sequence switch. Performance on the first trial after a switch to a new sequence (Figure 2A, Fixed 1) was worse than the corresponding performance in the random condition. This reflected the animal’s reliance on their memory of which movements had been correct in the previous trial. Their performance quickly improved across trials, however, until reaching an asymptote at almost 90% correct by the fourth trial (Figure 2A, Fixed 4), even when the color bias was equivocal, at 50%. In this case there was no information in the stimulus and the animal had to guess the correct saccade direction, in the random condition. Examination of the performance in just the 50% color bias condition Parvulin as a function of trials following a sequence switch showed that performance improved until about trial 4, after which it remained consistent for the rest of the block (Figure 2B). In the 50% condition, the animal was forced to use information from previous trials to make a correct decision. This shows that the animals were able to use feedback from previous trials to improve their performance. This was further reflected in the reaction times. In the fixed condition, the animals would be able to use memory of which sequence had been correct in the previous trials to preplan and execute their decision more quickly.