Memory And Learning, Part 2

Memory And Learning, Part 2
Summary: Last month, I began a series of articles about knowing and using the constraints of perception and memory. Unlike some other constraints on learning interventions (such as time), there are no ways to get around memory. We either work with it or people learn less well (or not at all). Here is my second article discussing the relationship between memory and learning.

Discussing Memory And Learning: Part 2

For those who read my Part 1 article on the topic of memory and learning, I’ll begin with some retrieval practice. Retrieval practice is a simple but powerful instructional method. It asks that you to retrieve learned information from long-term memory (LTM). Research clearly shows that retrieving information from LTM improves recall of that information. And repeated retrieval helps strengthen recall even more. I’ll discuss this strategy more in a later article.

In the box below are a few questions from last month’s article. Try to retrieve what you remember from last month’s article (and whatever else you know) from your memory. Don’t reread last month’s article before you attempt to dig them out of your memory. I’ll summarize the answers elsewhere in this article.

Retrieval practice:
  1. Why is working memory (WM) so critical to learning?
  2. What happens when we overload WM?
  3. What can learning professionals do to not overload WM?

As I explained last month, Dr. Sweller, an educational psychologist who has extensively researched memory and learning describes perception and memory impacting learning and calls the process human cognitive architecture. In other words, this is how we integrate, process, and use new knowledge. A simplified version of this process is shown in Figure 1. (More detail is in last month’s article.)


One of the critical issues for learning is that working memory is quite limited both in terms of how much it can hold and how long it can hold it. Remember, to understand new information, WM is, at the same time, processing new information coming from the environment and stored information from LTM. We regularly hear that WM can only process 5 +/- 2 chunks of new information at a time, but actually it’s less because it’s also processing information from LTM in order to understand the new information.

The fact that WM can process so little seems like a faulty system until you realize that it’s intentional. Sweller explains that our brain is merely protecting the information in LTM by not allowing rapid or large changes to it. In other words, it’s a brilliant survival mechanism!

More Design Assertions For Helping Working Memory During Instruction

Last month, I described three of Sweller’s specific assertions on how we should assist working memory during instruction: Getting rid of split attention and unnecessary redundancies, and considering element interactivity. This month, I’ll discuss three more and answer a question a reader posed last month that I’m betting several you have (because it has nagged at me, too).

I’m going to use the following example throughout this month’s discussion.

Example: Changes in Medical Coding (ICD-9 to ICD-10)

Let’s image ourselves working as Learning and Development practitioners developing US healthcare billing coding training. Medical coders and other healthcare practitioners review medical records and other statements of medical care and assign the correct codes. These codes are based on anatomy, cause, body location, procedure, diagnoses, disability, device, supplies, equipment, procedure location, and more. The codes are regulated by the Centers of Medicare and Medicaid Services.

Coding accuracy is critical because errors can result in anything from denial of claim to fraud charges. In 2015, a large section of one portion of the codes changed. The change was a large one, causing a need for wide retraining.

This example may not reflect the exact realities of coding training as many people who do this work only work on codes in their medical specialty and there are a bunch of other issues involved that can make this training more complex. So I simplified the example for use in this article.

Now I’ll explain the three more of Sweller’s assertions for assisting working memory during instruction. Afterwards, we’ll finish up with an area of confusion from last month’s assertions.

Expertise Reversal

Expertise reversal is about the reversal of effectiveness of many instructional techniques on people with less expertise in the topic and people with more expertise in the topic. This results from different levels of prior knowledge. It’s based on cognitive load/memory.

To understand it, I need to explain that just because someone has been doing a job a long time does not mean they have a lot of expertise. Most people get to an average level of skill in a job and then stop improving. I explain why this occurs here. Expertise requires far more effort and time than most people are willing to put in.

Good methods for teaching people new to the topic area and people with expertise are different; and if we teach them the same, the other group will suffer. For example, new-to-the-topic participants need to go slower and often need a lot more explanation and guidance. We need to check more often for misunderstandings and provide more practice. Research shows that expertise suffer cognitive load with training methods meant for people with less knowledge because it causes redundancy effects (see below and last month’s article). Experts want to discuss their own projects and problems and get input from other experts. People new to the topic would be lost in an expert-centric learning environment.


How can we deal with expertise reversal with training in training for the coding changes? The first issue is determining who is expert and who is simply proficient. Remember I said earlier that there are usually few experts in most organizations?

The first step is to find the experts and discuss the training. Most experts keep ahead of changes in their field (Experts often stay updated. They tend to be members of trade organizations and get information about changes and updates in their field.). Are they already up-to-date on the changes? If the experts are updated, can they help us find existing training programs and resources to train people who are proficient in the previous version?

Since experts are likely either already trained or working on ways to be trained, they can help us determine which training will work well (if the Learning and Development people are not experts in this area). This is a pragmatic way to deal with specialized training like coding.

Design takeaway: When dealing with highly specialized content, utilize experts in the topic. Experts often have good resources and knowledge that can help with the effort. They may not have training skills but if willing to train, they need to be taught training skills. Far more proficient people often prefer learning from experts as they have very precise questions.

Worked Examples

“Worked exampled” are examples that clearly show all the steps and thinking of how a problem was worked out. Research shows that people who are new to a topic learn well from worked examples because it shows how to think and do real problems. This is the reason that cases are so important in medical and legal study, for example. One of the fastest ways to build accurate schemas is worked problems and simulations of typical problems.

When you are new to a topic, you do not have enough prior knowledge to know how to solve problems. When given problems to solve, people new to the topic throw trial solutions at it to see what works. This is time consuming and frustrating.

I once worked with a radiologist who wanted to learn how to develop numerous radiology examples for medical students so the students could better learn how to read a wide variety of radiographs (the resulting image after x rays). He felt that too many medical students didn’t get enough practice with a wide enough variety of radiographs in their radiology rotation so they didn’t develop the needed proficiency.

The example in the next section shows worked examples.

Design takeaway: When participants are new to a topic, use worked examples that show all the steps and thinking that go into how a problem is worked out. Good worked examples help people who are new to a topic build accurate schema. Use questions (and detailed feedback) to determine if they understand the examples.

Guidance Fading

Sweller also recommends that as people gain more knowledge as they are learning that we slowly remove guidance. When people have very low levels of expertise, we should provide a great deal of guidance. But we can slowly fade the guidance as expertise increases. Eventually, we remove all guidance. He says that one of the positives of technology is that we can program the guidance to fade out as proficiency increases.


The example below shows some ideas for using worked problems and guidance fading in a course for proficient coders that teaches them how to go from the ICD-9 codes they already know how to use to the new ICD-10 codes.

ICD-9 ⇔ ICD-10
ICD-9 code: 845.09

Other sprains and strains of ankle

ICD-10 code: S93.401A

Sprain of unspecified ligament of right ankle, initial encounter

The course would use the ICD-10-CM Coding Manual, the ICD-9 to ICD-10 Crosswalk, and The General Equivalence Mappings tools developed to help people learn and translate between the two coding systems. The training process might use the following blended approach.

Segment 1: Asynchronous Module

1. [Video] Instructor shows why and how the systems are different.

2. [Video] Instructor uses resources to code primary and secondary diagnoses in ICD-10.

3. Twelve worked examples.

4. Quiz on worked examples (pass to attend Segment 2).

Segment 2: Classroom Or Synchronous Virtual Classroom Session

5. Q&A with expert about worked examples and any other questions.

6. Participants use resources to find diagnoses using resources, debrief.

Segment 3: Asynchronous, Self-Paced, With Discussion Boards For Answering Questions

7. Scenarios for coding in ICD-10 with hint button (Some guidance).

8. Scenarios for coding in ICD-10 (No guidance).

9. Test (Anyone can test-out if they have already trained themselves).

This methodology would only work for new coders, because it is built with the understanding that people already can code in ICD-9. Other pieces may be added after full needs analysis.

Design takeaway: Participants new to a topic require guidance to build an accurate schema not build in misunderstandings, which are difficult to undo later. As they gain proficiency, guidance can and should be faded out. Too much guidance with experts can result in more cognitive load.

Things That Don’t Make Sense Sometimes Do

Sometimes learning sciences and common sense don’t seem to go together. Learning styles, for example, seems like common sense but research has debunked it. And this shouldn’t come as such a big surprise. After all, what we prefer to eat often isn’t what is best for us either.

Last month’s, someone asked a question about Sweller’s proposition that we not use redundant materials in instruction. Redundancy problems occur when more information results in less learning. The rationale is that redundant materials make us process more sources of information and that can add confusion and mental load.

The reader specifically asked about using illustrations with spoken words. Mayer has written an entire book about learning from multimedia (combinations of text in words or audio, moving or static pictures, and other media) and has performed a great deal of research in this area. Here are three of his research-based principles that apply in this situation:

  • Multimedia principle.
    People learn more from words and pictures than from words alone.
  • Coherence principle.
    People learn better when unnecessary words, pictures and sounds are excluded.
  • Modality principle.
    We should generally present words as narration rather than onscreen text.

So, what does this mean for the reader’s question? When an explanation is needed for a graphic, it should be provided (and it should be with the graphic). If the graphic is self-explanatory, or can be easily explained in a caption, nothing else is needed.

In 2008, in a research paper by Mayer and Johnson, there were some revisions to his redundancy principle. They found that redundancy could assist learning when very short key words were placed near relevant parts of the image (to guide focus). They also said that they expected other exceptions to apply, such as when explanations are needed (such as when instruction is not in the participant’s native language). Mayer, Heiser, & Lonn’s paper reminds us that adding interesting but not necessary details to any presentation can detract from remembering the important details we want them to remember.

My answers to the retrieval practice questions are:

  1. WM is a critical to learning because to remember any information, it must be processed by WM and WM is a very limited resource.
  2. When we overload WM, little or no learning happens. People are literally “overloaded”.
  3. We can use Sweller’s six methods for working with WM in mind.

I know this has been a long article so if you made it to the end, congratulations, amazing learning geek! I’d love to know if you have questions and how you intend to put these ideas into action. You can also post these on Twitter and we can discuss there (@pattishank and @elearnindustry).



  • Liu, T. C., Lin, Y. C., Tsai, M. J. & Paas, F. (2011). Split-attention and redundancy effects on mobile learning in physical environments. Computers and education, 56 (2), 172-181.
  • Kalyuga, S. (2007). Expertise reversal effect and its implications for learner-tailored instruction. Educational Psychology Review 19. 509–539.
  • Mayer, R. E., Heiser, J., & Lonn, S. (2001). Cognitive Constraints on Multimedia Learning: When Presenting More Material Results in Less Understanding. Journal of Educational Psychology, 93, 187-198.
  • Mayer, R. E., & Johnson, C. I. (2008). Revising the Redundancy Principle in Multimedia Learning. Journal of Educational Psychology, 100, 380-386.
  • Mayer, R. E. (2009). Multimedia learning: Second edition. New York, NY: Cambridge University Press.
  • Sweller, J., Ayres, P. L., Kalyuga, S. & Chandler, P. A. (2003). The expertise reversal effect. Educational Psychologist, 38(1), 23-31.
  • Sweller, J. (2005). Implications of cognitive load theory for multimedia learning. In R. E. Mayer (Ed.), The Cambridge Handbook of Multimedia Learning (pp. 19-30). New York, NY: Cambridge University Press.
  • Sweller, J. (2008). Human Cognitive Architecture. In J. M. Spector, M. D. Merrill, J. V. Merrienboer, & M.P. Driscoll (Eds.), Handbook of Research on Educational Communications and Technology 3rd ed., 369-381. New York, NY: Taylor & Francis Group.