Reasoning Skills: Honing Them With A Chatbot

Using Chatbot Virtual Patients In Physiotherapy Training

ChatGPT has been hogging the limelight with some for its use in learning, while others are hoping to gain some leverage from its use to lighten the effort in developing learning activities. Bruner (2023) has found that ChatGPT is able to produce credible results in deductive reasoning tasks, but not for inductive reasoning tasks.

I had the opportunity to lead a research team consisting of physiotherapists, information and communication, communication skills faculty, and students in developing a chatbot from scratch. Unlike ChatGPT, which provided free-form responses, our chatbot was used to hone physiotherapy students’ clinical questioning and reasoning skills in a structured manner. The chatbot functioned as the virtual patient, who replied to questions asked by the physiotherapy student. This approach was taken as standardized patients were not scalable, costly, and took time to train to act in a realistic manner. It was also felt that it could provide students with more practice in clinical questioning and reasoning skills in a safe "clinical" environment, and in remote settings as seen during pandemics.

The chatbot has been rolled out to students. It was used as a supplementary practice tool with no grades or rewards attached. Here, I would like to share the thought process our team went through to develop the chatbot and perhaps inspire some who might want to develop their own chatbots for their specific teaching and learning purposes.

Our Thought Process In Developing A Chatbot To Hone Reasoning Skills

As this chatbot was originally developed, we had to train the chatbot by keying in a full conversation between a physiotherapist and the chatbot into Google Dialogflow. The process of writing the script was new to the physiotherapy and communication skills team, as we have not heard of terms like "turn", "intention", "utterances", "input tags", and "output tags". We learned that a "turn" referred to an intention while "utterances" referred to questions asked by the physiotherapist. Thus, a turn could be numbered 1.1 with the intention "to greet", and there could be five utterances, with each utterance numbered 1.1, 1.2, 1.3, etc. Furthermore, "input tags" and "output tags" referred to "what questions need to be asked before this question can be asked", and "what are the questions that can be asked after this question is answered", respectively.

There was a debate within the team as to whether to mandate a fixed sequence in the conversation, and we concluded that we would allow students to lead the conversation. We were reminded that our purpose in developing the chatbot was to train students in clinical questioning and reasoning skills. This could be promoted only if we allowed students freedom in asking questions. Students are still required to follow a three-phase structure, starting with an introduction phase, followed by the patient history-taking phase, and finally ending with a goal-setting phase. However, in the patient history-taking phase, students are free to ask relevant questions in any order.

The next question that the team had to consider was whether to provide students with guidelines or hints in this questioning and reasoning process, as weaker students might get lost in the conversation. We then decided to provide hints (in the form of phases following the musculoskeletal flow) for questioning patients.

As there were many phases, we had to consider how to maintain student engagement (a full conversation could last about 20 mins). We then decided that we would implement a scoring system with feedback on utterances they had missed. To avoid students from hitting the feedback button repeatedly throughout their practice session, the feedback and score for each practice session were only provided to students when they closed a session.

The user testing session was very informative for the team. We found that there were repeated keywords in the script that "confused" the chatbot as it did not know whether the pain in the "elbow" or "wrist" was being referred to, thus resulting in a "please repeat your question" reply from the chatbot. We also had to add many utterances from the user testing which were not originally included. This expanded the script and enhanced the accuracy of the chatbot’s reply.

We also included a text input option (besides the default audio input option) as the chatbot had issues recognizing Asian pronunciation sometimes. This allowed students to alternate between text and audio inputs when they faced pronunciation issues which the chatbot had problems recognizing.

To help students track and review the conversation with the chatbot, we also added a dialogue history box that showed a text transcript of the conversation. This allowed students to understand what it was the chatbot had sometimes misinterpreted their voice input to be, and provided a guide for them to improve their pronunciation, or to ask the question in a different way.

Conclusion

In summary, the effort in developing the chatbot gave us insights into how chatbots are trained. Accuracy in chatbot training is of utmost importance and must be continuously updated based on user inputs. With this original version, we also believe that the chatbot can be extended for training students in other domains, such as software engineering and hospitality. With the proliferation of large language model (LLM) based chatbots such as ChatGPT and Google Bard, we also hope to experiment with using this AI chatbot to create simulation patients, and to investigate methods to restrict these bots to only answer questions within a predefined set of scenarios.