InstruMentAR: Auto-Generation of Augmented Reality Tutorials for Operating Digital Instruments Through Recording Embodied Demonstration (CHI 2023)
InstruMentAR enables seamless creation of augmented reality tutorials by recording real-world demonstrations. The system integrates hand gestures and pressure sensor data to automatically detect each operation step on physical instruments and then generates corresponding AR cues. Text instructions can be authored through voice input, and the screen can be automatically captured. For learners, InstruMentAR provides real-time haptic feedback to reinforce understanding. Our user study shows that InstruMentAR simplifies tutorial authoring and enhances learning effectiveness compared to traditional methods.
Background: Traditional way to learn device operation
Traditionally, image or video tutorials are created to guide learners to operate devices. The learner needs to hold the tutorial, memorize the instruction, then turn to the physical interface and perform the operation. The procedure exerts a lot of cognitive load.
Background: AR Tutorial
AR tutorials have emerged as an alternative, where visual guidance is displayed directly on the associated object, minimizing context switching and reducing the cognitive load.
The Problem
Creating an AR tutorial is a complex and time-intensive process that demands both programming skills and domain expertise. This high barrier of entry limits the widespread adoption and scalability of AR tutorials, despite their strong potential for improving real-world training and instruction.
What if we can generate the AR tutorial automatically while operating a device?
Formative Study
We began by identifying the most common physical interfaces found on everyday devices: buttons, knobs, sliders, switches, and touch screens.
Through a questionnaire study, we analyzed how users naturally interact with each type and summarized the most frequent manipulation patterns. These findings guided our design of the operation-recognition algorithm.
HArdware design
The wearable prototype consists of three main components—a wristband and two finger caps. This minimal setup preserves the natural appearance of the hand, ensuring compatibility with existing hand-tracking algorithms.
The finger caps, equipped with thin-film pressure sensors, are worn on the thumb and index finger. The pressure readings inform the system of the exact moment when the user interacts with the interface. The wristband houses the microprocessor and a haptic feedback module that provides immediate tactile feedback during tutorial playback.
The finger caps, equipped with thin-film pressure sensors, are worn on the thumb and index finger. The pressure readings inform the system of the exact moment when the user interacts with the interface. The wristband houses the microprocessor and a haptic feedback module that provides immediate tactile feedback during tutorial playback.
Operation Recognition: the decision Tree
By combining hand-tracking data with pressure sensor input, we developed a decision tree to classify common interaction types. The system first distinguishes gestures based on which fingers are active (thumb, index, or both) and then filters them through pose and transformation layers.
This hierarchy enables InstruMentAR to accurately recognize operations, including button presses, switch toggles, knob rotations, slider movements, and screen touches.
This hierarchy enables InstruMentAR to accurately recognize operations, including button presses, switch toggles, knob rotations, slider movements, and screen touches.
Authoring Mode
Using InstruMentAR, the authoring process is simply demonstrating the operation and pinching to convert your voice to text instructions.
Accessing Mode:
Automatic Forward
After a correct operation is done, the tutorial will automatically forward to the next step.
Accessing mode:
Preemptive feedback
Since InstruMentAR is capable of tracking hand operation, it can issue preemptive warnings to prevent wrong operations. The haptic module provides haptic warning as well.
Full Video
Video can’t be displayed
Personal Thought
Just like many other engineering students, I suffered a lot in the ECE labs learning to operate the oscilloscope. I often found myself lost in messy lab manuals, trying to locate the current step and comparing it with the figures to check if I pressed the right button.
When I began researching in XR, I immediately saw how AR tutorials could transform this experience by overlaying guidance directly on instruments. However, I soon realized that creating AR tutorials was far from simple—it often took days to build even a single session in Unity.
That’s why I designed InstruMentAR: to make AR authoring as natural as performing the task itself. Instructors can simply demonstrate the operation without dealing with complex authoring interfaces. The user study confirmed its effectiveness—the authoring process was more than twice as fast as conventional immersive methods.
Looking ahead, the authoring process could become even easier. With the advancement of LLMs, maybe all we’ll need is to feed the system a device manual and simply prompt it to create the desired AR tutorial. Then the creator wouldn’t even need to demonstrate the procedure — just spend a few minutes talking, and there goes the AR lab manual.
When I began researching in XR, I immediately saw how AR tutorials could transform this experience by overlaying guidance directly on instruments. However, I soon realized that creating AR tutorials was far from simple—it often took days to build even a single session in Unity.
That’s why I designed InstruMentAR: to make AR authoring as natural as performing the task itself. Instructors can simply demonstrate the operation without dealing with complex authoring interfaces. The user study confirmed its effectiveness—the authoring process was more than twice as fast as conventional immersive methods.
Looking ahead, the authoring process could become even easier. With the advancement of LLMs, maybe all we’ll need is to feed the system a device manual and simply prompt it to create the desired AR tutorial. Then the creator wouldn’t even need to demonstrate the procedure — just spend a few minutes talking, and there goes the AR lab manual.