高级检索+

多模态眼动数据驱动的阅读行为图像特征提取与分类模型

Multimodal eye-tracking data-driven image feature extraction and classification model for reading behavior

  • 摘要: 现有阅读行为分析方法主要依赖单模态眼动特征,未能有效融合文本的视觉语义特征与眼动时空动态特性,这使得现有方法难以全面捕捉复杂的阅读认知过程。针对这一问题,本文提出一种基于眼动轨迹与文本图像协同分析的分类模型。通过整合眼动仪采集的注视点序列、瞳孔直径变化及扫描路径等多模态眼动数据,结合深度卷积网络提取的文本图像语义特征,构建多维度特征矩阵。采用时空注意力机制对眼动数据进行动态加权,并利用图卷积网络(GCN)建模眼动轨迹与文本区域的空间关联性。实验在公开阅读行为数据集及自建多模态数据集上验证,结果表明,所提模型在文本类型分类任务中准确率达91.4%,较主流多模态融合模型提升3.2%,且能精准区分"精读"与"浏览"模式。该研究为阅读认知过程的量化分析提供了新范式,可应用于教育评估、人机交互优化等相关领域。

     

    Abstract: Current reading behavior analysis methods primarily rely on single-modal eye movement features and fail to effectively integrate the visual semantic features of text with the spatiotemporal characteristics of eye movements. This makes it difficult for existing methods to comprehensively capture complex reading cognitive processes. To address this issue, this paper proposes a classification model based on the joint analysis of eye movement trajectories and text images. By integrating multimodal eye-tracking dataincluding fixation sequences, pupil diameter variations, and scanning patterns captured via eye-tracking devices-with semantic features extracted from textual images using deep convolutional networks, a multidimensional feature matrix is constructed. Further, a spatiotemporal attention mechanism is employed to dynamically weight eye-movement data, while a graph convolutional network (GCN) is utilized to model the spatial correlations between eye-movement trajectories and textual regions. Experimental validation on both public reading behavior datasets and a self-constructed multimodal dataset demonstrates that the proposed model achieves 91. 4% accuracy in text-type classification tasks, outperforming mainstream multimodal fusion models by 3. 2%. The proposed model also enables precise discrimination between "intensive reading" and "skimming" patterns. In summary, this work introduces a novel paradigm for the quantitative analysis of reading cognitive processes, with potential applications in educational assessment, human-computer interaction optimization, and other related fields.

     

/

返回文章
返回