Tian Yingli, Yang Xiaodong, Yi Chucai, Arditi Aries
Electrical Engineering Department, The City College, and Graduate Center, City University of New York, New York, NY 10031.
Mach Vis Appl. 2013 Apr 1;24(3):521-535. doi: 10.1007/s00138-012-0431-7.
Independent travel is a well known challenge for blind and visually impaired persons. In this paper, we propose a proof-of-concept computer vision-based wayfinding aid for blind people to independently access unfamiliar indoor environments. In order to find different rooms (e.g. an office, a lab, or a bathroom) and other building amenities (e.g. an exit or an elevator), we incorporate object detection with text recognition. First we develop a robust and efficient algorithm to detect doors, elevators, and cabinets based on their general geometric shape, by combining edges and corners. The algorithm is general enough to handle large intra-class variations of objects with different appearances among different indoor environments, as well as small inter-class differences between different objects such as doors and door-like cabinets. Next, in order to distinguish intra-class objects (e.g. an office door from a bathroom door), we extract and recognize text information associated with the detected objects. For text recognition, we first extract text regions from signs with multiple colors and possibly complex backgrounds, and then apply character localization and topological analysis to filter out background interference. The extracted text is recognized using off-the-shelf optical character recognition (OCR) software products. The object type, orientation, location, and text information are presented to the blind traveler as speech.
对于盲人和视力受损者来说,独立出行是一个众所周知的挑战。在本文中,我们提出了一种基于计算机视觉的概念验证寻路辅助工具,以帮助盲人独立进入不熟悉的室内环境。为了找到不同的房间(如办公室、实验室或浴室)以及其他建筑设施(如出口或电梯),我们将目标检测与文本识别相结合。首先,我们通过结合边缘和角点,开发了一种强大而高效的算法,基于门、电梯和橱柜的一般几何形状来检测它们。该算法具有足够的通用性,能够处理不同室内环境中具有不同外观的物体的大类内变化,以及不同物体(如门和类似门的橱柜)之间的小类间差异。接下来,为了区分类内物体(如办公室门和浴室门),我们提取并识别与检测到的物体相关的文本信息。对于文本识别,我们首先从具有多种颜色和可能复杂背景的标志中提取文本区域,然后应用字符定位和拓扑分析来滤除背景干扰。使用现成的光学字符识别(OCR)软件产品来识别提取的文本。目标类型、方向、位置和文本信息以语音的形式呈现给盲人旅行者。