参考维基百科:
https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping
你们叫他SLAM,我还是习惯叫他三维重建......
在机器人学中,SLAM为面对未知环境进行即时位置和地图重建。现阶段最流行的方法为基于视觉的滤波方法,比如离子滤波和EKF方法。基于雷达的方法也可以用于稀疏SLAM问题,却缺乏一定的适用性和亲人性。
在AI行业里,总会这样说,怎样才能达到人的水平。SLAM问题对应了人熟悉未知环境并根据位置环境不断确定自身位置的过程,这个过程中所获的信息一般只是视觉信息,因此,构建类人的方法,使用纯视觉传感器来完成SLAM成为一个更接近于AI的研究方向,且其可达性显而易见。
古老的历史
基于激光雷达构建场景是古老并且可信的方法,激光雷达也是精确场景使用最多的SLAM传感器。它们提供机器人本体与周围环境障碍物间的距离信息。常见的激光雷达,例如SICK、Velodyne还有我们国产的rplidar等,都可以拿来做SLAM。激光雷达能以很高精度测出机器人周围障碍点的角度和距离,从而很方便地实现SLAM、避障等功能。
Problem definition
Given a series of sensor observations
o t {\displaystyle o_{t}}
over discrete time stepst {\displaystyle t},
the SLAM problem is to compute an estimate of the agent's location x t {\displaystyle x_{t}}
and a map of the environmentm t {\displaystyle m_{t}}.
All quantities are usually probabilistic, so the objective is to compute:
- P ( m t , x t | o 1 : t ) {\displaystyle P(m_{t},x_{t}|o_{1:t})}
Applying Bayes' rule gives a framework for sequentially updating the location posteriors, given a map and a transition functionP ( x t | x t − 1 ) {\displaystyle P(x_{t}|x_{t-1})},
- P ( x t | o 1 : t , m t ) = ∑ m t − 1 P ( o t | x t , m t ) ∑ x t − 1 P ( x t | x t − 1 ) P ( x t − 1 | m t , o 1 : t − 1 ) / Z {\displaystyle P(x_{t}|o_{1:t},m_{t})=\sum
_{m_{t-1}}P(o_{t}|x_{t},m_{t})\sum _{x_{t-1}}P(x_{t}|x_{t-1})P(x_{t-1}|m_{t},o_{1:t-1})/Z}
Similarly the map can be updated sequentially by
- P ( m t | x t , o 1 : t ) = ∑ x t ∑ m t P ( m t | x t , m t − 1 , o t ) P ( m t − 1 , x t | o 1 : t − 1 , m t − 1 ) {\displaystyle P(m_{t}|x_{t},o_{1:t})=\sum
_{x_{t}}\sum _{m_{t}}P(m_{t}|x_{t},m_{t-1},o_{t})P(m_{t-1},x_{t}|o_{1:t-1},m_{t-1})}
Like many inference problems, the solutions to inferring the two variables together can be found, to a local optimum solution, by alternating updates of the two beliefs in a form ofEM
algorithm.
被看作一个EM方法
Algorithms
Statistical techniques used to approximate the above equations includeKalman filters,particle
filters (aka. Monte Carlo methods) and scan matching of range data. They provide an estimation of the posterior probability function for the pose of the robot and for the parameters of the map.Set-membership
techniques are mainly based on interval constraint propagation.
They provide a set which encloses the pose of the robot and a set approximation of the map.
Bundle adjustment is another popular technique for SLAM using image data. Bundle adjustment jointly
estimates poses and landmark positions, increasing map fidelity, and is used in many recently commercialized SLAM systems such as Google'sProject
Tango.
New SLAM algorithms remain an active research area, and are often driven by differing requirements and assumptions about the types of maps, sensors and models as detailed below. Many SLAM systems can be viewed as combinations
of choices from each of these aspects.
两种方法:滤波方法和平差方法。
Mapping
Topological maps are a method of environment representation which capture the connectivity (i.e.,topology)
of the environment rather than creating a geometrically accurate map. Topological SLAM approaches have been used to enforce global consistency in metric SLAM algorithms.
In contrast, grid maps use arrays (typically square or hexagonal) of discretized cells to represent a topological world, and make inferences about which cells are occupied. Typically the cells are assumed to be statistically independent in order to simplify computation.
Under such assumption, P ( m t | x t , m t − 1 , o t ) {\displaystyle P(m_{t}|x_{t},m_{t-1},o_{t})}
are set to 1 if the new map's cells are consistent with the observation o t {\displaystyle o_{t}}
at locationx t {\displaystyle x_{t}}
and 0 if inconsistent.
地图:表示真实环境的数据映射,一般使用拓扑图地图表示全局的地图结构。另外的一种表达方式为网格地图,网格地图为离散坐标系的全覆盖。拓扑地图和网格地图有各自的优点,并适用于不同的环境。
Sensing
SLAM will always use several different types of sensors, and the powers and limits of various sensor types have been a major driver of new algorithms.
Statistical independence is the mandatory requirement to cope with metric bias and with noise in measures. Different types of sensors give rise to different SLAM algorithms whose assumptions are which are most appropriate to the sensors. At one extreme, laser
scans or visual features provide details of a great many points within an area, sometimes rendering SLAM inference unnecessary because shapes in these point clouds can be easily and unambiguously aligned at each step viaimage
registration. At the opposite extreme, tactile sensors are extremely sparse as they contain only information about points very close to the agent, so they require strong prior models to compensate in purely tactile SLAM. Most practical SLAM tasks fall somewhere between these visual and tactile
extremes.
Sensor models divide broadly into landmark-based and raw-data approaches. Landmarks are uniquely identifiable objects in the world whose location can be estimated by a sensor—such as wifi access points or radio beacons.
Raw-data approaches make no assumption that landmarks can be identified, and instead modelP ( o t | x t ) {\displaystyle P(o_{t}|x_{t})}
directly as a function of the location.
Optical sensors may be one-dimensional (single beam) or 2D- (sweeping)laser rangefinders, 3D High
Definition LiDAR,3D Flash LIDAR, 2D or 3Dsonar
sensors and one or more 2D cameras. Since 2005, there has been intense
research into VSLAM (visual SLAM) using primarily visual (camera) sensors, because of the increasing ubiquity of cameras such as those in mobile devices.
Visual and LIDAR sensors are informative enough to allow for landmark extraction in many cases. Other recent forms of SLAM include tactile SLAM
(sensing by local touch only), radar SLAM, and wifi-SLAM (sensing by strengths of
nearby wifi access points). Recent approaches apply quasi-optical wireless ranging for multi-lateration (RTLS) ormulti-angulation
in conjunction with SLAM as a tribute to erratic wireless measures. A kind of SLAM for human pedestrians uses a shoe mountedinertial
measurement unit as the main sensor and relies on the fact that pedestrians are able to avoid walls to automatically build floor plans of buildings. by anindoor
positioning system.
For some outdoor applications, the need for SLAM has been almost entirely removed due to high precision differentialGPS
sensors. From a SLAM perspective, these may be viewed as location sensors whose likelihoods are so sharp that they completely dominate the inference. However GPS sensors may go down entirely or in performance on occasions, especially during times of military
conflict which are of particular interest to some robotics applications.
传感器:SLAM有时需要不同的传感器,在数据获取上表现为高斯独立的性质。传感器模型可分为基于标记的模型和基于原始数据的模型。其区别在于有没有对数据进行标记和假设。
光学传感器有一维和二维、三维激光雷达,或者更多的是图像传感器。随着移动摄影设备的普及,基于视觉的SLAM成为主流。
Kinematics modeling
The P ( x t | x t − 1 ) {\displaystyle P(x_{t}|x_{t-1})}
term represents the kinematics of the model, which usually include information about action commands given to a robot. As a part of the model, the kinematics of the robot is included, to improve estimates of sensing under conditions of inherent and ambient
noise. The dynamic model balances the contributions from various sensors, various partial error models and finally comprises in a sharp virtual depiction as a map with the location and heading of the robot as some cloud of probability. Mapping is the final
depicting of such model, the map is either such depiction or the abstract term for the model.
For 2D robots, the kinematics are usually given by a mixture of rotation and "move forward" commands, which are implemented with additional motor noise. Unfortunately the distribution formed by independent noise in
angular and linear directions is non-Gaussian, but is often approximated by a Gaussian. An alternative approach is to ignore the kinematic term and read odometry data from robot wheels after each command—such data may then be treated as one of the sensors
rather than as kinematics.
动力学模型:转移概率用以描述环境模型,包含了运动模型和一部分环境噪音模型。对于2d机器人,动力学模型通常用一个旋转和移动的混合模型进行表述,并附加一定的独立噪音。由于噪音的分布一般不是高斯的,但接近于高斯,因此使用一个传感器模型相对于动力学模型更好一些。
Multiple objects
The related problems of data association and computational complexity are among the problems yet to be fully resolved, for example the identification of multiple confusable landmarks. A significant recent advance in the feature-based SLAM literature involved the re-examination
of the probabilistic foundation for Simultaneous Localisation and Mapping (SLAM) where it was posed in terms of multi-object Bayesian filtering with random finite sets that provide superior performance to leading feature-based SLAM algorithms in challenging
measurement scenarios with high false alarm rates and high missed detection rates without the need for data association.
Popular techniques for handling multiple objects includeJoint
Probabilistic Data Association Filter (JPDAF) and probability hypothesis density filter (PHD).
多目标:数据关联 和计算复杂度估计使用到 多个待分析的目标。一个有意义的提升是基于特征的SLAM的使用,广泛使用机器学习的方法(如一个贝叶斯分类器)。
Moving objects
Non-static environments, such as those containing other vehicles or pedestrians, continue to present research challenges. SLAM with DATMO is a model which tracks moving objects in a similar way to the agent itself.
Loop closure
Loop closure is the problem of recognizing a previously-visited location and updates the beliefs accordingly. This can be a problem because model or algorithm errors can assign low priors to the location. Typical loop
closure methods apply a second algorithm to measure some type of sensor similarity, and re-set the location priors when a match is detected. For example, this can be done by storing and comparingbag
of words vectors of SIFT features from each previously visited location.
闭环检测:是一个识别是否进入先前环境的过程,检测到闭环可以相应的提高可信度,降低误差。经典的闭环检测的算法是使用基于局部特征SIFT的词包模型,匹配每一个走过的场景。
Exploration
"Active SLAM" studies the combined problem of SLAM with deciding where to move next in order to build the map as efficiently as possible. The need for active exploration is especially pronounced in sparse sensing regimes
such as tactile SLAM. Active SLAM is generally performed by approximating the entropy of the map under hypothetical actions. "Multi agent SLAM" extends this problem to the case of multiple robots coordinating themselves to explore optimally.
动态 SLAM 把 Agent ”将要去哪儿“ 这个议题结合起来。
Biological inspiration
In neuroscience, the hippocampus appears to be involved in SLAM-like computations giving rise to place cells and has formed the basis for bio-inspired SLAM systems such as RatSLAM. Bio-inspired methods are not currently competitive with engineering approaches however.
Complexity
Researchers and experts in artificial intelligence have struggled to solve the SLAM problem in practical settings: that is, it required a great deal of computational power to sense a sizable area and process the resulting
data to both map and localize.
A 2008 review of the topic summarized: "[SLAM] is one of the fundamental challenges of robotics . . . [but it] seems that almost all the current approaches can not perform consistent maps for large areas, mainly due to the increase of the computational cost
and due to the uncertainties that become prohibitive when the scenario becomes larger."
Generally, complete 3D SLAM solutions are highly computationally intensive as they use complex real-time particle filters, sub-mapping strategies or hierarchical combination of metric topological representations, etc.
Robots that use embedded systems cannot fully implement SLAM because of their limitation in computing power. Nguyen V., Harati A., & Siegwart R. (2007) presented a fast, lightweight solution called OrthoSLAM, which breaks down the complexity of the environment
into orthogonal planes. By mapping only the planes that are orthogonal to each other, the structure of most indoor environments can be estimated fairly accurately. OrthoSLAM algorithm reduces SLAM to a linear estimation problem since only a single line is
processed at a time.
复杂度:SLAM是一个计算复杂的问题,持续的曾来那个地图和环境增加了过程的复杂度.........
Implementations
Various SLAM algorithms are implemented in the open-sourcerobot operating
system (ROS) libraries.
执行环境:一般的SLAM算法,都实现了对ROS的兼容,可以在ROS上运行。
History
A seminal work in SLAM is the research of R.C. Smith and P. Cheeseman on the representation and estimation of spatial uncertainty in 1986.
Other pioneering work in this field was conducted by the research group of Hugh F. Durrant-Whyte in the early 1990s. which showed
that solutions to SLAM exist in the infinite data limit. This finding motivates the search for algorithms which are computationally tractable and approximate the solution. The self-driving STANLEY car won the DARPA Grand Challenge and included a SLAM system,
bringing SLAM to worldwide attention. Mass-market SLAM implementations can now be found in consumer robot vacuum cleaners such as the Neato XV11.Self-driving
cars by Google and others have now received licenses to drive on public roads in some US states.
SLAM问题最先是由Smith Self 和Cheeseman 在1988年提出来的,被认为是实现真正全自主移动机器人的关键。 由 Smith, R.C. and P. Cheeseman, 提出的论文:On
the Representation and Estimation of Spatial Uncertainty. 《International Journal of Robotics Research》, 1986. 5(4): p. 56 -- 68. 以滤波的形式表示SLAM问题。
基于统计的滤波器思路,把SLAM写成一个运动方程和观测方程,以最小化这两个方程中的噪声项为目的,使用典型的滤波器思路来解决SLAM问题。
具体方法:当一个帧到达时,我们能(通过码盘或IMU)测出该帧与上一帧的相对运动,但是存在噪声,是为运动方程。同时,通过传感器对路标的观测,我们测出了机器人与路标间的位姿关系,同样也带有噪声,是为观测方程。通过这两者信息,我们可以预测出机器人在当前时刻的位置。同样,根据以往记录的路标点,我们又能计算出一个卡尔曼增益,以补偿噪声的影响。于是,对当前帧和路标的估计,即是这个预测与更新的不断迭代的过程。
随后的时间里,Se, S., D. Lowe and J. Little,的论文: Mobile robot localization and
mapping with uncertainty using scale-invariant visual landmarks. 《The international Journal of robotics Research》, 2002. 21(8): p. 735--758.中使用了卡尔曼滤波的方法。
使用EKF的方法是一段时间内的SLAM主流方法。
EKF来由:KF使用高斯分布表示后验概率,这个高斯分布的均值为ut,协方差记为Et.
高斯线性的主要问题在于,他们只有在运动模型f 和 测量模型h 是线性的情况下才比较接近实际情况。对于非线性的f 和h ,滤波器更新的结果往往不是高斯的。因此使用KF的定位算法需要将运动模型和传感器模型进行线性化(linearize),使用一个线性函数对非线性函数进行局部近似。通过一个一阶泰勒展开式对 f 和 h 进行线性化的卡尔曼滤波器为EKF。
稀疏图集:SALM问题通过不同的概率技术得到解决,包括EKF方法。使用EKF方法只需增加状态向量和环境中地标的位置,且EKF方法对地图和状态向量进行二次更新(O(n^2))。对于小地图,计算方法是可行的。对于大地图,则常常使用图松弛-稀疏图集的方法进行地图更新。
BA方法:21世纪之后,SLAM研究者开始借鉴SFM(Structure from Motion)问题中的方法,把平差方法--捆集优化BA(Bundle Adjustment)引入到SLAM中来。优化方法和滤波器方法有根本上的不同。它并不是一个迭代的过程,而是考虑过去所有帧中的信息。通过优化,把误差平均分到每一次观测当中。在SLAM中的Bundle
Adjustment常常以图的形式给出,所以研究者亦称之为图优化方法(Graph Optimization)。图优化可以直观地表示优化问题,可利用稀疏代数进行快速的求解,表达回环也十分的方便,因而成为现今视觉SLAM中主流的优化方法。
The Future
SLAM始终是一个计算复杂的问题,增量地图的更新带来的计算代价是不可避免的,正是由于计算能力的提升,以及残差和函数矩阵的稀疏性的认识,BA方法在视觉SLAM才有更广泛的流行。实时性需要有强有力的计算能力作为支撑,硬件的提升是促进SLAM发展的潜力所在。