加载中...
返回

【STL源码剖析】iterator & traits

本篇对应《STL源码剖析》第三章,主要看看迭代器和由此引出的 iterator_traits 。我对这部分现代源码的阅读顺序是从Clang到GCC,因此如果对本文秉承由上至下的阅读顺序,可能会丢失一些信息,特此说明。

GCC

由于 iterator 部分我是先看的Clang,回头看GCC时又不得不说还是GCC组织的好些(为什么每一篇都觉得第二个看到的好些(x。

GCC对 iterator 的基本概念和定义都放在 bits/stl_iterator_base_types.h 里。

iterator & iterator_traits

  • 首先是迭代器的几种 category ,这些类型的继承关系是后面整个STL算法库的重要基础之一。
/**
   *  @defgroup iterator_tags Iterator Tags
   *  These are empty types, used to distinguish different iterators.  The
   *  distinction is not made by what they contain, but simply by what they
   *  are.  Different underlying algorithms can then be used based on the
   *  different operations supported by different iterator types.
  */
  ///@{
  ///  Marking input iterators.
  struct input_iterator_tag { };

  ///  Marking output iterators.
  struct output_iterator_tag { };

  /// Forward iterators support a superset of input iterator operations.
  struct forward_iterator_tag : public input_iterator_tag { };

  /// Bidirectional iterators support a superset of forward iterator
  /// operations.
  struct bidirectional_iterator_tag : public forward_iterator_tag { };

  /// Random-access iterators support a superset of bidirectional
  /// iterator operations.
  struct random_access_iterator_tag : public bidirectional_iterator_tag { };

#if __cplusplus > 201703L
  /// Contiguous iterators point to objects stored contiguously in memory.
  struct contiguous_iterator_tag : public random_access_iterator_tag { };
#endif
  ///@}
  • 迭代器基本原型,注释指出其他XX迭代器都可以继承这个类,可以省点工作(其实就是少几个typedef)
  /**
   *  @brief  Common %iterator class.
   *
   *  This class does nothing but define nested typedefs.  %Iterator classes
   *  can inherit from this class to save some work.  The typedefs are then
   *  used in specializations and overloading.
   *
   *  In particular, there are no default implementations of requirements
   *  such as @c operator++ and the like.  (How could there be?)
  */
  template<typename _Category, typename _Tp, typename _Distance = ptrdiff_t,
           typename _Pointer = _Tp*, typename _Reference = _Tp&>
    struct iterator
    {
      /// One of the @link iterator_tags tag types@endlink.
      typedef _Category  iterator_category;
      /// The type "pointed to" by the iterator.
      typedef _Tp        value_type;
      /// Distance between iterators is represented as this type.
      typedef _Distance  difference_type;
      /// This type represents a pointer-to-value_type.
      typedef _Pointer   pointer;
      /// This type represents a reference-to-value_type.
      typedef _Reference reference;
    };
  • iterator_traits ,这个东西跟迭代器基本原型放在同一个文件里,对阅读来说是很友好的(说的就是你Clang)

  • 这里放出了GCC在不同C++版本里对 iterator_traits 的实现,注释指出,GCC在早期C++14上是缺失了SFINAE友好型的 iterator_traits 的,说人话就是早期没有利用好SFINAE法则来实现 iterator_traits ,导致对于非官方的迭代器类型(缺失了五元素中的某一些)不能够创建出 iterator_traits 实例。

  /**
   *  @brief  Traits class for iterators.
   *
   *  This class does nothing but define nested typedefs.  The general
   *  version simply @a forwards the nested typedefs from the Iterator
   *  argument.  Specialized versions for pointers and pointers-to-const
   *  provide tighter, more correct semantics.
  */
  template<typename _Iterator>
    struct iterator_traits;

#if __cplusplus >= 201103L
  // _GLIBCXX_RESOLVE_LIB_DEFECTS
  // 2408. SFINAE-friendly common_type/iterator_traits is missing in C++14
  template<typename _Iterator, typename = __void_t<>>
    struct __iterator_traits { };

#if ! __cpp_lib_concepts

  template<typename _Iterator>
    struct __iterator_traits<_Iterator,
			     __void_t<typename _Iterator::iterator_category,
				      typename _Iterator::value_type,
				      typename _Iterator::difference_type,
				      typename _Iterator::pointer,
				      typename _Iterator::reference>>
    {
      typedef typename _Iterator::iterator_category iterator_category;
      typedef typename _Iterator::value_type        value_type;
      typedef typename _Iterator::difference_type   difference_type;
      typedef typename _Iterator::pointer           pointer;
      typedef typename _Iterator::reference         reference;
    };
#endif // ! concepts

  template<typename _Iterator>
    struct iterator_traits
    : public __iterator_traits<_Iterator> { };

#else // ! C++11
  template<typename _Iterator>
    struct iterator_traits
    {
      typedef typename _Iterator::iterator_category iterator_category;
      typedef typename _Iterator::value_type        value_type;
      typedef typename _Iterator::difference_type   difference_type;
      typedef typename _Iterator::pointer           pointer;
      typedef typename _Iterator::reference         reference;
    };
#endif // C++11

C++11之前的版本, iterator_traits 实例化的时候要求模板类型必须定义了迭代器五元素,否则编译不过:

之后的版本则支持对于非标准迭代器,能够实例化出一个空的 iterator_traits

  • iterator_traits 对于标准指针类型的一些特化, remove_cv 这个 type traits 实际上出现的蛮早的,不知道为啥这里要等到C++17以后才把偏特化缩减到一个。
#if __cplusplus > 201703L
  /// Partial specialization for object pointer types.
  template<typename _Tp>
#if __cpp_concepts >= 201907L
    requires is_object_v<_Tp>
#endif
    struct iterator_traits<_Tp*>
    {
      using iterator_concept  = contiguous_iterator_tag;
      using iterator_category = random_access_iterator_tag;
      using value_type	      = remove_cv_t<_Tp>;
      using difference_type   = ptrdiff_t;
      using pointer	      = _Tp*;
      using reference	      = _Tp&;
    };
#else
  /// Partial specialization for pointer types.
  template<typename _Tp>
    struct iterator_traits<_Tp*>
    {
      typedef random_access_iterator_tag iterator_category;
      typedef _Tp                         value_type;
      typedef ptrdiff_t                   difference_type;
      typedef _Tp*                        pointer;
      typedef _Tp&                        reference;
    };

  /// Partial specialization for const pointer types.
  template<typename _Tp>
    struct iterator_traits<const _Tp*>
    {
      typedef random_access_iterator_tag iterator_category;
      typedef _Tp                         value_type;
      typedef ptrdiff_t                   difference_type;
      typedef const _Tp*                  pointer;
      typedef const _Tp&                  reference;
    };
#endif

advance

和侯捷老师书中基本一致,根据不同 iterator_category 选择不同实现以优化效率。

  template<typename _InputIterator, typename _Distance>
    inline _GLIBCXX14_CONSTEXPR void
    __advance(_InputIterator& __i, _Distance __n, input_iterator_tag)
    {
      // concept requirements
      __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
      __glibcxx_assert(__n >= 0);
      while (__n--)
	++__i;
    }

  template<typename _BidirectionalIterator, typename _Distance>
    inline _GLIBCXX14_CONSTEXPR void
    __advance(_BidirectionalIterator& __i, _Distance __n,
	      bidirectional_iterator_tag)
    {
      // concept requirements
      __glibcxx_function_requires(_BidirectionalIteratorConcept<
				  _BidirectionalIterator>)
      if (__n > 0)
        while (__n--)
	  ++__i;
      else
        while (__n++)
	  --__i;
    }

  template<typename _RandomAccessIterator, typename _Distance>
    inline _GLIBCXX14_CONSTEXPR void
    __advance(_RandomAccessIterator& __i, _Distance __n,
              random_access_iterator_tag)
    {
      // concept requirements
      __glibcxx_function_requires(_RandomAccessIteratorConcept<
				  _RandomAccessIterator>)
      if (__builtin_constant_p(__n) && __n == 1)
	++__i;
      else if (__builtin_constant_p(__n) && __n == -1)
	--__i;
      else
	__i += __n;
    }
  
  /**
   *  @brief A generalization of pointer arithmetic.
   *  @param  __i  An input iterator.
   *  @param  __n  The @a delta by which to change @p __i.
   *  @return  Nothing.
   *
   *  This increments @p i by @p n.  For bidirectional and random access
   *  iterators, @p __n may be negative, in which case @p __i is decremented.
   *
   *  For random access iterators, this uses their @c + and @c - operations
   *  and are constant time.  For other %iterator classes they are linear time.
  */
  template<typename _InputIterator, typename _Distance>
    inline _GLIBCXX17_CONSTEXPR void
    advance(_InputIterator& __i, _Distance __n)
    {
      // concept requirements -- taken care of in __advance
      typename iterator_traits<_InputIterator>::difference_type __d = __n;
      std::__advance(__i, __d, std::__iterator_category(__i));
    }

GCC的这个实现跟Clang有几个有意思的不同点(Clang的实现请见后文):

  • GCC的每个 __advance 额外用 __glibcxx_function_requires 检查了迭代器类型,但实际上我怀疑这个东西可能不是那么必要,因为第三个参数本来就是 iterator_traits 从入参迭代器中获得的,理论上是可信的;Clang就没有搞这套检查;

  • GCC这里直接用 difference_type 类型承接了外部传入的 __n ,进行了一波隐式转换,相比之下Clang似乎对这层转换的关注度更高,在模板定义时就指出必须满足整数转换关系,在实际转换时又用了显式转换;

  • GCC对于 random_access_iterator 的实现,额外考虑了内置指针类型的 ++ / -- ,个人没理解为啥要这么搞,难道会更高效?Clang就没有考虑这么多了;

  • GCC用 while 循环,Clang用 for 循环,这个小细节我站GCC~

distance

std::distance 的实现跟侯捷老师书中提到的没什么区别,就不展开了。在Clang的distance一节,会展开记录关于 distance 的其他细节。

  template<typename _InputIterator>
    inline _GLIBCXX14_CONSTEXPR
    typename iterator_traits<_InputIterator>::difference_type
    __distance(_InputIterator __first, _InputIterator __last,
               input_iterator_tag)
    {
      // concept requirements
      __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)

      typename iterator_traits<_InputIterator>::difference_type __n = 0;
      while (__first != __last)
	{
	  ++__first;
	  ++__n;
	}
      return __n;
    }

  template<typename _RandomAccessIterator>
    inline _GLIBCXX14_CONSTEXPR
    typename iterator_traits<_RandomAccessIterator>::difference_type
    __distance(_RandomAccessIterator __first, _RandomAccessIterator __last,
               random_access_iterator_tag)
    {
      // concept requirements
      __glibcxx_function_requires(_RandomAccessIteratorConcept<
				  _RandomAccessIterator>)
      return __last - __first;
    }

#if _GLIBCXX_USE_CXX11_ABI
  // Forward declaration because of the qualified call in distance.
  template<typename _Tp>
    ptrdiff_t
    __distance(_GLIBCXX_STD_C::_List_iterator<_Tp>,
	       _GLIBCXX_STD_C::_List_iterator<_Tp>,
	       input_iterator_tag);

  template<typename _Tp>
    ptrdiff_t
    __distance(_GLIBCXX_STD_C::_List_const_iterator<_Tp>,
	       _GLIBCXX_STD_C::_List_const_iterator<_Tp>,
	       input_iterator_tag);
#endif

  /**
   *  @brief A generalization of pointer arithmetic.
   *  @param  __first  An input iterator.
   *  @param  __last  An input iterator.
   *  @return  The distance between them.
   *
   *  Returns @c n such that __first + n == __last.  This requires
   *  that @p __last must be reachable from @p __first.  Note that @c
   *  n may be negative.
   *
   *  For random access iterators, this uses their @c + and @c - operations
   *  and are constant time.  For other %iterator classes they are linear time.
  */
  template<typename _InputIterator>
    inline _GLIBCXX17_CONSTEXPR
    typename iterator_traits<_InputIterator>::difference_type
    distance(_InputIterator __first, _InputIterator __last)
    {
      // concept requirements -- taken care of in __distance
      return std::__distance(__first, __last,
			     std::__iterator_category(__first));
    }

Clang

iterator_traits

  • 迭代器类型定义:
struct _LIBCPP_TEMPLATE_VIS input_iterator_tag {};
struct _LIBCPP_TEMPLATE_VIS output_iterator_tag {};
struct _LIBCPP_TEMPLATE_VIS forward_iterator_tag       : public input_iterator_tag {};
struct _LIBCPP_TEMPLATE_VIS bidirectional_iterator_tag : public forward_iterator_tag {};
struct _LIBCPP_TEMPLATE_VIS random_access_iterator_tag : public bidirectional_iterator_tag {};
#if _LIBCPP_STD_VER > 17
struct _LIBCPP_TEMPLATE_VIS contiguous_iterator_tag    : public random_access_iterator_tag {};
#endif
  • iterator traits(before cpp20):
// iterator_traits<Iterator> will only have the nested types if Iterator::iterator_category
//    exists.  Else iterator_traits<Iterator> will be an empty class.  This is a
//    conforming extension which allows some programs to compile and behave as
//    the client expects instead of failing at compile time.

template <class _Iter>
struct _LIBCPP_TEMPLATE_VIS iterator_traits
    : __iterator_traits<_Iter, __has_iterator_typedefs<_Iter>::value> {

  using __primary_template = iterator_traits;
};

可以看到这个类继承自内部实现的 __iterator_traits ,先看它的基类:

template <class _Iter, bool> struct __iterator_traits {};

template <class _Iter, bool> struct __iterator_traits_impl {};

template <class _Iter>
struct __iterator_traits_impl<_Iter, true>
{
    typedef typename _Iter::difference_type   difference_type;
    typedef typename _Iter::value_type        value_type;
    typedef typename _Iter::pointer           pointer;
    typedef typename _Iter::reference         reference;
    typedef typename _Iter::iterator_category iterator_category;
};

template <class _Iter>
struct __iterator_traits<_Iter, true>
    :  __iterator_traits_impl
      <
        _Iter,
        is_convertible<typename _Iter::iterator_category, input_iterator_tag>::value ||
        is_convertible<typename _Iter::iterator_category, output_iterator_tag>::value
      >
{};

对于 _iterator_traits_impl<_Iter, true> 实现了偏特化,说明外部传入的第二个参数是能够正确萃取出内容的关键;回到这个模板基类被实例化的地方,发现 iterator_traits 调用了另一个模板 __has_iterator_typedefs 来判断传入的迭代器是否实现了应有的类型定义(就是迭代器五元素):

template <class _Tp>
struct __has_iterator_typedefs
{
private:
    struct __two {char __lx; char __lxx;};
    template <class _Up> static __two __test(...);
    template <class _Up> static char __test(typename __void_t<typename _Up::iterator_category>::type* = 0,
                                            typename __void_t<typename _Up::difference_type>::type* = 0,
                                            typename __void_t<typename _Up::value_type>::type* = 0,
                                            typename __void_t<typename _Up::reference>::type* = 0,
                                            typename __void_t<typename _Up::pointer>::type* = 0);
public:
    static const bool value = sizeof(__test<_Tp>(0,0,0,0,0)) == 1;
};

这个东西用于判断一个类型是否满足迭代器的五元素要求,即是否定义了 iterator_category / difference_type / value_type / reference / pointer 这五个属性。它本质上也是对 SFINAE 法则的一个应用,可以看到这里的 __test 有两种重载,当外部传入的类型,即 _Tp ,定义了迭代器要求的全部五种属性,就会匹配到返回值为 char 的那一版 __test ;反之,匹配到返回值为 __two 的那一版 __test ,此时对返回值的 size 进行判断,即可表明 _Tp 是否实现了全部五种属性的定义。

这个实现看着还是很过瘾的,只能说写标准库的那帮人不愧是玩模板的神仙……

advance

iterator_traits 为迭代器规定了必须具备的五种基本元素,使得STL中的一些行为能够基于迭代器具体类型来得到最优实现。侯捷书中以 advancedistance 为例,我们亦延续其思路,看看当前的实现:

template <class _InputIter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX14
void __advance(_InputIter& __i, typename iterator_traits<_InputIter>::difference_type __n, input_iterator_tag) {
  for (; __n > 0; --__n)
    ++__i;
}

template <class _BiDirIter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX14
void __advance(_BiDirIter& __i, typename iterator_traits<_BiDirIter>::difference_type __n, bidirectional_iterator_tag) {
  if (__n >= 0)
    for (; __n > 0; --__n)
      ++__i;
  else
    for (; __n < 0; ++__n)
      --__i;
}

template <class _RandIter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX14
void __advance(_RandIter& __i, typename iterator_traits<_RandIter>::difference_type __n, random_access_iterator_tag) {
  __i += __n;
}

template <
    class _InputIter, class _Distance,
    class _IntegralDistance = decltype(_VSTD::__convert_to_integral(declval<_Distance>())),
    class = _EnableIf<is_integral<_IntegralDistance>::value> >
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX14
void advance(_InputIter& __i, _Distance __orig_n) {
  typedef typename iterator_traits<_InputIter>::difference_type _Difference;
  _Difference __n = static_cast<_Difference>(_VSTD::__convert_to_integral(__orig_n));
  _LIBCPP_ASSERT(__n >= 0 || __is_cpp17_bidirectional_iterator<_InputIter>::value,
                 "Attempt to advance(it, n) with negative n on a non-bidirectional iterator");
  _VSTD::__advance(__i, __n, typename iterator_traits<_InputIter>::iterator_category());
}

这份实现有几个要点:

  1. 通过 iterator_traits 提取出 iterator_category ,然后实例化一个临时对象作为内部实现的第三个参数,用以实现对不同类型迭代器选择不同的 advance 方式。这个手段在侯捷书中亦有详细介绍;

  2. std::advance 的实现,首先在模板中对 _Distance 类型进行了选择,要求必须可转为整数类型;其次是通过 _LIBCPP_ASSERT 规范了参数 __n ,要求对于 InputIterator 不能传入负数。(这里注意 RandomAccessIterator 也是一种 BiDirectionalIterator ,这就是继承关系的好处)。

distance

最后看看 std::distance 的实现:

template <class _InputIter>
inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX14
typename iterator_traits<_InputIter>::difference_type
__distance(_InputIter __first, _InputIter __last, input_iterator_tag)
{
    typename iterator_traits<_InputIter>::difference_type __r(0);
    for (; __first != __last; ++__first)
        ++__r;
    return __r;
}

template <class _RandIter>
inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX14
typename iterator_traits<_RandIter>::difference_type
__distance(_RandIter __first, _RandIter __last, random_access_iterator_tag)
{
    return __last - __first;
}

template <class _InputIter>
inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX14
typename iterator_traits<_InputIter>::difference_type
distance(_InputIter __first, _InputIter __last)
{
    return _VSTD::__distance(__first, __last, typename iterator_traits<_InputIter>::iterator_category());
}
  1. 对于 InputIterator / BidirectionalIterator 只能对 first 逐单位递增,最后算出 first ~ last 的距离;对于 RandomAccessIterator 则可以直接两者相减;

延伸:

这里的实现是有说法的,我们注意cppreference对 std::distance 的前提要求:

Returns the number of hops from first to last.

If InputIt is not LegacyRandomAccessIterator, the behavior is undefined if last is not reachable from first.

If InputIt is LegacyRandomAccessIterator, the behavior is undefined if first and last are neither reachable from each other.

注意 reachable 的定义:

An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to elements of the same sequence.

实际上标准工作组在 LWG940std::distance 的行为下了最终定义:

我们回归Clang STL的实现,会发现 std::distance 确实对于 RandomAccessIterator 可以返回负值,而对于其他类型迭代器,必须满足 first 可用 ++ 运算抵达 last ,程 序才不会出现未定义行为。

最后更新于 Dec 03, 2023
有朋自远方来,不亦说乎?