- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

【华为云MySQL技术专栏】MySQL open_table流程解析

GaussDB 数据库发表于 2024/11/28 10:12:05 2024/11/28

【摘要】 1. 背景介绍元数据（Metadata）是用来描述数据的数据。如果没有元数据，我们就无法理解和使用数据库中存储的数据。数据字典（DD）是 MySQL 维护元数据的模块。然而，在实际操作中，MySQL 是通过表定义对象 TABLE 与存储引擎进行交互来处理表数据。每个TABLE内有一个handler，用于指示所使用的引擎对象。此外，TABLE对象还存储了操作数据的必备信息，如表及字段的特征、元...

1. 背景介绍

元数据（Metadata）是用来描述数据的数据。如果没有元数据，我们就无法理解和使用数据库中存储的数据。

数据字典（DD）是 MySQL 维护元数据的模块。然而，在实际操作中，MySQL 是通过表定义对象 TABLE 与存储引擎进行交互来处理表数据。每个TABLE内有一个handler，用于指示所使用的引擎对象。此外，TABLE对象还存储了操作数据的必备信息，如表及字段的特征、元数据锁以及record 查找结果等。每次执行 SQL 语句时，都会遍历涉及的表列表（TABLE_LIST，如 JOIN 多个表），并逐个打开表（open_table），最终从数据字典的 Table_impl 中获取构建TABLE的对象信息。

本文将基于 MySQL 8.0.22 版本对open_table流程，以及涉及到的表定义缓存实现进行详细介绍。

2. 表定义缓存

为了减少获取表 DD 信息的代价，避免每次都从 DD 模块中构建TABLE对象，Server层和 InnoDB 层都对表定义进行了缓存，且两者的缓存管理是相互独立的。

2.1关键数据结构

2.1.1 TABLE_SHARE

一张表对应一个TABLE_SHARE。Server 层在open_table时，需要将这个表的表名、库名、所有的列信息、列的默认值、表的字符集等信息存储到TABLE_SHARE结构体对象中。简单地说，TABLE_SHARE是 Server 层表定义的实例化对象。

TABLE_SHARE对象包含引用计数和版本信息，每次执行FLUSH操作时，其版本信息会递增。

TABLE_SHARE定义如下：

struct TABLE_SHARE { 
  ... 
  TABLE_SHARE *next{nullptr}, **prev{nullptr}; /* Link to unused shares */ 
  /** 
    Array of table_cache_instances pointers to elements of table caches respresenting this table in each of table_cache instances. 
  */ 
  Table_cache_element **cache_element{nullptr};
  Field **field{nullptr}; 
  KEY *key_info{nullptr}; /* data of keys defined for the table */ 
  LEX_CSTRING table_cache_key{nullptr, 0}; 
  LEX_CSTRING db{nullptr, 0}; /* Pointer to db */ 
  LEX_CSTRING table_name{nullptr, 0}; /* Table name (for open) */ 
  LEX_STRING path{nullptr, 0}; /* Path to .frm file (from datadir) */ 
  ulong mysql_version{0}; /* 0 if .frm is created before 5.0 */ 
  ulong reclength{0}; /* Recordlength */ 
  ... 
  // How many TABLE objects use this TABLE_SHARE. 
  unsigned int m_ref_count{0}; 
  unsigned long m_version{0}; 
  ... 
 };

2.1.2 TABLE_SHARE缓存

TABLE_SHARE缓存由哈希表（table_def_cache）和未使用对象链表（oldest_unused_share）组成。哈希表以表名为键，缓存了TABLE_SHARE对象；而未被使用的TABLE_SHARE对象则通过oldest_unused_share链表进行链接。

table_def_cache缓存和oldest_unused_share链表定义如下：

using Table_definition_cache = 
    malloc_unordered_map<std::string, 
                         std::unique_ptr<TABLE_SHARE, Table_share_deleter>>; 
Table_definition_cache *table_def_cache; 
static TABLE_SHARE *oldest_unused_share, end_of_unused_share;

table_def_cache缓存和oldest_unused_share链表的组织结构，如图1所示：

图1 table_def_cache缓存和oldest_unused_share链表结构示意图

在open_table流程中，当会话从table_def_cache中无法获取TABLE_SHARE对象时，会读取DD来创建TABLE_SHARE对象，并将其添加到table_def_cache中。

如果TABLE_SHARE对象的引用计数为0（也就是没有TABLE对象使用），就会将TABLE_SHARE对象移入oldest_unused_share链表。此时，TABLE_SHARE对象并未被删除，仍然在缓存中。如果有新会话打开该表时，就可以直接复用。

TABLE_SHARE对象只有在表结构被修改后才会被删除，或者table_def_cache缓存满了，优先从 oldest_unused_share中淘汰并释放。

2.1.3 TABLE

每个会话在获得TABLE_SHARE对象之后（所有的会话访问同一张表时，共用同一个TABLE_SHARE对象），都会创建一个 TABLE对象，并在使用期间独占该对象。Server 层会话通过TABLE对象的handler，与引擎层进行交互，以操作表文件实体。因此，可以将TABLE对象看作表在 Server 层的一种映射，将handler看作其为操作底层数据文件而在引擎层创建的句柄。

TABLE结构描述表的相关信息，如下所示：

struct TABLE { 
  TABLE_SHARE *s{nullptr}; 
  handler *file{nullptr}; 
  TABLE *next{nullptr}, *prev{nullptr};
 
 private: 
  /** 
    Links for the lists of used/unused TABLE objects for the particular
    table in the specific instance of Table_cache. 
  */ 
  TABLE *cache_next{nullptr}, **cache_prev{nullptr}; 

 public: 
  THD *in_use{nullptr}; /* Which thread uses this */ 
  Field **field{nullptr}; /* Pointer to fields */ 
  ... 
};

TABLE_SHARE：与表定义相关的一些 DD 信息，如表中包含的字段等；

handler：该表所使用的存储引擎接口的指针；

next/prev：两个TABLE指针。它们串联起了由当前线程（THD）所控制的所有正在操作的 TABLE 对象，这些对象共同构成了 THD::open_tables 链表。

2.1.4 TABLE缓存

MySQL Server层对TABLE对象的管理，主要通过TABLE缓存来完成，涉及到的结构体定义如下:

class Table_cache_element { 
  /* 
    Doubly-linked (back-linked) lists of used and unused TABLE objects 
    for this table in this table cache (one such list per table cache). 
  */ 
  typedef I_P_List< 
      TABLE, I_P_List_adapter<TABLE, &TABLE::cache_next, &TABLE::cache_prev>> 
      TABLE_list; 
  TABLE_list used_tables; 
  TABLE_list free_tables; 
  TABLE_SHARE *share; 
  ... 
};

class Table_cache { 
  ... 
  /** 
    The hash of table_cache_element objects, each table/table share that 
    has any TABLE object in the table_cache has a table_cache_element from 
    which the list of free TABLE objects in this table cache AND the list 
    of used TABLE objects in this table cache is stored. 
    We use table_cache_element::share::table_cache_key as key for this hash. 
  */ 
  std::unordered_map<std::string, std::unique_ptr<Table_cache_element>> m_cache; 
  /** 
    List that contains all TABLE instances for tables in this particular 
    table cache that are in not use by any thread. 
  */ 
  TABLE *m_unused_tables; 
  /** 
    Total number of TABLE instances for tables in this particular table 
    cache (both in use by threads and not in use). 
  */ 
  uint m_table_count; 
  ...
};

class Table_cache_manager { 
  ... 
  /** 
    An array of table_cache instances. 
    Only the first table_cache_instances elements in it are used. 
  */ 
  Table_cache m_table_cache[MAX_TABLE_CACHES]; 
  ... 
}; 
extern table_cache_manager table_cache_manager;

TABLE缓存的组织结构，如图2所示：

图2 TABLE缓存结构示意图

• Table_cache_manager是所有Table_cache的集合，该对象有一个全局单例。为了提高并发度，系统会根据 THD::m_thread_id 对所有会话的 Table_cache 进行分片处理。

• 在Table_cache中，m_unused_table缓存了所有处于空闲状态的TABLE对象，m_cache 是按照 [key:"database_name + table_name", value:Table_cache_element ] 建立的哈希表，每个Table_cache_element都唯一对应于一个TABLE_SHARE，并负责独立管理该TABLE_SHARE下所有正在使用和空闲的TABLE对象。由于Table_cache是按照 thread_id 进行分片的，因此，同一个TABLE_SHARE可能会分布在多个 Table_cache 的 Table_cache_element 中。

• Table_cache_element表示了一张表在会话对应的Table_cache中，构建的所有TABLE实例，其中有些正在使用（由used_tables维护，同时在THD:: open_tables中有所记录），也有一些已经被释放，缓存供后续使用（由free_tables管理），以提高缓存命中率。

MySQL在open_table时，访问TABLE缓存的过程，可以简单概括如下：

1) 根据会话m_thread_id查找使用的Table_cache，具体计算方法为 m_thread_id 对 table_cache_instances 取模（即 m_thread_id % table_cache_instances）；

2) 根据表名在Tabel_cache的hash表中查找Table_cache_element。如果存在转3），如果不存在转4）；

3) 从Table_cache_element的free_tables中出取一个TABLE并返回，同时，调整 table_cache_element 中的 free_tables/used_tables链表；

4) 重新创建一个TABLE, 并将其加入到对应的Table_cache_element 的 used_tables链表中。

会话在关闭表时会将TABLE对象从used_tables链表移入free_tables链表，并不直接删除和释放。与TABLE_SHARE类似，TABLE对象只有在表结构被修改后才会删除，或者Table_cache缓存满了，优先从m_unused_tables中淘汰并释放。

2.1.5 dict_table_t

InnoDB 通过读取元数据表记录来构建 dict_table_t 对象，这是 InnoDB 层表定义的实体化对象。对于每张表，只存在一份这样的对象（如果是分区表，则每个分区对应一份）。当没有 TABLE 引用时，该对象会被释放。

struct dict_table_t { 
  ... 
  /** Id of the table. */ 
  table_id_t id; 
  /** Table name. */ 
  table_name_t name; 
  /** Array of column descriptions. */ 
  dict_col_t *cols; 
  /** Array of virtual column descriptions. */ 
  dict_v_col_t *v_cols; 
  /** List of indexes of the table. */ 
  UT_LIST_BASE_NODE_T(dict_index_t) indexes; 
  /** metadata version number of dd::Table::se_private_data() */ 
  uint64_t version; 
  /** Approximate number of rows in the table. We periodically calculate new estimates. */ 
  ib_uint64_t stat_n_rows; 
  /** Count of how many handles are opened to this table. */ 
  std::atomic<uint64_t> n_ref_count; 
  ... 
};

2.1.6 dict_table_t缓存

dict_table_t被缓存在dict_sys_t中，其大小和 TABLE_SHARE 的缓存一样。dict_sys_t包括了两个哈希表：一个是按 name 索引的 table_hash ，另一个是按 id 索引的 table_id_hash。

struct dict_sys_t { 
  ... 
  hash_table_t *table_hash; /*!< hash table of the tables, based on name */ 
  hash_table_t *table_id_hash; /*!< hash table of the tables, based on id */ 
  ... 
}; 
/** the dictionary system */ 
dict_sys_t *dict_sys = nullptr;

2.2 Server层缓存结构

Server 层的表定义缓存结构，如图3所示：

图3 Server层表定义缓存结构图

会话通过THD ID和表名到table_cache_manager中去获取TABLE对象和TABLE_SHARE对象。如果从table_cache_manager找不到可用的TABLE_SHARE对象，就会从table_def_cache中获取。如果从table_def_cache中找不到TABLE_SHARE，就会从DD中读取表定义信息来构建TABLE_SHARE对象。

3. open_table流程解析

3.1 源码解析

open_table函数的主要处理流程如下：

open_table 
|--> open_table_get_mdl_lock // 获取MDL锁 
|--> check_if_table_exists // 表是否存在判断。创建表时表不存在，产生一次空读，这次空读会完整读穿DD模块直至底层存储引擎，不过由于此时相关DD信息尚未构建 
|--> dd::table_exists 
|--> client->acquire // 通过 key 去缓存中获取元数据对象。获取的整体过程就是一级局部缓存 -> 二级共享缓存 -> 存储引擎 
|--> acquire_uncommitted(key, &uncommitted_object, &dropped) // 一级缓存查找 
|--> m_registry_committed.get(key, &element); 
|--> Shared_dictionary_cache::get // 从二级共享缓存读 
|--> Shared_dictionary_cache::get_uncached // 从磁盘读系统表 
|--> Storage_adapter::get 
|-->Open_dictionary_tables_ctx::open_tables() // 调用 Server 层接口打开所有表 
|--Raw_table::find_record() // 直接调用 handler 接口根据传入的 key（比如表名）查找记录 
|--> ha_check_if_table_exists 
|--> Table_cache *tc = table_cache_manager.get_cache(thd) 
|--> table = tc->get_table(thd, key, key_length, &share); // 从缓存获取TABLE、TABLE_SHARE对象 
|--> if (table) goto table_found; // 如果能成功分配一个TABLE对象 
|--> if (share) goto share_found; // 找到 table_share 
|--> else // table、table_share都没找到,需要构建table_share 
|--> get_table_share_with_discover 
|--> get_table_share 
|--> table_def_cache->find // 如果在table_def_cache中找到，跳到goto share_found 
|--> alloc_table_share // 如果没找到，就需要建立TABLE_SHARE了 
|--> client->acquire // 从DD获取表定义信息 
|--> open_table_def // 根据DD中的表定义信息填写TABLE_SHARE 
|--> fill_share_from_dd 
|--> fill_columns_from_dd 
|--> fill_indexes_from_dd 
|--> fill_partitioning_from_dd 
|--> fill_foreign_keys_from_dd 
|--> fill_check_constraints_from_dd 
|--> share_found: 
|--> open_table_from_share // 通过TABLE_SHARE打开TABLE和dict_table_t 
|--> outparam->file = get_new_handler // 为TABLE实例创建对应的handler对象 
|--> ha_open 
|--> ha_innobase::open 
|--> dict_table_check_if_in_cache_low // 先从hash表中查找是否在cache中 
|--> if (!cached) ib_table = dd_open_table // 没有找到则从DD中打开或者加载 
|--> dd_open_table_one 
|--> dd_fill_dict_table // 基于dd::Table or dd::Partition实例化dict_table_t 
|--> dd_fill_dict_index // 实例化index相关的元数据 
|--> Table_cache *tc = table_cache_manager.get_cache(thd); 
|--> tc->add_used_table(thd, table) // 添加TABLE到Table_cache和used_tables链表 
|--> table_found:

3.2 流程分析

从前面的open_table代码可知：

在open_table过程时，首先会获取对应的 MDL 锁，以保障 DDL/DML 并发操作时，表对应的元数据、Server 层和 InnoDB 层的表定义缓存信息保持一致。

在Server 层获取TABLE_SHARE和构建TABLE实体对象过程中，也涉及多层缓存机制。首先，Table_cache_manager缓存了TABLE对象，维护了所有正在使用或曾经打开过的TABLE对象。如果 Table_cache_manager发生缓存穿透（即找不到所需的TABLE对象），则会去Table_definition_cache缓存中查找是否有存在TABLE_SHARE对象。如果Table_definition_cache进一步穿透，则会从InnoDB 层读取DD中的元数据构建TABLE_SHARE对象。

InnoDB 层也有表定义缓存，由全局的 dict_sys_t 管理，而单个表对象对应一个dict_table_t对象。通过读取元数据表记录来构建 dict_table_t 对象，并且 dict_sys_t 中也有两个 dict_table_t 缓存，分别以 table name 和 table id 进行映射关联。如果 dict_sys_t 发生穿透，则会读取DD中的元数据来构建dict_table_t 对象。

4. 总结

本文重点介绍了 MySQL 的表定义缓存实现机制，以及对open_table的处理流程进行了解析。open_table流程就是一个通过访问 DD 元数据，来构建用户表（或其他内容，如视图等）内存对象的过程。为了提升性能，MySQL在Server 层和 InnoDB 层都实现了表定义缓存，以减少直接访问 DD 带来的性能消耗。

鉴于MySQL open_table流程的代码实现细节比较多，篇幅有限，希望本文能够作为一个参考，帮助感兴趣的读者进一步研究这部分源码。

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

【华为云MySQL技术专栏】MySQL open_table流程解析

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

【华为云MySQL技术专栏】MySQL open_table流程解析

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品