1.2 Why HTAP Matters

  |   0 评论   |   0 浏览

1.2 Why HTAP Matters(HTAP 数据库简介)

主讲人

Xiaoyu Ma(马晓宇)

Senior Technical Director - Real-time analytics and SQL meta

Tech lead@Quantcast / Netease / PingCAP

Big Data / Distributed database

Before we begin

  • Context: As the need of real-time analytics and HTAP is rising, this topic is an introduction of the concept of HTAP.
  • Goal: After this session, audience will have a brief idea of what is HTAP
  • Outline:
    • Database evolution
    • What does term HTAP means
    • Why HTAP is needed and how it helps you
    • TiDB HTAP architecture
    • Real world scenarios
  • Lab requirements (if needed):N/A

Part I: What is HTAP

  • An overview of HTAP
  • Goal
  • Subtopics
    • What is HTAP
    • AP & TP Databases
    • Why HTAP and how it helps you
    • Technical difficulties
  • Key points
  • Review of goal

What is HTAP

  • Invented by Gartner
  • HTAP is a very simple concept
  • TP = Transactional Processing
    • Row format, update in real-time
    • High concurrency and consistency, touch only a few rows each time
    • Current data
  • AP = Analytical Processing
    • Columnar format, batch update
    • Low concurrency, large batch process each query
    • Historical data

HTAP 是一家著名的市场分析和调研机构Gartner发明的词汇。

传统数据平台

121TraditionalDataPlatform.png

数仓中的数据更新不及时,架构复杂。

Why HTAP

The boundary between TP/AP is blurry now

  • TP-ish AP use cases
    • Comprehensive query platforms that provide report and high concurrent short query at the same time
  • AP-ish TP use cases
    • Analyze and optimize online transactional business in real time
    • Real-time cross BU data services

How HTAP help you

  • HTAP databases shine
    • Simplify architecture
    • Lower maintenance cost
    • Empower real-time scenarios
    • Improve business agility

HTAP 使架构变得简单,降低运维成本,支持实时分析和决策。

案例:销售数据平台

122SalesDataPlatform.png

该平台要求必须提供 TP 和 AP 两种能力。

Difficulties

  • Meeting the requirements from both sides is hard
    • Scalability
      • It's easy to build a distributed AP database but TP is hard
    • TP/AP at the same time
      • Supporting both storage forms
      • Avoiding workload interference
    • Seamlessly integration
      • Data synchronization
      • Fresh data

Part II: How HTAP help you

  • An overview of TiDB HTAP
  • Goal
  • Subtopics
    • TiDB HTAP introduction
    • Real world scenarios
  • Key points
  • Review of goal

TiDB HTAP

  • A Scalable database
  • Build for strict transactional use cases
  • Proved at core finance business
  • Equipped with powerful analytical engines
  • Natural fit for datahub / real-time data application

What's new in TiDB 4.0 HTAP

  • Real-time updatable columnar engine
  • Scalable row-wise and columnar engines
    • Separated machines, no interference
    • Consistent data replication
  • vectorized engine
  • Smart selection between row and column formats

增加了一个可更新的列存引擎。
行存引擎和列存引擎可以使用不同的服务器资源,互不干扰。
行存到列存一致性复制(异步)。
优化器自动选择使用行存还是列存。

TiDB 4.0 架构

123NewTiDB4.0Architecture.png

真实案例一:TP + AP 的一站式应用

124TPAPOneStop.png

简化架构,一套系统替代两套系统,保证数据新鲜。

真实案例二:实时数仓

125RealTimeDW.png

承载不同业务系统的数据变更,实时业务分析。

综合数据平台

126ComprehensiveDataPlatform.png

TiSpark 可以横跨多种数据平台。