缓存带来的各种问题

在流计算中，缓存主要用于存储中间计算结果或临时数据，以便进行后续的计算操作，如JOIN操作[3]。然而，缓存在流计算中也可能带来一些问题：

内存压力和计算效率问题：流计算中的JOIN操作通常需要缓存整个数据集，但在流计算中，JOIN的对象是无限的数据流，这就导致了内存压力和计算效率的问题[2]。
数据边界问题：在流计算中，缓存的数据集需要有明确的边界，这个边界通常是时间窗口。但在实时流计算中，无法缓存所有数据，因此需要设定基于时间的清理条件以限制JOIN涉及的数据范围[2]。
资源问题：在REGULAR JOIN中，因为历史数据不会被清理，所以允许对输入表进行任意种类的更新操作（insert、update、delete）。然而，由于资源问题，REGULAR JOIN通常是不可持续的[2]。
缓冲消胀问题：如果缓冲消胀机制的预测不准，可能会导致没有足够的缓存数据来提供全量吞吐，或者有太多缓冲数据对checkpoint barrier推进或者非对齐的checkpoint的大小造成不良影响[6]。
状态管理问题：在流数据处理的过程中，可能需要处理事件窗口、时间乱序、多流关联等问题，在解决这些问题的过程中，通常会涉及到对部分流数据的临时缓存，并在处理完后将其清除。这就涉及到流数据状态的管理，如果管理不当，可能会导致数据处理的错误[1]。
数据一致性问题：由于流计算的实时性，数据可能在被缓存和使用之间发生变化，这可能导致数据的一致性问题。

以上就是缓存在流计算中可能带来的一些问题。

Citations: [1] https://cloud.tencent.com/developer/article/1624955 [2] https://developer.aliyun.com/article/739793 [3] https://blog.csdn.net/maoyuanming0806/article/details/122325320 [4] https://www.infoq.cn/article/xvmaffiul0kmo1wxajbh [5] https://developer.huawei.com/consumer/cn/forum/topic/0202775562683000448 [6] https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/memory/network_mem_tuning/ [7] https://www.infoq.cn/article/sjmrwwacjxc9c6*ea8rt [8] https://developer.aliyun.com/article/540778 [9] https://flink-learning.org.cn/article/detail/78ff81d85d022778205a0b84663446e0 [10] https://www.cnblogs.com/xuwc/p/14013060.html [11] https://blog.csdn.net/weixin_47894524/article/details/118659499 [12] https://www.whitewood.me/2020/01/16/Flink-DataStream-%E5%85%B3%E8%81%94%E7%BB%B4%E8%A1%A8%E5%AE%9E%E6%88%98/ [13] https://www.linux88.com/caching-risks/ [14] https://aws.amazon.com/cn/blogs/china/optimizing-flink-join-on-emr-with-alluxio/ [15] https://blog.csdn.net/perfect2011/article/details/127906574 [16] https://blog.sciencenet.cn/blog-3472670-1287445.html [17] https://blog.jrwang.me/2020/2020-01-05-flink-sourcecode-sql-stream-join/ [18] https://www.cnblogs.com/swordfall/p/10612404.html [19] https://developer.aliyun.com/article/770534 [20] https://tech.ipalfish.com/blog/2021/06/29/flink_practice/ [21] https://developer.volcengine.com/articles/7146114617494732813 [22] https://aws.amazon.com/cn/builders-library/caching-challenges-and-strategies/ [23] https://cloud.tencent.com/developer/article/2013323 [24] https://developer.aliyun.com/article/1375609

缓存数据可能带来以下问题：

数据一致性问题：缓存和数据库可能存在数据不一致的情况。例如，如果缓存是定时刷新的，那么在刷新周期内，缓存和数据库的数据可能会不一致。尽管有一些方案可以尽可能保证缓存和数据库的一致性，但在极端情况下，仍然可能出现数据不一致的问题[1]。
内存压力和计算效率问题：缓存数据需要占用内存空间，如果缓存的数据量过大，可能会导致内存压力，影响计算效率[2]。
数据丢失问题：如果出现中断，本地缓存中的数据可能会丢失，需要重新获取和存储，这会给缓存带来不利影响[2]。
安全问题：缓存数据可能存在安全风险，例如，外部缓存队列可能缺乏对序列化数据加密的机制和传输层安全性，这在缓存中保存有敏感用户信息时特别重要[3]。
惊群效应：在许多客户端大约在同一时间发起请求，而这些请求都需要同一个未缓存的下游资源的情况下，可能会出现“惊群效应”，这会对系统性能产生影响[3]。
缓存无法满足请求：在某些情况下，例如冷启动、缓存队列中断、流量模式变化或下游服务长时间中断等，可能会导致无法使用缓存数据来满足请求[3]。

Citations: [1] http://kaito-kidd.com/2021/09/08/how-to-keep-cache-and-consistency-of-db/ [2] https://aws.amazon.com/cn/caching/database-caching/ [3] https://aws.amazon.com/cn/builders-library/caching-challenges-and-strategies/ [4] https://cloud.tencent.com/developer/article/1525256 [5] https://aws.amazon.com/cn/caching/ [6] https://www.51cto.com/article/654020.html [7] https://xiaolincoding.com/redis/architecture/mysql_redis_consistency.html [8] https://tech.meituan.com/2017/03/17/cache-about.html [9] https://jaskey.github.io/blog/2022/04/14/cache-consistency/ [10] https://pdai.tech/md/db/nosql-redis/db-redis-x-cache.html [11] https://www.cnblogs.com/softwarearch/p/16834547.html [12] https://blog.csdn.net/NhnMacos/article/details/133395882 [13] https://developer.aliyun.com/article/935615 [14] https://cloud.tencent.com/developer/techpedia/1733 [15] https://cn.pingcap.com/article/post/2957.html [16] https://www.linux88.com/caching-risks/ [17] http://www.360doc.com/content/23/1208/16/82054882_1106746808.shtml [18] https://time.geekbang.org/column/article/295812 [19] https://blog.csdn.net/weixin_43824238/article/details/103869741 [20] https://www.51cto.com/article/718910.html

🌥️ 晓灰

瞅瞅

缓存带来的各种问题

翻一翻

反向链接