DuckDB Streaming Data Lakes, PostgreSQL 19 REPACK CONCURRENTLY, & AI Framework

Database · 2026-05-28

This week, we explore DuckDB's innovative data inlining in DuckLake, enabling efficient streaming for data lakes by eliminating the small files problem. We also dive into PostgreSQL 19's new `REPACK CONCURRENTLY` for non-blocking table rewrites and a practical framework for leveraging Postgres as an execution environment for AI workloads.

Data Inlining in DuckLake: Unlocking Streaming for Data Lakes (DuckDB Blog)

DuckDB Blog

DuckDB's new "data inlining" feature within its DuckLake ecosystem offers a novel approach to managing continuous streaming into data lakes. This technique directly embeds small updates into the catalog, fundamentally solving the pervasive "small files problem" that often plagues data lake architectures. By keeping metadata and data updates closely integrated, DuckLake significantly reduces overhead associated with frequent, minor data modifications. The blog post highlights a benchmark demonstrating a remarkable 926x improvement in performance for these streaming scenarios, making real-time data ingestion and processing into data lakes not just feasible but highly efficient. This innovation positions DuckDB as a powerful tool for modern data pipelines requiring low-latency updates and high throughput, directly addressing a common bottleneck in data lake architectures.

This is a game-changer for anyone dealing with data lakes and streaming ingestion; finally a practical solution to the small files problem without complex compaction jobs.

REPACK CONCURRENTLY: pg_squeeze Gets a Promotion (Planet PostgreSQL)

Planet PostgreSQL

PostgreSQL 19 introduces `REPACK CONCURRENTLY`, a highly anticipated native feature designed to provide a non-blocking alternative for rewriting tables and recovering wasted space. Historically, PostgreSQL users have relied on external tools like `pg_repack` to perform online table vacuuming and defragmentation without requiring exclusive locks that can disrupt production environments. The new `REPACK CONCURRENTLY` command integrates this crucial functionality directly into the database core, offering a more robust and supported solution. This native implementation means database administrators can now efficiently manage table bloat and reclaim disk space with minimal impact on application availability, making database maintenance operations smoother and more integrated into the PostgreSQL ecosystem. It's a significant improvement for performance tuning and overall database health strategies.

Integrating `REPACK CONCURRENTLY` natively in Postgres 19 is fantastic; it simplifies maintenance by removing a dependency on `pg_repack` and ensures concurrent operations.

Postgres as an Execution Environment for AI: Failure Modes, Hooks, and the ORBIT Framework (Planet PostgreSQL)

Planet PostgreSQL

This report from PGConf Dev 2026 delves into the practicalities of using PostgreSQL as an execution environment for AI workloads, introducing the ORBIT Framework. The article discusses common "failure modes" encountered when integrating AI with relational databases and explores how PostgreSQL's robust "hooks" and extensibility can be leveraged to mitigate these issues. The ORBIT Framework is presented as a working solution for developers and MLOps teams looking to maintain stable and efficient AI operations within a PostgreSQL backend. By providing insights into architectural considerations and practical implementation strategies, the post offers guidance on how to keep AI workloads running reliably in production, highlighting PostgreSQL's capabilities beyond traditional data storage to serve as a powerful computation engine for intelligent applications.

The ORBIT Framework for running AI workloads directly in Postgres is a practical guide; it’s great to see a concrete approach to address the complexities of in-database AI execution.