<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Dozer | Start building real-time data apps in minutes Blog</title>
        <link>https://getdozer.io/blog/articles</link>
        <description>Dozer | Start building real-time data apps in minutes Blog</description>
        <lastBuildDate>Mon, 01 Apr 2024 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Scaling In-App Analytics over PostgreSQL For a Last Mile Logistics company]]></title>
            <link>https://getdozer.io/blog/articles/scaling-inapp-analytics</link>
            <guid>https://getdozer.io/blog/articles/scaling-inapp-analytics</guid>
            <pubDate>Mon, 01 Apr 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[Imagine managing a growing logistics company with a fleet of 100 drivers, each handling an average of 10 orders per customer.  PostgreSQL, your chosen database, handles daily operations smoothly. However, complex analytics involving joins and aggregations across multiple tables become sluggish as your data volume grows.]]></description>
            <content:encoded><![CDATA[<p>Imagine managing a growing logistics company with a fleet of 100 drivers, each handling an average of 10 orders per customer.  PostgreSQL, your chosen database, handles daily operations smoothly. However, complex analytics involving joins and aggregations across multiple tables become sluggish as your data volume grows.</p><p><strong>The Problem:</strong></p><ul><li>PostgreSQL excels at basic transactions, but struggles with complex logistics analytics requiring:<ul><li>Joining real-time GPS data, order details, and customer addresses for route optimization.</li><li>Joining order timestamps, delivery statuses, and customer information to assess delivery efficiency.</li></ul></li><li>Extensive joins and aggregations slow down PostgreSQL due to table scans and lookups.</li></ul><p><strong>The Scaling Challenge:</strong></p><ul><li>As you add drivers and customers, data volume and complexity explode in PostgreSQL.</li><li>Complex analytical tasks like route optimization and delivery efficiency require even more joins and aggregations.</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-clickhouse">Why Clickhouse?<a href="#why-clickhouse" class="hash-link" aria-label="Direct link to Why Clickhouse?" title="Direct link to Why Clickhouse?">​</a></h2><p>In such a case the ideal scenario for a company would be to set up a real-time replication pipeline from their transactional database (Postgres in this case), to something like a ClickHouse - which is optimised for analytical queries on a large dataset.</p><p>In this scenario, using ClickHouse alongside PostgreSQL can be a game-changer for the logistics company's data analytics needs. Here's why:</p><ol><li><strong>Columnar Data Storage:</strong> ClickHouse stores data in a way that's perfect for analytics. It organises data by columns, which makes it super fast to search and retrieve just the bits of data needed for a query. On the other hand, PostgreSQL stores data row by row, which can slow things down, especially when dealing with large datasets - as it requires an entire table scan for performing extensive joins.</li><li><strong>Materialized Views:</strong> ClickHouse offers built-in support for materialized views, which are precomputed query results stored as tables. These materialized views can significantly improve query performance by eliminating the need to repeatedly perform complex calculations or joins on large datasets. This feature streamlines analytical queries, leading to faster insights and more efficient data analysis workflows.</li><li><strong>Scalability:</strong> ClickHouse is designed to handle massive datasets and can scale horizontally by distributing data across multiple nodes. This scalability ensures that as the company grows and its data volumes increase, ClickHouse can seamlessly accommodate the expanding workload without sacrificing performance.</li></ol><h2 class="anchor anchorWithStickyNavbar_LWe7" id="performance-comparisons">Performance Comparisons<a href="#performance-comparisons" class="hash-link" aria-label="Direct link to Performance Comparisons" title="Direct link to Performance Comparisons">​</a></h2><p>Let’s take a look at some analytical queries that one would see in a real world scenario like mentioned above, and see how each of these databases perform.</p><p>As explained in the problem statement, we have 5 tables namely - orders, jobs, drivers, driver_logs, job_assignments. Now, for the performance comparison between the query execution times in clickhouse and postgres databases, we take an example of an SQL query which can be highly useful in the real time analytics of the scenarios that a logistics company would face.</p><p>The SQL query selects rounded off latitude and longitude coordinates from the intermediate table "driver_locations" alongside the count of drivers at each location. It then filters the results to include only records where the number of drivers exceeds the average across all locations.</p><p>Essentially, this query identifies high-traffic areas by pinpointing GPS coordinates with an above-average concentration of drivers, aiding in the analysis and decision-making processes for optimizing driver deployment and route planning in logistics operations.</p><p>The single-read query would look something like:</p><div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">-- Query to aggregate driver locations and identify high-traffic areas without using materialized views</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">WITH</span><span class="token plain"> driver_locations </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">SELECT</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token function" style="color:#d73a49">ROUND</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token function" style="color:#d73a49">ROUND</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token function" style="color:#d73a49">COUNT</span><span class="token punctuation" style="color:#393A34">(</span><span class="token operator" style="color:#393A34">*</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> num_drivers</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">FROM</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        driver_logs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">GROUP</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">BY</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token function" style="color:#d73a49">ROUND</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">ROUND</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">SELECT</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    num_drivers</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">FROM</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    driver_locations</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">WHERE</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    num_drivers </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token keyword" style="color:#00009f">SELECT</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">AVG</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">num_drivers</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">FROM</span><span class="token plain"> driver_locations</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><h2 class="anchor anchorWithStickyNavbar_LWe7" id="querying-postgres-vs-querying-clickhouse-non-materialised">Querying Postgres VS Querying Clickhouse (Non materialised)<a href="#querying-postgres-vs-querying-clickhouse-non-materialised" class="hash-link" aria-label="Direct link to Querying Postgres VS Querying Clickhouse (Non materialised)" title="Direct link to Querying Postgres VS Querying Clickhouse (Non materialised)">​</a></h2><p>After conducting a performance comparison between ClickHouse and PostgreSQL databases on a dataset containing approximately 10 million records using the provided SQL query, the outcomes are as follows:</p><p>In the case of the ClickHouse database, the query execution time is impressively swift, clocking in at 2300 milliseconds (ms). Conversely, for the PostgreSQL database, the query execution time is noticeably longer, measuring 30174 ms (&gt;30 seconds). This stark difference illustrates that ClickHouse outperforms PostgreSQL by nearly fifteen times in terms of query execution speed.</p><p>These findings underscore the significant advantage that ClickHouse offers in handling analytical queries efficiently, particularly in scenarios where real-time insights are crucial for logistics operations.</p><p><img loading="lazy" alt="Postgres-Clickhouse-NonMaterialised" src="/blog/assets/images/postgres-ch-82389e5369c754e6d01a368d5f39e3b7.png" width="1850" height="1053" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="querying-postgres-vs-querying-clickhouse-materialised">Querying Postgres VS Querying Clickhouse (Materialised)<a href="#querying-postgres-vs-querying-clickhouse-materialised" class="hash-link" aria-label="Direct link to Querying Postgres VS Querying Clickhouse (Materialised)" title="Direct link to Querying Postgres VS Querying Clickhouse (Materialised)">​</a></h2><p>Similarly, we can also try having a materialised view for the situation, and then run a single read query on that view. So the query would look something like:</p><div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">--Create materialised view to aggregate driver locations</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">CREATE</span><span class="token plain"> MATERIALIZED </span><span class="token keyword" style="color:#00009f">VIEW</span><span class="token plain"> driver_locations_aggregated </span><span class="token keyword" style="color:#00009f">ENGINE</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> AggregatingMergeTree</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">ORDER</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">BY</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> longitude</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">SELECT</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">round</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">round</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">count</span><span class="token punctuation" style="color:#393A34">(</span><span class="token operator" style="color:#393A34">*</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> num_drivers</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">FROM</span><span class="token plain"> driver_logs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">GROUP</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">BY</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">round</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">round</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">-- Populate the materialised view</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">INSERT</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">INTO</span><span class="token plain"> driver_locations_aggregated</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">SELECT</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">round</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">round</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">count</span><span class="token punctuation" style="color:#393A34">(</span><span class="token operator" style="color:#393A34">*</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">AS</span><span class="token plain"> num_drivers</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">FROM</span><span class="token plain"> driver_logs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">GROUP</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">BY</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">round</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">round</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">gps_longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">-- Query to identify high-traffic areas</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">SELECT</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">latitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">longitude</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">num_drivers</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">FROM</span><span class="token plain"> driver_locations_aggregated</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">WHERE</span><span class="token plain"> num_drivers </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token keyword" style="color:#00009f">SELECT</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">avg</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">num_drivers</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">FROM</span><span class="token plain"> driver_locations_aggregated</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><p>In this case, we can observe that querying a materialized view on Clickhouse takes 111ms which is more than 270 times faster than querying a Postgres (which took us ~30 seconds).</p><p>These findings reinforce ClickHouse's superiority in handling analytical tasks, offering quicker insights and enabling more responsive decision-making in logistics operations.</p><p><img loading="lazy" alt="Postgres-Clickhouse-Materialised" src="/blog/assets/images/postgres-ch-mv-bb6b795e35e3eadfac50baa374bd6038.png" width="1850" height="1053" class="img_ev3q"></p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="cpu-utilisation">CPU Utilisation<a href="#cpu-utilisation" class="hash-link" aria-label="Direct link to CPU Utilisation" title="Direct link to CPU Utilisation">​</a></h3><p>ClickHouse demonstrates superior efficiency in handling queries compared to Postgres, evident in CPU utilization metrics. ClickHouse (materialized) evenly distributes CPU usage across all cores in a multi-core system, maintaining a maximum of 80% utilization (on a 8GB RAM machine). In contrast, Postgres shows uneven distribution, with some cores reaching 100% utilization while others hover around 20%.</p><h5 class="anchor anchorWithStickyNavbar_LWe7" id="clickhouse">Clickhouse<a href="#clickhouse" class="hash-link" aria-label="Direct link to Clickhouse" title="Direct link to Clickhouse">​</a></h5><p><img loading="lazy" alt="Clickhouse" src="/blog/assets/images/query_clickhouse-438f77a6dc6faf20ff0528e04c2fd0ca.png" width="371" height="253" class="img_ev3q"></p><h5 class="anchor anchorWithStickyNavbar_LWe7" id="postgres">Postgres<a href="#postgres" class="hash-link" aria-label="Direct link to Postgres" title="Direct link to Postgres">​</a></h5><p><img loading="lazy" alt="Postgres" src="/blog/assets/images/query_postgres-f63a5b2ab8e17b19992e85b7be5b2d03.png" width="1026" height="276" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion">​</a></h2><p>Comparing the performance of ClickHouse and PostgreSQL databases shows that ClickHouse is better at handling complex queries efficiently.</p><p>By using both ClickHouse and PostgreSQL together, logistics companies can have a complete solution that combines transactional reliability with strong analytical capabilities. Connecting PostgreSQL to ClickHouse in real-time allows for smooth integration of transactional and analytical data, making the most of each database's strengths for better overall performance.</p><p>Moreover, ClickHouse's use of materialized views simplifies analytical tasks, making it faster to gain insights and make decisions in real-time analytics situations. This hybrid approach, using PostgreSQL for transactions and ClickHouse with materialized views for real-time analytics, gives logistics companies an edge in the market.</p><p>Dozer stands out as a great tool for this setup, making it easy to replicate data from PostgreSQL to ClickHouse in real-time. By bringing together different data sources seamlessly, Dozer lets logistics companies create real-time data views using standard SQL and turn them into accessible APIs with minimal effort.</p><p><a href="https://getdozer.io/usecases/analytics.html" target="_blank" rel="noopener noreferrer">Dozer</a> acts as a real-time analytics layer, ensuring that insights are obtained quickly and enabling confident decision-making based on data. Its scalability means that companies can focus on innovation without worrying about managing infrastructure, as Dozer can handle increasing data needs effortlessly.</p><p>In summary, by adopting this two-database strategy, logistics companies can make the most of their data, improving efficiency, enhancing customer experience, and staying competitive in the fast-paced logistics industry.</p>]]></content:encoded>
            <category>PostgreSQL</category>
            <category>ClickHouse</category>
            <category>Logistics</category>
            <category>In-App Analytics</category>
        </item>
        <item>
            <title><![CDATA[GPTs, Assistant APIs and Retrieval Augmented Generation - What has fundamentally changed]]></title>
            <link>https://getdozer.io/blog/articles/gpts-and-assistants</link>
            <guid>https://getdozer.io/blog/articles/gpts-and-assistants</guid>
            <pubDate>Wed, 15 Nov 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[OpenAI's recent announcements of GPTs and Assistant APIs on November 6th, 2023 mark a significant advancement in the AI landscape, especially for developers and businesses looking to use AI in a more accessible and customizable manner. GPTs are a customisable version of Chat GPT that enable users to create AI experiences without writing any code where as Assistant APIs are a more powerful set of APIs to allow agent-like workflows similar to LangChain.]]></description>
            <content:encoded><![CDATA[<p>OpenAI's recent announcements of GPTs and Assistant APIs on November 6th, 2023 mark a significant advancement in the AI landscape, especially for developers and businesses looking to use AI in a more accessible and customizable manner. GPTs are a customisable version of Chat GPT that enable users to create AI experiences without writing any code where as Assistant APIs are a more powerful set of APIs to allow agent-like workflows similar to LangChain.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="difference-between-gpts-and-assistants">Difference Between GPTs and Assistants:<a href="#difference-between-gpts-and-assistants" class="hash-link" aria-label="Direct link to Difference Between GPTs and Assistants:" title="Direct link to Difference Between GPTs and Assistants:">​</a></h2><h3 class="anchor anchorWithStickyNavbar_LWe7" id="gpts-customizable-chatgpts">GPTs (Customizable ChatGPTs)<a href="#gpts-customizable-chatgpts" class="hash-link" aria-label="Direct link to GPTs (Customizable ChatGPTs)" title="Direct link to GPTs (Customizable ChatGPTs)">​</a></h3><p><code>Purpose</code> : To enable developers to create personalized AI models.</p><p><code>Key Features</code>:</p><ol><li><code>No Coding Requirement</code>:
It simplifies the process for non-programmers, allowing them to develop AI models through basic instructions.</li><li><code>Customization</code>:
Developers can tailor the AI to specific needs or industries.</li><li><code>Ease of Use</code>: The process is as simple as starting a conversation, making AI more accessible to a wider range of users.</li></ol><h3 class="anchor anchorWithStickyNavbar_LWe7" id="assistant-apis">Assistant APIs<a href="#assistant-apis" class="hash-link" aria-label="Direct link to Assistant APIs" title="Direct link to Assistant APIs">​</a></h3><p><code>Purpose</code>: To provide a more agent-like, task-oriented AI experience.</p><p><code>Key Features</code>:</p><ol><li><code>Specificity</code>: Designed for defined purposes with clear instructions and additional knowledge.</li><li><code>Advanced Capabilities</code>: Includes features like Code Interpreter and Retrieval, allowing the AI to perform more complex tasks.</li><li><code>Model and Tool Integration</code>: The ability to call upon various models and tools for executing specific tasks.</li><li><code>Simplified Development</code>: The API streamlines the development process, making it easier to create sophisticated AI assistants.</li></ol><p><strong>LLM Limitations: RAG (Retrieval Augmented Generation) and Fine Tuning</strong></p><p>LLMs are great but are limited in their knowledge base. Two primary challenges encapsulate the issues faced by Large Language Models (LLMs):</p><ol><li><strong>Stale Training Data:</strong> The data used to train LLMs tends to become outdated, posing a hurdle in keeping up with recent trends and events.</li><li><strong>Extrapolation and Hallucination:</strong> In the absence of factual information, LLMs often resort to extrapolation, resulting in the generation of confidently stated yet false statements.</li></ol><p>These are the two popular ways of bridging this gap. </p><ol><li><p><strong>RAG</strong>: </p><p>Integrates a retrieval component with a generative model, dynamically fetching external information to enhance the generation process. It's great for queries needing up-to-date or specific knowledge, but can struggle with data quality, relevance, and increased response latency.</p></li><li><p><strong>Fine-Tuning</strong>: </p><p>Involves adjusting a pre-trained model on a specific dataset to tailor its responses. It offers more control over the model's output and consistency in performance, but lacks the ability to incorporate real-time information and can be limited by the scope of the training data.</p></li></ol><p>RAG (Retrieval-Augmented Generation) can be seen as superior to fine-tuning in certain scenarios due to its dynamic nature and ability to incorporate a wide range of up-to-date information. RAG retrieves information in real-time, it can adapt to new topics or recent developments more effectively than a fine-tuned model, which might require retraining to stay current.</p><p><a href="https://python.langchain.com/docs/use_cases/question_answering/" target="_blank" rel="noopener noreferrer">Langchain</a> has been a leading solution in integrating knowledge using RAG, gaining wide popularity. However, OpenAI's recent advancements, particularly in assistant APIs, suggest a shift in the landscape. These APIs now enable direct integration of similar functionalities with potentially higher quality, potentially rendering approaches like vector stores and Langchain less critical in certain applications.</p><p><a href="https://platform.openai.com/docs/assistants/overview" target="_blank" rel="noopener noreferrer">The Assistants API</a> allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling. </p><p><strong>How Dozer Can Help You with RAG?</strong></p><p><a href="https://getdozer.io/" target="_blank" rel="noopener noreferrer">Dozer</a> supercharges AI applications with lightning-fast API data retrieval, seamlessly fueling retrieval systems and tools for smarter, more informed responses. </p><ul><li><p><strong><code>Enhanced Data Access for RAG</code></strong>: </p><p>  RAG models rely on external data for generating informed and accurate responses. Dozer's ability to quickly fetch data via APIs means that these models can access a broader range of current and relevant information, improving the quality of their output.</p></li><li><p><strong><code>Real-Time Information Retrieval</code></strong>: </p><p>  In scenarios where up-to-date information is crucial, Dozer's rapid API calls enable real-time data retrieval. This is particularly useful for applications that require the latest data, like news updates, stock prices, or weather reports.</p></li><li><p><strong><code>Integration with Diverse Data Sources</code></strong>: </p><p>  The ability to tap into various APIs allows for the integration of diverse data sources. This is beneficial in creating more comprehensive and multifaceted AI applications that can pull in specialized information from different fields.</p></li><li><p><strong><code>Streamlining Development with Assistant APIs</code></strong>: </p><p>  When building applications with Assistant APIs, the rapid integration of external data through Dozer can streamline the development process. It allows developers to focus on building the core functionality of their applications without worrying about the complexities of data retrieval.</p></li><li><p><strong><code>Customization and Flexibility</code></strong>: </p><p>  Dozer's API-driven approach provides developers with the flexibility to customize their retrieval sources. This customization is crucial in building applications that cater to specific domains or user needs.</p></li><li><p><strong><code>Scalability and Efficiency</code></strong>: </p><p>  Efficient data retrieval via APIs contributes to the scalability of AI applications. It ensures that as user demand increases, the system can maintain performance without significant increases in latency or resource consumption.</p></li></ul><p>In our next article, we'll dive into a hands-on example illustrating how Dozer's effortless API integration can enhance RAG with Open AI APIs. <a href="https://getdozer.io/contact" target="_blank" rel="noopener noreferrer">Please reach out</a> to us with your thoughts or if you want to have a discussion.</p>]]></content:encoded>
            <category>GPTs</category>
            <category>Assistant APIs</category>
            <category>Retrieval</category>
            <category>AI</category>
        </item>
        <item>
            <title><![CDATA[How to Use Assistant APIs?]]></title>
            <link>https://getdozer.io/blog/articles/how-to-use-assistant-apis</link>
            <guid>https://getdozer.io/blog/articles/how-to-use-assistant-apis</guid>
            <pubDate>Wed, 15 Nov 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[OpenAI has introduced a formidable competitor to Langchain agents, known as Assistants. These user-friendly, low-code tools, aim to offer an agent-like experience, enhanced retrievals, and improved function calling capabilities.]]></description>
            <content:encoded><![CDATA[<p>OpenAI has introduced a formidable competitor to Langchain agents, known as Assistants. These user-friendly, low-code tools, aim to offer an agent-like experience, enhanced retrievals, and improved function calling capabilities.</p><p>The Assistants API empowers you to construct AI assistants seamlessly integrated into your applications. These assistants, equipped with instructions, harness models, tools, and knowledge to adeptly address user queries. Presently, the Assistants API accommodates three essential tools: Code Interpreter, Retrieval, and Function Calling. The integration process for the Assistants API follows a straightforward flow:</p><ol><li><p><code>Assistant Creation</code>: Initiate an Assistant in the API, defining custom instructions and selecting a model. Optionally, enable tools like Code Interpreter, Retrieval, and Function Calling to enhance functionality.</p></li><li><p><code>Thread Creation</code>: Establish a Thread when a user initiates a conversation.</p></li><li><p><code>Message Addition</code>: Populate the Thread with Messages as users pose questions or interact.</p></li><li><p><code>Assistant Execution</code>: Invoke the Assistant on the Thread to elicit responses. This automatically engages the relevant tools, ensuring a dynamic and responsive interaction.</p></li></ol><p>In this instance, we are embarking on the development of an assistant specialized in knowledge retrieval, specifically to answer questions about the rules of chess using a document. Once developed, we plan to test it in the Assistant Playground and potentially publish it to the GPT Store.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="setting-up-the-client"><strong>Setting Up the Client</strong><a href="#setting-up-the-client" class="hash-link" aria-label="Direct link to setting-up-the-client" title="Direct link to setting-up-the-client">​</a></h3><p>The script kicks off with the straightforward import of the OpenAI package and the client's definition. It is recommended to handle the API key through an environment variable to prevent inadvertent leaks to version control systems.</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OpenAI</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> time</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Initialize Client</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">client </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"YOUR_API_KEY"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><h3 class="anchor anchorWithStickyNavbar_LWe7" id="uploading-knowledge-source"><strong>Uploading Knowledge Source</strong><a href="#uploading-knowledge-source" class="hash-link" aria-label="Direct link to uploading-knowledge-source" title="Direct link to uploading-knowledge-source">​</a></h3><p>Following the client's initialization, the script proceeds to provide the knowledge source by uploading a file to the client's context.</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Upload file</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token builtin">file</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">files</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token builtin">file</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"LawsOfChess.pdf"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"rb"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">purpose</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'assistants'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><h3 class="anchor anchorWithStickyNavbar_LWe7" id="initializing-the-chess-assistant"><strong>Initializing the Chess Assistant</strong><a href="#initializing-the-chess-assistant" class="hash-link" aria-label="Direct link to initializing-the-chess-assistant" title="Direct link to initializing-the-chess-assistant">​</a></h3><p>Once the file is uploaded and the client is set up, the next step is to select the model on which the assistant will run. The assistant is then initialized with the required tools and knowledge base. In this case, the assistant is named "Chess Assistant," and specific instructions are provided.</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Add the file to the assistant</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">assistant </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">beta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">assistants</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Chess Assistant"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">instructions</span><span class="token operator" style="color:#393A34">=</span><span class="token triple-quoted-string string" style="color:#e3116c">"""You are an assistant to help people with the game of Chess. You know the rules. Use the knowledge found in the uploaded file to best respond to players' queries."""</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4-1106-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">tools</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"retrieval"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">file_ids</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token builtin">file</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><p>Note the use of the "retrieval" tool, indicating the need to retrieve knowledge from external sources. OpenAI also supports tools like function calling and a code interpreter for more complex use cases.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="organizing-conversations-with-threads"><strong>Organizing Conversations with Threads</strong><a href="#organizing-conversations-with-threads" class="hash-link" aria-label="Direct link to organizing-conversations-with-threads" title="Direct link to organizing-conversations-with-threads">​</a></h3><p>The script organizes conversations as "threads," similar to the "Chain Of Thought" concept for agents in Langchain .</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Create a thread</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">thread </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">beta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">threads</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><h3 class="anchor anchorWithStickyNavbar_LWe7" id="interacting-with-the-chess-assistant"><strong>Interacting with the Chess Assistant</strong><a href="#interacting-with-the-chess-assistant" class="hash-link" aria-label="Direct link to interacting-with-the-chess-assistant" title="Direct link to interacting-with-the-chess-assistant">​</a></h3><p>With the setup complete, the script proceeds to interact with the assistant by sending a message and waiting for a response.</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Send a message to the assistant</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">message </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">beta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">threads</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">messages</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">thread_id</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">thread</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">role</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">content</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"What is Quickplay finish in Chess?"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">file_ids</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token builtin">file</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><h3 class="anchor anchorWithStickyNavbar_LWe7" id="running-the-chess-assistant"><strong>Running the Chess Assistant</strong><a href="#running-the-chess-assistant" class="hash-link" aria-label="Direct link to running-the-chess-assistant" title="Direct link to running-the-chess-assistant">​</a></h3><p>The assistant is then run by associating the thread and assistant.</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Run the assistant</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">run </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">beta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">threads</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">runs</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">thread_id</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">thread</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">assistant_id</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">assistant</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><p>To complete the process, the script continually checks the status of the run until it is 'completed' and then processes the response.</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Wait for completion</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">while</span><span class="token plain"> run</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">status </span><span class="token operator" style="color:#393A34">!=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"completed"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">sleep</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Retrieve and print the answer</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">messages </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">beta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">threads</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">messages</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">list</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">thread_id</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">thread</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"ANSWER: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">messages</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><h3 class="anchor anchorWithStickyNavbar_LWe7" id="exploring-assistant-capabilities-in-openai-playground"><strong>Exploring Assistant Capabilities in OpenAI Playground</strong><a href="#exploring-assistant-capabilities-in-openai-playground" class="hash-link" aria-label="Direct link to exploring-assistant-capabilities-in-openai-playground" title="Direct link to exploring-assistant-capabilities-in-openai-playground">​</a></h3><p>As we have seen, it is incredibly easy to setup an Assistant and use it for interaction. We can even view our assistant in the OpenAI Playground and test it online.</p><p><img loading="lazy" alt="OpenAI Playground" src="/blog/assets/images/image-e64de7e1d99b2f7b38098bf2fcda9eda.png" width="2000" height="871" class="img_ev3q"></p><p>As we explore the capabilities of OpenAI's Assistant API, it's crucial to understand two key approaches to enhance the performance and relevance of your AI assistant: fine-tuning and retrieval.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="fine-tuning"><strong>Fine-tuning:</strong><a href="#fine-tuning" class="hash-link" aria-label="Direct link to fine-tuning" title="Direct link to fine-tuning">​</a></h2><p>Fine-tuning involves training a pre-existing model on a specific dataset to tailor it for a particular task or domain. This is done to prevent the models to make a certain kind of mistake again and again. While fine-tuning allows for a high level of customization, it requires a considerable amount of domain-specific data and expertise. This process is ideal when your AI assistant needs to address very niche or specialized queries.</p><p>In the context of our chess assistant, fine-tuning could be applied to make the model more adept at answering intricate chess strategy questions or providing insights into specific chess openings. However, the trade-off is the resource-intensive nature of fine-tuning, both in terms of time and data.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="retrieval"><strong>Retrieval:</strong><a href="#retrieval" class="hash-link" aria-label="Direct link to retrieval" title="Direct link to retrieval">​</a></h2><p>On the other hand, retrieval is a more flexible approach that leverages external knowledge bases or documents to enhance the assistant's responses. OpenAI's Assistant API supports the retrieval tool, allowing the assistant to pull in information from an external source.</p><p>In our chess assistant example, retrieval could be used to fetch the latest tournament results, rule updates, or any dynamic information related to chess. This ensures that the assistant stays up-to-date without the need for continuous fine-tuning.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="choosing-the-right-approach"><strong>Choosing the Right Approach:</strong><a href="#choosing-the-right-approach" class="hash-link" aria-label="Direct link to choosing-the-right-approach" title="Direct link to choosing-the-right-approach">​</a></h2><p>The choice between fine-tuning and retrieval depends on the specific requirements of your AI application. If your domain is highly specialized and you have access to a substantial amount of domain-specific data, fine-tuning may be the preferred route. On the other hand, if your application requires constant updates or draws upon a diverse range of information, retrieval offers a more dynamic and resource-efficient solution.</p><p>In the pursuit of creating powerful GenAI applications, integrating real-time data is paramount. While static knowledge serves as the foundation, real-time data adds the fuel needed to power dynamic and hyper-personalized user experiences.</p><p>As showcased in the earlier sections, OpenAI's Assistant API, coupled with tools like retrieval, enables seamless integration of external knowledge sources. However, to truly unlock the potential of your chatbot, incorporating real-time data is essential. This could include the latest chess tournament results, rule updates, or even personalized user information. By integrating personalized user data, the assistant can tailor responses based on the user's preferences, history, or interactions, creating a more personalized and user-centric interaction.</p><p>Dozer, a powerful tool in the data integration landscape, allows you to effortlessly integrate real-time data from various sources, adding an additional layer of personalization and specificity to your assistant's responses.</p><p>As we continue our exploration, it's essential to highlight the versatility of these chatbots beyond the OpenAI Playground. We'll guide you through the steps to effortlessly integrate your assistant into third-party applications such as Slack, Microsoft Teams, and more.</p><p>Stay tuned for the upcoming articles, where we'll delve into the process of extending the reach of your assistant and unlocking its potential in various digital channels.</p><p>In the next segment, we'll specifically focus on the step-by-step process of integrating your assistant with popular messaging platforms using Dozer, enabling you to deploy your chatbot where your users are most active.</p><p>The era of AI-powered, data-integrated chatbots is here, and the possibilities are limitless. Join us as we continue to simplify, enhance, and redefine the landscape of interactive AI applications.</p>]]></content:encoded>
            <category>GPTs</category>
            <category>Assistant APIs</category>
            <category>Retrieval</category>
            <category>AI</category>
        </item>
    </channel>
</rss>