HOW MUCH DATA CAN TABLEAU HANDLE?
Before answering this question, let us look at the relevant terminology and understand the architecture of the data extracts (TDE) that we work with in Tableau.
TDE Data Storage – Columnar Data Model
A Tableau data extract (TDE) is a subset of data that you can use to improve the performance of your workbook, upgrade your data to allow for more advanced capabilities, and to make it possible to analyze your data offline (Source: Working with Tableau Data Extracts)
TDEs are based on a columnar database model, which means the data is stored in sections of columns, which allows for maximum compression of the data. The size of a TDE depends upon the cardinality (uniqueness of data values) of the columns in consideration. If there are fewer unique values, the column size is smaller, and vice versa. For example, if a table holds millions of records but a column has only four unique values in the entire table, then the TDE will store only four values for that column, and while storing the values the compression would also be applied.
This columnar model makes the TDE very efficient for reading operations, i.e., data can be read and processed much faster than that in row-based databases.
TDE is architecture aware
A TDE can use all parts of the computer memory, including the RAM, CPU cache, hard disk, etc.
When a query is sent to the TDE engine, only the data required to be processed at that particular point in time is loaded on to the RAM in the form of memory blocks. The memory mapping technique is used to achieve this, where the data is mapped from the disk to the RAM as memory blocks. The data from the RAM and hard disk is exchanged in and out, depending upon what data is required. The OS takes care of creating space on the RAM and shuffling the memory blocks. If there is enough RAM available, the loaded data will remain there for subsequent usage. In the very unlikely case that no RAM is available, the hard disk space is utilized to load the data from the TDE. Therefore, any data required for query processing can be loaded on the RAM, or if that is not possible, on the hard disk. It is also important to note that data from a TDE is always loaded in the compressed mode.
In principal, you can work with a data set that is larger than the amount of RAM available as you are only using small sections of RAM to load the relevant data at any given time. The OS will manage the space on your RAM to load additional blocks if required, and disk space will be utilized in extreme situations.
How it works: Some examples
The following tests were conducted using Tableau version 8.3.
Test 1: The first test used a table in SQL Server 2012 database of approximately 100 million records, with 16 dimensions and 4 measures. The size of the data file on the SQL Server was over 40 GB. There were two columns holding approximately 400K unique records and one column with almost 100Kunique records. While Tableau connected to this table, a data extract was generated which was approximately 3.5 GB in size.
It took about 30 minutes to generate the extract file. The SQL Server and the machine that Tableau was running on was 64 bit Windows Server 2012 R2, with 64 GB of RAM and 4 processors with 2 cores each.
Using Tableau Desktop, a report was created with three dimensions and one measure:
- Dimension A in the row shelf with 400K unique members
- Dimension B in the column shelf with 3 unique members
- Dimension C in the colour shelf with 4 unique members
- Measure with SUM as aggregation
A total of 2.5 Million “marks” (data points) are expected to be generated in this report.
Please note that the visualization was created for testing purposes only. In reality, these type of charts showing 400K data points do not comply with best practices.
The following tasks were performed by Tableau when the report was accessed.
a) Processing Query
The query required to fetch the data from TDE file was processed by the TDE engine. Using the memory mapping technique (as explained above) the required data from the columns in consideration was loaded on to the RAM and the “result set” was produced.
Total memory utilized: 40 MB
Total time taken: 290 Seconds (4.8 minutes)
b) Computing View Layout
The VizQL component took the result set produced above and generated the desired view/chart (2.5 million marks). On a 64 bit machine, VizQL has only 4GB of allocated memory on the RAM to carry out this task.
Total memory utilized: 2 GB (more memory was required than for query processing)
Total time taken: 90 Seconds (less time was required than for query processing)
Test 2: The same report was tested on a different machine which was running on a 64 bit OS but where the server had only 10 GB of RAM and 2 core processors. The results were as follows:
a) Processing Query
Total memory utilized: 40 MB (same as for Test 1)
Total time taken: 4490 Seconds (almost an hour and 15 minutes – much longer than Test 1)
b) Computing Layout
Total memory utilized: 2 GB (same as for Test 1)
Total time taken: 90 Seconds (same as for Test 1)
There are two areas to look at: the query processing and the computing layout.
With respect to query processing, even though there were 2.5 million data points in consideration, a maximum of 40 MB of RAM was utilized by Tableau for data loading and processing (irrespective of the hardware used for processing). Even when using the smaller server with only 10 GB of RAM, all of the required data could be loaded since a very smaller section of RAM was required. Therefore, we can conclude that, irrespective of the size of the data set, Tableau will rarely, if ever, be short of RAM for loading the required data and processing the queries.
However the time it takes to process the queries (i.e., the performance) will definitely be affected when we shifted to a smaller server. The TDE processing is core based so the greater the number of cores, and the more powerful the cores are, the better the performance will be. If we increase the frequency of the processors and add more cores, the data will be processed much faster, even when there is a smaller RAM.
However, for the second stage of computing the layout, the amount of memory required and the time to complete the process in both cases was similar. Thus we can conclude that, irrespective of the hardware, this process would remain the same. However we need to remember that there is a limitation of 4 GB on a 64 bit OS and 2 GB on a 32 bit OS, so we must be careful depending on the type of visualization being created. However, even if the data processing takes only seconds, you may not be able to display the visualization due to VizQL limitations. There are many articles on best practices for creating visualizations that must be followed.
So if you have a high-volume database, do not hesitate to create a data extract, but invest in a powerful server so that the performance can be boosted. There are good articles by Tableau that explain the best practices for data extracts through aggregation of data, removal of unwanted columns, etc.. However, it is important to remember that Tableau Data Extract is a not a replacement for a Data Warehouse. From a long term perspective there should always be a proper data strategy in place, rather than creating huge TDE files.
This Tableau Tip was written by:
Sourabh Dasgupta is the Director of Technology & BI at Corporate Renaissance Group India and is also the Product Manager of CRG’s in-house software products such as FlexABM and Cost Allocator. Over the past 14 years he has been involved as a technical expert in various projects and assignments related to implementation of Activity Based Costing and Business Intelligence and reporting systems in India as well as South Africa.
He is a one of the professional trainers for Tableau software and has conducted many trainings for the clients. He has been involved in designing and development of Tableau reports and dashboards, and was awarded with a Tableau Desktop 8 Qualified Associate certificate as well as a Tableau Server 8 Qualified Associate certificate. He has successfully conducted trainings on the Tableau Software (Desktop and Server versions) for many clients, which included basic and advanced features of the Tableau Desktop and Tableau Server.
Sourabh has a Bachelor of Science degree (Computer Science, Physics, and Mathematics) from Nagpur University, India and a Diploma from the National Institute of Information Technology (NIIT), Pune, India. He also holds a Microsoft Certified Professional certificate for Designing and Implementing Desktop Applications with Microsoft Visual basic.
If you wish to accelerate your skills and empower your learning with hands-on education, or are seeking an Expert-on-Demand, reach out to Sourabh at email@example.com