Exam DP203 file formats Parquet: Difference between revisions

From MillerSql.com
NeilM (talk | contribs)
No edit summary
NeilM (talk | contribs)
No edit summary
Line 4: Line 4:


It is also binary, so cannot be opened in Notepad++. It needs to be opened in an app like '''ParquetViewer'''
It is also binary, so cannot be opened in Notepad++. It needs to be opened in an app like '''ParquetViewer'''
It is a hybrid format, where the data is stored in row groups of (say) 1000 rows, and in columnar format within that. I think max min values might also be stored per row group.

Revision as of 22:18, 1 December 2024

The third file formats is: Parquet

Parquet is a columnar data format.

It is also binary, so cannot be opened in Notepad++. It needs to be opened in an app like ParquetViewer

It is a hybrid format, where the data is stored in row groups of (say) 1000 rows, and in columnar format within that. I think max min values might also be stored per row group.