Exam DP203 file formats Parquet: Difference between revisions
No edit summary |
No edit summary |
||
Line 6: | Line 6: | ||
It is a hybrid format, where the data is stored in row groups of (say) 1000 rows, and in columnar format within that. I think max min values might also be stored per row group. | It is a hybrid format, where the data is stored in row groups of (say) 1000 rows, and in columnar format within that. I think max min values might also be stored per row group. | ||
Compression | |||
Dictionary encoding |
Revision as of 22:32, 1 December 2024
The third file formats is: Parquet
Parquet is a columnar data format.
It is also binary, so cannot be opened in Notepad++. It needs to be opened in an app like ParquetViewer
It is a hybrid format, where the data is stored in row groups of (say) 1000 rows, and in columnar format within that. I think max min values might also be stored per row group.
Compression
Dictionary encoding