Native
| Input | Output | Alias |
|---|---|---|
| ✔ | ✔ |
Description
The Native format is ClickHouse's most efficient format because it is truly "columnar"
in that it does not convert columns to rows.
In this format data is written and read by blocks in a binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in the block are recorded one after another.
This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It might not be practical to work with this format yourself.
Data types wire format
Data is sent over the wire in a columnar format, which means that each column is sent separately, and all values of a column are sent together as a single array.
Each column in a block contains a header similar to RowBinaryWithNamesAndTypes.
When using the native TCP binary protocol (or when the HTTP endpoint receives ?client_protocol_version=<n>),
a BlockInfo structure is written before the column and row counts. The examples in this section use
the plain HTTP interface without a protocol version, which omits BlockInfo.
Block structure
The following query returns two columns, number and str, with three rows:
The output data fits into a single ClickHouse block, and it will look like this:
Multiple blocks
However, in many cases, the data will not fit into a single block, and ClickHouse will send the data as multiple blocks. Consider the following query that fetches two rows with reduced block size to force splitting the data as one row per block:
The output:
Simple data types
The wire format for an individual value of one of the simpler data types is similar to RowBinary/RowBinaryWithNamesAndTypes.
The full list of types that match this description includes:
- (U)Int8, (U)Int16, (U)Int32, (U)Int64, (U)Int128, (U)Int256
- Float32, Float64
- Bool
- String
- FixedString(N)
- Date
- Date32
- DateTime
- DateTime64
- IPv4
- IPv6
- UUID
Refer to the descriptions of the types above in "RowBinary data types wire format" for more details.
Complex data types
The encoding of the following types differs from RowBinary and RowBinaryWithNamesAndTypes.
- Nullable
- LowCardinality
- Array
- Map
- Variant
- Dynamic
- JSON
Nullable
In the Native format, a nullable column will have a number of bytes equal to the number of rows in the block before the actual data. Each of these bytes indicates whether the value is NULL or not. For example, with this query, each odd number will be NULL instead:
The output will look like this:
It works similarly with Nullable(String). The null indicator always comes from the nullable mask byte —
a mask value of 0x01 means the row is NULL regardless of the string content. For NULL rows,
the underlying string is stored as an empty string (LEB128 length 0). Note that a non-NULL empty
string also has LEB128 length 0, so only the mask byte distinguishes the two cases. For example, the following query:
The output will look like this:
LowCardinality
Unlike RowBinary where LowCardinality is transparent, the Native format uses a dictionary-based columnar encoding. A column is encoded as a version prefix, then a dictionary of unique values, and an array of integer indexes into that dictionary.
A column can be defined as LowCardinality(Nullable(T)), but it is not possible to define it as Nullable(LowCardinality(T)) — it will always result in an error from the server.
The version prefix is a UInt64(LE) with value 1, written once per column. Then, per block, the following is written:
UInt64(LE)—IndexesSerializationTypebitfield. Bits 0–7 encode the index width (0 = UInt8, 1 = UInt16, 2 = UInt32, 3 = UInt64). Bit 8 (NeedGlobalDictionaryBit) is never set in Native format (the server throws an exception if it is encountered). Bit 9 indicates additional dictionary keys are present. Bit 10 indicates the dictionary should be reset.UInt64(LE)— number of dictionary keys, followed by the keys bulk-serialized using the inner type encoding.UInt64(LE)— number of rows, followed by index values bulk-serialized using the appropriate UInt width.
The dictionary always contains a default value at index 0 (e.g. empty string for String, 0 for numeric types). For LowCardinality(Nullable(T)), index 0 represents NULL, and the keys are serialized without the Nullable wrapper.
For example, LowCardinality(String) with 5 rows ['foo', 'bar', 'baz', 'foo', 'bar']:
With LowCardinality(Nullable(String)), index 0 is NULL:
Array
Unlike RowBinary where each array is prefixed with a LEB128 element count, the Native format encodes arrays as two columnar sub-streams:
- N cumulative
UInt64offsets (little-endian, 8 bytes each). Rowihasoffset[i] - offset[i-1]elements, withoffset[-1]implicitly 0. - All nested elements across all rows, bulk-serialized contiguously.
For example, Array(UInt32) with 3 rows [[0, 10], [1, 11], [2, 12]]:
An empty array has the same offset as the previous row. For example, Array(String) with 4 rows [[], ['0'], ['0','1'], ['0','1','2']]:
Map
A Map(K, V) is encoded as Array(Tuple(K, V)) — array offsets followed by all keys, then all values. This differs from RowBinary where keys and values are interleaved per entry.
For example, Map(String, UInt64) with 3 rows [{'a':0,'b':10}, {'a':1,'b':11}, {'a':2,'b':12}]:
Variant
Unlike RowBinary where each row carries its own discriminant byte followed by the value inline, the Native format separates discriminators from data.
As with RowBinary, the types in the definition are always sorted alphabetically, and the discriminant is the index in that sorted list. 0xFF (255) represents NULL.
A Variant column is encoded as:
UInt64(LE)discriminators mode prefix (0= BASIC,1= COMPACT). Native format output typically uses BASIC (0); COMPACT mode may appear when reading data stored withuse_compact_variant_discriminators_serializationenabled.- N
UInt8discriminators, one per row. - Each variant type's data as a separate bulk column containing only the matching rows, in discriminant order.
For example, Variant(String, UInt32) with 5 rows [0::UInt32, 'hello', NULL, 3::UInt32, 'hello'] (sorted: String = 0, UInt32 = 1):
Dynamic
Unlike RowBinary where each value is self-describing (type prefix + value), the Native format serializes Dynamic as a structure prefix followed by a Variant column.
The structure prefix contains a UInt64(LE) serialization version, then the number of dynamic types (as VarUInt), then the type names as strings. In version V1 the type count is written twice for compatibility. The data that follows is a Variant column whose type list is the dynamic types plus an internal SharedVariant type, sorted alphabetically.
For example, Dynamic with 5 rows [0::UInt32, 'hello', NULL, 3::UInt32, 'hello']:
JSON
Unlike RowBinary where each row is self-describing with path names and values, the Native format serializes JSON in a columnar structure. The encoding is complex and version-dependent: it consists of a structure prefix with the serialization version, dynamic path names, and shared data layout, followed by typed paths (each as a bulk column), dynamic paths (each as a Dynamic column), and shared data for overflow paths.
For simpler interoperability, consider using the setting output_format_native_write_json_as_string=1, which serializes JSON columns as plain JSON text strings (one String per row).