@@ -125,6 +125,81 @@ const s3Destination = new firehose.S3Bucket(bucket, {
125125});
126126```
127127
128+ ### Data Format Conversion
129+
130+ Defining an S3 destination configured with data format conversion:
131+
132+ ``` ts
133+ declare const bucket: s3 .Bucket ;
134+ declare const schemaGlueTable: glue .cfnTable ;
135+ const s3Destination = new firehose .S3Bucket (bucket , {
136+ compression: firehose .Compression .GZIP ,
137+ fileExtension: ' .json.gz' ,
138+ dataFormatConversionConfiguration: {
139+ schema: Schema .fromCfnTable (schemaGlueTable ),
140+ inputFormat: InputFormat .OPENX_JSON ,
141+ outputFormat: OutputFormat .PARQUET ,
142+ }
143+ });
144+ ```
145+
146+ You can only parse JSON and transform it into either Parquet or ORC:
147+ - to read JSON using OpenX parser, choose ` InputFormat.OPENX_JSON ` .
148+ - to read JSON using Hive parser, choose ` InputFormat.HIVE_JSON ` .
149+ - to transform into Parquet, choose ` OutputFormat.PARQUET ` .
150+ - to transform into ORC, choose ` OutputFormat.ORC ` .
151+
152+ Each input and output format has highly specific props that can be tuned if the defaults do not suit your usecase.
153+ These are detailed below
154+
155+ #### Input: OpenX JSON
156+
157+ Example creation of custom OpenX JSON InputFormat:
158+
159+ ``` ts
160+ const inputFormat = new OpenXJsonInputFormat ({
161+ lowercaseColumnNames: false ,
162+ columnToJsonKeyMappings: {" ts" : " timestamp" },
163+ convertDotsInJsonKeysTounderscores: true ,
164+ })
165+ ```
166+
167+ #### Input: Hive JSON
168+
169+ Example creation of custom Hive JSON InputFormat:
170+
171+ ``` ts
172+ const inputFormat = new HiveJsonInputFormat ({
173+ timestampParsers: [TimestampParser .fromFormatString (' yyyy-MM-dd' )]
174+ })
175+ ```
176+
177+ Hive JSON allows you to specify custom timestamp formats to parse. The syntax is Joda Time (link needed).
178+ When you specify a custom ` TimestampParser ` , the default parser is overriden. To retain the default parser,
179+ add ` TimestampParser.DEFAULT ` to the list of parsers.
180+
181+ To parse epoch millis, use the convenience ` TimestampParser.EPOCH_MILLIS ` .
182+
183+ #### Output: Parquet
184+
185+ Example creation of custom Parquet OutputFormat
186+
187+ ``` ts
188+ const outputFormat = new ParquetOutputFormat ({
189+ // TODO: Props
190+ })
191+ ```
192+
193+ #### Output: ORC
194+
195+ Example creation of custom ORC OutputFormat
196+
197+ ``` ts
198+ const outputFormat = new OrcOutputFormat ({
199+ // TODO: Props
200+ })
201+ ```
202+
128203## Server-side Encryption
129204
130205Enabling server-side encryption (SSE) requires Amazon Data Firehose to encrypt all data
0 commit comments