728x90
Snowflake❄️의 Data Warehousing Workshop을 진행하면서 작업한 내용을 정리하고자 합니다.
Badge 1: 데이터 웨어하우징 워크샵 - KR
이번 내용은 Lesson 7: Data Storage Structures 과정을 진행하면서 작성한 내용입니다.
🥋 XML Lab - With Header Row
🥋 View XML File
🥋 Create a Table for the XML Data
// Create an Ingestion Table for XML Data
CREATE TABLE LIBRARY_CARD_CATALOG.PUBLIC.AUTHOR_INGEST_XML
(
"RAW_AUTHOR" VARIANT
);
🥋 Create an File Format for the XML Data
//Create File Format for XML Data
CREATE FILE FORMAT LIBRARY_CARD_CATALOG.PUBLIC.XML_FILE_FORMAT
TYPE = 'XML'
COMPRESSION = 'AUTO'
PRESERVE_SPACE = FALSE
STRIP_OUTER_ELEMENT = FALSE
DISABLE_SNOWFLAKE_DATA = FALSE
DISABLE_AUTO_CONVERT = FALSE
IGNORE_UTF8_ERRORS = FALSE;
🎯 Load the XML Data into the XML Table
앞에서 배운 방식 중 하나를 사용해서 로드합니다.
🥋 View the XML Data You Loaded
모든 행이 단일 행으로 로드된 것 같습니다.
🥋 XML Lab - No Header Row
🥋 Download and View Another XML File
위 단계에서 사용한 파일과 지금 사용한 파일의 차이점을 확인해봅니다.
🥋 View the New Rows
🎯 Consider a Better Solution
<dataset>
태그와 </dataset>
태그를 제거하면 문제가 해결되지만 로드하려는 모든 XML 파일에 대해 이와 같은 작업을 수행하는 것은 실용적이지 않습니다. 이는 "kluge"(어떤 문제에 대한 서툴거나 세련되지 않은 해결책) 이라고 불릴 것 입니다.
🎯 XML Challenge Lab
🎯 Modify Your XML File Format (the Non-Klugey Solution)
XML file format 수정
//MODIFY File Format for XML Data by Changing Config
CREATE OR REPLACE FILE FORMAT LIBRARY_CARD_CATALOG.PUBLIC.XML_FILE_FORMAT
TYPE = 'XML'
COMPRESSION = 'AUTO'
PRESERVE_SPACE = FALSE
STRIP_OUTER_ELEMENT = FALSE
DISABLE_SNOWFLAKE_DATA = FALSE
DISABLE_AUTO_CONVERT = FALSE
IGNORE_UTF8_ERRORS = FALSE;
🎯 Modify, Truncate, Load, View, Repeat
1. file format 수정
2. 테이블 행 삭제 TRUNCATE LIBRARY_CARD_CATALOG.PUBLIC.AUTHOR_INGEST_XML;
3. 다시 파일 로드
🥋 XML Lab - Queries (Optional)
🥋 Fun With XML Variant Data
//Returns entire record
SELECT raw_author
FROM author_ingest_xml;
// Presents a kind of meta-data view of the data
SELECT raw_author:"$"
FROM author_ingest_xml;
//shows the root or top-level object name of each row
SELECT raw_author:"@"
FROM author_ingest_xml;
//returns AUTHOR_UID value from top-level object's attribute
SELECT raw_author:"@AUTHOR_UID"
FROM author_ingest_xml;
//returns value of NESTED OBJECT called FIRST_NAME
SELECT XMLGET(raw_author, 'FIRST_NAME'):"$"
FROM author_ingest_xml;
//returns the data in a way that makes it look like a normalized table
SELECT
raw_author:"@AUTHOR_UID" as AUTHOR_ID
,XMLGET(raw_author, 'FIRST_NAME'):"$" as FIRST_NAME
,XMLGET(raw_author, 'MIDDLE_NAME'):"$" as MIDDLE_NAME
,XMLGET(raw_author, 'LAST_NAME'):"$" as LAST_NAME
FROM AUTHOR_INGEST_XML;
//add ::STRING to cast the values into strings and get rid of the quotes
SELECT
raw_author:"@AUTHOR_UID" as AUTHOR_ID
,XMLGET(raw_author, 'FIRST_NAME'):"$"::STRING as FIRST_NAME
,XMLGET(raw_author, 'MIDDLE_NAME'):"$"::STRING as MIDDLE_NAME
,XMLGET(raw_author, 'LAST_NAME'):"$"::STRING as LAST_NAME
FROM AUTHOR_INGEST_XML;
//MODIFY File Format for XML Data by Changing Config
CREATE OR REPLACE FILE FORMAT LIBRARY_CARD_CATALOG.PUBLIC.XML_FILE_FORMAT
TYPE = 'XML'
COMPRESSION = 'AUTO'
PRESERVE_SPACE = FALSE
STRIP_OUTER_ELEMENT = FALSE
DISABLE_SNOWFLAKE_DATA = FALSE
DISABLE_AUTO_CONVERT = FALSE
IGNORE_UTF8_ERRORS = FALSE;
🎯 Create JSON Table & File Format, and Load the File
🎯 Download a JSON File
🎯 Create a Table Raw JSON Data
CREATE TABLE "LIBRARY_CARD_CATALOG"."PUBLIC"."AUTHOR_INGEST_JSON"
(
"RAW_AUTHOR" VARIANT
);
🎯 Create a File Format to Load the JSON Data
//Create File Format for JSON Data
CREATE FILE FORMAT LIBRARY_CARD_CATALOG.PUBLIC.JSON_FILE_FORMAT
TYPE = 'JSON'
COMPRESSION = 'AUTO'
ENABLE_OCTAL = FALSE
ALLOW_DUPLICATE = FALSE
STRIP_OUTER_ARRAY = TRUE
STRIP_NULL_VALUES = FALSE
IGNORE_UTF8_ERRORS = FALSE ;
🎯 Load the Data into the New Table, Using the File Format You Created
COPY INTO 문을 사용해서 json 데이터를 테이블에 추가합니다.
copy into AUTHOR_INGEST_JSON
from @LIKE_A_WINDOW_INTO_AN_S3_BUCKET
files = ( 'author_with_header.json')
file_format = ( format_name=JSON_FILE_FORMAT );
stage가 없는 경우 아래 sql 문으로 stage를 clone 할 수 있습니다.
CREATE STAGE LIKE_A_WINDOW_INTO_AN_S3_BUCKET
CLONE GARDEN_PLANTS.VEGGIES.LIKE_A_WINDOW_INTO_AN_S3_BUCKET;
🥋 Query the JSON Table
//returns AUTHOR_UID value from top-level object's attribute
select raw_author:AUTHOR_UID
from author_ingest_json;
//returns the data in a way that makes it look like a normalized table
SELECT
raw_author:AUTHOR_UID
,raw_author:FIRST_NAME::STRING as FIRST_NAME
,raw_author:MIDDLE_NAME::STRING as MIDDLE_NAME
,raw_author:LAST_NAME::STRING as LAST_NAME
FROM AUTHOR_INGEST_JSON;
💯 Lesson 8: Intro to Semi-Structured Data 완료!
728x90