Introduction to Talend Training

Course #:WA2700

Introduction to Talend Training

This 3 day course provides an introduction to Talend.

Audience

  • Developers
  • Data Engineers
  • Integration Engineers
  • Architects
  • Data Steward

Prerequisites

Participants should preferably have basic knowledge of a programming language like Java. The participants must be familiar with RDBMS and SQL language.

Duration

3 Days

Outline of Introduction to Talend Training

OVERVIEW (Theory)

  • Introduction to Talend
  • Why Talend?
  • Talend vs Other tools
  • Logical Architecture
  • More on Data Integration Aspects
  • Talend Big Data Integration
  • Talend Open Studio Walkthrough
  • Key components in Palette
  • Conclusion

INTRODUCTION AND GENERAL PRINCIPLES

  • Before you begin
  • Installing the software
  • Enabling tHashInput and tHashOutput

METADATA AND SCHEMAS

  • Introduction
  • Hand-cranking a built-in schema
  • Propagating schema changes
  • Creating a generic schema from the existing metadata
  • Cutting and pasting schema information
  • Dropping schemas to empty components
  • Creating schemas from lists

VALIDATING DATA

  • Introduction
  • Enabling and disabling reject flows
  • Gathering all rejects prior to killing a job
  • Validating against the schema
  • Rejecting rows using tMap
  • Checking a column against a list of allowed values
  • Checking a column against a lookup
  • Creating validation rules for more complex requirements
  • Creating binary error codes to store multiple test results

MAPPING DATA

  • Introduction
  • Simple mapping and tMap time savers
  • Creating tMap expressions
  • Using the ternary operator for conditional logic
  • Using intermediate variables in tMap
  • Filtering input rows
  • Splitting an input row into multiple outputs based on input conditions
  • Joining data using tMap
  • Hierarchical joins using tMap
  • Using reload at each row to process real-time / near real-time data

USING JAVA IN TALEND

  • Introduction
  • Performing one-off pieces of logic using tJava
  • Setting the context and globalMap variables using tJava
  • Adding complex logic into a flow using tJavaRow
  • Creating pseudo components using tJavaFlex
  • Creating custom functions using code routines
  • Importing JAR files to allow use of external Java classes

MANAGING CONTEXT VARIABLES

  • Introduction
  • Creating a context group
  • Adding a context group to your job
  • Adding contexts to a context group
  • Using tContextLoad to load contexts
  • Using implicit context loading to load contexts
  • Turning implicit context loading on and off in a job
  • Setting the context file location in the operating system

WORKING WITH DATABASES

  • Introduction
  • Setting up a database connection
  • Importing the table schemas
  • Reading from database tables
  • Using context and globalMap variables in SQL queries
  • Printing your input query
  • Writing to a database table
  • Printing your output query
  • Managing database sessions
  • Passing a session to a child job
  • Selecting different fields and keys for insert, update, and delete
  • Capturing individual rejects and errors
  • Database and table management
  • Managing surrogate keys for parent and child tables
  • Rewritable lookups using an in-process database

MANAGING FILES

  • Introduction
  • Appending records to a file
  • Reading rows using a regular expression
  • Using temporary files
  • Storing intermediate data in the memory using tHashMap
  • Reading headers and trailers using tMap
  • Reading headers and trailers with no identifiers
  • Using the information in the header and trailer
  • Adding a header and trailer to a file
  • Moving, copying, renaming, and deleting files and folders
  • Capturing file information
  • Processing multiple files at once
  • Processing control/validation files
  • Creating and writing files depending on the input data

WORKING WITH XML, QUEUES, AND WEB SERVICES

  • Introduction
  • Using tXMLMap to read XML
  • Using tXMLMap to create an XML document
  • Reading complex hierarchical XML
  • Writing complex XML
  • Calling a SOAP web service
  • Calling a RESTful web service
  • Reading and writing to a queue
  • Ensuring lossless queues using sessions

DEBUGGING, LOGGING, AND TESTING

  • Introduction
  • Find the location of compilation errors using the Problems tab
  • Locating execution errors from the console output
  • Using the Talend debug mode – row-by-row execution
  • Using the Java debugger to debug Talend jobs
  • Using tLogRow to show data in a row
  • Using tJavaRow to display row information
  • Using tJava to display status messages and variables
  • Printing out the context
  • Dumping the console output to a file from within a job
  • Creating simple test data using tRowGenerator
  • Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
  • Creating random test data using lookups
  • Creating test data using Excel
  • Testing logic – the most-used pattern
  • Killing a job from within tJavaRow

DEPLOYING AND SCHEDULING TALEND CODE

  • Introduction
  • Creating compiled executables
  • Using a different context
  • Adding command-line context parameters
  • Managing job dependencies
  • Capturing and acting on different return codes
  • Returning codes from a child job without tDie
  • Passing parameters to a child job
  • Executing non-Talend objects and operating system commands

COMMON MISTAKES AND OTHER USEFUL HINTS AND TIPS

  • Introduction
  • My tab is missing
  • Finding the code routine
  • Finding a new context variable
  • Reloads going missing at each row global variable
  • Dragging component globalMap variables
  • Some complex date formats
  • Capturing tMap rejects
  • Adding job name, project name, and other job specific information
  • Printing tMap variables
  • Stopping memory errors in Talend

Software Development Lifecycle (Theory) (Hands-on)

  • Working with Git and Talend
  • How to perform CI/CD with Jenkins and Talend?
  • Job Monitoring using Resource Manager UI
  • Unit Testing
  • Best Practices
    • Joblets
    • Parallelization
    • Reusing Jobs (Child Jobs)
    • Joblets
    • Context Variables
    • Repository

Getting started with a basic Big Data Job

  • Creating a Job
  • Adding components to the Job
  • Connecting the components together
  • Configuring the components
  • Executing the Job
  • Various types of Big Data Jobs
    • Pig Workflow
    • Reading and Writing to Hive on Hadoop
    • Working with HDFS
    • Performing Sqoop
    • Using Spark in Talend
    • Kafka

CarParts Project

  • Creating a Spark Batch Job
  • Use cases
    • Scenario: Carparts_demoprep
    • Scenario: Carparts_ETL
    • Scenario: Carparts01_Spark
    • Scenario: LoadCarPartsinHDFS
We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.