{"id":1590,"date":"2020-06-15T06:00:34","date_gmt":"2020-06-15T11:00:34","guid":{"rendered":"https:\/\/www.bigdatainrealworld.com\/?p=1590"},"modified":"2023-02-19T07:32:13","modified_gmt":"2023-02-19T13:32:13","slug":"building-a-data-pipeline-with-apache-nifi","status":"publish","type":"post","link":"https:\/\/www.bigdatainrealworld.com\/building-a-data-pipeline-with-apache-nifi\/","title":{"rendered":"Building a Data Pipeline with Apache NiFi"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">NiFi is an open source data flow framework. It is highly automated for flow of data between systems. It works as a data transporter between data producer and data consumer. Producer means the system that generates data and consumer means the other system that consumes data. NiFi ensures to solve high complexity, scalability, maintainability and other major challenges of a Big Data pipeline.\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">NiFi is used extensively in Energy and Utilities, Financial Services, Telecommunication , Healthcare and Life Sciences, Retail Supply Chain, Manufacturing and many others.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Commonly used sources are data repositories, flat files, XML, JSON, SFTP location, web servers, HDFS and many others.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Destinations can be S3, NAS, HDFS, SFTP, Web Servers, RDBMS, Kafka etc.,<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-high-level-architecture.png\"><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone size-full wp-image-1591\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-high-level-architecture.png\" alt=\"NiFi high level architecture\" width=\"464\" height=\"268\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-high-level-architecture.png 464w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-high-level-architecture-300x173.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-high-level-architecture-253x146.png 253w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-high-level-architecture-50x29.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-high-level-architecture-130x75.png 130w\" sizes=\"(max-width:767px) 464px, 464px\" \/><\/a><\/p>\n<h2 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Why NiFi?<\/span><\/h2>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Primary uses of NiFi include data ingestion. In any Big Data projects, the biggest challenge is to bring different types of data from different sources into a centralized data lake. NiFi is capable of ingesting any kind of data from any source to any destination. NiFi comes with 280+ in built processors which are capable enough to transport data between systems.\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #ff0000;\">Interested in getting in to Big Data? check out our\u00a0<a style=\"color: #ff0000;\" href=\"https:\/\/hadoopinrealworld.com\/developer\/\">Hadoop Developer In Real World<\/a>\u00a0course for interesting use case and real world projects\u00a0just like what you are reading.<\/span><\/p>\n<p style=\"text-align: justify;\"><b>NiFi is an easy to use tool which prefers configuration over coding.<\/b><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">However, NiFi is not limited to data ingestion only. NiFi can also perform data provenance, data cleaning, schema evolution, data aggregation, transformation, scheduling jobs and many others. We will discuss these in more detail in some other blog very soon with a real world data flow pipeline.\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Hence, we can say NiFi is a highly automated framework used for gathering, transporting, maintaining and aggregating data of various types from various sources to destination in a data flow pipeline.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">A sample NiFi DataFlow pipeline would look like something below<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Sample-data-pipeline.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1598\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Sample-data-pipeline.png\" alt=\"NiFi Sample data pipeline\" width=\"720\" height=\"375\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Sample-data-pipeline.png 720w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Sample-data-pipeline-300x156.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Sample-data-pipeline-260x135.png 260w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Sample-data-pipeline-50x26.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Sample-data-pipeline-144x75.png 144w\" sizes=\"(max-width:767px) 480px, 720px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Seems too complex right. This is the beauty of NiFi: we can build complex pipelines just with the help of some basic configuration. So, always remember NiFi ensures <\/span><b>configuration over coding.<\/b><\/p>\n<h2 style=\"text-align: justify;\">Step by step instructions to build a data pipeline in NiFi<\/h2>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Before we move ahead with NiFi Components. As a developer, to create a NiFi pipeline we need to configure or build certain processors and group them into a processor group and connect each of these groups to create a NiFi pipeline.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Let us understand these components using a real time pipeline.\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Suppose we have some streaming incoming flat files in the source directory. Now, I will design and configure a pipeline to check these files and understand their name,type and other properties. This procedure is known as listing. After listing the files we will ingest them to a target directory.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">We will create a processor group \u201cList &#8211; Fetch\u201d by selecting and dragging the processor group icon from the top-right toolbar and naming it.<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-list-fetch.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1595\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-list-fetch.png\" alt=\"NiFi list fetch\" width=\"664\" height=\"397\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-list-fetch.png 664w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-list-fetch-300x179.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-list-fetch-244x146.png 244w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-list-fetch-50x30.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-list-fetch-125x75.png 125w\" sizes=\"(max-width:767px) 480px, 664px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Now, double click on the processor group to enter \u201cList-Fetch\u201d and drag the processor icon to create a processor. A pop will open, search for the required processor and add.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">The processor is added but with some warning \u26a0 as it&#8217;s just not configured . Right click\u00a0 and goto configure. Here, we can add\/update the scheduling , setting, properties and any comments for the processor. As of now, we will update the source path for our processor in Properties tab. Each of the field marked in <\/span><b>bold <\/b><span style=\"font-weight: 400;\">are mandatory and each field have a question mark next to it, which explains its usage.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">\u00a0<a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1599\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor.png\" alt=\"NiFi configure processor\" width=\"673\" height=\"496\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor.png 673w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-300x221.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-198x146.png 198w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-50x37.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-102x75.png 102w\" sizes=\"(max-width:767px) 480px, 673px\" \/><\/a><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Similarly, add another processor \u201cFetchFile\u201d. Move the cursor on the ListFile processor and drag the arrow on ListFile to FetchFile. This will give you a pop up which informs that the relationship from ListFile to FetchFile is on Success execution of ListFile. Once the connection is established. Warnings from ListFile will be resolved now and List File is ready for Execution. This can be confirmed by a thick red square box on processor.<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Processor.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1600\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Processor.png\" alt=\"NiFi Processor\" width=\"623\" height=\"420\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Processor.png 623w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Processor-300x202.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Processor-217x146.png 217w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Processor-50x34.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-Processor-111x75.png 111w\" sizes=\"(max-width:767px) 480px, 623px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Similarly, open FetchFile to configure. In the settings select all the four options from \u201cAutomatically Terminate Relationships\u201d. This ensures that the pipeline will exit once any of these relationships is found.<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-automatically-terminate-relationships.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1601\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-automatically-terminate-relationships.png\" alt=\"C:\\Hadoop\\Blog\\Nifi 1st\\NiFi automatically terminate relationships.png\" width=\"563\" height=\"413\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-automatically-terminate-relationships.png 563w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-automatically-terminate-relationships-300x220.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-automatically-terminate-relationships-199x146.png 199w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-automatically-terminate-relationships-50x37.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-automatically-terminate-relationships-102x75.png 102w\" sizes=\"(max-width:767px) 480px, 563px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Next, on Properties tab leave <\/span><b>File to fetch <\/b><span style=\"font-weight: 400;\">field as it is because it is coupled on success relationship with ListFile. Change Completion Strategy to <\/span><b>Move File<\/b><span style=\"font-weight: 400;\"> and input target directory accordingly. Choose the other options as per the use case. Apply and close.<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1602\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-2.png\" alt=\"NiFi configure processor 2\" width=\"580\" height=\"429\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-2.png 580w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-2-300x222.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-2-197x146.png 197w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-2-50x37.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-configure-processor-2-101x75.png 101w\" sizes=\"(max-width:767px) 480px, 580px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Pipeline is ready with warnings. Let\u2019s execute it.<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-ready.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1603\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-ready.png\" alt=\"NiFi pipeline ready\" width=\"779\" height=\"539\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-ready.png 779w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-ready-300x208.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-ready-768x531.png 768w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-ready-211x146.png 211w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-ready-50x35.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-ready-108x75.png 108w\" sizes=\"(max-width:767px) 480px, (max-width:779px) 100vw, 779px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">If we want to execute a single processor, just right click and start. For complete pipeline in a processor group. Goto the processor group by clicking on the processor group name at the bottom left navigation bar. Then right click and start.<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-start.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1604\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-start.png\" alt=\"NiFi pipeline start\" width=\"536\" height=\"271\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-start.png 536w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-start-300x152.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-start-260x131.png 260w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-start-50x25.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-start-148x75.png 148w\" sizes=\"(max-width:767px) 480px, 536px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-success.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1605\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-success.png\" alt=\"NiFi pipeline success\" width=\"564\" height=\"359\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-success.png 564w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-success-300x191.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-success-229x146.png 229w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-success-50x32.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-pipeline-success-118x75.png 118w\" sizes=\"(max-width:767px) 480px, 564px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">The green button indicates that the pipeline is in running state and red for stopped. Here, file moved from one processor to another through a Queue. If one of the processor completes and the successor gets stuck\/stop\/failed, the data processed will be stuck in Queue. Other details regarding execution history, summary, data provenance, Flow configuration history etc., can be accessed either by right click on processor\/processor group or by clicking on three horizontal line button on top right.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">This is a real world example of a building and deploying NiFi pipeline.<\/span><\/p>\n<p><span style=\"color: #ff0000;\">Like what you are reading? You would like our free live webinars too. Sign up and get notified when we host webinars =&gt;<\/span><script src=\"\/\/static.leadpages.net\/leadboxes\/current\/embed.js\" async=\"\" defer=\"defer\"><\/script><button style=\"background: #afbf00; border-color: #afbf00; border-radius: 20px; color: #ffffff; display: inline-block; vertical-align: middle; padding: 16px 32px; min-width: 192px; border: 1px solid #afbf00; font-size: 1rem; font-family: Helvetica, Arial, sans-serif; text-align: center; outline: 0; line-height: 1; cursor: pointer; -webkit-transition: background 0.3s, color 0.3s, border 0.3s; transition: background 0.3s, color 0.3s, border 0.3s; box-shadow: 0px 2px 5px rgba(0, 0, 0, 0.6);\" data-leadbox-popup=\"146ccea73f72a2:103d035da346dc\" data-leadbox-domain=\"ccubecompany.lpusercontent.com\">Click here to subscribe<\/button><\/p>\n<h2 style=\"text-align: justify;\">NiFi Components<\/h2>\n<p><span style=\"font-weight: 400;\">Internally, NiFi pipeline consists of below components.\u00a0<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">FlowFile<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">FlowFile represents the real abstraction that NiFi provides i.e., the structured or unstructured data that is processed. Structured data such as JSON or XML message and unstructured data such as images, videos, audios. FlowFile contains two parts &#8211; content and attribute.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Content keeps the actual information of the data flow which can be read by using GetFile, GetHTTP etc. while the attribute is in the key-value pair form and contains all the basic information about the content.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Processor<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Processor acts as a building block of NiFi data flow. It performs various tasks such as create FlowFiles, read FlowFile contents, write FlowFile contents, route data, extract data, modify data and many more. As of today we have 280+ in built processors in NiFi. Do remember we can also build custom processors in NiFi as per our requirement.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Reporting Task<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Reporting task is able to analyse and monitor the internal information of NiFi and then sends this information to the external resources.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Processor Group<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">It is a set of various processors and their connections that can be connected through its ports.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Queue<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Queue as the name suggests it holds processed data from a processor after it&#8217;s processed.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">FlowFile Prioritizer<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">It gives the facility to prioritize the data that means the data needed urgently is sent first by the user and remaining data is in the queue.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Flow Controller<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Flow Controller acts as the brain of operations. It keeps the track of flow of data that means initialization of flow, creation of components in the flow, coordination between the components. It is responsible for managing the threads and allocations that all the processes use. Flow controller has two major components- Processors and Extensions.\u00a0<\/span><\/p>\n<h2 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">NiFi Architecture<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Consider a host\/operating system (your pc), Install Java on top of it to initiate a java runtime environment (JVM). Consider a web server (such as localhost in case of local PC), this webserver primary work would be to host HTTP based command or control API.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Now let\u2019s add a core operational engine to this framework named as <\/span><b>flow controller<\/b><span style=\"font-weight: 400;\">. It acts as the brains of operation. Processors and Extensions are its major components.The Important point to consider here is Extensions operate and execute within the JVM (as explained above).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It is the Flow Controllers that provide threads for Extensions to run on and manage the schedule of when Extensions receives resources to execute.<\/span><\/p>\n<p><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-components.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1606\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-components.png\" alt=\"NiFi components\" width=\"495\" height=\"357\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-components.png 495w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-components-300x216.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-components-202x146.png 202w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-components-50x36.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-components-104x75.png 104w\" sizes=\"(max-width:767px) 480px, 495px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Last but not the least let\u2019s add three repositories FlowFile Repository, Content Repository and Provenance Repository.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">FlowFile Repository\u00a0<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">FlowFile Repository is a pluggable repository that keeps track of the state of active FlowFile.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Content Repository\u00a0<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Content Repository is a pluggable repository that stores the actual content of a given FlowFile. It stores data with a simple mechanism of storing content in a File System. More than one can also be specified to reduce contention on a single volume.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Provenance Repository<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Provenance Repository is also a pluggable repository. It stores provenance data for a FlowFile in Indexed and searchable manner. provenance data refers to the details of the process and methodology by which the FlowFile content was produced. It acts as a lineage for the pipeline.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This is the overall design and architecture of NiFi. Please refer to the below diagram for better understanding and reference.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">NiFi is also operational on clusters using Zookeeper server.\u00a0<\/span><\/p>\n<h2 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Apache NiFi Installation on your PC<\/span><b>\u00a0<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Now, as we have gained some basic theoretical concepts on NiFi why not start with some hands-on. To do so, we need to have NiFi installed.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Please proceed along with me and complete the below steps irrespective of your OS:<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Download<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">C<\/span><span style=\"font-weight: 400;\">reate a directory of your choice.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Open a browser and navigate to the url\u00a0<\/span><a href=\"https:\/\/nifi.apache.org\/download.html\"><b>https:\/\/nifi.apache.org\/download.html<\/b><\/a><\/p>\n<p><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1607\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads.png\" alt=\"NiFi downloads\" width=\"1033\" height=\"609\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads.png 1033w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads-300x177.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads-768x453.png 768w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads-1024x604.png 1024w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads-248x146.png 248w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads-50x29.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-downloads-127x75.png 127w\" sizes=\"(max-width:767px) 480px, (max-width:1033px) 100vw, 1033px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">At the time of writing we had 1.11.4 as the latest stable release. Based on the latest release, go to \u201cBinaries\u201d section.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We are free to choose any of the available files however, I would recommend \u201c.tar.gz \u201c for MAC\/Linux\u00a0<\/span><span style=\"font-weight: 400;\">and \u201c.zip\u201d for windows.<\/span><\/p>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Install<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">While the download continues, please make sure you have java installed on your PC and JDK assigned to JAVA_HOME path.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Please do not move to the next step if java is not installed or not added to JAVA_HOME path in the environment variable.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Once the file mentioned in step 2 is downloaded, extract or unzip it in the directory created at step1.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Open the extracted directory and we will see the below files and directories<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-installed-files.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1608\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-installed-files.png\" alt=\"NiFi installed files\" width=\"592\" height=\"378\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-installed-files.png 592w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-installed-files-300x192.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-installed-files-229x146.png 229w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-installed-files-50x32.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-installed-files-117x75.png 117w\" sizes=\"(max-width:767px) 480px, 592px\" \/><\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Open the bin directory above. The below structure appears.<\/span><\/p>\n<p><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-bin-directory.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1609\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-bin-directory.png\" alt=\"NiFi bin directory\" width=\"583\" height=\"179\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-bin-directory.png 583w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-bin-directory-300x92.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-bin-directory-260x80.png 260w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-bin-directory-50x15.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-bin-directory-150x46.png 150w\" sizes=\"(max-width:767px) 480px, 583px\" \/><\/a><\/p>\n<p><b>\u00a0<\/b><span style=\"font-weight: 400;\">Here, we can see OS based executables. So our next steps will be as per our operating system:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For MAC\/Linux OS open a terminal and execute <span class=\"lang:default decode:true crayon-inline \">bin\/nifi.sh<\/span>\u00a0<\/span><span style=\"font-weight: 400;\">\u00a0run from installation directory or <span class=\"lang:default decode:true crayon-inline \">bin\/nifi.sh<\/span>\u00a0<\/span><span style=\"font-weight: 400;\">\u00a0start to run it in background.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To install NiFi as a service(only for mac\/linux) execute <span class=\"lang:default decode:true crayon-inline \">bin\/nifi.sh<\/span>\u00a0 install from installation directory. This will install the default service name as nifi. For custom service name add another parameter to this command <span class=\"lang:default decode:true crayon-inline \">bin\/nifi.sh<\/span>\u00a0 install dataflow<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For windows open cmd and navigate to bin directory for ex:<\/span><\/p>\n<p><span class=\"lang:default decode:true crayon-inline \">cd c:\\sw\\nifi-1.11.4-bin\\nifi-1.11.4\\bin<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Then type\u00a0<\/span><b>run-nifi.bat<\/b><span style=\"font-weight: 400;\"> and press enter<\/span><\/p>\n<p><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/run-nifi-bat.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1610\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/run-nifi-bat.png\" alt=\"run-nifi-bat\" width=\"578\" height=\"270\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/run-nifi-bat.png 578w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/run-nifi-bat-300x140.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/run-nifi-bat-260x121.png 260w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/run-nifi-bat-50x23.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/run-nifi-bat-150x70.png 150w\" sizes=\"(max-width:767px) 480px, 578px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Go to logs directory and open <\/span><b>nifi-app.log<\/b><span style=\"font-weight: 400;\"> scroll down to the end of the page.<\/span><\/p>\n<p><a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-logs.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1611\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-logs.png\" alt=\"NiFi logs\" width=\"592\" height=\"367\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-logs.png 592w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-logs-300x186.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-logs-236x146.png 236w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-logs-50x31.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-logs-121x75.png 121w\" sizes=\"(max-width:767px) 480px, 592px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0Here, in the log let us have a look at the below entry:<\/span><\/p>\n<pre class=\"lang:default decode:true \">\u00a0ServerConnector@1f59a598{HTTP\/1.1,[http\/1.1]}{0.0.0.0:8080}<\/pre>\n<h3 style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Verify<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">By Default, NiFi is hosted on 8080 localhost port. Open browser and open localhost url at 8080 port<\/span> <a href=\"http:\/\/localhost:8080\/nifi\/\"><span style=\"font-weight: 400;\">http:\/\/localhost:8080\/nifi\/<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400;\">We have our NiFi Home Page open.<\/span><\/p>\n<p><b>\u00a0<a href=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-running-after-installation.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1612\" src=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-running-after-installation.png\" alt=\"NiFi running after installation\" width=\"587\" height=\"476\" srcset=\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-running-after-installation.png 587w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-running-after-installation-300x243.png 300w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-running-after-installation-180x146.png 180w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-running-after-installation-50x41.png 50w, https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2020\/06\/NiFi-running-after-installation-92x75.png 92w\" sizes=\"(max-width:767px) 480px, 587px\" \/><\/a><\/b><\/p>\n<p><b>\u00a0<\/b><\/p>\n<p>This page confirms that our NiFi is up and running.<\/p>\n<p><span style=\"color: #ff0000;\">Like what you are reading? You would like our free live webinars too. Sign up and get notified when we host webinars =&gt;<\/span><script src=\"\/\/static.leadpages.net\/leadboxes\/current\/embed.js\" async=\"\" defer=\"defer\"><\/script><button style=\"background: #afbf00; border-color: #afbf00; border-radius: 20px; color: #ffffff; display: inline-block; vertical-align: middle; padding: 16px 32px; min-width: 192px; border: 1px solid #afbf00; font-size: 1rem; font-family: Helvetica, Arial, sans-serif; text-align: center; outline: 0; line-height: 1; cursor: pointer; -webkit-transition: background 0.3s, color 0.3s, border 0.3s; transition: background 0.3s, color 0.3s, border 0.3s; box-shadow: 0px 2px 5px rgba(0, 0, 0, 0.6);\" data-leadbox-popup=\"146ccea73f72a2:103d035da346dc\" data-leadbox-domain=\"ccubecompany.lpusercontent.com\">Click here to subscribe<\/button><\/p>\n<p style=\"text-align: justify;\"><b>\u00a0<\/b><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>NiFi is an open source data flow framework. It is highly automated for flow of data between systems. It works as a data transporter between data<span class=\"excerpt-hellip\"> [\u2026]<\/span><\/p>\n","protected":false},"author":3,"featured_media":1612,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26,1],"tags":[27],"class_list":["post-1590","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-apache-nifi","category-hadoop","tag-nifi"],"_links":{"self":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/1590","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/comments?post=1590"}],"version-history":[{"count":8,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/1590\/revisions"}],"predecessor-version":[{"id":1626,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/1590\/revisions\/1626"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/media\/1612"}],"wp:attachment":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/media?parent=1590"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/categories?post=1590"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/tags?post=1590"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}