[Middleware Learning] Fluentd Basic Learning Tutorial

Article Directory

*Introduction
Install
- Recommended installation method:
Mining pits with other installation methods
Test verification
learning process
- 1. Let’s first learn a simple collection example:
2. Learn the keyword configuration related to fluentd.
Log access practice
Summarize

introduction

Fluentd is a general data collection framework, usually used as a unified log processing platform. This article mainly records the learning and use process of Fluentd, including some pitfalls, and shares experiences with everyone.

Install

Installing a fluentd environment is a basic operation. The most valuable reference information is of course the fluentd guidance document. The URL link is: https://docs.fluentd.org/installation
The author used the Euler2.9 environment and encountered some pitfalls during the installation process.

Recommended installation method:

It is recommended to use the rpm package to install td-agent – a stable distribution of fluentd.
Reason for recommendation: Simple and convenient, no need to install ruby through gem installation.

Installation Instructions:
Mainly refer to: https://docs.fluentd.org/installation/install-by-rpm

Key points:

Get the installation script:

# td-agent 4
$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent4.sh | sh

We are using Euler 2.9 operating system, and an error will be reported when running the script, so modify the content of the script so that the value of $releasever in /etc/yum.repos.d/td.repo is 8, and then rerun the script.
Start the service and it will run

sudo systemctl restart td-agent.service

Default configuration file location:
/etc/td-agent/td-agent.conf

Mining pits with other installation methods

Summarize:

The difficulty of installation depends largely on the operating system. Some operating systems may only require a few lines of commands to install successfully.
The installation method using ruby gem seems simple. It mainly requires two commands:

yum install ruby
gem install fluentd --no-doc

However, these two commands are not friendly to the euler operating system. When encountering the following error, attempts to solve it failed.

mkmf.rb can't find header files for ruby at /usr/share/include/ruby.h

When using the docker installation method, the main reason is that the error of insufficient permissions is encountered. It may be that the settings of the configuration file are incorrect, and attempts to solve it have failed.

docker pull fluent/fluentd:v1.7-1

 docker run -p 8888:8888 --rm -v  $(pwd)/etc:/fluentd/etc -v$ (pwd)/log:/fluentd/log fluent/fluentd:v1.7-1 -c /fluentd/etc/fluentd_basic_setup.conf -v

Test verification

td-agent is configured to listen on port 8888 by default, which can be used for testing.

curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test
tail -n 1 /var/log/td-agent/td-agent.log

Effect:

2022-08-02 20:30:50.129095885 +0800 debug.test: {"json":"message"}

learning process

Reference documentation:
http://t.zoukankan.com/wzs5800-p-13528430.html

Demand drives learning. Describe the needs you encounter in practice:

Purpose: Automatically connect logs to the log platform.

First, let’s get familiar with the use of this middleware through a few simple examples.

1. Let’s first learn a simple collection example:

<source>
  @type tail
  path /home/fluentd/test.log
  pos_file /var/log/td-agent/test.pos
  tag fluent.test
  <parse>
    @type none
  </parse>
</source>
<match **>
    @type stdout
</match>

Track the log file /home/fluentd/test.log through tail and output it to the console terminal.
The input plug-in in_tail is specified in and the output plug-in out_stdout is specified in
In order to identify the log format, the in_tail plug-in needs to set up a Parser plug-in. By setting parser.type to none, it tells the td-agent that the log is a single column of text.
pos_file will record the number of log lines and is a required item.

2. Learn the keyword configuration related to fluentd.

source: source of configuration data

# Receive events from 24224/tcp
# This is used by log forwarding and the fluent-cat command
<source>
  @type forward
  port 24224
</source>

# http://<ip>:9880/myapp.access?json={"event":"data"}
<source>
  @type http
  port 9880
</source>

Data configuration that can be added

tag: myapp.access #Specify the direction of data
time: (current time) # time
record: {“event”:”data”} # Record, json format
match: specifies the direction of output

match: You can set the output of the log

The following example can obtain logs by sending an http request

# http://<ip>:9880/myapp.access?json={"event":"data"}
<source>
  @type http
  port 9880
</source>

# Match events tagged with "myapp.access" and
# store them to /var/log/fluent/access.%Y-%m-%d
# Of course, you can control how you partition your data
# with the time_slice_format option.
<match myapp.access>
  @type file
  path /var/log/fluent/access
</match>

The following example is to transfer nginx logs to kafka

<source>
  @type tcp
  port 1517
  bind 0.0.0.0
  tag nginx
  <parse>
    @type syslog
  </parse>
</source>

<match nginx>
  @type kafka2
  brokers 1.2.3.4:9092
  use_event_time false
  <format>
    @type json
  </format>
  topic_key nginx_log
  default_topic nginx_log
</match>

I found that there are a few tags whose functions I don’t particularly understand, so I’ll add them here.

@type    :  Represents the input plug-in
@label   :  My personal understanding is to simplifytagrouting。 exist<source>The input andlabel, andlabelAnd associated with the correspondingfilterandmatch, This simplifies the logic。
<parse></parse> ： can be used for<source> <match> <filter>middle， It is a parsing plug-in， Can be parsedcsv, nginx, jsondata in the same format。  
<format></format> ： used for<match><filter>middle， The output iscsv，jsonetc format，The function is to format the output。

Log access practice

How does fluntd transfer logs to kafka
Prerequisite, you need to have a kafka environment (you need to install one yourself and run it)

If kafa is used as a data producer, messages can be propagated in the following ways
Send a message:
bin/kafka-console-producer.sh –broker-list localhost:9092 –topic test

Here, fluentd can be regarded as a producer, so how to configure fluentd?

Demand scenario:
Some logs will be recorded in the environment, and the log information is appended to a specific log file test.log. Next, I want to configure the connection between fluentd and kafka. The configuration file is as follows:

<source>
  @type tail
  path /opt/test/test.log
  pos_file /var/log/td-agent/test.pos
  tag nuclei
  <parse>
    @type regexp
    expression /^\[(?<logtime>[^\]]*)\] \[(?<vul_name>[^\]]*)\] \[(?<protocal>[^\]]*)\] \[(?<level>[^\]]*)\] (?<url>[^ ]*)(?<detail>.*)$/
  </parse>
</source>

<match nuclei>
  @type kafka2
  brokers  ip:port
  use_event_time false
  <format>
    @type json
  </format>
  topic_key nuclei
  default_topic nuclei
</match>

Debugging summary:

Match is a one-to-one relationship, which means that one source can only correspond to one match. If there are more than one, it will not take effect?
The tail mode of Kafka receiving messages is not real-time, and there may be a slight delay.
The permissions of the log path are also important and need to be ensured that fluentd can read it.
Not all messages will be logged. It mainly depends on where the data flows.

For matching in various formats, it is recommended to use regular methods.

Regular debugging can be performed at the following URL:
https://rubular.com/r/xfQHocREGj

Summarize

In practice, we focus on the tail method to collect logs. Fluentd also supports many other input methods.