参考文档:
优点:
- 体积小,方便运行,当前版本只有15M大小,单个文件直接运行,不需要jdk
- 资源占用少,同样的解析任务(直接将kafka的SSO的日志不加处理的写入ES),用logstash执行需要内存1.06GB,cpu占用约260%,用gohangout内存400MB左右,cpu占用约130%
- yaml配置文件,结构清晰容易检查
缺点:
- 文档资料较少
当前生产配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| inputs: - Kafka: topic: test-1: 1 test-2: 1 codec: json consumer_settings: bootstrap.servers: "localhost:21005,localhost:21006" group.id: test-group - Kafka: topic: test-3: 1 codec: json consumer_settings: bootstrap.servers: "" group.id: test-group-2 outputs: - Elasticsearch: hosts: - 'http://username:password@localhost:9200' index: 'test-index-%{+2006-01-02}' index_type: "log" bulk_actions: 10000 bulk_size: 20 flush_interval: 10
|
可以看到上面配置每个topic都只配置了单线程,且写ES也没有并发。
启动参数 –worker为1时已经可以完全替代Logstash了
本地测试
根据官网资料尝试使用条件表达式和正则解析日志的配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
| inputs: - Stdin: codec: plain
filters: - Add: overwrite: true fields: addName: ice - Grok: src: message match: - '^(?P<name>\w+) (?P<status>\d+)$' - '^(?P<logtime>\S+) (?P<name>\w+) (?P<status>\d+) (?P<url>\S+) (?P<domain>\S+)$' - '^(?P<logtime>\S+) (?P<status>\d+) (?P<loglevel>\w+)$' remove_fields: ['message'] - Date: src: logtime formats: - 'RFC3339' - '2006-01-02T15:04:05' - '2006-01-02T15:04:05Z07:00' - '2006-01-02T15:04:05Z0700' - '2006-01-02' - 'UNIX' - 'UNIX_MS' remove_fields: ["logtime"] - Drop: if: - 'EQ(name,"test")' - 'Before(-24h) || After(24h)' - Filters: if: - '{{if eq .name "childe"}}y{{end}}' filters: - Add: fields: a: 'xyZ' - Lowercase: fields: ['url', 'domain'] outputs: - Stdout: if: - 'EQ(name,"childe")' - Stdout: if: - 'EQ(name,"test2")' - Stdout: if: - 'EQ(name,"test3")'
|
测试数据如下:
2020-01-10T17:53:00 childea 404 ABCDEFG-URL abcdEFG-Domain
2020-01-10T17:53:00 childe 404 ABCDEFG-URL abcdEFG-Domain
2020-01-10T17:53:00 test 404 ABCDEFG-URL abcdEFG-Domain
2020-01-10T17:53:00 test2 404 ABCDEFG-URL abcdEFG-Domain
2020-01-10T17:53:00 test3 405 ABCDEFG-URL abcdEFG-Domain