eBPF for ROS2 #1: 開始測量效能

Weekly Meeting 2019-09-10

在我的上一篇文中實作了最簡單的 publisher 和 subscriber,而當中的 callback 是使用 C++11 lambda。

現在就開始作一個簡單的效能分析,目標是量測從一個 node 發布一個字串到另一個 node 上之間的延遲 (latency),其中會需要找到兩個 callback 的 symbol,因為 C++11 lambda 會是一個 unnamed object,每一個 lambda 都有自己的類型,要找到對應的 lambda 函式執行的位址 (其實是 unnamed object 定義的 operator()) 會相當困難,便改成以 member function 實作。

Notes

Rewrite C++11 lambda callback function into class method.

subscriber:

listener.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <cstdio>
#include <memory>
#include <string>
#include <vector>

#include "rclcpp/rclcpp.hpp"

#include "std_msgs/msg/string.hpp"

// Create a Listener class that subclasses the generic rclcpp::Node base class.
// The main function below will instantiate the class as a ROS node.
class Listener : public rclcpp::Node
{
public:
Listener() : Node("listener") {
sub_ = this->create_subscription<std_msgs::msg::String>(
"chatter", std::bind(&Listener::callback, this, std::placeholders::_1));
}

private:
void callback(const std_msgs::msg::String::SharedPtr msg) {
RCLCPP_INFO(this->get_logger(), "I heard: [%s]", msg->data.c_str());
}

rclcpp::Subscription<std_msgs::msg::String>::SharedPtr sub_;
};

int main(int argc, char *argv[])
{
rclcpp::init(argc, argv);
auto node = std::make_shared<Listener>();
auto node2 = rclcpp::Node::make_shared("talker");
rclcpp::spin(node);
rclcpp::shutdown();
return 0;
}

rclcpp::Node::create_subscription requires callback functions to be type of function object.
function object (or functor) is a class that defines operator()
callback() is non-static member function, so it’s required to pass this pointer.

std::bind(&Listener::callback, this, std::placeholders::_1)

publisher:

在 19 行可依照需求調整發送頻率,我在測量延遲時是用 50ms。

talker.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <chrono>
#include <cstdio>
#include <memory>
#include <string>
#include <utility>
#include <vector>

#include "rclcpp/rclcpp.hpp"
#include "std_msgs/msg/string.hpp"

using namespace std::chrono_literals;

class Talker : public rclcpp::Node {
public:
Talker()
: Node("talker"), count_(0) {
pub_ = this->create_publisher<std_msgs::msg::String>("chatter");
timerPtr_ = this->create_wall_timer(
500ms, std::bind(&Talker::callback, this));
}

private:
void callback(void) {
char str[7];
this->timerPtr_->reset();
std::snprintf(str, 7, "%06d", this->count_++);
auto message = std_msgs::msg::String();
message.data = "Hello, world! " + std::string(str);
RCLCPP_INFO(this->get_logger(), "Publishing: '%s'", message.data.c_str());
this->pub_->publish(message);
}

int count_;
rclcpp::TimerBase::SharedPtr timerPtr_;
rclcpp::Publisher<std_msgs::msg::String>::SharedPtr pub_;
};

int main(int argc, char *argv[])
{
rclcpp::init(argc, argv);
auto node = std::make_shared<Talker>();
rclcpp::spin(node);
rclcpp::shutdown();
}

Measure latency on the same machine

Environment

Distro: Dashing Diademata
OS: 18.04.1 Ubuntu
CPU: Intel® Core™ i5-2520M CPU @ 2.50GHz (2C4T)

At ROS2 workspace. The executables are placed under install directory.

$ ls
build install log src

path of talker (publisher) and listener (subscriber):

ros_course_demo is the package name.

Executable path
talker install/ros_course_demo/lib/ros_course_demo/talker
listener install/ros_course_demo/lib/ros_course_demo/listener

Find out the symbol for callback()

目標是找到 Talker::callback() 在 ELF 檔中對應的 symbol

List all exported symbols in the executable.

$ objdump install/ros_course_demo/lib/ros_course_demo/talker -t
install/ros_course_demo/lib/ros_course_demo/talker: file format elf64-x86-64

SYMBOL TABLE:
...
0000000000015d40 w F .text 000000000000004d _ZN6TalkerD1Ev
0000000000011806 w F .text 000000000000015b _ZNSt23_Sp_counted_ptr_inplaceIN6rclcpp9PublisherIN8
std_msgs3msg7String_ISaIvEEES5_EESaIS7_ELN9__gnu_cxx12_Lock_policyE2EEC2IJRPNS0_15node_interfaces17NodeBaseInterfa
ceERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEER23rcl_publisher_options_tRKNS0_23PublisherEventCallbacks
ERSt10shared_ptrISaIS6_EEEEES8_DpOT_
...

C++ uses name mangling to handle classes, templates, namespaces, etc. Use c++filt to demangle symbols:

$ objdump install/ros_course_demo/lib/ros_course_demo/talker -t \
| c++filt | grep callback
...
0000000000008944 w F .text 000000000000031c Talker::callback()
...

Retrieve the symbol in ELF file for Talker::callback() using address 0000000000008944.

$ objdump install/ros_course_demo/lib/ros_course_demo/talker -t \
| grep 0000000000008944
0000000000008944 w F .text 000000000000031c _ZN6Talker8callbackEv

其實要找到 demangled symbol 只需要使用 $ nm -C 就可以了,感覺根本不需要這麼複雜。

Register probe function

使用剛剛找到的 symbol 並將其帶入 attach_uprobe() 便能完成註冊,一旦 Talker::callback() 被呼叫,註冊的 talker_probe 就會被執行。 更多 uprobe (在 userspace 追蹤程式) 的相關的資訊請看底下 References

b.attach_uprobe(name="./ros2_course/install/ros_course_demo/lib/ros_course_demo/talker",
sym="_ZN6Talker8callbackEv", fn_name="talker_probe")

Measure latency

以下為測量方法的時序圖。
首先使用 uprobe 註冊 talker 和 listener 探針函式,分別會在 callback 被呼叫時執行,並測量兩個 callback 之間的時間差。

使用 eBPF 測量兩個 callback 之間延遲的時序圖

寫一個 eBPF 的程式追蹤兩個 callback 的呼叫時間:

#!/usr/bin/env python
from __future__ import print_function
from bcc import BPF
from time import sleep

# load BPF program
b = BPF(text="""
#include <uapi/linux/ptrace.h>

BPF_HISTOGRAM(dist);
BPF_ARRAY(start_time, uint64_t, 1);

int talker_probe(struct pt_regs *ctx) {
uint64_t curr = bpf_ktime_get_ns();
uint32_t key = 0;

start_time.update(&key, &curr);
// bpf_trace_printk("talker_probe curr=%lu\\n", curr);
return 0;
};

int listener_probe(struct pt_regs *ctx) {
uint64_t curr = bpf_ktime_get_ns();
uint64_t *prev, lat;
uint32_t key = 0;

prev = start_time.lookup(&key);
if (prev) {
lat = (curr - *prev) / 1000;
dist.increment(bpf_log2l(lat));
bpf_trace_printk("listener_probe lat=%luusecs\\n", lat);
}
return 0;
}
""")

b.attach_uprobe(name="./ros2_course/install/ros_course_demo/lib/ros_course_demo/talker",
sym="_ZN6Talker8callbackEv", fn_name="talker_probe")
b.attach_uprobe(name="./ros2_course/install/ros_course_demo/lib/ros_course_demo/listener",
sym="_ZN8Listener8callbackESt10shared_ptrIN8std_msgs3msg7String_ISaIvEEEE",
fn_name="listener_probe")

while 1:
try:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except ValueError:
continue
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
# print("%-16s %-6d %s" % (task, pid, msg))
except KeyboardInterrupt:
break

print("")
b["dist"].print_log2_hist("usec")

以下為統計結果,每次均 publish 一個長度為 21 的字串。

統計結果,共約 10000 個樣本點

Future work

STM32MP157A-DK1

STM32MP157 dual Cortex®-A7 32 bits + Cortex®-M4 32 bits MPU

openSTLinux kernel 4.19

Build a PREEMPT_RT kernel for stm32mp1
Cross compile ROS2 for ARMv7

References

BPF In Depth: Communicating with Userspace
https://blogs.oracle.com/linux/notes-on-bpf-3
kprobes
https://www.kernel.org/doc/Documentation/kprobes.txt
uprobe-tracer
https://www.kernel.org/doc/Documentation/trace/uprobetracer.txt
Meet cute-between-ebpf-and-tracing
https://www.slideshare.net/vh21/meet-cutebetweenebpfandtracing
iovisor/bcc
https://github.com/iovisor/bcc

Draft: https://hackmd.io/@1IzBzEXXRsmj6-nLXZ9opw/rJC0fc_2V