Profile Performance with Simpleperf
Simpleperf is a command-line native CPU profiling tool that helps you analyze your app's performance on Vega devices. Use Simpleperf to identify CPU bottlenecks, excessive context switches, inefficient thread scheduling, and other performance issues that might not be visible through code review alone. It provides insights into CPU usage patterns across cores, function call frequencies and execution times, hardware performance counters, process scheduling behavior, and context switching patterns.
Simpleperf works in two modes:
- On-device — Collects real-time performance data by monitoring running apps and system activities.
- On-host — Processes and displays data previously collected on the device.
Use Simpleperf when your app experiences unexpected frame drops, battery depletion, slow responsiveness, or when you suspect CPU usage is higher than expected.
When to use Simpleperf
| Scenario | Recommended tool |
|---|---|
| Real-time CPU and memory monitoring with a GUI | Activity Monitor |
| Hardware counter data, call graphs, or profiling without Vega Studio | Simpleperf |
| JavaScript thread performance and flamegraphs | Chrome DevTools |
| Quick visual check of CPU spikes during development | Activity Monitor |
| Profiling app launch or specific native code paths | Simpleperf |
Prerequisites
Before you begin, make sure you have:
- Vega SDK installed
- A Fire TV Stick device connected through VDA
- Your app built and running on the device
Set up Simpleperf
Simpleperf isn't included in the device image by default. You need to sideload the binary from the Vega SDK onto your device.
Step 1: Verify your build variant
Confirm that your device uses a user-external or user build variant. Either build works with Simpleperf, though user-external is more common:
vda shell "cat /etc/os-release | grep BUILD_VARIANT"
Expected output:
BUILD_VARIANT=user-external
or:
BUILD_VARIANT=user
Step 2: Enable developer mode
Enable developer mode on the device:
vda shell vsm developer-mode enable
For more details on developer mode, see Enable Developer Mode.
Step 3: Connect to your device and start your app
Open a shell connection to your device:
vda shell
Start your app using vmsgr. For example:
vmsgr send "pkg://com.amazondeveloper.mytestapp.main"
Step 4: Push the Simpleperf binary to the device
Find the Simpleperf binary in the SDK at:
vega/workspace/env/KeplerCLISimpleperf-1.0/runtime/simpleperf-target/simpleperf
Push it to your app's scratch folder:
vda push vega/workspace/env/KeplerCLISimpleperf-1.0/runtime/simpleperf-target/simpleperf tmp/scratch/com.amazondeveloper.mytestapp
Example output:
developer@bcd07467961e ~ % vda push vega/workspace/env/KeplerCLISimpleperf-1.0/runtime/simpleperf-target/simpleperf /tmp/scratch/com.amazondeveloper.mytestapp
simpleperf: 1 file pushed, 0 skipped. 0.1 MB/s (130 bytes in 0.001s)
The binary is now available in the component shell's tmp/scratch folder.
Step 5: Access the component shell
Open a new terminal on your host machine. You need a separate terminal because your first terminal is running the vda shell session from Step 3.
Connect to the component shell:
vda shell -t component-id com.amazondeveloper.mytestapp.main
Navigate to the scratch directory where you pushed the binary:
cd /scratch
You can now run Simpleperf commands from this directory.
Find your app's process ID
Before profiling, find your app's process ID (PID):
ps -A
Example output:
PID TTY TIME CMD
2 ? 00:00:22 mytestapp
66 pts/1 00:00:00 sh
76 pts/1 00:00:00 ps
You can reference your app by PID, app ID, or pidof:
./simpleperf stat -p 2 # By PID
./simpleperf stat -p com.amazondeveloper.mytestapp # By app ID
./simpleperf stat -p $(pidof com.amazondeveloper.mytestapp) # By pidof
List available events
Use simpleperf list to see which events your device supports:
# List all available events
simpleperf list
# List hardware events only
simpleperf list hw
# List software events only
simpleperf list sw
Additional filtering options include:
cache— Hardware cache eventsraw— Raw CPU PMU eventstracepoint— Tracepoint eventscs-etm— CoreSight ETM instruction tracing eventspmu— System-specific PMU events
To display supported features on your device:
simpleperf list --show-features
Example output for hardware events:
List of hardware events:
cpu-cycles (Hardware event)
instructions (Hardware event)
cache-references (Hardware event)
cache-misses (Hardware event)
branch-instructions (Hardware event)
branch-misses (Hardware event)
Example output for software events:
List of software events:
cpu-clock (Software event)
task-clock (Software event)
page-faults (Software event)
context-switches (Software event)
cpu-migrations (Software event)
minor-faults (Software event)
Use any listed event with the -e option when running simpleperf stat and simpleperf record commands.
Collect performance statistics with stat
Use simpleperf stat to collect aggregate performance statistics for CPU events. This mode provides a concise summary of hardware and software events without the overhead of detailed sampling data. It's useful for quick performance assessment and A/B testing of optimizations.
# Basic stat collection for a specific process
simpleperf stat -p <pid>
# Stat with specific events
simpleperf stat -e cache-misses,branch-misses -p <pid>
# Stat with custom print interval (every 1000 ms)
simpleperf stat --interval 1000 -p <pid>
# Stat for kernel space only
simpleperf stat -e cpu-cycles:k -p <pid>
# Stat with event grouping
simpleperf stat -e cpu-cycles,instructions -p <pid>
# Stat with a fixed duration (10 seconds)
simpleperf stat -p <pid> --duration 10
Example output:
Performance counter statistics:
1,320,496,145 cpu-cycles # 0.131736 GHz (100%)
510,426,028 instructions # 2.587047 cycles per instruction (100%)
4,692,338 branch-misses # 468.118 K/sec (100%)
886.008130(ms) task-clock # 0.088390 cpus used (100%)
753 context-switches # 75.121 /sec (100%)
870 page-faults # 86.793 /sec (100%)
Total test time: 10.023829 seconds.
simpleperf stat -a) is only available with root access.Interpret stat results
Use these guidelines to interpret the output:
| Metric | What it tells you | Action if high |
|---|---|---|
cpu-cycles |
Total CPU work performed | Profile with record to find hot functions |
instructions / cycles per instruction |
CPU efficiency | Low IPC (>3 cycles/instruction) suggests memory stalls or branch mispredictions |
branch-misses |
Branch prediction failures | Simplify conditional logic or restructure data for predictable access patterns |
context-switches |
Thread preemption frequency | Reduce thread count or consolidate work onto fewer threads |
page-faults |
Memory pages loaded from disk | Pre-allocate memory or reduce working set size |
cache-misses |
CPU cache inefficiency | Improve data locality, reduce object allocations, or restructure data layouts |
For all available options, run simpleperf help stat.
Record detailed performance data
Use simpleperf record to collect in-depth CPU performance metrics by sampling CPU events at predefined intervals. Simpleperf stores all recorded data in a perf.data file for subsequent analysis.
Events can be scoped to monitor user space only (:u) or kernel space only (:k). For most Vega app profiling, use user space monitoring.
# Basic recording
simpleperf record -p <pid>
# Record with call graph (recommended for detailed analysis)
simpleperf record -e cpu-cycles:u --call-graph dwarf,2048 -p <pid> -- sleep 10
# Record specific events
simpleperf record -e cache-misses,branch-misses -p <pid>
# Record with increased sample frequency (1000 samples/sec)
simpleperf record -f 1000 -p <pid>
# Record a specific thread
simpleperf record -t <thread_id> -p <pid>
# Record kernel space events
simpleperf record -e cpu-cycles:k -p <pid>
# Record to a custom output file
simpleperf record -o custom_output.data -p <pid>
simpleperf record -a) is only available with root access.For all available options, run simpleperf help record.
Transfer performance data to your host machine
The component shell stores performance data files in the /scratch directory. To retrieve them, open a terminal on your host machine (outside the component shell) and run:
vda pull /tmp/scratch/com.amazondeveloper.mytestapp/perf.data
/scratch directory may be cleared during app restarts. If you want performance data to persist across app launches and reboots, copy the data file to /data before pulling it.Analyze performance data with report
Use simpleperf report on your host machine to analyze the collected perf.data file. Use the host version of Simpleperf provided with the SDK at:
vega/sdk/<version>/bin/tools/simpleperf
Replace <version> with your installed SDK version (for example, 0.23.6323).
# Generate a basic report (looks for perf.data in current directory)
simpleperf report
# Report with call graph (data must be recorded with -g flag)
simpleperf report -g
# Report from a specific file
simpleperf report -i perf.data
# Custom sort order
simpleperf report --sort comm,pid,tid,dso,symbol
# Filter by process
simpleperf report --pids <pid1>,<pid2>
# Filter by thread
simpleperf report --tids <tid1>,<tid2>
# Filter by binary
simpleperf report --dsos <path_to_binary>
Example output:
Cmdline: /usr/bin/simpleperf record -e cpu-clock -p 1171 --duration 10
Arch: x86_64
Event: cpu-clock (type 1, config 0)
Samples: 1
Event count: 250000
Overhead Command Pid Tid Shared Object Symbol
100.00% ..pdate-daem:36 1171 2937 /lib/libc.so.6 __errno_location
For all available options, run simpleperf help report.
Other commands
Simpleperf includes additional commands for specialized profiling needs:
| Command | Description |
|---|---|
debug-unwind |
Tests and debugs offline stack unwinding functionality |
dump |
Extracts and displays raw data from a perf.data file |
inject |
Modifies or injects data into existing perf.data files |
kmem |
Analyzes kernel memory allocation patterns (requires root) |
merge |
Combines multiple perf.data files into one |
monitor |
Monitors events in real-time and prints textual representations to stdout |
report-sample |
Displays raw sample information from a perf.data file |
trace-sched |
Traces system-wide process runtime events (requires root) |
To see all sub-commands and their options:
simpleperf --help
Known limitations
Simpleperf relies on the perf_event_paranoid kernel parameter to determine its operational permissions. By default, Vega OS sets this value to 2, which restricts certain profiling capabilities for security reasons.
For devices with user-external build variants and developer mode enabled, the system automatically adjusts the value to 1, which allows more comprehensive profiling. However, unprivileged users are still restricted from:
- Profiling kernel space events
- Accessing hardware PMU events
- Profiling other users' processes
- Collecting raw tracepoint events
- Accessing kernel call graphs
- Collecting kernel-mode stack traces
Best practices
Data collection
- Always specify a duration or use controlled termination.
- Use an appropriate sampling frequency.
- Monitor system load during profiling.
Event selection
- Start with basic events (
cpu-cycles,instructions). - Avoid too many simultaneous hardware events.
- Consider hardware limitations of your target device.
Report analysis
- Compare multiple runs for consistent results.
- Use appropriate filters to narrow down results.
- Always check for lost samples.
Resource management
- Clean up old
perf.datafiles to free device storage. - Control CPU overhead by limiting sampling frequency.
Example: Diagnose frame drops during scrolling
This walkthrough shows how to use Simpleperf to investigate an app that drops frames when scrolling through a content list.
1. Confirm the problem with stat:
./simpleperf stat -e cpu-cycles,cache-misses,context-switches -p $(pidof com.amazondeveloper.mytestapp) --duration 5
If you see high context-switches (>500/sec) or high cache-misses relative to cpu-cycles, there's likely a performance issue worth investigating.
2. Record a call graph during the problematic interaction:
./simpleperf record -e cpu-cycles:u --call-graph dwarf,2048 -p $(pidof com.amazondeveloper.mytestapp) -- sleep 10
Scroll through your content list during the 10-second recording window.
3. Pull and analyze the data on your host machine:
vda pull /tmp/scratch/com.amazondeveloper.mytestapp/perf.data
simpleperf report -g -i perf.data --sort comm,pid,tid,dso,symbol
4. Identify the bottleneck:
Look for functions with high Overhead percentages. For example:
35.2% mytestapp 2 2 libhermes.so hermes::vm::interpretFunction
22.1% mytestapp 2 2 libreact.so facebook::react::ShadowTree::commit
This tells you that 35% of CPU time is spent in Hermes JavaScript execution and 22% in React's shadow tree commits. Your scroll handler is likely doing too much JS work per frame.
5. Fix and verify:
After optimizing (for example, memoizing expensive computations in your scroll handler), re-run the same stat command and confirm the metrics improve.
Related topics
- Monitor CPU Usage
- Investigate JavaScript Thread Performance
- App Performance Best Practices
- Inspect Traces in Vega Apps
Last updated: Jun 18, 2026

