as

Settings
Sign out
Notifications
Alexa
Amazon Appstore
Ring
AWS
Documentation
Support
Contact Us
My Cases
Get Started
Design and Develop
Publish
Reference
Support

Profile Performance with Simpleperf

Simpleperf is a command-line native CPU profiling tool that helps you analyze your app's performance on Vega devices. Use Simpleperf to identify CPU bottlenecks, excessive context switches, inefficient thread scheduling, and other performance issues that might not be visible through code review alone. It provides insights into CPU usage patterns across cores, function call frequencies and execution times, hardware performance counters, process scheduling behavior, and context switching patterns.

Simpleperf works in two modes:

  • On-device — Collects real-time performance data by monitoring running apps and system activities.
  • On-host — Processes and displays data previously collected on the device.

Use Simpleperf when your app experiences unexpected frame drops, battery depletion, slow responsiveness, or when you suspect CPU usage is higher than expected.

When to use Simpleperf

Scenario Recommended tool
Real-time CPU and memory monitoring with a GUI Activity Monitor
Hardware counter data, call graphs, or profiling without Vega Studio Simpleperf
JavaScript thread performance and flamegraphs Chrome DevTools
Quick visual check of CPU spikes during development Activity Monitor
Profiling app launch or specific native code paths Simpleperf

Prerequisites

Before you begin, make sure you have:

  • Vega SDK installed
  • A Fire TV Stick device connected through VDA
  • Your app built and running on the device

Set up Simpleperf

Simpleperf isn't included in the device image by default. You need to sideload the binary from the Vega SDK onto your device.

Step 1: Verify your build variant

Confirm that your device uses a user-external or user build variant. Either build works with Simpleperf, though user-external is more common:

Copied to clipboard.

vda shell "cat /etc/os-release | grep BUILD_VARIANT"

Expected output:

BUILD_VARIANT=user-external

or:

BUILD_VARIANT=user

Step 2: Enable developer mode

Enable developer mode on the device:

Copied to clipboard.

vda shell vsm developer-mode enable

For more details on developer mode, see Enable Developer Mode.

Step 3: Connect to your device and start your app

Open a shell connection to your device:

Copied to clipboard.

vda shell

Start your app using vmsgr. For example:

Copied to clipboard.

vmsgr send "pkg://com.amazondeveloper.mytestapp.main"

Step 4: Push the Simpleperf binary to the device

Find the Simpleperf binary in the SDK at:

vega/workspace/env/KeplerCLISimpleperf-1.0/runtime/simpleperf-target/simpleperf

Push it to your app's scratch folder:

Copied to clipboard.

vda push vega/workspace/env/KeplerCLISimpleperf-1.0/runtime/simpleperf-target/simpleperf tmp/scratch/com.amazondeveloper.mytestapp

Example output:

developer@bcd07467961e ~ % vda push vega/workspace/env/KeplerCLISimpleperf-1.0/runtime/simpleperf-target/simpleperf /tmp/scratch/com.amazondeveloper.mytestapp                                                   
simpleperf: 1 file pushed, 0 skipped. 0.1 MB/s (130 bytes in 0.001s)

The binary is now available in the component shell's tmp/scratch folder.

Step 5: Access the component shell

Open a new terminal on your host machine. You need a separate terminal because your first terminal is running the vda shell session from Step 3.

Connect to the component shell:

Copied to clipboard.

vda shell -t component-id com.amazondeveloper.mytestapp.main

Navigate to the scratch directory where you pushed the binary:

Copied to clipboard.

cd /scratch

You can now run Simpleperf commands from this directory.

Find your app's process ID

Before profiling, find your app's process ID (PID):

Copied to clipboard.

ps -A

Example output:

 PID TTY TIME CMD
 2 ? 00:00:22 mytestapp
 66 pts/1 00:00:00 sh
 76 pts/1 00:00:00 ps

You can reference your app by PID, app ID, or pidof:

Copied to clipboard.

./simpleperf stat -p 2                                         # By PID
./simpleperf stat -p com.amazondeveloper.mytestapp             # By app ID
./simpleperf stat -p $(pidof com.amazondeveloper.mytestapp)    # By pidof

List available events

Use simpleperf list to see which events your device supports:

Copied to clipboard.

# List all available events
simpleperf list

# List hardware events only
simpleperf list hw

# List software events only
simpleperf list sw

Additional filtering options include:

  • cache — Hardware cache events
  • raw — Raw CPU PMU events
  • tracepoint — Tracepoint events
  • cs-etm — CoreSight ETM instruction tracing events
  • pmu — System-specific PMU events

To display supported features on your device:

Copied to clipboard.

simpleperf list --show-features

Example output for hardware events:

List of hardware events:
  cpu-cycles                    (Hardware event)
  instructions                  (Hardware event)
  cache-references              (Hardware event)
  cache-misses                  (Hardware event)
  branch-instructions           (Hardware event)
  branch-misses                 (Hardware event)

Example output for software events:

List of software events:
  cpu-clock                     (Software event)
  task-clock                    (Software event)
  page-faults                   (Software event)
  context-switches              (Software event)
  cpu-migrations                (Software event)
  minor-faults                  (Software event)

Use any listed event with the -e option when running simpleperf stat and simpleperf record commands.

Collect performance statistics with stat

Use simpleperf stat to collect aggregate performance statistics for CPU events. This mode provides a concise summary of hardware and software events without the overhead of detailed sampling data. It's useful for quick performance assessment and A/B testing of optimizations.

# Basic stat collection for a specific process
simpleperf stat -p <pid>

# Stat with specific events
simpleperf stat -e cache-misses,branch-misses -p <pid>

# Stat with custom print interval (every 1000 ms)
simpleperf stat --interval 1000 -p <pid>

# Stat for kernel space only
simpleperf stat -e cpu-cycles:k -p <pid>

# Stat with event grouping
simpleperf stat -e cpu-cycles,instructions -p <pid>

# Stat with a fixed duration (10 seconds)
simpleperf stat -p <pid> --duration 10

Example output:

Performance counter statistics:

 1,320,496,145 cpu-cycles         # 0.131736 GHz          (100%)
   510,426,028 instructions       # 2.587047 cycles per instruction (100%)
     4,692,338 branch-misses      # 468.118 K/sec         (100%)
886.008130(ms) task-clock         # 0.088390 cpus used    (100%)
           753 context-switches   # 75.121 /sec           (100%)
           870 page-faults        # 86.793 /sec           (100%)

Total test time: 10.023829 seconds.

Interpret stat results

Use these guidelines to interpret the output:

Metric What it tells you Action if high
cpu-cycles Total CPU work performed Profile with record to find hot functions
instructions / cycles per instruction CPU efficiency Low IPC (>3 cycles/instruction) suggests memory stalls or branch mispredictions
branch-misses Branch prediction failures Simplify conditional logic or restructure data for predictable access patterns
context-switches Thread preemption frequency Reduce thread count or consolidate work onto fewer threads
page-faults Memory pages loaded from disk Pre-allocate memory or reduce working set size
cache-misses CPU cache inefficiency Improve data locality, reduce object allocations, or restructure data layouts

For all available options, run simpleperf help stat.

Record detailed performance data

Use simpleperf record to collect in-depth CPU performance metrics by sampling CPU events at predefined intervals. Simpleperf stores all recorded data in a perf.data file for subsequent analysis.

Events can be scoped to monitor user space only (:u) or kernel space only (:k). For most Vega app profiling, use user space monitoring.

# Basic recording
simpleperf record -p <pid>

# Record with call graph (recommended for detailed analysis)
simpleperf record -e cpu-cycles:u --call-graph dwarf,2048 -p <pid> -- sleep 10

# Record specific events
simpleperf record -e cache-misses,branch-misses -p <pid>

# Record with increased sample frequency (1000 samples/sec)
simpleperf record -f 1000 -p <pid>

# Record a specific thread
simpleperf record -t <thread_id> -p <pid>

# Record kernel space events
simpleperf record -e cpu-cycles:k -p <pid>

# Record to a custom output file
simpleperf record -o custom_output.data -p <pid>

For all available options, run simpleperf help record.

Transfer performance data to your host machine

The component shell stores performance data files in the /scratch directory. To retrieve them, open a terminal on your host machine (outside the component shell) and run:

vda pull /tmp/scratch/com.amazondeveloper.mytestapp/perf.data

Analyze performance data with report

Use simpleperf report on your host machine to analyze the collected perf.data file. Use the host version of Simpleperf provided with the SDK at:

vega/sdk/<version>/bin/tools/simpleperf

Replace <version> with your installed SDK version (for example, 0.23.6323).

# Generate a basic report (looks for perf.data in current directory)
simpleperf report

# Report with call graph (data must be recorded with -g flag)
simpleperf report -g

# Report from a specific file
simpleperf report -i perf.data

# Custom sort order
simpleperf report --sort comm,pid,tid,dso,symbol

# Filter by process
simpleperf report --pids <pid1>,<pid2>

# Filter by thread
simpleperf report --tids <tid1>,<tid2>

# Filter by binary
simpleperf report --dsos <path_to_binary>

Example output:

Cmdline: /usr/bin/simpleperf record -e cpu-clock -p 1171 --duration 10
Arch: x86_64
Event: cpu-clock (type 1, config 0)
Samples: 1
Event count: 250000

Overhead Command         Pid  Tid  Shared Object  Symbol
100.00%  ..pdate-daem:36 1171 2937 /lib/libc.so.6 __errno_location

For all available options, run simpleperf help report.

Other commands

Simpleperf includes additional commands for specialized profiling needs:

Command Description
debug-unwind Tests and debugs offline stack unwinding functionality
dump Extracts and displays raw data from a perf.data file
inject Modifies or injects data into existing perf.data files
kmem Analyzes kernel memory allocation patterns (requires root)
merge Combines multiple perf.data files into one
monitor Monitors events in real-time and prints textual representations to stdout
report-sample Displays raw sample information from a perf.data file
trace-sched Traces system-wide process runtime events (requires root)

To see all sub-commands and their options:

Copied to clipboard.

simpleperf --help

Known limitations

Simpleperf relies on the perf_event_paranoid kernel parameter to determine its operational permissions. By default, Vega OS sets this value to 2, which restricts certain profiling capabilities for security reasons.

For devices with user-external build variants and developer mode enabled, the system automatically adjusts the value to 1, which allows more comprehensive profiling. However, unprivileged users are still restricted from:

  • Profiling kernel space events
  • Accessing hardware PMU events
  • Profiling other users' processes
  • Collecting raw tracepoint events
  • Accessing kernel call graphs
  • Collecting kernel-mode stack traces

Best practices

Data collection

  • Always specify a duration or use controlled termination.
  • Use an appropriate sampling frequency.
  • Monitor system load during profiling.

Event selection

  • Start with basic events (cpu-cycles, instructions).
  • Avoid too many simultaneous hardware events.
  • Consider hardware limitations of your target device.

Report analysis

  • Compare multiple runs for consistent results.
  • Use appropriate filters to narrow down results.
  • Always check for lost samples.

Resource management

  • Clean up old perf.data files to free device storage.
  • Control CPU overhead by limiting sampling frequency.

Example: Diagnose frame drops during scrolling

This walkthrough shows how to use Simpleperf to investigate an app that drops frames when scrolling through a content list.

1. Confirm the problem with stat:

Copied to clipboard.

./simpleperf stat -e cpu-cycles,cache-misses,context-switches -p $(pidof com.amazondeveloper.mytestapp) --duration 5

If you see high context-switches (>500/sec) or high cache-misses relative to cpu-cycles, there's likely a performance issue worth investigating.

2. Record a call graph during the problematic interaction:

Copied to clipboard.

./simpleperf record -e cpu-cycles:u --call-graph dwarf,2048 -p $(pidof com.amazondeveloper.mytestapp) -- sleep 10

Scroll through your content list during the 10-second recording window.

3. Pull and analyze the data on your host machine:

Copied to clipboard.

vda pull /tmp/scratch/com.amazondeveloper.mytestapp/perf.data
simpleperf report -g -i perf.data --sort comm,pid,tid,dso,symbol

4. Identify the bottleneck:

Look for functions with high Overhead percentages. For example:

35.2%  mytestapp  2  2  libhermes.so  hermes::vm::interpretFunction
22.1%  mytestapp  2  2  libreact.so   facebook::react::ShadowTree::commit

This tells you that 35% of CPU time is spent in Hermes JavaScript execution and 22% in React's shadow tree commits. Your scroll handler is likely doing too much JS work per frame.

5. Fix and verify:

After optimizing (for example, memoizing expensive computations in your scroll handler), re-run the same stat command and confirm the metrics improve.


Last updated: Jun 18, 2026