Switch PRO Controller

I bought a Switch PRO Controller!! It’s really cool!

Two files are provided:

  • Packet capture of some form of USB HID device
  • Screen recording of someone typing out a flag with a on-screen keyboard

The screen recording unfortunately, has the flag visually obscured. The objective here is thus to recover the button presses from the packet capture and identify the letters entered.

This thread suggests that the Switch Pro Controller is just transmitting Bluetooth HID data over USB. With that in mind, a look at reverse engineered documentation matches the data present in the packet capture.

We can thus extract the button press and release events from the packet capture:

#!/usr/bin/python3
import csv
from datetime import timedelta

# https://github.com/dekuNukem/Nintendo_Switch_Reverse_Engineering/blob/master/bluetooth_hid_notes.md#standard-input-report---buttons
btn_map_raw = """3 (Right)	Y	X	B	A	SR	SL	R	ZR
4 (Shared)	Minus	Plus	R Stick	L Stick	Home	Capture	--	Charging Grip
5 (Left)	Down	Up	Right	Left	SR	SL	L	ZL"""

btn_map = []
for line in btn_map_raw.splitlines():
    data = line.split("\t")

    label = data[0][3:-1]

    btn_map.append([f'{label}: {key}' for key in data[1:]])

def to_s(timestring):
    td = timedelta(seconds=float(timestring))

    return f'0{str(td)[:-3]}'

# tshark -r capture.pcapng -T fields -e frame.time_relative -e usb.capdata > time_capdata.txt
with open("time_capdata.txt", newline="") as f:
    r = csv.reader(f, delimiter='\t')

    with open("btn_press.srt", "w") as outfile:
        p_btn = bytearray(3)
        press_time = 0
        caption_count = 0
        for row in r:
            time, capdata = row
            # only interested in packets with usb.capdata
            if len(capdata) == 0:
                continue

            raw_capdata = bytearray.fromhex(capdata)
            # only interested in packets starting with 0x30
            if raw_capdata[0] != 0x30:
                continue
            
            btn = raw_capdata[3:6]
            if btn != p_btn:
                change = bytes((now ^ prev) for now, prev in zip(btn, p_btn))
                for i, b in enumerate(change):
                    for u in range(8):
                        # true means bit has changed
                        if b & (1<<u):
                            key = btn_map[i][u]
                            if btn[i] & (1<<u):
                                change = "Pressed"
                                press_time = time
                            else:
                                change = "Release"
                                caption_count += 1
                                outfile.write(f'{caption_count}\n{to_s(press_time)} --> {to_s(time)}\n{key}\n\n')
                                print(time, key, change)

                p_btn = btn

The key press/release events are written out as a set of subtitle information, which can then be combined with the screen recording to identify when keys are selected on the on-screen keyboard.

The subtitle file has to be delayed by approximately 6 seconds because the packet capture starts 6s into the screen recording.

The flag RCTF{5witch_1s_4m4z1ng_m8dw65} is then recovered (after replacing - with _).