Siri Protocol

Applidium documented the Siri Protocol on 14 November 2011 by setting up a DNS to see the traffic. The traffic is simple HTTPS (with some modifications, mentioned later). The server presents a certificate for guzzoni.apple.com (IP 17.174.4.4) and the client checks for the correct domain certificate. But it does not check the issuer, so you can create a self-signed certificate to see the traffic.

Protocol
The request looks similar to a standard HTTP request: ACE /ace HTTP/1.0 Host: guzzoni.apple.com User-Agent: Assistant(iPhone/iPhone4,1; iPhone OS/5.0/9A334) Ace/1.0 Content-Length: 2000000000 X-Ace-Host: 4620a9aa-88f4-4ac1-a49d-e2012910921 The X-Ace-Host is tied to the 4S you are using. The content length of almost 2GB is fixed, so no actual length. The User-Agent is modified depending on your OS version and build. The data itself is binary.

Binary Data

 * Starts with 0x00AACCEE on iOS 5, or 0xAACCEE02 on iOS 6+
 * Rest is compressed with zlib

Then the data is made out of chunks:
 * Starting with 0x020000xxxx are "plist" packets with size xxxx of the binary plist data.
 * Starting with 0x030000xxxx are "ping" packets, sent by the iPhone to Siri server to keep connection alive. xx is the ping sequence number.
 * Starting with 0x040000xxxx are "pong" packets, sent from Siri server to the iPhone to keep connection alive. xx is the pong sequence number.
 * Starting with 0x070000xxxx are "speech" packets, sent by iOS 8.4 (maybe a bit earlier and probably newer versions too, speech is sent as a plist on iOS 5 and 6, and maybe 7? (not tested on 7)). xxxx is the length of the packet.

To decipher the binary plist you can use the plutil command-line tool on Mac OS X.

plist data
The audio data is compressed with Speex audio codec (iOS 5 and 6) or with Opus audio codec. (iOS 8)

(More documentation of plist data is missing here.)