Testing of TLS Implementations

Fix #

The Go team has resolved the decoding issue; for details, see: golang/go#71862.
Oracle has addressed the decoding issue, credited us in their Critical Patch Update (CPU), see https://www.oracle.com/security-alerts/cpuoct2025.html, and assigned CVE-2025-53057 for this vulnerability.

CertificateGenerator and TestCertificates #

We focus on the DN and GeneralName’s ASN1String (PrintableString Ia5String UTF8String BMPString) in Subject, Issuer, SAN, IAN, AIA, SIA, cRLDistributionPoints.

For DN, we only focus on these common OIDs: 2.5.4.3, 2.5.4.15, 0.9.2342.19200300.100.1.25, 1.2.840.113549.1.9.1, 2.5.4.7, 2.5.4.11, 2.5.4.10, 2.5.4.5, 2.5.4.8. We want to see how different OID+ASN1String combinations handle characters, so we combine different OIDs and ASN1Strings, and fill them with different characters including all characters from U+0000-U+00FF, plus sampled characters from the remaining Unicode range (randomly selecting one from each block in https://unicode.org/Public/15.0.0/ucd/Blocks.txt). Each certificate will only test one field.

For GeneralName, we only focus on dNSName, RFC822Name and URI - these relatively common types. The method for generating test certificates is similar to above.

In the CertificateGenerator, ASN.1 string types are encoded as follows: BMPString uses UTF-16-BE, while IA5String, PrintableString, and UTF8String use UTF-8. This is useful for decoding inference.

The complete test certificates and their corresponding parsing results need to be generated by CertificateGenerator/GenerateCerts.py and multiple parsers.

File Description #

Block_unicode15.0.json is our processed file based on unicode15.0 Block.txt.
Generator.py is a script for batch certificate generation convenience.
GenerateCerts.py is the script for generating test certificates.
SelectedChars.txt is the Unicode charset selected for our experiments.
TestGeneration.py is an example of using Generator.py.

When using CertificateGenerator/GenerateCerts.py to generate certificates, please use CryptoGraphy@V42.0.7. When prompted about illegal character embedding, please follow the prompt to comment out the corresponding check in CryptoGraphy.

Test certificate description #

sha1: The SHA-1 fingerprint of the certificate
pem: The certificate in PEM format
FocusField: The field being tested in the current certificate
FocusFieldValue: The embedded value in the field
InsertValue: Indicates the Unicode character being tested
description: Descriptive information

Note:We encountered an issue where the C JSON library (cJSON) fails to properly process strings containing embedded U+0000 characters, resulting in truncation of our inserted field values including U+0000. This directly impacts the normal functioning of our decoding detectors. To address this,we have excluded all test cases involving U+0000 embedding from automated processing and manually validate their results.

CertificateParsers #

Please note that since some libraries such as Golang Crypto have been modified accordingly, you can not obtain the same result.

ParsedCertifciates #

Some fields from the input file are included in the output file. The Status field indicates whether the certificate was parsed successfully overall. The xxStatus field, present in some results, indicates whether a specific field was parsed successfully. Other JSON fields in the output correspond to the parsed values of the respective certificate fields. To understand the mapping between the called functions and JSON fields, refer to the parsers in CertificateParsers.

EncodingDetection #

EncodingDetection/src/main/java/TlsImplementationTest/Unicert/EncodingDetection.java extracts the DER byte sequences of subfields in Subject, Issuer, SAN, IAN, AIA, SIA, CRLDistributionPoints from certificates, identifies the Type-Length-Value (TLV) structure within the byte sequences, and verifies whether the Value conforms to the ASN.1 String type declared in Type. This program can detect certificates with ASN.1 string encoding errors.

EncodingDetection/src/main/java/TlsImplementationTest/Unicert/CertificateBuilder.java further analyzes certificates with ASN.1 string encoding errors by performing certificate chain validation, including:

Whether the last certificate in the chain is self-signed
Whether the certificate chain signatures are valid
The trustworthiness of the root certificate (measured by acceptance in mainstream root stores)

The EncodingDetection/src/main/resources/ROOT directory contains all mainstream root stores we selected for evaluation.

The EncodingDetection/src/main/resources/CertsWithEncodingErrors.json contains all certificates with ASN1String encoding issues.

The EncodingDetection/src/main/resources/CertsWithEncodingErrors_check.json is the result of performing certificate chain analysis on CertsWithEncodingErrors.json.

The EncodingDetection/src/main/resources/CertsWithEncodingErrors_check_trusted.json includes all certificates with encoding issues that are signed by trusted CAs.

The EncodingDetection/src/main/resources/CertsWithEncodingErrors_check_trusted_SubjectSAN.json contains certificates from EncodingDetection/src/main/resources/CertsWithEncodingErrors_check_trusted.json that have encoding issues in their Subject or SAN fields.

CertificateDecodingChecker #

We observed that different TLS implementations have decoding errors and special character handling errors when parsing ASN1String. Subsequently, we developed corresponding decoding detectors for each parser.

The purpose of the decoding detectors are to infer the ASN1String decoding method and character handling modes used by TLS libraries.

The principle of the decoding detectors is described in our paper.

Output format of decoding detectors #

For each field, the output of the decoding detector should be as follows:

parsing_failed: The set of characters that cause parsing failure in the field or the entire certificate
possible_decodings: The set of potential decoding methods for successfully parsed characters
deduced_decoding: The unique decoding method derived from possible_decodings
replacement: The set of characters that trigger character replacement
escaping: The set of characters that trigger character escaping
truncation: The set of characters that trigger character truncation
possible_decodings_ex: The set of potential decoding methods for characters that undergo special handling
deduced_decoding_ex: The unique decoding method derived from possible_decodings_ex
deduced_decoding_include_ex: The unique decoding method derived from both deduced_decoding_ex and deduced_decoding
possible_replacement: The set of characters that may trigger character replacement
possible_escaping: The set of characters that may trigger character escaping
possible_truncation: The set of characters that may trigger character truncation
is_unidecoding: If True, it indicates that (apart from unparsable characters) all other characters in the field follow a unified decoding method, which is deduced_decoding_include_ex

We will mark deduced_decoding and deduced_decoding_ex as "pass", indicating that we do not process this field. When a field’s is_unidecoding is True and parsing_failed is empty, the field can be fully automated except for parsing DN in OpenSSL and GN in java.security.cert(modified decoding interfere with the proper identification of character escaping). Otherwise, manual inspection is required.

Usage of the decoding detectors’ output #

In the cases of R1 and R2, the results from decoding detectors are fully reliable, whereas in UR1, UR2, and UR3 scenarios, manual inspection is required. Most fields’ corresponding results do not necessitate additional manual checks.

R1: The output of decoding detectors is fully reliable.

"Forge-Subject-2.5.4.3-UTF8String": {
                "parsing_failed": [],
                "possible_replacement": [],
                "replacement": [],
                "possible_escaping": [],
                "escaping": [],
                "possible_truncation": [],
                "truncation": [],
                "possible_decodings": [
                    "iso-8859-1",
                    "ascii"
                ],
                "possible_decodings_ex": [],
                "is_unidecoding": true,
                "deduced_decoding": "iso-8859-1",
                "deduced_decoding_ex": "unknown",
                "deduced_decoding_include_ex": "iso-8859-1"
            }

Forge strictly adheres to ISO-8859-1 for decoding PrintableString in Subject 2.5.4.3 without any special character handling mode.

R2: The output of decoding detectors is reliable, although not all embedded characters can be successfully decoded or match special character handling modes.

"PyOpenSSL-Subject-2.5.4.3-BMPString": {
                "parsing_failed": [
                    "10069-108C79"
                ],
                "possible_replacement": [],
                "replacement": [],
                "possible_escaping": [],
                "escaping": [],
                "possible_truncation": [],
                "truncation": [],
                "possible_decodings": [
                    "ucs-2"
                ],
                "possible_decodings_ex": [],
                "is_unidecoding": true,
                "deduced_decoding": "ucs-2",
                "deduced_decoding_ex": "unknown",
                "deduced_decoding_include_ex": "ucs-2"
            }

Although some characters lead to parse failure, these were inherently invalid for UCS-2. Therefore, PyOpenSSL strictly adheres to UCS-2 when decoding BMPString in Subject 2.5.4.3, applying no special character handling modes. When decoding detectors produce reliable output, replacement, escaping, and truncation can be employed for character checks.

UR1: The presence of modified decoding compromises the reliability of decoding detectors’ output.

"OpenSSL-SubjectOneline-2.5.4.3-PrintableString": {
                "parsing_failed": [],
                "possible_replacement": [],
                "replacement": [],
                "possible_escaping": [],
                "escaping": [
                    "0001-001F",
                    "007F-108C79"
                ],
                "possible_truncation": [],
                "truncation": [],
                "possible_decodings": [
                    "ascii"
                ],
                "possible_decodings_ex": [
                    "ascii",
                    "utf-8"
                ],
                "is_unidecoding": true,
                "deduced_decoding": "ascii",
                "deduced_decoding_ex": "utf-8",
                "deduced_decoding_include_ex": "utf-8"
            }

When ‘SubjeU+0209ct’ undergoes UTF-8 encoding and subsequent decoding, it yields the same string ‘SubjeU+0209ct’. This was mistakenly interpreted as character escaping in relation to the parsed value ‘subje\xC8\x89ct’. However, OpenSSL actually just escapes unrecognized bytes as hexadecimal strings. Manual verification is required in such cases.

UR2: The output of decoding detectors becomes unreliable due to the unrecognized character handling modes.

"OpenSSL-SubjectOneline-2.5.4.3-BMPString": {
                "parsing_failed": [],
                "possible_replacement": [],
                "replacement": [],
                "possible_escaping": [
                    "0001-108C79"
                ],
                "escaping": [],
                "possible_truncation": [],
                "truncation": [],
                "possible_decodings": [],
                "possible_decodings_ex": [],
                "is_unidecoding": false,
                "deduced_decoding": "unknown",
                "deduced_decoding_ex": "unknown",
                "deduced_decoding_include_ex": "unknown"
            }

Through manual inspection, you can approximate a field’s decoding method—for example, when ‘sU+0664ubject’ undergoes UTF-16-BE encoding followed by decoding (with special character handling) to yield ‘\x00s\x06d\x00u\x00b\x00j\x00e\x00c\x00t’, this suggests an ASCII or ASCII-compatible decoder where U+0000 is escaped as’\x00’. By analyzing such outputs for unicerts embedded with boundary characters (U+0080–U+00FF, U+FFFF–U+10FFFF), the exact decoding method can be determined.

UR3: The decoding method for BMPString in java.security.cert remains unclear, with the only known detail being its full compatibility with ASCII. This represents the sole exception encountered during our decoding inference.

Others #

Our decoding detectors need to determine the relationship between the embedded string in a field and its parsed value. This requires knowledge of how different TLS libraries parse specific field formats (e.g., prefixes like “CN=”, suffixes, etc.). Through extensive test certificate construction, we have precisely identified these parsing patterns, and these findings are directly implemented in the code. All results are shown in CertificateDecodingChecker/result.