Rating Methodology

Open Ratings Code Quality
Assessment Framework

A credit-risk-calibrated approach to software quality certification
Version: 1.0 Effective: March 2026 Issuer: Open Ratings, a product of SGX Analytics LLC Patent: U.S. Provisional 63/983,531

Overview

Open Ratings provides independent, cryptographically anchored quality assessments of software repositories. Our methodology is calibrated to the risk proportionality principles used by established credit rating agencies, adapted for software engineering.

Open Ratings evaluates a specific revision of a code repository — not a continuous stream, but a deliberate snapshot, analogous to a credit rating agency assessing a bond issuance at a point in time. Each rating reflects the quality, security, and reliability of that revision as assessed by our multi-dimensional analytical framework.

Our rating scale mirrors the letter-grade conventions used globally in credit markets (AAA through D), and our target grade distribution is calibrated to the observed distribution of corporate credit ratings maintained by the major rating agencies. This means a BBB from Open Ratings carries the same relative quality signal as a BBB in the bond market: adequate quality with room for improvement, but fundamentally sound.

Every rating is permanently recorded via Merkle-tree hashing and SHA-256 attestation, creating an immutable, independently verifiable record of when the assessment was performed, what grade was assigned, and what computational evidence supports the finding.

Disclosures

Dialect Coverage

Grades shown for COBOL modernization outputs are benchmark-scoped and provisional with respect to dialect coverage. Current validation targets a defined benchmark suite. Validation across IBM Enterprise COBOL, GnuCOBOL, and Micro Focus dialect variants is being expanded. Grades reflect assessment of modernization output quality on the benchmark suite, not comprehensive dialect compliance.

Benchmark Scope

Grades represent an assessment of model outputs on a defined benchmark suite at a specific point in time. They are not a guarantee of future performance, a warranty against defects in production modernization, or an endorsement of the model’s general capability. Model behavior may vary across tasks, parameter configurations, and COBOL program characteristics not represented in the current suite.

Rating Scale

Grade Score Range Assessment Definition Target Dist.
Investment Grade — Production Recommended
AAA 9.70 – 10.00 Human + Auto Exceptionally strong code quality. Near-zero defect probability. Verified through comprehensive automated analysis, formal methods, and expert human due diligence. Extremely rare. < 0.05%
AA 9.30 – 9.69 Human + Auto Very strong code quality, differing from the highest-rated repositories only to a small degree. Requires enhanced validation depth and expert review. ~2%
A 7.50 – 9.29 Automated Strong code quality. Sound architecture, adequate test coverage, no critical vulnerabilities detected. Somewhat more susceptible to quality degradation under rapid development than higher-rated repositories. ~8%
BBB 6.00 – 7.49 Automated Adequate code quality. Exhibits acceptable quality parameters, but adverse conditions are more likely to weaken quality over time. ~15%
Speculative Grade — Elevated Risk
BB 4.50 – 5.99 Automated Below-average quality. Faces ongoing uncertainties including incomplete test coverage, known vulnerabilities, or architectural weaknesses. ~20%
B 3.00 – 4.49 Automated Weak code quality. Currently functional, but adverse conditions will likely impair reliability. ~25%
CCC 1.50 – 2.99 Automated Substantial quality risk. Dependent on favorable conditions to avoid failure. Critical vulnerabilities or hallucinated dependencies detected. ~18%
D 0.00 – 1.49 Automated Non-functional or critically defective. Dependencies do not install, runtime errors on basic execution, or fundamental flaws rendering the code unsuitable for any production use. ~12%
Investment Grade Threshold

Repositories rated BBB or above are considered investment grade — suitable for production use, dependency inclusion, and organizational adoption. Grades below BBB carry speculative-grade designation, indicating elevated risk. This threshold mirrors the BBB/BB boundary used globally in credit markets.

Target Distribution

Our rating distribution is calibrated to mirror the observed distribution of corporate credit ratings. This ensures that an Open Ratings grade carries proportional risk weight equivalent to its credit market counterpart. A AAA is as rare in software as it is in corporate bonds — today, only two U.S. companies maintain a AAA corporate credit rating.

Expected Rating Distribution Across Rated Repositories
AAA
< 0.05%
AA
2%
~2%
A
8%
~8%
BBB
15%
~15%
BB
20%
~20%
B
25%
~25%
CCC
18%
~18%
D
12%
~12%

These proportions are targets, not quotas. Distribution targets serve as calibration anchors — if observed distributions diverge significantly from targets, we investigate whether the divergence reflects the true state of software quality or a calibration error in our model.

Assessment Dimensions

Each repository is evaluated across six quality dimensions. Dimensions are weighted to reflect their relative contribution to overall software risk, calibrated through empirical analysis of production incident correlation.

Dimension Weight What We Assess
Security 25%
Vulnerability exposure, secret management, input validation, authentication patterns, dependency vulnerability propagation, and detection of hallucinated packages or API endpoints.
Logic Correctness 25%
Semantic accuracy of implementation relative to stated intent, edge case handling, error propagation paths, type safety, and divergence between specifications and actual behavior.
Codebase Fit 20%
Architectural coherence, consistency with project conventions, appropriate abstraction levels, and integration quality with existing modules.
Maintainability 15%
Cyclomatic and cognitive complexity, nesting depth, function length, naming clarity, documentation coverage, and long-term modifiability.
Test Quality 10%
Test effectiveness measured through mutation analysis: deliberate code changes that should break tests but don’t indicate weak test suites.
Dependency Health 5%
Dependency freshness, known vulnerability exposure (CVE), license compatibility, maintainer activity, and whether declared dependencies actually exist.

Assessment Process

Ratings are produced through a multi-phase analytical pipeline. Each phase generates independently verifiable intermediate outputs that are cryptographically combined into a proof-of-work attesting to the depth and integrity of the assessment.

  1. Input Ingestion
    The target repository is cloned at the specified commit. Dependency manifests, language composition, framework indicators, and project metadata are extracted. If the code was generated or assisted by an AI coding agent, the agent may be identified through stylometric analysis.
  2. Static Analysis
    Abstract syntax tree parsing measures structural complexity across every function and module. Pattern-matching rules detect security anti-patterns, code duplication, and known vulnerability signatures. Declared dependencies are cross-referenced against authoritative registries.
  3. Sandboxed Execution
    The codebase is built and executed within an isolated containerized environment. Dependency installation is attempted and failures recorded. Test suites are executed and results captured. Mutation analysis introduces controlled defects to measure test suite effectiveness.
  4. Quality Scoring
    Dimension scores are computed from the combined static and dynamic analysis results, weighted per the dimension table, and aggregated into a composite quality score. The composite score is mapped to a letter grade per the rating scale thresholds.
  5. Proof-of-Work Generation
    Intermediate outputs from each phase are organized into a Merkle tree. A difficulty-adjusted cryptographic proof is generated, with difficulty proportional to the achieved grade.
  6. Cryptographic Anchoring
    The rating grade, composite score, proof-of-work hash, Merkle root, and timestamp are sealed via cryptographic attestation, creating an immutable, timestamped record.

Rating Tiers

Automated Assessment (D through A)

Ratings from D through A are produced entirely through the automated pipeline. No human judgment is involved in the scoring. A grade of A represents the highest confidence level achievable through automated analysis alone.

The automated ceiling exists by design. Software quality beyond the A threshold involves factors that no automated system can reliably assess: architectural intent that spans years of technical strategy, organizational commitment to maintenance, and the subtle distinction between code that is merely correct and code that is wisely constructed.

Enhanced Assessment (AA and AAA)

Human Due Diligence Required

Grades of AA and above cannot be achieved through automated analysis alone. They require a structured human due diligence review conducted by senior analysts with extensive experience in software risk assessment.

For a repository to be considered for AA or AAA, it must first achieve a minimum automated score of A (7.50+). The automated score establishes the quantitative floor; the human review determines whether the qualitative factors support elevation above that floor.

AI Agent Attribution

Open Ratings may identify the AI coding agent that generated or assisted in generating the assessed code. This attribution is reported as part of the rating when detected with sufficient confidence. Different AI coding agents exhibit distinct failure mode profiles that affect the probability of undiscovered defects.

Agent attribution adjusts the weight given to specific assessment dimensions. If an agent is known to have elevated hallucination rates for a particular package ecosystem, the dependency health dimension is scrutinized more heavily.

Cryptographic Verification

Every Open Rating is sealed via cryptographic attestation at the time of issuance. The on-chain record is a compact attestation containing the essential elements needed to verify the rating’s existence and integrity.

// On-chain rating record structure
repo_hash: SHA-256("owner/repository")
commit_hash: first 20 bytes of rated commit SHA
grade: letter grade enum (D=0 ... AAA=7)
score: composite quality score × 100
pow_hash: proof-of-work hash
merkle_root: Merkle root of analysis phase outputs
timestamp: Unix timestamp of rating issuance
prior_rating: address of previous rating for same repo (if any)

The attestation layer serves as a filing cabinet, not an analytical tool. It provides three guarantees: the rating existed at the claimed time, the rating has not been altered after issuance, and the full history of a repository’s ratings can be traversed by following prior_rating links.

Rating Types

Open Rating

Issued for public repositories. The rated code is publicly accessible, and any party can independently verify the codebase against the assessment findings.

Verified Rating

Issued for private repositories using the identical methodology. The underlying code is not publicly accessible, but the assessment was conducted under the same standards.

Shadow Rating

An unsolicited Open Rating issued without the request or involvement of the repository owner. Produced at our discretion for repositories of public interest.

Reproducibility

Each benchmark task is versioned and deterministic. Task definitions include meta.json, prompt.md, source files, and gold.json oracle checks. Runs log model identifiers, decoding parameters, timestamps, and per-task outputs. Suite hashes and prompt hashes are recorded for auditability.

The assessment pipeline produces independently verifiable intermediate outputs at each phase. These outputs are organized into a Merkle tree whose root is anchored on-chain, enabling third-party verification that the claimed analysis was actually performed to the claimed depth.

Limitations & Disclaimers

An Open Rating is an assessment of code quality at a specific point in time for a specific revision. It is not a guarantee of future performance, a warranty against defects, or an endorsement of the project’s business viability.

Ratings from D through A reflect automated analysis only. No automated system can guarantee the absence of defects. The absence of detected issues is not proof that no issues exist.

AI agent attribution is probabilistic. Confidence levels below 80% are not published. Attribution reflects our best assessment based on stylometric analysis and is reported as an analytical finding, not a definitive determination.

SGX Analytics LLC

Enterprise reliability frameworks for AI-generated software and modernization workflows.

SGX Analytics LLC is an independent technology company focused on structured evaluation, governance, and risk calibration for AI-assisted software systems.

The firm develops reproducible benchmarking frameworks designed to measure correctness, reliability, and operational robustness in code generation and legacy modernization contexts.

SGX Analytics products emphasize:

All frameworks are designed to support enterprise and regulated environments where traceability and reproducibility are essential.

Corporate Structure

Open Ratings™ is a trademark and product of SGX Analytics LLC. All reliability grades are issued under the Open Ratings framework and are administered by SGX Analytics.