Overview
Open Ratings provides independent, cryptographically anchored quality assessments of software repositories. Our methodology is calibrated to the risk proportionality principles used by established credit rating agencies, adapted for software engineering.
Open Ratings evaluates a specific revision of a code repository — not a continuous stream, but a deliberate snapshot, analogous to a credit rating agency assessing a bond issuance at a point in time. Each rating reflects the quality, security, and reliability of that revision as assessed by our multi-dimensional analytical framework.
Our rating scale mirrors the letter-grade conventions used globally in credit markets (AAA through D), and our target grade distribution is calibrated to the observed distribution of corporate credit ratings maintained by the major rating agencies. This means a BBB from Open Ratings carries the same relative quality signal as a BBB in the bond market: adequate quality with room for improvement, but fundamentally sound.
Every rating is permanently recorded via Merkle-tree hashing and SHA-256 attestation, creating an immutable, independently verifiable record of when the assessment was performed, what grade was assigned, and what computational evidence supports the finding.
Disclosures
Grades shown for COBOL modernization outputs are benchmark-scoped and provisional with respect to dialect coverage. Current validation targets a defined benchmark suite. Validation across IBM Enterprise COBOL, GnuCOBOL, and Micro Focus dialect variants is being expanded. Grades reflect assessment of modernization output quality on the benchmark suite, not comprehensive dialect compliance.
Grades represent an assessment of model outputs on a defined benchmark suite at a specific point in time. They are not a guarantee of future performance, a warranty against defects in production modernization, or an endorsement of the model’s general capability. Model behavior may vary across tasks, parameter configurations, and COBOL program characteristics not represented in the current suite.
Rating Scale
| Grade | Score Range | Assessment | Definition | Target Dist. |
|---|---|---|---|---|
| Investment Grade — Production Recommended | ||||
| AAA | 9.70 – 10.00 | Human + Auto | Exceptionally strong code quality. Near-zero defect probability. Verified through comprehensive automated analysis, formal methods, and expert human due diligence. Extremely rare. | < 0.05% |
| AA | 9.30 – 9.69 | Human + Auto | Very strong code quality, differing from the highest-rated repositories only to a small degree. Requires enhanced validation depth and expert review. | ~2% |
| A | 7.50 – 9.29 | Automated | Strong code quality. Sound architecture, adequate test coverage, no critical vulnerabilities detected. Somewhat more susceptible to quality degradation under rapid development than higher-rated repositories. | ~8% |
| BBB | 6.00 – 7.49 | Automated | Adequate code quality. Exhibits acceptable quality parameters, but adverse conditions are more likely to weaken quality over time. | ~15% |
| Speculative Grade — Elevated Risk | ||||
| BB | 4.50 – 5.99 | Automated | Below-average quality. Faces ongoing uncertainties including incomplete test coverage, known vulnerabilities, or architectural weaknesses. | ~20% |
| B | 3.00 – 4.49 | Automated | Weak code quality. Currently functional, but adverse conditions will likely impair reliability. | ~25% |
| CCC | 1.50 – 2.99 | Automated | Substantial quality risk. Dependent on favorable conditions to avoid failure. Critical vulnerabilities or hallucinated dependencies detected. | ~18% |
| D | 0.00 – 1.49 | Automated | Non-functional or critically defective. Dependencies do not install, runtime errors on basic execution, or fundamental flaws rendering the code unsuitable for any production use. | ~12% |
Repositories rated BBB or above are considered investment grade — suitable for production use, dependency inclusion, and organizational adoption. Grades below BBB carry speculative-grade designation, indicating elevated risk. This threshold mirrors the BBB/BB boundary used globally in credit markets.
Target Distribution
Our rating distribution is calibrated to mirror the observed distribution of corporate credit ratings. This ensures that an Open Ratings grade carries proportional risk weight equivalent to its credit market counterpart. A AAA is as rare in software as it is in corporate bonds — today, only two U.S. companies maintain a AAA corporate credit rating.
These proportions are targets, not quotas. Distribution targets serve as calibration anchors — if observed distributions diverge significantly from targets, we investigate whether the divergence reflects the true state of software quality or a calibration error in our model.
Assessment Dimensions
Each repository is evaluated across six quality dimensions. Dimensions are weighted to reflect their relative contribution to overall software risk, calibrated through empirical analysis of production incident correlation.
| Dimension | Weight | What We Assess |
|---|---|---|
| Security | 25% | Vulnerability exposure, secret management, input validation, authentication patterns, dependency vulnerability propagation, and detection of hallucinated packages or API endpoints. |
| Logic Correctness | 25% | Semantic accuracy of implementation relative to stated intent, edge case handling, error propagation paths, type safety, and divergence between specifications and actual behavior. |
| Codebase Fit | 20% | Architectural coherence, consistency with project conventions, appropriate abstraction levels, and integration quality with existing modules. |
| Maintainability | 15% | Cyclomatic and cognitive complexity, nesting depth, function length, naming clarity, documentation coverage, and long-term modifiability. |
| Test Quality | 10% | Test effectiveness measured through mutation analysis: deliberate code changes that should break tests but don’t indicate weak test suites. |
| Dependency Health | 5% | Dependency freshness, known vulnerability exposure (CVE), license compatibility, maintainer activity, and whether declared dependencies actually exist. |
Assessment Process
Ratings are produced through a multi-phase analytical pipeline. Each phase generates independently verifiable intermediate outputs that are cryptographically combined into a proof-of-work attesting to the depth and integrity of the assessment.
-
Input IngestionThe target repository is cloned at the specified commit. Dependency manifests, language composition, framework indicators, and project metadata are extracted. If the code was generated or assisted by an AI coding agent, the agent may be identified through stylometric analysis.
-
Static AnalysisAbstract syntax tree parsing measures structural complexity across every function and module. Pattern-matching rules detect security anti-patterns, code duplication, and known vulnerability signatures. Declared dependencies are cross-referenced against authoritative registries.
-
Sandboxed ExecutionThe codebase is built and executed within an isolated containerized environment. Dependency installation is attempted and failures recorded. Test suites are executed and results captured. Mutation analysis introduces controlled defects to measure test suite effectiveness.
-
Quality ScoringDimension scores are computed from the combined static and dynamic analysis results, weighted per the dimension table, and aggregated into a composite quality score. The composite score is mapped to a letter grade per the rating scale thresholds.
-
Proof-of-Work GenerationIntermediate outputs from each phase are organized into a Merkle tree. A difficulty-adjusted cryptographic proof is generated, with difficulty proportional to the achieved grade.
-
Cryptographic AnchoringThe rating grade, composite score, proof-of-work hash, Merkle root, and timestamp are sealed via cryptographic attestation, creating an immutable, timestamped record.
Rating Tiers
Automated Assessment (D through A)
Ratings from D through A are produced entirely through the automated pipeline. No human judgment is involved in the scoring. A grade of A represents the highest confidence level achievable through automated analysis alone.
The automated ceiling exists by design. Software quality beyond the A threshold involves factors that no automated system can reliably assess: architectural intent that spans years of technical strategy, organizational commitment to maintenance, and the subtle distinction between code that is merely correct and code that is wisely constructed.
Enhanced Assessment (AA and AAA)
Grades of AA and above cannot be achieved through automated analysis alone. They require a structured human due diligence review conducted by senior analysts with extensive experience in software risk assessment.
For a repository to be considered for AA or AAA, it must first achieve a minimum automated score of A (7.50+). The automated score establishes the quantitative floor; the human review determines whether the qualitative factors support elevation above that floor.
AI Agent Attribution
Open Ratings may identify the AI coding agent that generated or assisted in generating the assessed code. This attribution is reported as part of the rating when detected with sufficient confidence. Different AI coding agents exhibit distinct failure mode profiles that affect the probability of undiscovered defects.
Agent attribution adjusts the weight given to specific assessment dimensions. If an agent is known to have elevated hallucination rates for a particular package ecosystem, the dependency health dimension is scrutinized more heavily.
Cryptographic Verification
Every Open Rating is sealed via cryptographic attestation at the time of issuance. The on-chain record is a compact attestation containing the essential elements needed to verify the rating’s existence and integrity.
repo_hash: SHA-256("owner/repository")
commit_hash: first 20 bytes of rated commit SHA
grade: letter grade enum (D=0 ... AAA=7)
score: composite quality score × 100
pow_hash: proof-of-work hash
merkle_root: Merkle root of analysis phase outputs
timestamp: Unix timestamp of rating issuance
prior_rating: address of previous rating for same repo (if any)
The attestation layer serves as a filing cabinet, not an analytical tool. It provides three guarantees: the rating existed at the claimed time, the rating has not been altered after issuance, and the full history of a repository’s ratings can be traversed by following prior_rating links.
Rating Types
Open Rating
Issued for public repositories. The rated code is publicly accessible, and any party can independently verify the codebase against the assessment findings.
Verified Rating
Issued for private repositories using the identical methodology. The underlying code is not publicly accessible, but the assessment was conducted under the same standards.
Shadow Rating
An unsolicited Open Rating issued without the request or involvement of the repository owner. Produced at our discretion for repositories of public interest.
Reproducibility
Each benchmark task is versioned and deterministic. Task definitions include meta.json, prompt.md, source files, and gold.json oracle checks. Runs log model identifiers, decoding parameters, timestamps, and per-task outputs. Suite hashes and prompt hashes are recorded for auditability.
The assessment pipeline produces independently verifiable intermediate outputs at each phase. These outputs are organized into a Merkle tree whose root is anchored on-chain, enabling third-party verification that the claimed analysis was actually performed to the claimed depth.
Limitations & Disclaimers
An Open Rating is an assessment of code quality at a specific point in time for a specific revision. It is not a guarantee of future performance, a warranty against defects, or an endorsement of the project’s business viability.
Ratings from D through A reflect automated analysis only. No automated system can guarantee the absence of defects. The absence of detected issues is not proof that no issues exist.
AI agent attribution is probabilistic. Confidence levels below 80% are not published. Attribution reflects our best assessment based on stylometric analysis and is reported as an analytical finding, not a definitive determination.
Legal Basis for Shadow Ratings
Open Ratings publishes unsolicited assessments of public repositories under protections established by over a century of credit rating agency precedent and First Amendment jurisprudence. Our shadow rating practice is modeled on the long-standing practice of unsolicited credit ratings.
Methodology Disclosure
This document describes our assessment framework at a level of detail sufficient for users of our ratings to understand what a grade means, how it was derived, and what factors influence it. Certain implementation details are trade secrets of SGX Analytics LLC and are not disclosed. The assessment methodology is protected by U.S. Provisional Patent Application No. 63/983,531.
Independence
Open Ratings operates independently of the repositories it rates. We do not accept compensation from rated entities for the issuance of ratings, and the payment of fees for solicited ratings does not influence the grade assigned.