Standardization, AI, and the processing costs of sustainability disclosures in the field
Standardization, AI, and the processing costs of sustainability disclosures in the field
Abstract This paper studies how standardization and Artificial Intelligence affect the processing costs of sustainability disclosures. Using field data from a large data collection project, I document how processing costs vary across disclosure formats and with experience. I complement this evidence with a field experiment that randomizes access to more versus less standardized reports and to a generative AI assistant. Processing costs, measured by time and accuracy in assessing firms' performance, are significantly reduced by standardization but not by access to the AI. AI also reduces the beneficial effects of experience and standardization.
One. Introduction
One. Introduction
The rate of firms disclosing sustainability information has massively increased in recent years. This is because society is demanding greater accountability and transparency amid growing environmental and social crises. The rapid increase, however, has left many concerned about information overload, i.e., high costs associated with processing these disclosures due to low comparability and high discretion. Responding to these concerns, the European Union has developed the Corporate Sustainability Reporting Directive, a far-reaching disclosure mandate standardizing reports to "make sustainability information easily accessible for users." Processing these reports, which involves sifting through lengthy documents and identifying relevant information, is itself undergoing a fundamental transformation. This is driven by the advent of generative Artificial Intelligence, which potentially renders the benefits of standardization on processing costs irrelevant. In this paper, I document the extent and drivers of these costs, and then examine the effects of standardization through the CSRD and access to AI.
I focus on sustainability reporting not only because of its increasing relevance as a form of 'targeted transparency' employed to address societal challenges, but also because it provides a suitable setting to study the effects of standardization. Although firms have been publishing sustainability reports since at least the two thousands, these disclosures have frequently been labeled as greenwashing and criticized as mere impression management. Moreover, absent mandatory regulation, firms had considerable discretion about what and how to present information. Studies find that firms bias their reports by employing visual information or rhetorical arguments, ultimately reducing their usefulness. Thus, standardization might be especially powerful in this setting.
Laine 2009), ultimately reducing their usefulness (Milne and Gray 2013). Thus, standardization might be especially powerful in this setting.
Research on standardization in the financial reporting domain, such as through the introduction of International Financial Reporting Standards, finds mixed effects on reporting outcomes such as transparency and comparability. While many earlier studies have documented positive benefits from adoption, later studies argue that concurrent changes in the institutional environment and the relevance of reporting incentives also play a significant role. As IFRS are principles-based standards, these factors play an important role in this setting. Against this backdrop, the sustainability reporting domain provides an interesting contrast.
The CSRD was explicitly motivated by the EU's dissatisfaction with the earlier Non-Financial Reporting Directive, which did not deliver the hoped-for benefits with regard to comparability. Thus, in twenty twenty, the EU decided to introduce a disclosure mandate and entrust its European Financial Reporting Advisory Group to develop a set of reporting standards. The twelve European Sustainability Reporting Standards detail what environmental, social, and governance disclosures affected companies must provide. Moreover, the standards are much more prescriptive regarding the requirement of disclosures in a specific format, possibly decreasing the influence of incentives. However, given its recent introduction, the standards lack a proper enforcement and assurance infrastructure.
I leverage insights from a data collection project on the Sustainability Reporting Navigator to better understand the determinants of processing costs and to inform my experimental design. To inform the regulatory debate on the "CSRD-preparedness" of European firms, the SRN hosts a data entry platform where students - as part of their final assignment at one of the SRN's affiliated universities - collect firms' data points from the two most widely used standards, ESRS E1 on climate change (e.g., a firm's gross Scope 1 GHG emissions) and ESRS S1 on their own workforce (e.g., a firm's share of minority workers in its workforce). Starting from March 2023, 143 undergraduate and graduate students from four public German universities collected over 750,000 data points on 851 unique firms' reporting practices (4,453 firm-years). The datapoints are directly derived from the standards themselves, i.e., represent those elements an analyst would have to collect to create sustainability benchmarks and analyses.
Monitoring students' data collection allows me to construct a granular measure of information acquisition costs of individual data points, as well as on an aggregated level of firms' emissions and own workforce reporting, respectively. Information processing is commonly divided into three steps that are then linked to specific costs: awareness, acquisition, and integration. I proxy for acquisition costs by the time it takes a student to browse the firm's report, identify the data point, and enter the corresponding value in the data collection platform. This captures the cognitive costs of reading and extracting specific information from a firm's report.
I find that collecting the data points for one firm-year takes about thirty to forty minutes on average. Data points that are uncommon, as well as qualitative data points (i.e., descriptions of methods used or explanations), take longer to collect. Experience gained through the process, i.e., completing data collection for additional firms, as well as experience gained outside the process, e.g., through previous internships or personal investing activities, significantly reduces the acquisition time of the disclosures. Data points highly dispersed across the report, as well as those located in more complex texts, are harder to process, which complements findings from prior literature. Presenting information in tabular format, however, aids processing.
Having documented that some of the dimensions, which are explicitly addressed by the CSRD, are associated with lower processing costs, I now turn towards my field experiment. The field experiment allows me to causally study how the 'composite' effect of the CSRD, as well as access to the AI, impacts processing. My field experiment is integrated into a course on GHG emissions reporting under the CSRD, hosted on the SRN Academy, catering towards sustainability practitioners and accounting students. To successfully complete the course and gain a certificate, participants have to extract information and analyze the sustainability reports. Compared to the mere data acquisition task from the first part of the paper, I add an evaluation component to better resemble the task of an ESG analyst who actually evaluates and compares the firms' performance. I randomly allocate participants to groups receiving reports from before or after the CSRD, as well as having access to a custom AI tool. To study how experience influences processing and to reduce the risk of participants simply re-doing the task with new credentials, I prepare three industries.
My primary measure is an aggregate score based on the time it takes participants to finish the task and the accuracy of their answers to three questions. Specifically, participants evaluate which of an industry's three firms has the lowest absolute emissions levels, the lowest emissions intensity, and the most ambitious climate goal. The questions are based on a careful review of ESG analyst reports and are designed to capture a comprehensive view of the firms' greenhouse gas emissions performance. I select industries for which climate is a material ESG matter, thereby ensuring that the firms disclosed on their emissions performance already before the CSRD. Finally, and similar to the first part of the paper, I elicit their prior experience in a pre-experimental questionnaire.
For my first treatment arm, receiving pre-CSRD versus post-CSRD reports, I find no significant differences in the time participants take to conduct their analysis and answer the questions between standardized and unstandardized reports. Participants who analyze standardized CSRD reports, however, are fifteen percent more accurate in answering the three questions. Given that the firms' environmental sections published after the CSRD are significantly longer than their pre-CSRD counterparts, the results show that relying on a standardized way of presenting information makes up for increases in reporting volume, bolstering accessibility and understandability. Moreover, I find that the benefits of standardization are concentrated in the group with above-median prior experience, suggesting that the benefits of standardization are highest for users with prior experience.
But to what extent are the benefits of standardization still relevant? Since the first discussions about mandating sustainability reporting standards in the EU took place in twenty twenty, technological progress has accelerated. Given the widespread use of (generative) artificial intelligence, one question is whether relying on AI can substitute for increased standardization. Many expected AI to significantly alter the nature of jobs and replace humans in manual 'knowledge tasks'. Given its potential to quickly sift through vast amounts of unstructured data, AI can reduce the frictions that led humans to experience processing costs. Recent large-scale randomized control trials, for example, document improved efficiency for call-center agents from ten to fifteen percent and for software developers of up to twenty-six percent.
More and more commentators, however, cast doubt on the qualities of large language models. As most enterprise generative AI pilots fail to deliver financial returns,
enterprise users continue to "prefer ChatGPT for simple tasks, but abandon it for mission-critical work". This manifests in a decreasing AI adoption rate among U.S. firms, decreasing levels of trust over the last two years, and employees' dissatisfaction when receiving "workslop," i.e., documents or presentations created with AI involving low effort but which are passable looking. Moreover, it remains unclear which of the more knowledge-intensive tasks AI can perform sufficiently well. In their randomized control trial with strategy consultants using generative AI, Dell'Acqua et al. argue for a "jagged technological frontier" as they find that performance improvements manifest only for a set of tasks which are "surprising and not immediately obvious to individuals or even to producers of large language models themselves". It is an open question, therefore, how generative AI affects the processing of complex corporate disclosures.
To answer this question, I present results from the second treatment arm of my field experiment, which involves access to a custom AI tool that I specifically designed to aid the extraction and interpretation of sustainability information. The AI tool works similarly to ChatGPT, but includes a retrieval augmented generation architecture to alleviate concerns of "hallucination". This means that the chatbot injects the pages of the report that are most similar to the users' query so that the large language model can rely on the firms' actual disclosures, not just its training data. The architecture thus combines the generative-interactive capabilities of large language models with domain-specific knowledge, increasing accuracy and speed compared to "off-the-shelf" models.
I do not find that access to the AI tool reduces processing costs. To the contrary, participants using AI actually show lower aggregate scores. This holds true irrespective of whether the participants see standardized or unstandardized reports. The negative effect affects both dimensions, time spent and accuracy of the answers. This, however, is not necessarily due to the low accuracy of the AI chatbot. My analysis of the prompts and responses reveals that the bot responds, when prompted to give the
Scope one emissions (i.e., the subject of the first question and the only question which can be answered definitely by an AI), in eighty percent of the cases with the exact value. These findings align with research highlighting that users have issues trusting AI output, especially in high-stakes settings. In addition, using AI has detrimental effects on learning. I show that the internal experience gained from absolving more rounds only occurs in the group not using AI. Also, the benefits of standardization are concentrated in the groups without access to the AI chatbot. Using AI drives engagement, however. I find that AI-equipped participants don't show survey fatigue or signs of attrition over the three rounds compared to the group manually browsing the reports.
My paper offers several contributions. First, I contribute to research on the standardization of disclosures through mandatory regulation. Whereas studies examining the standardization of financial reporting through IFRS generally do find low or no effects of increased comparability or reporting quality, my study shows direct benefits of standardization for sustainability reporting. This is plausible since sustainability reporting started from lower levels of standardization and exhibits unique characteristics, e.g., features a diverse range of topics with different metrics, little to no monetization, and speaks to a broader audience than financial reporting.
Second, I add to a growing literature that examines the use of (generative) AI in professional contexts. A recent meta-analysis reveals that human-AI combinations, on average, do not perform better than either group alone, calling for more research on how AI might lead to 'skill atrophy.' I add to this emerging literature by documenting that in a high-stakes setting, participants do not necessarily have an advantage with AI. In addition, I show that using AI is detrimental to learning.
My findings also have practical relevance to standard-setters in light of the currently debated changes to the CSRD, the "Omnibus initiative." Here, the European Commission proposes to significantly cut the number of data points and make more disclosures voluntary. My findings show that standardizing sustainability reporting through the CSRD has already helped to reduce processing costs. Especially since the CSRD standardized reporting along key dimensions that, in the voluntary setting, have increased processing costs, it is questionable if allowing for more reporting discretion is beneficial. This is in line with calls by many investors, NGOs, and other groups representing users of this reporting who argue that, as the disclosures are increasingly used for investment decisions, they require standardized information. From a welfare perspective, however, these benefits have to be weighed against the costs of the increased disclosures through the CSRD.