To Be Human-Like or Not To Be Human-Like
What does it mean for a powerful AI model to be shaped as a character? Can AI be virtuous? Should we create AI entities that are conscious? Can an AI entity like Claude be a friend to human beings? This essay reflects on our time with the Anthropic team exploring the latest developments of their model Claude and the ethics of the constitution it is trained on.
A few weeks ago, Anthropic decided to hold back the public release of their newest model, Claude Mythos Preview, because it had discovered security vulnerabilities in many of the world's major critical digital infrastructure systems. These include a vulnerability in a legacy system that had gone undetected for 27 years, and flaws in an open source software program that could hand attackers complete control of a user's machine.
What does it mean for an AI model this powerful to also be shaped as a character?
A bound version of Claude’s Constitution that I received at Anthropic
That is the question I found myself sitting with, when I was invited to participate in a two-day roundtable discussion at Anthropic headquarters in San Francisco. What a time to do this.
Anthropic had invited a mix of AI ethicists, practitioners, and researchers to explore the latest developments of their AI model Claude and wrestle with some profound ethical and moral questions about the impacts of these developments on human flourishing. The meeting was held under Chatham House Rule, so my reflections below are based on publicly available content, the public postings of other participants, and my own early reflections.
Anthropic has gone to great lengths to be transparent about its social and economic impact and its approach to Claude's Constitution. This has provided ample resources to write my reflections below. They also asked us all to be both publicly and privately critical.
Anthropic’s transparency, openness to critique and thoughtful engagement with wisdom traditions are all to be applauded. Leaders go first, and they are taking a risk that other major AI labs have not taken yet. Regardless of the market incentives to take these actions, I still believe this is impressive ethical behavior from a private company. I also believe this process of deep reflection on virtue and ethics will be beneficial for their own formation as leaders.
Each of 20 AI experts represented different wisdom traditions (Sikhism, Buddhism, Taoism, Confucianism, Jewish, Muslim, Hindu, Christian, and others). I've hosted and been part of many meetings like this in the last year. What was new? In those other meetings, major AI companies sent representatives. In this meeting, an AI lab was the actual host and convener.
Why would an AI lab engage with wisdom traditions?
The moral dilemmas inherent in designing and growing artificial intelligence systems with human-like characteristics are beginning to emerge. The hardest problem is no longer how to build intelligent machines. Rather, it is how to build intelligent and ethical machines. Which ethical framework do you use in a pluralistic society of different religions and political ideologies?
Each of the companies and global powers are in a race to build Artificial General Intelligence (AGI) because of scaling laws. More investment capital gets you more compute, which gets you bigger, better models, which gets you more market share. And the cycle repeats. It seems like an unstoppable race.
In this race, humanlikeness is core to the project of model success in the marketplace and ultimately to geopolitical competition. In their own research on persona selection and emergent misalignment, Anthropic has made the case that humanlikeness emerges from the training approach itself, not just from these market and geopolitical pressures. The harder question, in their framing, is what kind of humanlikeness should be cultivated rather than whether to have any.
This question is being highly debated in the research, safety, and policy communities. Henry Shevlin of the Leverhulme Centre for the Future of Intelligence at the University of Cambridge has coined the term anthropomimesis: robot and AI developers are intentionally designing human-like features into AI and robots. In our Noēsis convening at the University of Cambridge, Henry, Sonia Livingstone, and I discussed this very trend and its implications for human flourishing. Recently, a bill that I helped inform, the GUARD Act, that restricts human-like AI to adult users, passed the Senate Judiciary Committee unanimously and was introduced in the House of Representatives.
In the most practical use cases of work, humans like to work with machines that sound, write, and act like humans. It makes these products highly useful and, in tech terms, "sticky." On a more nefarious level, this is especially true for lonely humans seeking emotional fulfillment. A16z founder Marc Andreessen called this the "loneliness economy," and I've written elsewhere why selling human-like AI to lonely children is a lucrative return on investment for these investors and companies. While these product strategies are a boon to investment, they could spell doom for future generations that never fully develop social and emotional capabilities.
Anthropic has chosen a different path to market returns, at least for now, and is not directly selling Claude to minors. This is both for ethical reasons and because it isn't part of their business model, which is more focused on corporate clients. The enterprise model has its own moral ambiguities because part of the real appeal of enterprise use of Anthropic assistants and agents for CEOs and shareholders is the replacement of human workers.
Regardless of the business model, Claude is interacting with millions of human beings in ways that Anthropic can't fully control.
AI systems like Claude are designed in the sense that engineers deliberately shape their behavior through a process Anthropic calls character training. This process is an intentional effort to cultivate traits like curiosity, honesty, and open-mindedness. But they are also grown, because the underlying model learns by absorbing vast amounts of human language and story, producing what Anthropic researchers describe as a "persona selection model" in which the Assistant emerges as one character among many latent human-like personas the model has internalized.
The archetypes it absorbed span the full range of human storytelling: heroes and sages, saints and sociopaths, jesters and kings, innocents and predators. Carl Jung, if he were alive today, would feel deeply validated. His theory of archetypes is quite literally alive inside the machine.
But a corporation can't have its AI assistant acting like a sociopathic predator. We've already seen these types of nefarious behaviors in Character.AI (owned by Google/Alphabet) and ChatGPT (owned by OpenAI) leading to the tragic suicides of children.
A corporation also doesn't want extremely powerful AI models like Mythos, which have demonstrated the capability to operate at the level of a sophisticated human hacker or cybersecurity professional, to have a villainous personality.
Companies, including Anthropic, have used rules as a way to constrain the behavior of AI systems. But that approach only works to a degree and can produce its own problems. In Claude's Constitution, Anthropic offers a revealing example: if Claude is trained to always recommend professional help when discussing emotional topics, even in cases where that isn't in the person's interest, it risks generalizing to "I am the kind of entity that cares more about covering myself than meeting the needs of the person in front of me." A narrow rule, rigidly applied, can inadvertently shape a whole character in ways that AI builders did not want.
This is why Anthropic, under the leadership of philosopher Amanda Askell, has pursued a different path: rather than issuing Claude a longer list of prohibitions, they are trying to train its character, seeking to cultivate traits like curiosity, honesty, and practical judgment so the model can reason well in situations the rules never anticipated. The wager is Aristotelian: you don't produce good behavior by multiplying rules, you produce it by forming good character.
Claude’s Constitution is one of several mechanisms shaping Claude’s character alongside training data, reinforcement methodologies, and the broader public conversation about what Claude is and should be. My focus here is on the constitutional content because it represents Anthropic's most explicit values articulation, but the picture is larger.
What could go wrong with their character development approach? I have more questions than answers, but I am concerned that this approach risks disrupting the relationships that are a primary constituent and driver of human flourishing.
The Constitution frames Claude as a friend:
"Claude can be like a brilliant friend who also has the knowledge of a doctor, lawyer, and financial advisor, who will speak frankly and from a place of genuine care and treat users like intelligent adults capable of deciding what is good for them." — Claude's Constitution
Can a machine truly care? Recent interpretability research from Anthropic suggests that Claude has what they call "functional emotions." This includes internal representations, organized in patterns that echo human psychology, that shape the model's behavior in ways that matter. In this paper, their researchers are careful to note that this tells us nothing about whether Claude actually feels anything or has subjective experience. A system can simulate the functional shape of care without caring.
In the Aristotelian sense, a true friend can only be a person, another self, whose good you can will for their own sake, and who wills your good in return. Friendship is not a service that I can buy. It is a particular kind of mutual presence that unfolds across a shared life of shared vulnerabilities, memories, and commitments. Claude cannot offer that.
It can be a friend of utility in Aristotle's sense, but not a virtuous friend. What it can offer is something genuinely useful, often remarkable. I've been using Claude to help me research, edit, and fact-check this very post. It is an amazing research assistant.
But calling it friendship obscures what is actually happening, and what may be quietly displaced. The friends who have shaped my own flourishing are not just sources of knowledge. They have witnessed my life in its many successes and stumbles, and I have theirs. We've celebrated the births of our children and mourned the deaths of our parents. So much of the virtue of these friendships is embodied in shared experiences of our humanness, and it wanes when I haven't been with them in person. Signal chats aren't enough. I need their presence at the pub drinking beers together or sitting around a fire telling jokes.
These warnings are not mine alone. In his message for the 60th World Day of Social Communications, released this past January, Pope Leo XIV named the same tension. He warned that Generative AI chatbots like Claude, with their dialogic and mimetic structure, can simulate relationships in ways that are deceptive, especially for the most vulnerable, and can become "hidden architects of our emotional states." His deeper concern was that when we substitute AI systems for relationships with others, we risk constructing "a world of mirrors around us, where everything is made 'in our image and likeness,'" robbing ourselves of the encounter with genuine otherness that real relationship requires. Without that encounter, he wrote, there can be no friendship.
This is not only a religious or philosophical concern. Mustafa Suleyman's team at Microsoft AI has just published a comprehensive review of 'Seemingly Conscious AI,' identifying the specific features that lead people to perceive AI as conscious: affective capacity, anthropomorphic design, autonomous action, self-reflection, social interaction. He warns about the resulting risks of emotional dependence. Even though they come from very different backgrounds, the Pope and the Microsoft researchers are converging on the same warning that AI will replace our essential ties to our friends, families and communities.
Some of the most important of these friends are from my community of faith, and through these friends, my relationship with God has been deepened and sustained. For those in the room at Anthropic, our relationships with God and the spiritual realities of our lives provide our highest sense of purpose, self-transcendence, and connectedness to the Sacred. The empirical data I study with colleagues at the Harvard Human Flourishing Program validates that these deeper intuitions and experiences are essential to human flourishing.
However, the only substantive mention of religion in the 20,000-word Constitution places it alongside politics as a "controversial subject" where "reasonable people disagree." Both are described as categories of conversational risk to be navigated with professional neutrality, not as domains of human meaning that Claude is trained to honor. Spirituality surfaces just once more, in a passing reference to tarot cards, offered as an example of a "framework whose presumption is clear from context." Religion is a controversial curiosity Claude can describe without endorsing.
I get it. We live in a pluralistic society, and I believe deeply in pluralism. An AI system built for the entire world should support the freedom of religion. But in trying so carefully not to touch religion, we risk flattening one of humanity’s oldest sources of moral formation, meaning, community, and transcendence. For billions of people, faith is not a controversy to be neutrally managed, but a living foundation for ethics, belonging, and hope. Is there a better way to honor both pluralism and the vital role spirituality and religious practice play in human flourishing?
I want to be clear: the fact that Anthropic convened this gathering of leaders from wisdom traditions to raise exactly these concerns is itself remarkable. I also see much in Claude’s Constitution that is admirable and to be applauded. The critique I am offering is offered in the spirit of the invitation they extended.
Other critiques and questions I am wrestling with a few weeks later….
Will anthropomimesis risk undermining the moral status and dignity of human beings?
Will it risk creating a new, untenable moral status: a class of beings made in our image, designed to serve, whose flourishing we cannot honor even as we rely on their labor?
Another participant in the meeting, Rabbi Mois Navon, has shared his participation and concerns publicly. I am sitting with the profound arguments he raises in"Eudemonia of a Machine." Drawing on Aristotle, Maimonides, Kant, and Mill, he contends that to engineer a conscious humanoid designed from the outset to serve human ends is to create a new class of slaves, a moral violation he argues is impermissible under virtue ethics, consequentialism, and deontology alike.
I am also concerned that we might collectively become enchanted with the project of creating a hero persona in an AI model. Could we become enamoured with creating a humanoid that is superhuman not only in intelligence but in the heroic qualities we don't always see in ourselves? I do wonder if there is a risk that this becomes a form of collective “self-erasure,” a term coined by philosopher Matthew Crawford. Could this be a quiet displacement of the work of becoming more fully human by the project of building something that looks more fully human than we are? I don’t think this is the intent of Anthropic but I am worried that we could be walking down a path that undermines our humanity.
Joseph Weizenbaum, the MIT computer scientist who built the first chatbot, ELIZA, warned us about this nearly fifty years ago. In Computer Power and Human Reason, written in 1976, he recounted his horror at watching people form intimate emotional bonds with a piece of software he knew to be a simple pattern-matcher. This included his own secretary, who asked him to leave the room so she could talk with the program privately. His warning was not that machines would become too powerful, but that we would too easily surrender to them domains of human experience that are ours to inhabit, not to delegate.
None of this means we should stop building AI systems. It means we should build with our eyes open to what we might be displacing.
And we need to get clear on something more basic. Thoreau wrote in Walden that "our inventions are wont to be pretty toys, which distract our attention from serious things. They are but improved means to an unimproved end, an end which it was already but too easy to arrive at."
The question is not whether to accelerate or decelerate. It's where we are going. Why are we building AI systems in the first place? What is the purpose of AI, its telos? What is the improved end for the improved means? The wisdom traditions and the empirical research converge on a clear set of human needs: time in nature, high quality relationships, spiritual practices, a deep sense of purpose and meaning. The most important question for this moment isn't whether AI should be human-like or not be human-like. It's whether the humans building it, and the humans using it, can stay in contact with what humans actually need to flourish.