Ethics and Job Apps: Goodhart’s Law and the Temptation Towards Dishonesty
In the first post in this series, I discussed a moral issue I ran into as someone running a job search. In this post, I want to explore a moral issue that arose when applying to jobs, namely that the application process encourages a subtle sort of dishonesty.
My goal, when working on job applications, is to get the job. But to get the job, I need to appear to be the best candidate. And here is where the problem arises. I don’t need to be the best candidate; I just need to appear to be the best candidate. And there are things that I can do that help me appear to be the best candidate, whether I’m actually the best candidate or not.
To understand this issue, it will be useful to first look at Goodhart’s law, and then see how it applies to the application process.
Goodhart’s Law
My favorite formulation of Goodhart’s law comes from Marilyn Strathern:
When a measure becomes a target, it ceases to be a good measure.
To understand what this means, we need to understand what a measure is. Here, we can think about a ‘measure’ as something you use as a proxy to assess how well a process is going. For example, if I go to the doctor they cannot directly test my health. Rather, they can test a bunch of things that act as measures of my health. They can check my weight, my temperature, my blood pressure, my reflexes, etc. If I have a fever, then that is good evidence I’m sick. If I don’t have a fever, that is good evidence I’m healthy. My temperature, then, is a measure of my health.
My temperature is not the same thing as my health. But it is a way to test whether or not I’m healthy.
So what Goodhart’s law says is that when the measure (in this case temperature) becomes a target, it ceases to be a good measure. What would it mean for it to become a target? Well, my temperature would be a target if I started to take steps to make sure my temperature remains normal.
Suppose that I don’t want to have a fever, since I don’t want to appear sick, and so, whenever I start to feel sick I take some acetaminophen to stop a fever. Now my temperature has become a target. So what Goodhart’s law says is that now that I’m taking steps to keep my temperature low, my temperature is no longer a good measure of whether I’m sick.
This is similar to the worry that people have about standardized tests. In a world where no one knew about standardized tests, standardized tests would actually be a pretty good measure of how much kids are learning. Students who are better at school will, generally, do better on standardized tests.
But, of course, that is not what happens. Instead, teachers begin to ‘teach to the test.’ If I spend hours and hours teaching my students tricks to pass standardized tests, then of course my students will do better on the test. But that does not mean they have actually learned more useful academic skills.
If teachers are trying to give students the best education possible, then standardized tests are a good measure of that education. But if teachers are instead trying to teach their kids to do well on standardized tests, then standardized tests are no longer a good measure of academic ability.
When standardized tests become a target (i.e., when we teach to the test) then they cease to be a good measure (i.e., a good way to tell how much teachers are teaching).
We can put the point more generally. There are various things we use as ‘proxies’ to assess a process (e.g., temperature to assess if someone is sick). We use these proxies, even though they are not perfect (e.g., you can be sick and not have a fever, or have a fever and not be sick), because they are generally reliable. But because the proxies are not perfect, there are often steps you can take to change the proxy, without changing the underlying thing that you are trying to measure (e.g., you can lower people’s temperature directly, without actually curing the disease). And so the stronger incentive people have to manipulate the proxy, the more likely they are to take steps that change the proxy without changing what the proxy was measuring (e.g., if you had to make it to a meeting where they were doing temperature checks to eliminate sick people, you’d be strongly tempted to take medicine to lower your temperature even if you really are sick). And because people are taking steps to directly change the proxy, the proxy is no longer a good way to test what you are trying to measure (e.g., you don’t be able to screen out sick people from the meeting by taking their temperature).
The thing is, Goodhart’s law explains a common moral temptation that we have to prioritize appearances.
Take, as an example, an issue that comes up in bioethics. Hospitals have a huge financial incentive to do well on various metrics and ratings. One measure is what percentage of patients die in the hospital. In general, the more a hospital contributes to public health, the lower the percentage of patients who will die there. And indeed, there are all sorts of ways a hospital might improve their care, which would also mean more people survive (they might adopt better cleaning norms, they might increase the number of doctors on shift, they might speed up the triage process in the emergency room, etc.). But there are also ways that a hospital could improve their numbers that would not involve improving care. For example, hospitals might refuse to admit really sick patients (who are more likely to die). Here the hospital would increase the percentage of their patients who survive, but would do so by actually giving worse overall care. The problem is, this actually seems to happen.
Student Evaluations?
So how does Goodhart’s law apply to my job applications?
Well, there is an ever-present temptation to do things that make me appear to be a better job candidate, irrespective of whether I am the better candidate.
The most well-known example of this is student course evaluations. One of the ways that academic search committees assess how good a teacher I will be, is by looking at my student evaluations. At the end of each semester, students rate how good the class was and how good I was as a teacher.
Now, there are two ways to improve my student evaluations. First, I can actually improve my teaching. I can make my class better, so that students get more out of it. Or…, I can make students like my class more in ways that have nothing to do with how much they actually learn. For example, students who do well in a class tend to rate it more highly. So by lowering my grading standards, I can improve my student evaluations.
Similarly, there are various teaching techniques (such as interleaving) which studies show are more effective at teaching. But studies also show that students rate them as less effective. Why? Because the techniques force the students to put in a lot of effort. Because the techniques make learning difficult, the students ‘feel like’ they are not learning as much. (Here is a nice video introducing this problem.)
One particularly disturbing study drives this point home. At the U.S. Air Force Academy, students are required to take Calculus I and Calculus II. They are required to take Calculus II even if they do very poorly in the first class (you can’t get out of it by becoming a humanities major). The cool thing about this data is that all students take the same exams which are independently graded (so there is no chance of lenient professors artificially boosting grades).
So what did the researchers find when they compared student evaluations and student performance? Well, if you just look at Calculus I, the results are what you’d naturally expect. Some professors were rated highly by students, and students in those classes outperformed the students of other teachers on the final exam. It seems, then, that the top-rated teachers did the best job teaching students.
However, you get a very different result if you then look at Calculus II. There, what they find is that the students who did the best in Calculus I (and who had the top-rated teachers), did the worst in Calculus II.
The researchers conclude that “our results show that student evaluations reward professors who increase achievement in the contemporaneous course being taught, not those who increase deep learning.” Popular teachers are those who ‘teach to the test,’ who give students tricks to help them answer the immediate questions they will face. Teachers who actually force students to do the hard work of understanding the material receive worse evaluations because students find the teaching more difficult and less intuitive. And because difficult, unintuitive learning is what is actually required to learn material deeply, there is an inverse correlation between student ratings and student learning.
Student evaluations are intended to be a measure of teaching competence. However, because I know they are used as a measure of teaching competence, there is constant temptation to treat them as a target – to do things that I know will improve my evaluations, but not actually improve my teaching.
Generalizing the Problem
Student evaluations are one example of this, but they are not the only one. There are tons of ways that measures become targets for job applicants. Take, for example, my cover letter.
For each job I apply for, I write a customized cover letter in which I explain why I’d be a good fit for the job. This cover letter is supposed to be a measure of ‘fit’. The search committee looks at the letter to see if my priorities line up with the priorities of the job.
The problem, however, is that I change around parts of my cover letter to fit what I think the search committee is looking for. My interests are wide-ranging, and my interests are likely to remain wide-ranging. But in my cover letters I don’t emphasize all of these things to the extent of my actual interest. In jobs in normative ethics, I focus my cover letter on my work in normative ethics. For teaching jobs, I focus on my teaching. In other words, I write my cover letter to try and make it look like my interest concentrations match up with what the search committee is looking for.
My cover letters become a target. But because they become a target, they cease to be a good measure.
Another example is anything people do just so that they can reference it in their applications. If a school wants a teacher who cares about diversity, then they may want to hire someone involved in their local Minorities and Philosophy chapter. But, of course, they don’t want to hire someone involved in that chapter just so that they appear to care about diversity.
Similarly, if a school wants to hire someone interested in the ethics of technology, they don’t want to hire someone who wrote a paper on AI ethics just so that they can appear competitive for technology ethics jobs.
Anytime someone does something just for appearances, they are targeting the measure. And by targeting the measure, they damage the measure itself.
Is This a Moral Problem
As a job applicant, I face a strong temptation to ‘target the measure.’ I am tempted to improve how good an applicant I appear to be, and not how good an applicant I am.
When I give into that temptation, am I doing something morally wrong? I think so, pursuing appearances is a form of dishonesty. It’s not exactly a lie. I might really think I’m the best applicant for the job. In fact, I might be the best applicant for the job. So it’s not that by pursuing appearances I’m trying to give the other person a false belief. But even when I’m trying to get them to believe something true, I’m presenting the wrong evidence for that true conclusion.
Let’s consider three versions of our temperature example.
Case 1: Suppose as a kid I wanted to stay home from school. Thus, I ‘fake’ a fever by sticking the thermometer in 100ºF water before showing it to my mom. My mom is using ‘what the thermometer says’ as a measure of whether I’m sick. I take advantage of that by targeting the measure, and thus create a misleading result.
Clearly what I did was dishonest. I was creating misleading evidence to get my mom to falsely believe I was sick. But the dishonesty does not depend on the fact that I’m healthy.
Case 2: Suppose I really think I’m sick (and that I really am sick), but I know my mom won’t believe me unless I have a fever. And again I stick the thermometer in 100ºF water before showing it to my mom.
Here I’m trying to get my mom to believe something true, namely that I’m sick (just as in the application where I’m trying to get the search committee to reach the true conclusion that I’m the best person for the job). But still it’s dishonest. One way to see this is that the evidence (what the thermometer says) only leads to the conclusion through a false belief (namely that I have a fever). But the dishonesty does not depend on that false belief.
Case 3: Suppose I know both that I am sick and that my mom won’t believe me unless I have a fever. I don’t want to trick her with the false thermometer result, and so instead I take a pill that will raise my temperature by a few degrees, thereby giving myself a fever.
Here my mom will look at the evidence (what the thermometer says), conclude I have a fever (which is true), and so conclude I am sick (which is also true). But still what I did was dishonest. It was not dishonest because it brought about a false belief, but because in targeting the measure. It’s dishonest because I’m giving ‘bad evidence’ for my true conclusion. I’m getting my mom to believe something true, but doing so by manipulation. I’m weaponizing her own ‘epistemic processes’ against her.
Now, this third case seems structurally similar to all the various steps people take to improve their ‘appearance’ as a job applicant. Those steps all ‘target the measure’ in a way that damages the sort of evidential support the measure is supposed to provide.
It seems clear that honesty requires that I not take steps to target my student evaluations directly. Similarly, it would be dishonest to put extra effort into classes that will be observed by my letter writers. I recognize that it is morally important to avoid this sort of propaganda. For example, if I’m going to give end-of-semester extra credit, for instance, I will wait till after evaluations are done just to make sure I’m not tempted to give that extra credit as a way to boost my evaluations.
But those are the (comparatively) easy temptations to avoid. It’s easy to not do something just to make yourself appear better. What is much harder is being equally willing to do something even knowing it will make me appear worse. For example, there are times when I’ve avoided giving certain really difficult assignments or covering certain controversial topics which I think probably would have been educationally best, because I thought there was a chance they might negatively affect my teaching evaluations. It’s much easier to not do something for a bad reason, than it is to not refrain from doing something for a bad reason.
Conclusion
Once you start noticing this temptation to ‘play to appearances’ you start to notice it everywhere. In this way it’s like the vice of vainglory.
In fact, you start to notice that it might be at play in your very posts about the problem. If a potential employer is reading this piece, I expect it reflects well on me. I think it gives the (I hope, true) impression that I try to be unusually scrupulous about my application materials. And that is not necessarily dishonest, but it is dishonest if I would not have written this piece except to give that impression. So is that the real reason I wrote it?
Honestly, I’m not sure. I don’t think so, but self-knowledge is hard for us ordinary non-saintly people. (Though I’ll leave that topic for a future post.)