Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Browse good first issues to start contributing to open source (github.blog)
256 points by chmaynard on Jan 22, 2020 | hide | past | favorite | 47 comments


In a way, I appreciate the attempt to encourage people to contribute to open source, but in another way, initiatives like this feel a lot like a cargo cult to me. I mean that in a very specific sense:

Traditionally, contributors to free software and open source, especially the more prolific ones, are often good programmers. This has made its way into the collective consciousness in a way that affects hiring, e.g. by considering open source contributions as a positive signal.

People who want to be software developers hear or read about this and want to replicate it, so they are looking for somewhere to contribute. Hence initiatives like in TFA, and GSoC etc.

What they ignore or just don't see is that those traditional contributors to open source largely self-selected by having some issue that they wanted to solve and then just fixing it. They come to the projects with their own issues, which may overlap with issues of others out of coincidence. The difference in motivation and internal drive is significant.

Of course it's still nice when people genuinely want to contribute to make the world a better place, but I'm skeptical about the support burden that these non-traditional contributors often put on project maintainers, because they tend to need more guidance on average and have less intrinsic understanding of the problem space and relevant use cases. Some projects grow structures to support it and manage just fine, but some don't and I think that also needs to be accepted.



It's analogous to the way that software companies train junior engineers: it takes time to define the tasks, set up infrastructure and tests to make development easy, review and provide feedback on contributions, and to encourage learning and further skill development.

While open source communities have been around for a while, we're entering an era where the size of the community is scaling up significantly (thanks to GitHub and improved connectivity worldwide), and this is exposing some of the entrenched issues around the maintainer burden you mention, gatekeeping, and project maturity.

There's a huge opportunity here to create an improved learning, accreditation and contribution ecosystem; one project I'd heard about recently in the UK is 'Code Grades'[0], which aims to provide a learning ladder for developers - whether they intend to code professionally or as a hobby.

There's always going to be a requirement to define tasks and review contributions - the trick will be to spread the load and make consensus-gathering more straightforward, while also preventing accidental & malicious bad actors (reputation and accreditation can probably help get some of the way there).

[0] - https://codegrades.com


Whether someone wants to fix code because they use and have a problem with it, or they just wanted to knock out a random 'good first bug', it's not really up to you.

Personally i think github shouldn't have this feature for a number of unrelated reasons, but don't start excluding one-off noob contributors. Open source is open for a reason.


I'm not excluding them. If they do just knock out a random 'good first bug', then that's actually great and welcome. But there seems to be a pattern of people who heard that contributing to open source is good for their CV, this ends up being their only motivation, and they expect hand-holding from project maintainers or for their patches to be accepted without scrutiny (or, if they are receptive to feedback, their patches are just really low quality and require a lot of review iterations).

It's those last points that can be a problem, because it can cause their contribution to become net-negative.


> . But there seems to be a pattern of people who heard that contributing to open source is good for their CV, this ends up being their only motivation

Honestly I don't care what motivates people to contribute to open source. Some people are previlaged enough to contribute for fun in their free time. Some do it for improving their resume (Not everyone is previlaged to work in a FAANG or reputed tech company or went to an Ivy League). Some do it for improving the software they use. None of the reason is honestly better than the other. At the end of the day what matter is the code gets commited to the repo. Nobody mentions the reason why they made the contribution in commit message. Don't worry about maintainers getting overwhelmed by contributors. Its a good problem tp have. Most of them are smart enough to come up with a system if that problem actually arises. Honestly the main reason most open source projects die is not because of too many contributions. Its because of lack of contributors and a single maintainer having to do all the heavy lifting by themselves.


> Most of them are smart enough to come up with a system if that problem actually arises

Well, I'm afraid whatever the system might be, it's definitely not going to favor the newcomers.

(edit: formatting)


I think you are over estimating the number of contributors an average open source project gets. Almost none of the projects go to an extend to deliberately avoid new contributors. Its really hard to get people to contribute code to most open source projects. A good maintainer is incentivized to make it easier for new contributors to contribute. Not the other way around.


I kinda understand the point. However at what point one can call himself a "good prolific programmer" which has no time for "hand-holding", can judge if contribution is sincere attempt by some thirteen year old who wants to learn or just being done for CV.

One can argue that every kid with their "low quality patch" is net-negative as their teacher/project maintainer would be better spending time writing actual "good" code.

At what point one just starts sounding like elitist thinking that open source should just be left for exclusive prolific programmers?


> ... can judge if contribution is sincere attempt by some thirteen year old who wants to learn or just being done for CV.

To me it’s no different. Whether they’re thirteen or thirty, I’m not obligated to donate time to help them grow. In fact I’d say “student drivers” deserve less of my time — I’m not a free tutor unless I explicitly agreed to such.

That said, I deal with one-off contributors with respect and handhold out of kindness when necessary (assuming they’re not rude). But I don’t feel like explicitly setting the expectation that I will handhold, especially on trivial patches that’ll take me less time on my own.


I agree with you and think you're well-intentioned. One thing I'd point out is that if you're a project maintainer, who would you rather mentor:

- the inexperienced person who uses the software themselves, has an issue with it, and wants to fix it, or

- the inexperienced person who wants to boost their CV and picked a bug off a "starter task" list.

Assuming they are otherwise equal, I'd certainly prefer the first one, because chances are much higher that they'll stick around. Motivations matter.

Now if a patch just appears out of nowhere, obviously I don't apply a litmus test where I try to divine what the person's motivation is.

However, this logic has turned me off initiatives like GSoC. GSoC is great when there's a student who's already working on a project anyway, and it enables them to up their contributions during summer as an alternative to working a more boring job. However, my impression of GSoC with randoms who specifically scan GSoC for a project to work on has been disappointing.


>but don't start excluding one-off noob contributors. Open source is open for a reason.

The idea isn't that excluding noobs is good. But that noobs are coming in not because they want to contribute, but because they want to pad their resume. In doing so, they have diluted the pool in a way that they will not even be padding their resume.


In my experience good first issues are mostly hidden. Most of the time, big projects with large communities use that label in issues and at least for me the issues are almost never good first issue.

Projects with at most one or two regular contributors often are easier to get into. They are relatively smaller and there is a higher chance that the open issue is simple but no one had time. But the label is not used in those issues.

I have several contributions that are only 1-5 lines to projects with more than 10K stars. Issues were definitely good first issues but no one had time to work on them, let alone labelling them.


Feel like all decently used open source projects need technical PM contributors to help organize the backlog. Issues can range from no information, to incredibly detailed posts, to what belongs on stackoverflow. It must be exhausting as a maintainer to deal with this.


I'd really like to hear the perspective and success stories of "Good first issue" labels and other means of encouraging open source contributions from other open source projects.

In my experience it unfortunately often hasn't been a net benefit for the projects I worked on. A "good first issues" takes up a lot of time to write and often never get addressed at all or it takes even more time to review and give feedback ultimately causing more work than addressing it directly since most "first issue" contributors do not come back to contribute again.

GitHub has done a lot to streamline the process of contributing to an open source project. I think what is still missing (or I don't know about) is an overall resource where you can learn about open source best practises (or just "Best Practises" since you should be using them everywhere) like writing tests, writing good docs, using conventional commits etc. outside of individual projects - and then in addition to the "Good first issue" label also indicate which one of those best practises apply to that issue, e.g. "This is a good first issue if you know about NodeJS, Mocha tests and Markdown".


When there is an issue with a clear bug report that can be fixed in on of two lines and the solution is very clear, I like to answer something like:

"Thanks for the report! The problem is in link-to-file-in-GitHub. Do you want to send a PR to fix it? If not, we can fix it."

Sometimes they accept the proposal, sometimes not. It is a little more work to fetch the change, rebase, and merge it. But some people like the opportunity and perhaps may become a contributor in the future.

(If they don't accept, fix the problem soon and say thanks again with a link to the fix in case they want to see the change.)


A big challenge I'm seeing is when it comes to tests. Even though it is documented in the contribution guide and - I'd like to think - fairly easy to get running (Clone, `npm i`, `npm test`) many first time bug-fixes or features do not include tests.


As an author they have definitely taken more time than if I fixed the issues myself. But for me, that's not the point, the point was to help other fellow devs get started in OSS programming, kind of "light mentoring". At that, it has been wildly successful. I haven't had much free time to continue doing this in the last couple of years though.


Absolutely. I really enjoy helping get people involved in open source, too and help run a once-a month workshop on how to get started with open source here in Vancouver.

I'm just not sure if this is something all open source projects - many of which don't have a lot of resources available - can handle. The very large open projects that already have a lot of contributors that can put in the additional time are definitely at an advantage here.


If an issue takes a lot of time to write, then it's probably not a good first issue. For my project, I found there are all kind of small things that are easy to do but I don't have time, so instead I write a good first issue.

Usually I also put some pointers on how to get started, like "check such method in such file", or "have a look at that reducer action".

It doesn't take long to do this and it's been working pretty well. It's a very good way for new developers to get something done easily and to get familiar with the code base.

As of now there are 40 such issues, including 28 completed: https://github.com/laurent22/joplin/issues?q=label%3A%22good...


This is interesting because it definitely seems to be different for different types of projects like frontend and end-user applications vs. backend and other libraries.


It definitely seems like a lot of the time, the people making these lists could just resolve the issues with not a lot more effort than writing all this 'good first issue' documentation.

It's one thing to throw a tag 'good-for-beginners' on a bunch of issues, but the amount of effort going into some of these 'good first issue' things is astonishing compared to how tiny the issues are in the first place.


I usually bring up this research when talking about employee onboarding but there was a study done in 2015 (with Mozilla) that showed a “good first issue” approach didn’t actually lead to the intended outcome - having more folks successfully integrate into and continue to contribute to a project over a long term. To my knowledge there hasn’t been a ton of follow up or replication but still something to consider.

https://cs.uwaterloo.ca/~rtholmes/papers/msr_2015_labuschagn...


The announcement post is https://github.blog/2020-01-22-browse-good-first-issues-to-s.... We changed to the technical post because it has a bit more detail.


I'd also recommend How I built Good First Issue bot with OpenFaaS Cloud by Rajat Jindal of Proofpoint. It's already used on repos from Docker, Microsoft, OpenFaaS, JetStack (cert-manager), and Google. It's a free SaaS, just add the label and it'll tweet out for you.

Bonus once you've added the label - it should be picked-up by GitHub's tool too.

https://www.openfaas.com/blog/good-first-issue/


Is there a platform for matching contributors with existing projects?

I'm developing one of my projects[1] in public from day one, and I would love to collaborate with some people someday. Rcently, I was thinking about moving my private project tasks into GitHub issues and using tags like "good first issue" and "hacktoberfest". But I am not sure if this alone will attract people that want to really drive the project forward instead of "just" solving one task.

On the other hand, I know people who are looking for a project like this to contribute, but they simply are not interested in mine. So I can sense some need for a tool for project owners to post their projects and for other people to find one to contribute.

[1] https://github.com/darekkay/dashboard


The platform has always traditionally been your own computer. You install programs that you find useful, fun, interesting, necessary, etc. When you have an issue with one, you contribute to it.


I've had a weird first time contributor for one of my projects. I had decided to not use any build system for the front end. This caused issues in a subset of browsers that do not support the latest features. I accepted the extra work of testing it in several browsers and chose Firefox ESR as the minimum platform I support. But I didn't expect that there are forks like palemoon which disabled the core feature the entire project was built on, namely WebRTC.

The contributer's idea was to use HLS/DASH as a fallback for passive viewers who aren't broadcasting their own video stream. He was enthusiastic and even willing to spend 4 days per week on the project but then I realized how inexperienced that contributor is. He has never written a web app, never used a relational database, no knowledge of js, css, html but clearly willing to learn. However since I just don't have the time for hand holding like that I asked him if he wants to acquire these skills to get a job as a software developer. That wasn't the case and the lack of ambition turned me off. I didn't feel like spending 100+ hours on training someone who will never use his skills outside of this project, especially when I can just tell him to go away and come back with Firefox, Chrome, Opera, Edge, anything except Palemoon.


I have noticed that popular repos with "good first issue" labels on issues tend to have those issues claimed fairly quickly. There is real competition in trying to display and assert your coding credentials with open source contributions.

Anecdotally, some issues that are claimed for months do not have corresponding PRs submitted. Question to those that mark labels as "good first issue", have you dealt with flaky contributors?


If you want to volunteer for a project, just contact a project manager (through email or something) declaring the number of hours you're able to donate and your skills. They'll usually give you a good direction to start.

Drive-by commits are time-consuming to maintain. I don't see why maintainers even deal with them.


I can only speak for myself but I prefer not to manually onboard new contributors. I'd rather do a near-infinite number other things with that time.

Add a contributors guide, link to it in your README. GH even has some functionality built-in to link to it these days. Help contributors help themselves and spend that time on something else.


Well the alternative is to deal with poor/unresearched PRs, or to not accept volunteer contributions at all (which is what I do with most of my own open-source projects.)


They are just as time consuming as any other code you haven't written yourself.


I don't understand how this is supposed to work.

For example: https://github.com/ClangBuiltLinux/linux/contribute shows: > This repo doesn't have any good first issues, yet

But I have many open issues tagged "good first issue": https://github.com/ClangBuiltLinux/linux/issues?q=is%3Aissue...


Maybe whoever trained their ML models actually thought no one would call good first issues "good first issues". Kinda like you're trying to build a cat recognizer and you suddenly feed it pictures of cats wearing signs around their necks that say "CAT" in all caps. The recognizer would probably fail.


add a CONTRIBUTING.md file?


Isn't this just filtering issues with label "good first issue" other than the one user already is viewing?

Can someone explain the "machine learning algorithm" part here?


That's explained in the post linked from the blog: https://github.blog/2020-01-22-how-we-built-good-first-issue...


That seems like perhaps a more substantial post for a thread like this, so we've changed to it from https://github.blog/2020-01-22-browse-good-first-issues-to-s.... Probably a good idea to read both.


The post links to an approach walkthrough on the GitHub engineering blog.


"Up for grabs" is one of the first projects to do this on a wide scale in Github and that I absolutely recommend for starting contributing:

https://up-for-grabs.net/

The reason is that it's opt-in (instead of ML-found issues), so the authors of packages have expressed a strong willingness to help first-time contributors. I've used up for grabs before (as an author) and it was great.


If anyone is looking to contribute to Apache Kafka, they, can get started with setup that I have documented in my blog.

https://medium.com/@manasvi/getting-started-with-contributin...



This is a cool thing to support. I'm currently teaching my partner to code, and I am on the lookout for relatively easy contributions she can make to start getting interactions with the community and how things are built in the real world with other people.


If you know basic python, checkout ROS, interesting issues to solve -> https://github.com/ros/rosdistro/contribute


> As the first deep-learning-enabled product to launch on Github.com, this feature required careful design to ensure that the infrastructure would generalize to future projects.

It is surprising to see that this is the first time DL is run in production at GitHub. GitHub has a large amount of fairly structured data in the form of code, issues, etc. Plus they have been dabbling with DL for more than two years [1].

It could be that the business problems that are critical to the growth of GitHub product may not need DL. Solving problems like best-first-issues, code search are useful to the end user but may not effectively grow the business metrics.

[1] https://github.blog/2018-09-18-towards-natural-language-sema...





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: