Blog

Upcoming Features & Community

We love getting letters for our most active users and it really helps us push our community in the right direction and we build out more features for developers. Recently we received a letter from Chris who has been working his way up the leaderboard and offered some feedback on some of the things he loves about CodeEval as well as some of the things he wanted to see on CodeEval. Since his feedback hit some points we’re working on we asked if we could share it to our community and talk about some of the things we’ll be rolling out.

First of all I would like to say that I really like the idea behind CodeEval. I joined the site a few years ago and solved a bunch of problems before I was was distracted by some other work. I came back to CodeEval recently to solve some problems in Go to help learn the language. CodeEval has been a good learning tool to experiment with solving something in a language when you are first getting started to really get a grasp of the syntax. Seeing my rankings grow while solving the problems has been very satisfying and has made your tool pretty addictive. I have been telling some of my co-workers about it and started to get them interested in it. I would really like to see CodeEval grow to be much bigger than it is. I see potential in the form of Stackoverflow, Github, and Linkedin in how it is important to establish ones online profile. Basically if a company is reseaching a candidate they may look at these tools to see what impact this person has had in programming and how passionate/knowlegable they may be. This may not tell the whole story but it does play a factor. As part of a software company we always filter our candidates with a coding problem before hiring and this seems to be a great service that your site can offer real value for companies and recruiters. Anyways, I know I am preeching to the choir here since you know your own product and its potential better than anyone but I came up with a bunch of ideas from observations I have made about your business model.

Like Stackoverflow, Github, and Linkedin, etc your key value seems to be in your users/programmers that are solving problems. You need to attract many users but also quality users. These are a few ideas I have to grow your user base.

It seems that CodeEval is very focused around users who are seeking a job. This makes sense because that is how you are trying to make money but job seekers do not necessarily make up a large proportion or the most proficient coders in the world so this group may not be helping as much to grow your user base. To be credible you want to attract the best coders in the world regardless of whether they are seeking a job or not. When I google a popular coder that I really like who is making an impact in the community I should be able to see their codeeval profile. This is great publicity for your site because I can see that this famous coder is on your site and that I should think about joining your site, or if I didn’t know about CodeEval before, then now I do. Also, I may want to try and pass some of my favorite coders in the ranking, it is a great competitive environemnt. Ok, so attracting the best coders in the world is not necessarily an easy thing because, what would they have to prove by rising up the ranking? They are already thought of as a one of the best coders without being on CodeEval. I am not sure exactly what CodeEval could offer to the best coders in the world but I think that this is good food for thought on how codeeval can try to attract top coders.

One idea that I do have is to have a team or company ranking. I see you do have teams on the site (although I am not sure how I become part of a team), but it would be nice to see which companies have the best coders. It is great publicity for the company to try and be close to the top so they have their name there which also might make candidates want to apply there. As well, the company would encourage it’s employees to join CodeEval and solve problems to improve the company ranking so that they are close to the top of the list. Also, I think coders should be able to browse company pages to learn a bit more about the companies. Offering badges that the Company can put on their site would also be a great way to validate the company from a hiring perspecitve and would be great publicity and traffic for your site.

In terms of persona’s, I think CodeEval has done a good job at looking at the coder and companies side of things. Another possible persona which might bring in some revenue is to address the recruiters persona. I am not sure how to handle this one exactly because if I was on CodeEval and being bombarded by recruiters that might stop me from using the site. Maybe there is the option that would allow recruiters to contact you that you can enable or disable. Your site may be trying to cut out the recruiter but that hasn’t really been the case with linkedin. Companies may not need paid profiles all year round but a recruiter may gladly pay the monthly price if he is just passing that cost along to the company anyway. This next suggestion may not be the most beneficial for making money but having a recruiting or company ranking and reviews would be a good service for users who are looking for jobs and seems like something that many people would be willing to pay for if it met knowing more about good recruiters and companies.

You have done a great job with the rankings, adding badges for each programming language you code in and your percentile ranking. This is really great to have on your public profile and makes the site more addictive. I can’t think of anything that specific but any work on the gamification side is really important so that once you get a user you can drive them to use your site. Earning rewards and offers is also genius and I look forward to seeing this developed more.

Another thing, which I see you have recognized, is that plagarism would be a killer for the company. I don’t have much to add here because I see you have taken actions on this idea by adding already with the uniqueness, but I am just mentioning it because it is probably one of the most important things for your company and potentially your key competitive advantage over other sites that may compete with you. If users are able to falsely plagarise themselves to the top of the ranking then the site may lose all credibility. So, obviously having a really good plagarism detection algorithm is really important. Out of curiousity I noticed that there are a few repositories that have many solutions to the problems on your site. While you can’t stop people from posting solutions, non-unique problems that could be plagarised should add negative to discourage people from sharing solutions.

Anyways, A lot of this was just commending you on the work you have done and your direction. I think your site is a very smart idea and I look forward to seeing its continued growth. I hope these suggestions and feedback are of some help.


Thanks for your message Chris, it’s super helpful to hear some of the things you want to see and it helps move us forward. As you may have noticed, we’re making a lot of big changes here and you touched on many of the things at the top of our list. We started out a few years ago by building a screening tool for employers and focusing on helping developers find jobs. We quickly realized that building a community was much more valuable and more fun! Our new direction has been to eliminate the focus on jobs and work on allowing developers to build up their credibility with elegant profiles and a ranking system to let them know where they stand in the world. We’re building a community to attract the world’s best developers as well as those who strive to be one day and there’s a whole list of things in our pipeline to facilitate this transitions. 

Firstly, more robust team and company pages are coming! Currently, companies have to create a company page to add you to a team. It’s actually been a huge hit among our existing companies since it allows them to showcase their companies and give developers a better idea of what’s it like to work there as an engineer. Things like tech stack and languages are helpful but having team members is key.

With teams, you can see how proficient their developers are and see how they compare other companies. It’s important to know you’ll be working with smart people and for the company, having actual developers become a recruiting tool for your company just makes sense. Developers can connect with each other and share what’s it’s really like working there etc. Companies can build up their ranking and establish credibility as well as gain valuable insight into what kind of people they are attracting. It’s definitely something we’re pushing companies to build out and we’ve made it free so talk to someone at your company to build one out! 

Another things your touched on is privacy. We’ve gone through great lengths to ensure we’re not just another linkedin where you will get bombarded with recruiter emails. We’ve made it optional to turn on and off public profiles and will be introducing a new messaging system to allow you to communicate with recruiters through or other developers.


Gamification has also been at the top of our list. With the introduction of badges and things like offers. In the last few weeks we rolled out many more offers including credit at Firebase, and Kloudless as well as Giveaways on iPads and Oculus Rifts. Stay tuned for more exciting offers and rewards!


Something we’re doing that’s pretty unique is plagiarism. It’s something that few companies have bothered to resolve but because we want to maintain a sense of credibility in our community, we’ve invested heavily in our new plagiarism detection algorithm. More in this blog post: http://blog.codeeval.com/codeevalblog/2014/7/2/code-plagiarism-detection

Thanks for your feedback Chris! If you ever have feedback on our product or features you want to see, drop us a line anytime at support@codeeval.com. We read every email. 

Thanks,

CodeEval Team

New: CodeEval Code Comparison (Plagiarism Detection)

CodeEval is now even smarter with the launch of our code comparison (plagiarism detection) engine.

One of the challenges with providing relevant information and realistic code rankings for developers on CodeEval is building in a comprehensive system to protect the community against code plagiarism.

The open web makes sharing code easy and we want to make sure that we're providing information and code rankings that are actually relevant, and ensure that the reputation of the platform and our developers is protected. One of the requirements to do this is some kind of system to address cheating or copying code so that when you see someone's code rank, you be confident that they wrote the code themselves. 

After considering a number of algorithms for finding plagiarism in source code, we've decided to build our custom similarity detection engine based on the most current academic research in the area of "Winnowing". Here's one example of the research we took a look at: Winnowing: Local Algorithms for Document Fingerprinting

While we're not going to discuss everything we're doing or exactly how we do it... the gist of it is that every submission of code goes through an analyzer that splits the source code to lexemes (a basic lexical unit of a language, consisting of one word or several words, considered as an abstract unit, and applied to a family of words related by form or meaning.)  We get rid of dependence in the names of variables, classes, etc., then we apply the hashing algorithm and the principle of minimum hash. We choose an imprint that characterizes the source code then we compare the prints with each other, if they're similar - it means the code was duplicated.

A few of the features this supports:

  • You can organize a database to accelerate the check one-against-everyone.
  • Works across all of our 18 supported programming languages.
  • Benefits of the tokenized representation - So we automatically ignore the names of functions and variables (classes, objects, and so on). The tokenization prevents the impact of small changes in the program code to the code duplication checking.
  • Moving a small pieces of code in the source code can slightly affect the result of duplication searching.
  • The algorithm is insensitive to permutations of chunks of code.
  • Since the system compares new code to all existing solutions, finding solutions online and submitting them (even if they're manipulated) are easily identified.

Of course, while displaying this information publicly on profiles this might be a little sensitive for some developers, it's not intended to offend anyone... it's designed to give you some insight into your code, add some credibly to your profile, and to protect the community from those who are cheating or trying to game the rankings.

The comparison engine has been running behind CodeEval for a while now while we thought through how we wanted to present this information to developers. We've tried a number of things and learned a lot. There are benefits that come with simplicity and in looking at the results that were returned in tests it becomes fairly evident who has copied code (even if they've manipulated it) and who has written unique code that has some similarity since it's solving the same challenge (as could be expected). There are some thumbs on the scale in different regards. For example; simpler solutions have a higher probability of returning similar results since the code isn't as long and it's designed to solve a specific challenge so it's likely to have some duplication. Still, while we feel that we have a pretty good idea, we didn't want to be in the position of reporting any false positives. we decided against making any kind of true/false judgement and just deliver the results as a percentage of 'uniqueness' when compared with other submissions, leaving the final judgement with the viewer.

Login to your CodeEval account to see this in action in your account.


How we calculate the percentage...

Whenever any code is submitted for a challenge, we collect the information about duplication in the solutions of a user, we then calculate the ratio of challenges with a duplicate to the total number of solved problems - obtaining the percentage of uniqueness.

*Notes:

  • The percentage of 'not unique' code does not mean that that code was plagiarized, only that it is similar to other submissions (which is to be expected to some extent).
  • We currently don't check the source code for easy level challenges since the code tends to be similar. 
  • We will be continually tweeking this to improve the results so exchanges are to be expected.
  • While we keep all of the submissions for each coding challenge, we only use the results from your last successful submission. So, if you have a result that is less than favorable, you can write original code and improve your uniqueness score.

Feel free to email us for feedback or leave a comment below.

-CodeEval Team

 

Additional References:

  1. Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram Raghavan. Searching the web. ACM Transactions on Internet Technology (TOIT), 1(1):2–43,  2001.

  2. Brenda S. Baker. On finding duplication and near-duplication in large software systems. In L. Wills, P. Newcomb, and E. Chikofsky, editors, Second Working Conference on Reverse Engineering, pages 86–95, Los Alamitos, California, 1995. IEEE Computer Society Press.

  3. Brenda S. Baker and Udi Manber. Deducing similarities in java sources from byte codes. In Proc. of Usenix Annual Technical Conf., pages 179–190, 1998.

  4. Sergey Brin, James Davis, and Hector Garcıa-Molina. Copy detection mechanisms for digital documents. In Proceedings of the ACM SIGMOD Conference, pages 398–409, 1995.

  5. Andrei Broder. On the resemblance and containment of documents. In SEQS: Sequences ’91, 1998.

  6. Andrei Broder, Steve Glassman, Mark Manasse, and Geoffrey Zweig. Syntactic clustering of the web. In Proceedings of the Sixth International World Wide Web Conference, pages 391–404, April 1997.

  7. Nevin Heintze. Scalable document fingerprinting. In 1996 USENIX Workshop on Electronic Commerce, November 1996.

  8. Richard M. Karp and Michael O. Rabin. Pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249–260, 1987.

  9. Udi Manber. Finding similar files in a large file system. In Proceedings of the USENIX Winter 1994 Technical Conference, pages 1–10, San Francisco, CA, USA, 17–21 1994.

  10. Peter Mork, Beitao Li, Edward Chang, Junghoo Cho, Chen Li, and James Wang. Indexing tamper resistant features for image copy detection, 1999. URL: citeseer.nj.nec.com/mork99indexing.html.

  11. Narayanan Shivakumar and Hector Garcıa-Molina. SCAM: A copy detection mechanism for digital documents. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995.

  12. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249–260, 1995.

Real World Challenges

This morning you may have noticed we released a different type of challenge called "Find Flight 370." Give it a shot when you have a chance.

We're experimenting with adding more of these types of "real-world" challenges which utilize actual datasets to see if we can leverage the power of our community to solve real problems in the world.

Hard computer science challenges are great for prepping for interviews and brushing up on your skills but we've increasingly received requests to build challenges that are more interesting and can affect the world around us. 

This is just our first experiment see if we can help "Crowdsolve" big problems related to humanitarian, environmental, technological, and social issues. As the world becomes more interconnected, we want to build tools to allow developers to change it. 

We need your help in spreading the word and giving us feedback. Email our team or leave a comment below. 

Language Upgrades

We're happy to announce a few language updates this week which affect Go, C, C++, and Scala with more in the pipeline.

Go language 
updated to version 1.2 (from 1.0) - more features and it's really interesting for the community;
and C++ compiler updated to version 4.8.1 (from 4.6.3) - now we support C++ 11 standard
Scala language updated to version 2.10.3

In addition, we're still taking votes for our next set of coding challenges below!

Fill out my online form.