3 tricks for efficient data science

It’s amazing how little things can turn a data science / mathematical modelling project into a full-fledged mess.  There are easily avoidable traps, but unfortunately also easily forgotten: we love to pretend we “couldn’t do something that silly”.

Following the principles below has saved me (and others) a lot of time and spared me from unnecessary pain:

Always check your data

No one has ever been arrested for excess logging. If you don’t check the input for your models and the outputs of calculations at the intermediate steps, you are in for disappointing results. 

There is a promise of a greener future when deep learning will release us from all the mess of pre-processing, and we can ingest raw data in any format and the intermediate layers will sort it out. Sorry, but here on real life we are still too far from that.  Garbage in, garbage out.

Test your model on limit cases

This is somewhat related to the previous point, but not always. What happens to your model in the limit? Many scientists ask these kind of questions as a basic sanity check. The same should be done: does your model predict what the domain expert / common sense expects when you make one variable too large or too small? If not, then there is something wrong. Do you have a positive regression coefficient for monthly totals, but a negative for average totals? Time to revisit your model.

A common source of errors here is the scale: having two variables on wildly different scales often produces odd, counter-intuitive models.

Fail fast

A common error, specially in highly technical teams, is to over-engineer prematurely. There is a justified fear from that: if the data science experiments become too messy, then it will be too complicated for the engineers to implement them when it’s time to scale. But very rarely this justifies discarding the working Python prototype for writing everything from scratch in C.

Another common error is to try the “tool of the day” and build the solution in a poorly supported technology. There are thousands of poorly maintained and documented open source projects.

Don’t reinvent the wheel. Bet on tested technologies to iterate quickly and, ultimately, bring value to your final user, whether a consulting client, or someone else internally in your organization.  Once the project is on its way and there’s trust on the final user side, there will be plenty of time to explore alternatives.

 

From academia to business: the paradigm shift

The most important change one needs to do comes from inside: it is crucial to understand the logic of the business world, which is rather simple: everything needs to increase revenue or reduce costs.

Unlike academia, where one is free of cherry-picking problems, in the private sector we often find ourselves doing “menial” work for which we feel overqualified. The truth is that one needs to learn by doing, and if you haven’t done your fair amount of, say, SQL queries, you would not appreciate the job necessary, nor would be able to troubleshoot your algorithms yourself.

In my first job as a data scientist, I was super disappointed because I needed to write long queries and I was expecting to just build some beautiful models on top of the cleansed and sanitized data. I clearly had the wrong attitude, but it took me some time to realize.

Later while interviewing candidates for our team, I met a few PhDs that wanted to climb into manager immediately (or so their salary requirements suggested) on the sole merit that they, well, had a PhD. This sense of detachment of reality is what needs to get off from you first.

Another source of problems is the relationship with your advisor and close professors. This is a nasty one, and a very case-by-case topic. I had to disclose my leaving from academia at the time I was finishing the thesis, and tradition dictated that it was time for me to find a postdoc. In my case, my advisor was not against my moving out of academia, and he somewhat encouraged it by introducing me to other mathematicians working on close fields who have gone to the private sector. Unfortunately, another professor took it very differently. She was extremely upset about me leaving academia, and she thought I was a traitor and a mercenary that seeked nothing but profit out of life. Interestingly, she often spent lots of grant’s money on taxis with no reason (public transport in Europe is generally fine), but well. This kind of speech affects you personally: you had grown up to become a researcher, and probably it was a long cherished dream to be a professor. But circumstances change, and you can not let yourself be held by your 15-year-old self. Move on.

Getting a job in the private sector after your (quantitative) PhD

First of all, build marketable skills

At least on the beginning, you need to be aware of what the market is looking for, and where you can fit. Coding in some way or another is essential for this. Depends on the level of specialization you want to achieve, this would be C++ (development of high-performance algorithms, for instance in finance), Python, R or SQL (data science and business analytics). Some data science jobs may ask you for Java and so on: do it at your own risk, since that might be a position a bit further from modeling.

Another important skill is communication: you need to be able to understand other people’s needs and communicate your needs and the solution to their problems. If you are coming straight from academia, chances are hight that you had to give some lectures to experts and teach some courses to non-experts. In many countries, PhD students have to teach to students beyond their field. This is particularly common in maths, where pure math PhDs are usually teaching business students. It’s useful to find out this kind of opportunities during the PhD, so don’t hesitate.

The message that needs to be delivered to the decision makers of the place you want to work is this: you have to show them that you can get work done, and that your work is bringing in revenue or reducing costs. Anything else is utterly irrelevant to them.

Finding a job

There are lots of websites out there, which I am sure you are familiar with. I have had great success with indeed.com and their localized versions. Be sure that you understand the job requirements, and read between the lines. Sometimes (rather often) recruiters do not understand a lot about the requirements, so they just pull in a long list of requisites. A good recruiter will look at your CV and maybe even call you if they see that you could fit in their team, whether you fit on their shopping list or not.

The estimates vary, but it is generally agreed that most of the jobs never make it to the job boards. I know, that’s disappointing, but you should be aware that the best way to get a job you will enjoy is through networking. Go out and talk to people, nowadays every major city has meetups or other interest groups where you can find people with common interests. Don’t hesitate to pull in your network: for sure by the time you end your PhD you have a solid network of people to rely on.

It is always useful to build a name for yourself, so don’t hesitate on participating in as many public speakings opportunities you get, specially if it is for a wider audience. Who knows, maybe someone will offer you a job after your talk…

The interview process varies wildly: some places may ask you to prepare for a coding interview, while others may interview you and then do an offer. Usually there are at least three rounds: a first round to do a quick assessment of how suitable you are for the job, a second round to see that you know your stuff and a third round to see the final decision maker.

If you are invited to a coding interview, it is highly advisable to get some practice in websites like Codility, Hackerrank, Codewars or Project Euler.