Data Science and Finance

Hi guys,

So I’ve been doing a bit of learning recently, particularly around machine learning, AI, and data science, which seem to be hot trends nowadays.

It appears that such technologies will have a lasting change in our economy and the finance profession, with a strong shift in the mix of the future workforce.

What is your take on supplementing an accounting and finance background with knowledge in data science? If worthwhile, any Canadian courses you guys would recommend?

Cheers,

PFF

Would want to know as well - This always comes up in discussions, but I never hear a good place to start.

Start by learning to code. Regardless if you are a PhD in machine learning, being able to speak to the machines is going to be increasingly important. I started with a python intro course on udemy for 10 dollars

How long in term of time commitment do you think it took you to become good at it? I tried codeacademy but even after passing the course I didn’t feel I had learned much

So what I recommend from the coding standpoint are in order, the 1) Python 4 Everybody Specialization offered through University of Michigan via Coursera. It’s 5 beginner courses that are a very good introduction. Then step up to the five course 2) Applied Data Science with Python Specialization also from UoM through Coursera. The graded option is also I think vital. Once you have those, a good add on would be the 3) Statistics with R Specialization by Duke through Coursera, this is five courses as well. At that point, I would say you have a base level proficiency in coding and done at a fast pace, I think you could do all of that in a few months (I think it took me like 2-3 months of pushing the pace pretty hard to do all three although I also had other classes, work, family). I think the recommended / casual learner pace is like five months per specialization, but I think that would actually be counterproductive to go at that pace since you’d lose a lot of what you learned over time and I can’t imagine a a serious student moving at that pace.

On the math side, I had to first rebuild a base (I had the Calc I-IV progression and some prior coursework but was overall light on math). So that involved Stats I & II, Probability, Linear Alg I & II and Discrete at a local engineering school and now my masters begins in earnest in the fall. The basic math progression took me about half a year or so while working, family, learning to program, etc as well. If you’re determined with more free time and bright or have a strong pre-existing background, it can probably be done somewhat faster.

It has been pretty enjoyable so far, because I think I am finally grasping the big picture much more clearly quantitatively and seeing how each of these topics is more or less examining the others from a different point of view. My coding has also benefited from my math base in terms of tricks to streamline tasks. It’s also worth noting that to really pursue data science, everything I mentioned prior to the masters would be considered the basics required to START really studying data science in earnest. These math and coursera courses were actually what I was told to do to be prepared to start graduate coursework. The term gets thrown around a lot these days as a catch phrase, but it’s only causing employers and the industry to examine your skill set more critically for advanced capabilities.

What is Calc 4?

It depends on the school, the school I originally took it at, it was covered in three courses, in my current one it was covered in four although their term structure is a little different as well as some of the content. I got used to saying I-IV here, but the relevant idea is “full calc progression”.

Multivariate calculus, vector functions, partial derivatives, gradient, multivariable optimization, polar coordinates, etc

Yeah,this was analagous to Calc 3 in my school.

^fools, this is no more than calc 2 on my playfield. Do you even math, brah?

Not sure there is much left after you do all this from a corporate perspective.

“Data science” can be broken into supervised, unsupervised and deep learning. The first two now revolve around 1 or 2 algorithms which can be mastered in a week considering they all have linear algebra under the hood and you’ve ticked that. Deep learning is a bit more complicated which would take around 6 months to grasp.

There would be value in understanding how to deploy a model into the pipelines which means getting familiar with some concepts in IT and docker etc.

I would be curious to know if these guys are basically the new quants and these new algorithms are basically replacing the existing ones or there is some sort of sliding scale from ds - > quant.

What’s average compensation for these guys?

Before starting to learn Machine Learning and Data Science you need to learn Python and other more coding knowledge. Any students can complete this course who are either belong account or finance but it depends on your gain capacity about programming coding and concept. Today ML has more scope in IT industry as well as data science.

Thanks!!

Tank

Have you taken Analysis or Probability Theory(Not the stat and probability everyone has to take in STEM) ?

I’m just starting the program so I’ll see what’s ahead, but Mobius, SCB, any recommendations or critiques?

I don’t think compensation is that different. I think its just a slight rebranding of stuff already out there, the field is expanding / maturing so more nuanced sub-disciplines are proliferating with it.

Thanks for the answers guys.

@Black Swan: I like how you gave the step by step. I’ve just recently finished VBA for excel (which is applicable to my job), and working on SQL(which seems pretty applicable with anything SAS based). Python will be next.

Did you do your math courses concurrently with your coding courses? The reason I ask is, I am pretty far removed from any complex math since graduating, aside from some linear regression and stats. Do I have to rebuild the math base before doing python, or is it the other way around?

And finally, Udemy vs Coursera?

I’m not trying to go into the coding industry, because there are going to be PhDs that are 10x better than me. I’m hoping to get enough knowledge to incorporate with the finance/accounting knowledge to be useful going forward. Nothing worse than being left behind.

I’ve generally heard that Coursera > edX (although CS50 on edX is raved about) > Udemy

I don’t think order matters or that you *need* the math, data science is basically a space filled with CS folks that got more into math or MA folks that got more into programming over time. I started with the math, that was just my preference. You will probably get more bang for your buck learning programming like python before the math though. You may also find that learning R is better for your immediate needs and is worth tackling first (maybe through that Duke Coursera program).

Mathematics:

For Mathematics I think MIT OCW is the best place as long as you do the required assignments and take the exams that other students take to gauge yourself.

You will need:

  1. Single Variable Calculus
  2. Mufti Variate Calculus
  3. Linear Algebra
  4. Computational Linear Algebra
  5. Probability Theory and Stat

CS Theory:

I don’t think knowing about finite state machines or language grammars really helps you, but I do think knowing algorithms and data structures will definitely help you.

You will need:

  1. MIT OCW 6006 (I think this is the intro to algorithms course in MIT)
  2. The intro to Algos and Data Structure Course on Coursera by Princeton University.
  3. Code: The hidden language of computer software and hardware (not required but very useful)

Programming:

  1. Dataquest : While I think what B.S Said is very valuable there is no better way to learn programming than messing around with the data/code. I think this website provides much more “hands-on” experience than other venues, provided you completed the steps before.

ML:

  1. Andrew Ng’s ML course on Coursera is superb.

If you have a good foundation in ML, why not start NLP? NLP is currently one of the biggest rainmakers in Data Science and ML.