We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Data Science masterclass with Shifra Isaacs

2025/5/28

FP&A Today

AI Deep Dive AI Chapters Transcript

People

Shifra Isaacs

Topics

Shifra Isaacs: 我很高兴能回到我的金融本源，因为我的职业生涯实际上是从金融开始的。借助生成式AI，我们将不再有借口不成为数据科学家。理解借贷和交易的另一面，有助于数据科学家或分析师与财务利益相关者沟通。数据科学家就像律师，需要为自己的项目辩护并说明其重要性。财务建模和预测实际上是简化版的数据科学，量化分析是数据科学和金融领域中薪酬最高的工作之一。商业分析侧重于描述性分析，而数据科学更侧重于预测性分析。数据科学家需要清理数据，使其对算法具有可读性。维护是数据科学周期中最难的部分，需要设置自动化警报。Excel不适合软件工程工作流程，而软件工程强调持续集成和部署。Python for Excel 只是一个过渡状态，最终Copilot会完全集成Python。普遍的数学表达具有重要的哲学意义，因为我们有不同语言之间的翻译模型。机器学习的三种基本类型是回归、聚类和分类，学习机器学习的最佳方法是考虑你想要解决的问题类型。回归用于预测，如时间序列分析和成本预测，分类用于风险建模，如信用风险评分和抵押贷款预测。聚类可能对差异分析有用，例如客户流失预测和电子商务中的队列分析。机器学习擅长识别我们可能看不到的模式和相关性，聚类非常强大，因为它非常自主，你并没有真正告诉它该怎么想。利益相关者管理可以保护技术工作免受人工智能的自动化影响，重要的是了解模型的理论，并能够阅读输出结果。如果你不能解释或理解模型的正确和错误之处，那么使用该模型是很危险的。你需要以诚信的态度处理这些类型的问题，并知道这种分析是完全不可行的。数据科学家使用Python的方式就像高中生使用计算器一样，编程只是一种工具，重点是学习如何在SQL和Python中完成基本的Excel工作流程。尝试使用Google CoLab中的数据科学代理，这是一个适合初学者的工具。推荐多模式学习，即在熟悉的媒介和新的媒介中做同样的事情。工程中有一个伪代码的概念，即写下流程的所有步骤，然后将其映射到Python。首先应该做的是用一种新的媒介来做你已经在做的事情，GenAI应该一路回答问题，一旦你更舒服了，你想做一个新的项目。你需要能够检查你的工作，因为人工智能的问题是人们无法判断输出是否正确。数据科学家几乎从不遵循科学过程或进行严格的统计建模，人们应该具备基本的统计素养和人工智能素养。人工智能不是一个巨大的黑匣子，而是一种数学计算，数据分析师和数据科学家是你的决策支持。统计素养、人工智能素养和领域素养是最重要的，我最喜欢的Excel函数是SUMPRODUCT，因为它节省了大量的思考和步骤。

Deep Dive

Chapters

Shifra clarifies the often-blurred lines between data science and business analytics, highlighting the core distinction: data science focuses on prediction, while business analytics emphasizes description and narrative.

Data science is predictive, focusing on building models and predicting relationships.
Business analytics is descriptive, explaining trends and crafting narratives using data.

Shownotes Transcript

Translations:

中文

If you would like to earn CPE credit for listening to the show, visit earmarkcpe.com slash FPA. Download the app, take a short quiz, and get your CPE certificate. Finally, if you enjoy listening to FP&A Today, please go to your podcast platform of choice, click the subscribe button, and leave a rating and review of the show. And now, on to the show. From Data Rails, this is FP&A Today. ♪

Welcome to FD&A Today. I'm your host, Glenn Hopper. Today's guest is Shifra Isaacs, a developer relations advocate at Ascend.io, where she empowers data engineers through automation, education, and communication. Her background spans data science roles at Analec and JPMorgan Chase, analytics at Proz, and technical support at Sigma Computing. Shifra is also a skilled technical writer, having created content for Crash Course and Data Lemur.

where she's known for making complex data concepts accessible and engaging. Whether she's building models, writing code, or teaching others, Shufra brings a passion for data and a gift for demystifying it. We're excited to have her on the show to help introduce to the world of data science to FP&A professionals and explore how finance teams can tap into these tools in the era of AI. Shufra, welcome.

Wow. Thank you so much, Glenn, for an absolutely glowing introduction. It's really cool to be here. Thanks for having me. My roots are actually in finance, which we can talk a little bit about. So it's cool to be getting back to those roots a little bit. So listeners of the show know that I keep veering towards data science and data analytics. And there's the Venn diagram of FP&A and data science. You know, there's a lot of

crossover, but there's difference. And I'm pushing for, and we're going to get into this, but I'm pushing for with generative AI that we're going to run out of excuses to not become data scientists because if you don't have to learn Python, you don't have to learn SQL, you can do it in natural language. It's making it more accessible. So that's why you're on the show and I really appreciate you coming on. And I did not know when I asked you that your roots are in finance. So before we dive into the planned questions, tell me a little bit about that. Yeah, absolutely. So I

I got into Rutgers University School of Arts and Sciences and also Rutgers University Business School when I was applying to colleges. And I'm a pretty indecisive person, which is something that I've been working on the last couple of years. And I decided to pick this school that had the fewest majors so that I wouldn't be stuck

between 100 different majors to choose from. And the business school had six and the arts and sciences school had like 150 or something. So I went for the business school. And that means that I was really immersed in finance, accounting, supply chain management, and all these kind of areas of the Rutgers Business School majors. And I took classes, I did internships, I

Eventually worked in AI and data science at JPMorgan Chase for an internship. So I feel like that's probably the biggest intersection between our roles here, which we can definitely get into. And I think it really helps to understand debits and credits and the other side of a transaction when you're talking about, you know, the cost of data infrared, things like that. It really helps to speak the language of your finance stakeholders as a data scientist or analyst. So there are a lot of valuable tools for each side to learn from each other.

Yeah. And I think that's, I mean, having spent time in another domain too, I mean, I think it shifts your brain a little bit to kind of real world examples where you saw the kind of data that was being used in finance and maybe that feeds into your data science knowledge as well. And so, well, so you started in business and how did you get to business analytics? Yeah. So business analytics was a major at Rutgers and, or it still is rather. And I, I

actually got into it through a very unorthodox path, which was I was super interested in music. And there was a data science society club that had just started at my school, led by someone who was in the business analytics major called Shreena R. Anand. She's awesome. You can look her up. I think she's a cool founder now. And back in the day, she was hosting a seminar workshop with a guy who worked at Spotify, a data scientist.

And I was really much more going for the Spotify piece than the data science piece because I originally wanted to work in music. And when I heard him talking about statistics and modeling and solving business problems, it kind of resonated with me for a different reason, which was that I took statistics in high school and I really liked the idea.

that statistics is kind of the closest thing we have to truth and that we don't prove things in science. We kind of disprove things and work from there. And it kind of resonated with me unsuspectingly for a different sort of reason. And once I knew that this was a career path that I was like one step away from pursuing, I was like, this seems really cool. I should go for this and see where it takes me.

That's interesting because one of the first exposures I had to data science was one of my professors had done a study using stylometry, stylometry, I'm not sure the pronunciation, but it's a statistical method that he used it to analyze Beatles lyrics.

to determine which lyrics were written by John Lennon and which were written by Paul McCartney. And that was one of the first studies I saw was going through and analyzing the lyrics. And I didn't, you know, I was so green in data science that I couldn't even fathom how that was done. But it was an interesting, interesting way to come into data science through the music. Yeah, that's super cool. I have to look that up after this. I've also done like basic lyric analysis, but that sounds like a whole set of different methods. It sounds really cool.

All right, well, let's get into your background and why I've asked you to come on the show today. Since finishing school, you've moved around to a couple of different jobs. Tell me, so you're at Ascend.io now, and tell me about what you're doing there.

Yeah, absolutely. So Ascend.io is the unified data engineering and agentic data engineering platform. We actually just had our big launch this week where we launched all of our cool AI and agentic features to really make AI native to data engineering rather than just a lot of companies slapping a chatbot in the sidebar. We've done a lot more than that. And

What I do at Ascend is pretty much developer relations or DevRel for short. And that position really sits at the intersection of marketing, engineering and product. So I'm building product narratives. I'm building developer community and building trust with that community. And then also working on engineering efforts, maintaining a ton of documentation and really getting to touch a lot of different areas because the company is 17 people.

Wow. Yeah. Yeah. So that's a, that's a cool time to be at a company when, when they're that small and you can be just kind of move across the board, wear a lot of different hats. That's some of my best learning, you know, better than I got in, in business school was being in the startup space. So super cool. It's interesting that you have these kinds of crossover skills because a lot of in FP&A there's requirements for a lot of different skills too. So I know you have a technical writing background, but technical writing kind of goes along with what we do in FP&A in that it's

It's great if you can build a wonderful model and do all sorts of forecasting and analysis on data. But if you can't convey that story, it falls short. So when I started my career in the Stone Age, we didn't really talk about that. But technical writing actually goes really well with data science and FP&A.

Totally. And in the data space, we call it data storytelling, where you can, again, build the coolest machine learning model in the world or do the best SQL analysis of all time. But if you can't convey it and get executive level buy-in from people who are not necessarily familiar with all the technical details of what you're doing, then that's not going to bring value. That's not going to go anywhere. And one of my favorite data scientists, Tina Wong, who used to work at Meta, always says that

As a data scientist, you're basically a lawyer defending your project and why it matters. And I'm sure that FP&A people and people in finance need to do the same thing all the time.

Yeah. And it's funny, it's not just defending your analysis. It's also asking the why questions. Ask why, why again, why again, to keep digging deeper. And it's the same kind of motivation. It's just using different tools. I mean, I get it. You know, if you went and got an MBA or a master's in finance or a master's in accounting, you've already chosen your path and you've got, you know, your domain expertise that you're working on. And then to hear, well, now

Now you have to learn to write Python and write SQL. And actually more and more FP&A people can write SQL queries because it's just that's the way we get to so much of our data. And I think even in data science, I mean, I still love Excel, but a lot of FP&A teams really, they rely on Excel and traditional driver models. But I want to start exposing them to, and I want to talk about

sort of classical machine learning and all that before we dive into generative AI, but maybe in a sentence or two, tell us why finance professionals should care about data science methods and maybe now more than ever.

Totally. So I would say that a lot of what I've heard about financial modeling and forecasting specifically is pretty much watered down data science. It's like if you could learn data science in a week instead of a year, then you would be learning how to predict based on the last three months of sales rather than doing a full-fledged time series model with a fancy method like ARIMA, which would be auto-aggressive integrated moving average models. And basically you're just doing like

you're going halfway toward data science without full sending it. So you might as well learn the full reason. And again, the why behind what you're doing. Why does this work? And how can I fully immerse myself in getting the best forecast possible? So there's that. And then honestly, for people building their career, things like quant analysis are some of the highest paying jobs. If you want to level up your career and get to this really well-paid intersection of data science and finance, there's pretty much no reason to not take the further step.

Yeah, because you're already, you're already down the path and thinking a certain way. And it's, it's not like Excel is a super intuitive tool. I mean, you have to learn Excel. So if you took the time to learn that, take the step further and, you know, R is a great gateway drug, right? So R is a little bit easier than, I don't know, it's a, you know, start there. You know, you can, you can do ARIMA and Excel. It's just painful. It's just, you know, going through and de-seasonalizing and de-trending and doing all that stuff and then doing it back over. It's just,

it's not practical. There's a lot of other drivers rather than just doing trend analysis and times or time series analysis for forecasts, but that's always a great baseline. And then, you know, you go from there and you have your sort of, you start playing around in Excel with, with drivers. And it's funny because in, in finance, a lot of times those drivers, you know,

are more someone's opinion than they are based on anything. And what I always say with data science is it lets you go from that opinion or that hunch to a hypothesis and something that you could prove and getting to that data-driven sort of mindset. But, you know, on this show, they hear me all the time talk about this, but I guess from your perspective, so imagine you're briefing a senior FP&A director and say, this is someone who's never coded.

And he or she is telling you what they do and you're trying to explain to them what data science is and how it's different from sort of what they may see as classic business analytics. Yeah, that's a really interesting question. And before I answer, I just need to quickly say, I would absolutely love to see your Excel ARIMA model sometime. I didn't know that could be done and I would love to see it. It's been a minute since I touched time series. Getting back to your question, what is the difference between data science and business analytics?

So I think that the line can be really blurry sometimes. And typically business analytics can be like the end product, right? So maybe you built a machine learning model, which is totally in the realm of data science, but now it needs to be in a dashboard and now it's analytics again. So the line can definitely be fuzzy, especially with job titles. But the biggest difference for me is kind of descriptive versus predictive. And I would say that analytics is very descriptive, even in the sense where like a big part of your job as a data analyst is to explain a trend.

sales dip this month, tell me why. And now you're crafting narrative, you're storytelling, you're understanding the business. And the math is just kind of like a tool that you use to do that. Whereas with data science, a lot of your job is predicting, building models, predicting relationships, predicting categories. And yeah, basically, I would pull the lever from descriptive to predictive.

Yeah. So I do see in finance teams, you know, there will be separate teams of data scientists and then there's FP&A and the data science, you know, the FP&A is a customer of the data scientist. So they get the data from them. But there's in if you have a shop where it is, everybody's working together, you go from that descriptive to predictive to prescriptive where you are like, OK, this is here's the look at our pathogen.

past data, here's a prediction where it's going, and here's what we can do to change the future. Once you identify, and that's through KPIs and finding the right levers to pull and all that, and that's sort of the dream. That's the end of the rainbow for the full digital transformation to data-driven decision-making. And there's huge companies, and not a lot of companies are

are really there but it's an aspirational state probably more than a realistic one for most companies big companies that do have data science and fpna teams data comes well fpna we do go get data ourselves but a lot of times the data and information we get comes from data scientists we don't go you know query snowflake and get information out so we're at the mercy of

of what they're able to pull. And I think if we haven't worked in that world, like a typical data science lifecycle is, it's a little bit different than what we do in FP&A. So kind of the, you know, ingest, clean, model, deploy, monitor. Could you walk through how that, what that is from a data science perspective? Because I think it's, even if we're not in Python, it's an interesting thing to visualize and go through because a lot of the same steps happen in FP&A.

Yeah, totally. This is something that data analysts will even do in Excel for the first couple of steps.

So the ingest stage really depends where you are, but essentially you're pulling data that is not yet ready to be modeled from some source. So this could be super raw if you're a data engineer, for example, and you're pulling some unstructured data that needs to then be put into a table. For an FP&A analyst, it's probably just getting a data set in an Excel CSV or something. And this is your raw data that you're starting out with, essentially. Then that data needs to be cleaned or reformatted

depending how you use it. And this gets turned up to 11 for data science because we have this system called encoding, which I'll just touch on very briefly here, where, for example, if you have a true or false, like binary sort of column, you're not encoding that to a machine as the words true and the words false. What you'll probably do is turn that into a numeric flag that's either zero for false and one for true.

But you need algorithms to do that. And the more categorical options you have, the more complicated that can sort of become depending on the analysis you're going for. So you're cleaning it to make it human readable for an FP&A analyst. And for a data scientist, you're cleaning it to make it machine readable for your algorithms, essentially. So that's pretty much the cleaning piece.

And then the next piece would be preparing the data for modeling. Sometimes this can include, you know, pulling down the dimensions. So taking really complicated data and projecting it onto simpler dimensions so that it can run faster. It can include scaling your data. If your data goes from like one to a billion, you might want to scale those relationships down to a range like one to a hundred to get it

more optimized performance wise. And then you're going to actually choose your model. So we call this base model selection, because you need to figure out, you know, which models you're actually doing, which models you're actually running. And the best way to do this is to choose a baseline model. So for example, for a regression analysis between like variables x and y and understanding their relationship, you'll pick a very simple linear regression or multiple linear regression where it's just drawing a line between those two relationships. But you might

land on something much more complex like random forest or extreme gradient boosting, which we affectionately call XGBoost. And so there's a whole process of even deciding what the hell kind of model am I going to actually use that's best suited for my use case and what I need to get out of this.

And then from there, once you decide those models and you have a comparison to say like, oh, the baseline accuracy was 60%, but the final model was 80%. And we know that we have this like 20% gain across those stages. Then once you have that, you're going to want to deploy those models, keep them running on some cadence, maybe every week or something for a weekly deliverable. And then you're going to want to monitor that to make sure that you're keeping aware of things like data drift, schema drift,

or other kinds of issues where either your input data is changing, the use case is changing, there's some kind of bug in the model, and that you have essentially automated alerts set up so that you can be fixing those issues because maintaining is the hardest part of the cycle. I think this is something people don't know, that building is all exciting and fun, maintaining and getting paged at three o'clock in the morning to fix a problem, not so much fun. You really want to set yourself up for success with a lot of guardrails on the deployment stage. I know that was a lot. Did you want to double-click on any of that?

So a couple of things dawned on me when you were saying that, and this has never occurred to me before. Building machine learning models and monitoring, watching the drift is second nature. But I think about how many models over the years I've built in Excel and these models get passed around. They break all the time. Every time it touches sales and marketing, it falls apart.

And it's a static file. Yes. And how nobody, it's not even spoken. It's not like, oh, monitor the model. And it's not even because it's being emailed around and all that. It's in like this fluid state. And it's just, it's funny. So model drift in something that, you know, you build the machine learning model. And I know the data changes and all that, but you don't think about the drift with that. But in Excel, it's just sort of accepted. It's like, well, let me go find where the X lookup was broken and go there.

you know, go dig through and trace back my formulas and see what got wrong here. But that's a real problem in Excel, but we just sort of deal with it and just go find where the formula is broken or, you know, rebuild it. Yeah, it's wild. And I think it's really just because Excel is not built for software engineering workflows. In software engineering, we have a term that people might not know, CICD, which is continuous integration, continuous deployment. And it's really this overarching principle of like the work is never done. Maintenance is a big brunt of the work.

there's what we call unit tests for everything. So every time you add a new piece of code to your code repository, it's being thoroughly tested before it can even make it in and being tested from different angles in different ways. And unfortunately, this is just something that's not as widespread in the finance community, I think, for things like static Excel sheets.

And I would love to hear from you if you're willing to share and let me interview you for a moment. How do you think that finance can kind of take that mindset? Because I think it would be really helpful to not have to manually look to see where these numbers tie.

Yeah. And people are using different forecasting tools and other, I mean, we're not completely dependent on Excel like we used to be, but it is where we start. And it is, especially in a year like this, where you're doing a lot of, you know, redoing forecasts because who knows what's going to happen with tariffs from one minute to the next. And, you know, what could happen to supply chain and are we going to bomb Greenland or whatever? There's

there's a lot of moving parts this year. And I really, it's almost like the models are getting rebuilt. But then, I don't know, more and more is done in forecasting tools. And when it is in forecasting tools, there is sort of a tendency to think, oh, it's set it and forget it. There's not a mentality around, well, is it...

so deterministic. It just seems like, well, it worked last quarter. Why would it not work this quarter? And there's also maybe, you know, think about the number of features you're using in a machine learning model versus the number of drivers you might have in a forecasting model. I mean, maybe it's, maybe they are more set, but it's still going to be, you know, there's the drift in this case is really just formulas getting broken and or assumptions changing significantly enough, or you didn't build it with the right assumptions. And that's the other crazy thing is,

It's not like you've got a database of features and observations to go from. You're rethinking. And so if you built the model with only five drivers and it turns out there have been significant changes and now there's 11 drivers, well, now you just have to rebuild the whole model. So even if, whether you're in Excel or in another system. So really, I think that it is, we're doing ourselves a disservice by, I mean, my whole career has been, if there's going to be some kind of mundane thing that I have to do every

every day or every month or every quarter or even every year, I'm going to spend a bunch of time upfront to automate it so that I don't have to do that exact same thing again. But it's very hard to do that if you're only doing an Excel, which I just always think of as like a two-dimensional representation of the world. Whereas if you have

all these features and you're building out a true machine learning model, you have a lot more flexibility with it. And it's a lot easier to blanket change things across the board if you're just going through and adding Python to it. You're so tied in in Excel to the original set of features that you used, it's not very flexible to just add new ones or to change between them. Totally. And I was thinking about this when Python for Google Sheets came out, or was it Python for Excel came out? And I was like,

this helps a little bit, but it's like you said, it doesn't really give you the full three-dimensional representation of like what a regular repository of Python files can really open you up to. Yeah, and I'll tell you what I think Python for Excel is. It is an interim state between when they finally nail Copilot and just to have Python native to it because that when Copilot is fully integrated, it's going to rely on that Python. So I really think this is like an interim step because a lot of people, you know,

who write Python do spend some time in Excel. Not many people who spend most of their time in Excel write Python. So it's a very small group that's gonna be using both, but I really hope it's just an interim, like as they move towards fully integrating Copilot, which they're a long way from now, but you can see the end game there. - Yeah, that's a really interesting point. I'm not super up to date with all the Microsoft integration, so it's good to know what's going on there.

Another thing that you said that I latched onto was when you were talking about embeddings. I have not read the paper yet, so I probably shouldn't even bring it up, but there was a paper out of Cornell. It's called Harnessing the Universal Geometry of Embeddings. The interesting finding was that all language models are converging on the same, like the platonic ultimate form, that same universal geometry of meaning.

that researchers were able to translate between any models embeddings without seeing the original text. So pretty, pretty amazing that, and it,

That's one of those weird, it almost has like philosophical dimensions to it where these models that were the embeddings that were created differently, all sort of converged in the same space in the same vector space. It's just like there's a single Rosetta Stone for them. I don't know. I'm curious to read the paper and I think I've got my weekend reading set out for me. But that way, when you said embeddings, I just remembered I read that on the plane coming back today.

Totally. I don't even remember saying embeddings. I guess I blacked out a little bit when I was going through that. But that's a really cool paper that I need to check out. And I think that this universal mathematical representation, it does have big philosophical implications because we have translation models between very different kinds of languages like English and Mandarin Chinese. And like, how do we universalize semantic meaning across them? And it's with numbers and it's crazy. And yeah, it's awesome.

So, you know, another thing that I've realized, and this was sort of an unlock for me when I was early learning about machine learning, it seemed so vague and I couldn't grasp it until I realized that machine learning really does two, well, three things being regression, classification, and clustering. And so all this learning from data and all that. And when you think about it like that, I mean, so regression, you know, in time series analysis or whatever, just predictability.

prediction. Well, actually, you know what? I ramble on about this a lot, so I'm going to actually turn it over to you and ask you to give us kind of the 90-second tour of those, and then maybe your thoughts on which of the model families map most naturally to forecasting and variance analysis and risk scoring, things we do in FP&A. Totally. So I really like the little overview you gave. So the three categories of machine learning, basic classical machine learning, are regression,

clustering, and classification. They really answer three different types of questions. The best thing you can do when you're learning machine learning is just think about the kinds of problems you want to solve.

and then figure out which kind of method will help you. So for regression, you're asking, what is the relationship between X and Y? Two different things. And the interesting thing about time series is that it typically involves auto regression. That's the AR and ARIMA. So instead of predicting Y from X, you're predicting X from X, which is what makes it so interesting and why it's like a totally different method, bless you, where you're not using ordinarily squares. You're using all these other kinds of like calc two types of series terms. So that's a total aside, but just to bring it back,

regression is answering what is the numeric relationship between X and Y such that when I get 10 more customers, how much will sales go up by? That's an example of a regression question. And the answer to a regression question is always a number. So that's a good little pro tip for people. And then when it comes to classification, you're talking about categories.

So is this person going to default on their loan or not? I'm going to predict yes or no. And the answer to a classification question, a binary question with two options, is typically a yes, no, true, false, red, blue, that type of question. You can get more options. So like I once built a model, my first ever machine learning model was a multinomial classification between six different loan buckets. So that was kind of an interesting random thing that I did. And it gets more and more complicated the more options you have at the end of the day.

So the answer to a classification question is a category. Then the final group, classifications.

clustering is all about grouping data points into natural segments based on some kind of similarity. And that similarity can be like, oh, if I put some central points on a board, which one is it the closest to? Or if I compare two words, if I compare Glenn and Shifra, how many letters do they actually have in common? I think we might have none. So we would have like a zero similarity score by most string similarity metrics. And you can kind of see how you'd pull these levers, change parameters,

which algorithm am I actually going to use? And you can see how many opportunities there are to really customize your analysis. Getting into the relationship between each of these with FP&A for something like regression, that's where forecasting comes in. And that's where we talked about forecasting

based on the previous forecasting numbers, which would be a time series analysis. That's what you see with investments, people looking at, you know, the change in a stock price over time, cost projections, or the example I gave, which is like, if this thing happens, then how will sales or some other variable be impacted?

So that pretty much covers basic FP&A use cases for aggression. Moving on to classification, you would go into the risk modeling that I mentioned. So you're scoring credit risk. You're saying, hey, can I afford to give this person a credit card? Can I trust them to pay it back? Mortgage predictions of is this person going to default on their loan? Payment delinquency, things like that.

And then with clustering, for FP&A use cases, it might be useful for variance analysis. I'm not super familiar actually with clustering use cases for FP&A, but I would love to hear if you've been involved in that type of use case before. The coolest thing about clustering is, so think about churn prediction or cohorts in e-commerce. So you would cluster customers by cluster.

Okay, these are the customers who bought the first time in January. These are the ones that bought in February. And so their cohort, that's a group. But then it could be these are, you know, whatever demographic information, if you're direct to consumer or whatever, you know, this is a female 25 to 40, whatever, you know, lives in this area.

zip code, estimated household income, whatever. We just have these sort of, like think of a marketing persona. We just default to sort of the human understanding of clustering. But in machine learning with clustering, if, and I think about this because I started my career in telecom, like there's all kinds of similarities that customers cluster together that aren't based on any rules we put on them, but on behavior or on what's happened to them or whatever that we wouldn't even pick up on. And if you can run a clustering algorithm and

and see that, oh, customers who bought, you know, who signed up for service in November of 22 have, they got the immediate price raise and they also got two others. And then we had the big network outage and there's things that,

this like certain cohorts that we wouldn't have identified. So I think for churn prediction or for customer segmentation in marketing, and I know a lot of this sales and marketing has been using this for years, but also as FP&A gets more, I mean, there's starting to be more crossover where it was interesting to me that I think

starting out in finance and FP&A, when I did, you know, I kind of saw myself and my team as the original business analysts, but something happened with the early days of machine learning where finance was just, well, I don't have big data. I just got to

to general ledger. So if I have three years of data, that's only three marches. That's not exactly big data. So what do I do with that? But then over time, as FP&A has started to embed with other groups the same way data scientists do and work with other groups, then factoring in more information into our forecast, whereas sales and marketing early on, especially in e-commerce or SaaS companies or anywhere where you had that much customer data, they kind of jumped ahead for a while and use of it. So now

by FP&A being embedded with other groups and kind of working with sales and marketing and having these teams

that it's not just RevOps and FP&A, but there's a lot of crossover between them. That's what I'm seeing in clustering and it is cool. That's an eye-opener to people too because it shows machine learning or machines are very good at pattern recognition, patterns that we wouldn't see, finding correlations that we wouldn't see. I'm terrible about p-hacking if I'm trying to figure out something for a forecast and trying to find correlations that we might not have seen naturally and all that. I think that machine learning lets us do that.

Totally. And I think it's worth mentioning, just talking about machine learning and intelligence here, the difference between how we sort of prescribe to these models. So when you have a classification or regression model in a typical use case, what you're doing is you're going to label all the data and say, like, for example, if this is a credit risk model, then I'm going to label all my trading data and say, okay, this person defaulted and this person did it. And

And now how do we predict from that? What we do is we take a subset, usually 20%, and we test on that data to prove that our model's working. But with clustering, it's very different. We're not labeling that data. We're literally just giving the machine learning model whatever dimensions we have in most cases and saying, hey, you tell me what's related. And that's why it's so powerful. That's why it feels so magical because even compared to these other typical machine learning use cases, it's very autonomous and you're not really telling it what to think.

Yeah, that's, I mean, it always surprises me when you start seeing those clusters and you don't, especially when you get it really dialed in, like, is it three, five, is it three clusters? Is it five clusters or whatever, when it starts to like really make sense. And yeah, that's, that's, it does feel like magic. It feels like seeing the matrix. Totally. You're, you're taking the red pill. Yeah.

FP&A Today is brought to you by DataRails, the world's number one FP&A solution. DataRails is the artificial intelligence-powered financial planning and analysis platform built for Excel users. That's right, you can stay in Excel. But instead of facing hell for every budget, month-end close, or forecast, you can enjoy a paradise of data consolidation, advanced visualization, reporting, and AI capabilities.

plus game-changing insights giving you instant answers and your story created in seconds. Find out why more than a thousand finance teams use Data Rails to uncover their company's real story. Don't replace Excel, embrace Excel. Learn more at datarails.com.

So there have been drag and drop tools, machine learning tools for years. And I was a big user of RapidMiner until they got bought a couple of years ago. And I got lazy with writing Python. And, you know, even with the drag and drop tools, you couldn't really, if you didn't understand the basics of data science, it's like handing someone who's never taken finance a

and accounting, you know, financial statements. It's like handing someone who's never taken finance or accounting financial statements and asking them to make sense of them. It's like they could, you know, get a general idea, but they're not going to know the right questions to ask or where to look or what, you know, how things stand out. So it was it was hard, I guess, to sell finance professionals on, man, if you if you would learn Python or SQL, you know, if you could get these chops and build some really

cool models for your team, you'd have this superpower. And they were like, I'm having a hard enough time doing what I'm doing in a day. I don't have the time to learn a new language. So I guess it's sort of two questions because I was starting to go down one road. But before I ask that question, I guess I'm going to interrupt myself and ask another question.

You know, we're seeing how good generative AI is at writing code. And so developers, you know, there's like, is AI going to replace everything I do? And I've talked to some of my daughter's friends in college who were studying computer science, and they sort of have this sense of why am I even doing this if I'm going to be replaced by bots? But what's your sense right now?

Paul, do finance professionals need to learn to code or does anyone need to learn to code? Where do you think we are in the world of generative AI writing all of our code on, you know, sort of vibe coding prompts? Totally. Very apt, timely question. So I want to caveat by saying that we don't really know fully where this is going and where it is now might be very different than where it is the next year, the next five years, because we're accelerating at a crazy speed. I'm going to

to echo the statements of Zach Wilson, who is the most popular data engineering creator on LinkedIn. He quit his 500k Airbnb data engineering job to teach the whole community. It's very cool. He's a cool guy. And he posted on LinkedIn this week that he believes that stakeholder management is

is what is protecting your tech job from AI. So I think that finance has a lot of that. Product managers have a lot of that where your job is to sync with stakeholders, make sure projects are running, clarify business logic and business needs, and make sure that work is aligning with those needs. And I think that is a great place to be to protect yourself from AI automation. So I would say in terms of learning...

programming and building models, it's more important to have knowledge of the theory of how these models work and understanding how to read the outputs and see like, oh, is this model a piece of garbage or can we actually use this? So at this point in time, it's really important to understand the types of models that finance people need to be aware of, whether that's forecasting, risk modeling, et cetera, and then being able to read the outputs of those models.

What I will say is it's very hard to get deep knowledge on that without practice. So I would say that you should be learning whatever you need and doing projects, whatever you need to be able to have that conversation, to be able to manufacture consent and get buy-in with your team. And just think about it that way. Think about it from a perspective of driving value and building stakeholder management skills.

Did you ever use DataRobot? I've never heard of that. Okay, so it was super cool. I don't know what they're doing now, but it was a super cool, very expensive, drag and drop, badass machine learning platform. I was...

with a group that had access to it. And you'd have these people who had no idea what they were doing, no idea how to differentiate between any sort of machine learning model. Because that one thing DataRobot would do is you'd put in the data and tell it, you know, I want to predict whatever variable you want or whatever, and you could let it pick the model. So people would dump

dump data into this to them, what was a completely, you know, opaque black box and just take the results and present them. It's like, how, how are you going to present that with any meaning when you don't know the right question to your point? Like if you don't know what a confusion matrix is, or if you don't know how to measure accuracy or precision, recall F1 score, you know, all the ways that you determine a model is efficient, you

then you're really dangerous and shouldn't be using the model if you can't explain or understand where it's right and where it's wrong. So it's kind of like I wouldn't give a first year junior sales person the financial statements and ask him to analyze them and then take his results without question. So that's the equivalent of turning this stuff over to a model when you don't understand it.

Totally. And I feel like it's a good time to share a quick cautionary tale that maybe your audience can benefit from. I worked with a team once, not going to name names, not going to say where it was, a team of business managers who wanted this sort of black box magic data project. And this was the time when ChatGPT-3 had just come out and AI wasn't able to replace a lot of functions the way that it does now.

And these people said to me, we want to estimate the wallet size for our midsize, like medium business customers to see how much they can afford to pay for our tools. And I said, OK, cool. What's the data that you have? And they said, well, we have data from companies who's classified as small and companies we've classified as extra large. We would like you to build two models and take the average between them. And I was like, are you joking me?

And this is exactly the type of trap that you don't want to fall into as a finance professional, because you want to look like you know what you're talking about. You want to do the research and you want to work with a team of data scientists that respect you. And you need to approach these types of problems in good faith and know that this analysis is completely infeasible and that you're vastly overreaching in terms of what we call the relevant range of a regression problem.

Yeah, I don't know. We could get, I was about to get deep in the weeds there. I'm going to actually back off that a little bit because I'm now thinking, I work with a lot of companies on plans on how to roll out generative AI. And I talked to a guy who, he's not a coder, he's a FP&A analyst, and he had this report he had to do every day.

It took him about an hour. It was consolidating data from two billing systems. They were going through some kind of billing system integration. He was going to have to do this every day for months and months until the end of the year when the new billing systems were going to be integrated. And he spent about, and this was in the early days, so they weren't as good as they are now, but

he spent something like 14 or 16 hours going back and forth with, for whatever reason, bouncing back and forth between ChatTPT and Claude and writing scripts that he could put into, that he could automate this whole workflow. And he finally did it. And so, yeah, I guess it took him two full days of work

But if that was something he was going to have to do every week for months and months, he found the ROI. And I think about, I wonder how long that took him. And then it's like when I first started writing SQL queries, it would, you know, I'd leave off a comma or something that would take me, I could stare at it for hours, ended up not being very efficient, but.

Ultimately, you learn enough to be dangerous, but you also learn enough to understand what's happening with the code. So if you use Code Interpreter in ChatGPT now, you can click on it and write it, and it's all well-commented code and everything. But you don't understand what's happening in the for loop or whatever. So I'm wondering at this point, I don't think they have to be hardcore coders and work in a... It's not like we're going to go work in a production environment, but the ability to write code

worksheets, I don't know. Actually, I'm trying to answer my own question. How about I ask you the guess? What do you think if somebody's not working in data science but they're adjacent to it, is it worth them learning the coding basics?

Software engineering is a very different skill set from data scientists. I always say that data scientists use Python the way that high school students use calculators. Yes. Like we use data science to make number do thing or make computer do math, whatever. But we are not expected to have deep knowledge of computer science algorithms and software engineering skills. So I would say like the basics of data analytics are,

really valuable for finance people, especially like you were saying with finance people doing their own sequel, like, okay, now you don't have to slack somebody to see how many records there are in your table. You can select Countstar from your table all by yourself. And that's great. But in terms of

In terms of programming itself, I would say programming is really just a tool for you. It's not worth learning like deep software engineering algorithms, dynamic programming, asynchronous programming, all these things. What is worthwhile is understand how to do your basic Excel workflows in SQL and Python. If you're doing a sum product in Excel, know how to do that in Python. So learn how to use Python interchangeably with whatever you're already doing.

Yeah, actually, that's smart. Yeah, I like that. So let's go a step further. And I'm going to put you on the spot here. So if you can't think of something, that's okay. But I'm going to throw it out there. Think of...

like a lightweight, something that someone could do with generative AI. Either, I don't, you know, maybe they are writing code that they're going to do a collab project or something, or that they're just going to do it within a chat GPT conversation. But kind of a workflow that an FPN analyst could try, like something they would normally do in Excel, put it into generative AI or write code and do it in a workbook. Something like predicting next month's

with cash burn or forecast or budget analysis? Can you think of something like that that would be a good sort of,

gateway entry for an FP&A person to try out doing some data science with generative AI? Yeah, totally. And I just want to quickly call out there is a new tool that's pretty decent called the Data Science Agent in Google CoLab. I'm pretty sure it's free to use. And it's a data science tailored AI agent that you can work with directly in the type of environment that a data scientist would work in. It's very friendly for beginners. So I wanted to call that out there that it's a great tool for finance people who might want to start on this type of project. I think CashBurn, which you talked about, is a really good

basis for this. And something I really recommend to my students is multimodal learning. And what I mean by that is doing the same thing in a familiar medium and a new medium. So I'd recommend building this some kind of cash burn regression model, which is maybe predicting next month's cash burn using historical data, doing that in Excel with something like data analysis tool pack or even solver, I think you can use for that, where you're loading the historical data, you're enabling the tool pack.

And then you're running the regression model, setting your correct cells as the dependent and independent variables, reading the outputs, and then replicating that analysis in Python. And what you can do is say, here's all the stuff I did in Excel. Now data science agent in Google Colab, now turn this into Python and ask it to comment and explain to you every step of the way. Because the whole point is you're not learning something new. You're learning how to do something you're familiar with in a new way.

And that's a lot less scary. And there's a lot less friction kind of holding you back, I think. I think that's actually a really good idea because one, it puts your mind in where you're thinking about the workflow of what you do. So whether you've coded or not, you're going to follow a logical workflow that is probably going to be how you would lay it out in coding. And I think back to...

When I first started Python, somebody gave me a book, How to Learn Python in a Day. And that was so, there's like, okay, that's, I'm not really gonna learn Python in a day, but you're reading a book rather than being, you know, interacting with something. And I think that that's learning, not full-on computer science, but learning how to sort of interact and do stuff with Python right now. If you could in real time do it and interact with ChatGPT or Gemini or whatever, that's a super cool way to break into it.

Totally. And one more thing to add on to that is we have this concept in engineering of pseudocode, which people also may not have heard of. And this is where we basically just write all the steps of a process down. And the idea is you could give that to different engineers. You could give that to somebody who codes in SQL, somebody who codes in Rust, somebody who codes in Java. They could all write the same program that does the same thing.

So if you can build whatever model that you're doing in Excel for forecasting or cash burn prediction or whatever, and just write down all the steps you did and then map those to Python. Now you're thinking like an engineer because syntax, the grammar of Python, it's just grammar. You need to know the process that you're doing.

Yep, yep. Love it. So for an FP&A person who's right now they're Excel only, and if they're interested in starting to dabble with data science, what do you think comes first? And I know my answer, I'll tell you mine after you say yours, but is it, do they dive into statistics more deeply? Do they learn SQL? Do they get some basic Python? Or is there some reason that Gen AI shortcuts some of that learning curve?

Uh, so what comes first is always whatever you're already doing and doing that with a new medium. So if your job is to, I don't know, group some data and make an Excel dashboard for your manager that shows like, okay, these are all of the things that are OpEx and these are all the things that are CapEx for this week. Learn how to do that in SQL. Group by. Do that first. So the first thing is whatever you're already doing in a new medium.

And Gen AI should answer questions along the way. Gen AI is your personal tutor that never gets sick of your BS and will always answer questions thoughtfully and make you feel like a genius. So that's a great tool at that stage. Once you're more comfortable in that new medium, you want to do a new project. And then once you've done something, then I would say learn statistics fundamentals, learn how to get around Pandas or Polar's data from Python. But it starts with just literally replicating a process you already know with a new tool, like I said.

That's how I think it should start. So I think in another universe, I was going to be a college professor and may still, maybe that'll be my retirement. Maybe I'll be an adjunct professor somewhere. But because I always think, and I think it's because it's the way that I came to it. Because when I was in school, when I was at school, we didn't, I guess when I went back and got my,

analytics certificate than I had R. But when I was first studying finance, we were doing everything in Excel and it was learning those basic statistics concepts. But that really opened my eyes. It's very different building a statistical model and the way you pick features in statistics than the way you would. But I thought that that was so informative because I can remember before I was

CFO when I was doing someone else's bidding and they would tell me what to use as drivers and assumptions and all that. And I just thought, but you're just making that up, man. What do you. I'm so glad to hear you say that by the way. Yeah, no, I mean, basically that it's just statistics because I locked into that so early before I even had access or understanding to all these other tools. It was, it just, it kind of changed my way of thinking. But, but to your, your point is much more practical. It's much more

You're solving a problem. You're not just going off to some ivory tower of education. You're doing something that you're doing anyway, and your brain already thinks that way. So I think that's probably a more, a quicker path to actually doing something meaningful. Yeah. And you need to be able to check your work because again, the problem with AI is people not being able to decide if the output is correct.

So the only way to know that is if you already know the domain and you already know what the answer should be. And so if you're doing a process that you design that you're already quite familiar with, you're going to fare a lot better with AI as you're a teacher. One other thing I wanted to say on that is just to call out how statistics makes you think differently.

You definitely should be checking all your assumptions and like being a scientist in that way. But I will say that data scientists almost never follow scientific processes or Brown statistical modeling. And just to give a background for people who are probably not familiar with this, even something as simple as like a linear regression where how does X predict Y? There's a bunch of assumptions about like, okay, your source data should be this shape in the distribution and it should have no outliers and it should have all these things. But when you're in the real world, this is the only method you have.

So even if you're not necessarily hitting all those benchmarks and hitting all those assumptions, there's nothing else you could do. So you then will end up going ahead and just making the best of it with the data that you have. But the rigorous thinking is still very helpful. - Yeah. So it's funny that you said that, and I thought about something as simple as EDA

in finance, you're not throwing out anything. You don't throw out an outlier because it's an actual data point or, you know, if you're using actual dollar items and not, you know, and you don't, you can't impute if there's a, you know, if there's an NA in a data set, you don't impute and just put in the median or whatever because that's,

it means dollars or whatever so it's funny thinking about the subtle differences here um between the between the two but that's only i mean that's only with financial statement data obviously if you're clustering and doing other types of machine learning modeling that support

the work that you do. You follow the data science principles, which brings me to, all right, I think two more questions because we're running out of time and I could talk about this stuff all day, but people are listening to us on the morning drive. We don't want to have them sit in the car and be late to work. Go to work. Have a great day. So I often say, and I really believe this, that every, as automation and AI take over more and more of kind of the

monotonous and more mindless tasks that tasks that aren't going to be replaced or the people who aren't going to be replaced are the people who

understand what's happening here and how data science works. So I'm always advocating for like everyone should learn data science basics. And I'm wondering what you think on that because you know, so there's a difference between someone who completely immersed in it, studied it. That's your area of domain expertise versus a sideline people. I mean, how much is a finance person or a sales and marketing person or an ops person really going to be a data scientist if they didn't fully commit to the

a degree in it or whatever. So, I mean, what do you think? Should we say, well, that's a whole other domain. I don't need to understand it. Or is there some level that we should understand? What's the workforce going to look like in the future with this? And would it be valuable for people, even though they're not working as a data specialist, to know these skills?

Yeah, I feel like there's a lot of questions in that. Again, not super sure how AI is going to advance over the next one to five years. I think what I'd like to touch on is the level of basic literacy that I think people should have if they have the time and money bandwidth to do so. And for me, that starts with statistical literacy. So we're talking stats one, distributions, outliers, basic hypothesis tests,

And this relates to politics, philosophy, your beliefs as a human. I want you to be able to read a scientific study and be able to read the abstract and the conclusion and know what the hell's going on. That matters to me. I think people need to be literate so that they can form their own opinions. So I think statistical literacy is the beginning. And then now AI literacy is really important where I want people to know that chat GPT is not a massive black box.

and that it's math calculating the next most likely thing that it thinks you want it to say. And that's really important. So I think that there are a lot of one-pagers where you can learn basic statistics, transformers, which is the architecture behind LLMs like ChatGPT. Read a one-pager on what are transformers and why are they not magic? Because it's important for you to know that. It's important for you to know that the AI that's rejecting your insurance claim is not magic and you should challenge it.

Right. So I think that Stats 101 and AI literacy are the biggest things when it comes to machine learning. I would say that not everyone needs to know it. But what I will say is that for people who work in the corporate world, data analysts and data scientists, we are your decision support. Right. So it's our job to bring data and experiments to you and say, hey, here's what we found.

please go make a decision now. And I would say that whatever your teams are using to support your decisions as a CFO, as an executive, as a head of FP&A, as a leader, that's where you should get basic literacy on. So I guess to summarize what I said is Stats 101 literacy, so you can read scientific papers and form your own opinions about the world, AI literacy, so you can survive in the AI era, and then just domain literacy for the decisions that you're making in your organization.

Love it. Love it. Schiffer, you are wise beyond your years. I'm old enough. I can say stuff like that. And maybe I don't know. I sound like a pedantic old man. I don't know.

All right. So I had a bunch more questions on here. We are running, running close to being out of time. Um, I have to hit these boilerplate questions or, or, or Jonathan, our producer will kill me. So when we ask everybody, what is something most people don't know about you? Confession time. Something most people don't know about me. I got an A on a piano recital when I was a kid. And, and do you still play piano?

A little bit. My main instruments now are like vocals and bass. I haven't picked up my bass in like two months or something, but this is a good reminder to get back on that. Music is awesome. All right. All right. And this one we ask everyone, and I don't know why we're not logging the answers here, but, and this would be interesting from a, you know, because you're in Python, you're doing work there, but we all end up kind of starting with Excel. That's always my starting point anyway. But we ask every guest, what is your favorite Excel function and why?

The only one that's really coming to mind for me is some product. I really like some product because it's matrix multiplication. And without it, you'd have to do so many steps in Excel. And I like things that save you a lot of thinking and steps.

It's a good one. And then there's also one in Google Sheets that I'll say, which is the Google Stocks, I think is the formula where it gives you the live stock price. So like, so if you're a tech bro working in Silicon Valley, like a lot of my friends, and you want to know how much money you make at any given moment with your equity in your total compensation package, that will keep it up to date for you, which is pretty cool.

Super cool. Before I let you go, I know you're doing a lot sort of an educational space and it's that you are pretty active on LinkedIn. So if our if our listeners want to connect with you and learn more, how can they reach you?

Yeah, totally. So you can definitely find me on LinkedIn slash Shifra dash Isaacs. And that's where I post a lot of educational content, content about what I'm doing with Ascend, community building with some of my LinkedIn influencer friends. And yeah, you can find me there. My email's there. If you're interested in collaborating, talking, thinking about the future of AI data and finance, that's where I'll be. Well, cool. Shifra, thank you so much for coming on the show. Yeah, thank you for having me.

Data Science masterclass with Shifra Isaacs 54:12 Share

FP&A Today

Deep Dive

Shownotes Transcript

Data Science masterclass with Shifra Isaacs