{"id":4907,"date":"2020-07-19T11:46:00","date_gmt":"2020-07-19T11:46:00","guid":{"rendered":"https:\/\/azoora.com\/blog\/?p=4907"},"modified":"2020-08-18T12:00:38","modified_gmt":"2020-08-18T12:00:38","slug":"a-step-by-step-guide-to-the-data-analysis-process","status":"publish","type":"post","link":"https:\/\/azoora.com\/blog\/analytics\/a-step-by-step-guide-to-the-data-analysis-process\/","title":{"rendered":"A Step-by-Step Guide to the Data Analysis Process"},"content":{"rendered":"\n<p>Like any scientific discipline, data analysis follows a rigorous step-by-step process. Each stage requires different skills and know-how. To get meaningful insights, though, it\u2019s important to understand the process as a whole. An underlying framework is invaluable for producing results that stand up to scrutiny.<\/p>\n\n\n\n<p><strong>In this post<\/strong>, we\u2019ll explore the main steps in the <strong>data analysis process<\/strong>. This will cover how to define your goal, collect data, and carry out an analysis. Where applicable, we\u2019ll also use examples and highlight a few tools to make the journey easier. When you\u2019re done, you\u2019ll have a much better understanding of the basics. This will help you tweak the process to fit your own needs.<\/p>\n\n\n\n<p>We\u2019ll explore:<\/p>\n\n\n\n<ol><li>Defining the question<\/li><li>Collecting the data<\/li><li>Cleaning the data<\/li><li>Analyzing the data<\/li><li>Sharing your results<\/li><li>Embracing failure<\/li><li>Summary<\/li><\/ol>\n\n\n\n<p>Ready? Let\u2019s get started with step one.<\/p>\n\n\n\n<h2 id=\"1-step-one-defining-the-question\">01. Step One: Defining the question<\/h2>\n\n\n\n<p>The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the \u2018problem statement\u2019.<\/p>\n\n\n\n<p>Defining your objective means coming up with a hypothesis and figuring how to test it. Start by asking: What business problem am I trying to solve? While this might sound straightforward, it can be trickier than it seems. For instance, your organization\u2019s senior management might pose an issue, such as: \u201cWhy are we losing customers?\u201d It\u2019s possible, though, that this doesn\u2019t get to the core of the problem. A data analyst\u2019s job is to understand the business and its goals in enough depth that they can frame the problem the right way.<\/p>\n\n\n\n<p>Let\u2019s say you work for a fictional company called Topnotch Learning. Topnotch creates custom training software for its clients. While it is excellent at securing new clients, it has much lower repeat business. As such, your question might not be, \u201cWhy are we losing customers?\u201d but, \u201cWhich factors are negatively impacting the customer experience?\u201d or better yet: \u201cHow can we boost customer retention while minimizing costs?\u201d<\/p>\n\n\n\n<p>Now you\u2019ve defined a problem, you need to determine which sources of data will best help you solve it. This is where your business acumen comes in again. For instance, perhaps you\u2019ve noticed that the sales process for new clients is very slick, but that the production team is inefficient. Knowing this, you could hypothesize that the sales process wins lots of new clients, but the subsequent customer experience is lacking. Could this be why customers don\u2019t come back? Which sources of data will help you answer this question?<\/p>\n\n\n\n<h3 id=\"tools-to-help-define-your-objective\">Tools to help define your objective<\/h3>\n\n\n\n<p>Defining your objective is mostly about soft skills, business knowledge, and lateral thinking. But you\u2019ll also need to keep track of business metrics and key performance indicators (KPIs). Monthly reports can allow you to track problem points in the business. Some KPI dashboards come with a fee, like\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/databox.com\/product\" target=\"_blank\">Databox<\/a>\u00a0and\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/www.dasheroo.com\/\" target=\"_blank\">Dasheroo<\/a>. However, you\u2019ll also find open-source software like\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/grafana.com\/\" target=\"_blank\">Grafana<\/a>, <a rel=\"noreferrer noopener\" href=\"https:\/\/freeboard.io\/\" target=\"_blank\">Freeboard<\/a>, and\u00a0<a rel=\"noreferrer noopener\" href=\"http:\/\/dashbuilder.org\/\" target=\"_blank\">Dashbuilder<\/a>. These are great for producing simple dashboards, both at the beginning and the end of the data analysis process.<\/p>\n\n\n\n<h2 id=\"2-step-two-collecting-the-data\">02. Step two: Collecting the data<\/h2>\n\n\n\n<p>Once you\u2019ve established your objective, you\u2019ll need to create a strategy for collecting and aggregating the appropriate data. A key part of this is determining which data you need. This might be quantitative (numeric) data, e.g. sales figures, or qualitative (descriptive) data, such as customer reviews. All data fit into one of three categories: first-party, second-party, and third-party data. Let\u2019s explore each one.<\/p>\n\n\n\n<h3 id=\"what-is-first-party-data\">What is first-party data?<\/h3>\n\n\n\n<p>First-party data are data that you, or your company, have directly collected from customers. It might come in the form of transactional tracking data or information from your company\u2019s customer relationship management (CRM) system. Whatever its source, first-party data is usually structured and organized in a clear, defined way. Other sources of first-party data might include customer satisfaction surveys, focus groups, interviews, or direct observation.<\/p>\n\n\n\n<h3 id=\"what-is-second-party-data\">What is second-party data?<\/h3>\n\n\n\n<p>To enrich your analysis, you might want to secure a secondary data source. Second-party data is the first-party data of other organizations. This might be available directly from the company or through a private marketplace. The main benefit of second-party data is that they are usually structured, and although they will be less relevant than first-party data, they also tend to be quite reliable. Examples of second-party data include website, app or social media activity, like online purchase histories, or shipping data.<\/p>\n\n\n\n<h3 id=\"what-is-third-party-data\">What is third-party data?<\/h3>\n\n\n\n<p>Third-party data is data that has been collected and aggregated from numerous sources by a third-party organization. Often (though not always) third-party data contains a vast amount of unstructured data points (big data). Many organizations collect big data to create industry reports or to conduct market research. The research and advisory firm Gartner is a good real-world example of an organization that collects big data and sells it on to other companies. Open data repositories and government portals are also sources of third-party data.<\/p>\n\n\n\n<h3 id=\"tools-to-help-you-collect-data\">Tools to help you collect data<\/h3>\n\n\n\n<p>Once you\u2019ve devised a data strategy (i.e. you\u2019ve identified which data you need, and how best to go about collecting them) there are many tools you can use to help you. One thing you\u2019ll need, regardless of industry or area of expertise, is a data management platform (DMP). A DMP is a piece of software that allows you to identify and aggregate data from numerous sources, before manipulating them, segmenting them, and so on. There are many DMPs available. Some well-known enterprise DMPs include&nbsp;<a href=\"https:\/\/www.salesforce.com\/eu\/products\/marketing-cloud\/data-management\/\" target=\"_blank\" rel=\"noreferrer noopener\">Salesforce DMP<\/a>,&nbsp;<a href=\"https:\/\/www.sas.com\/en_us\/software\/data-management.html\" target=\"_blank\" rel=\"noreferrer noopener\">SAS<\/a>, and the data integration platform,&nbsp;<a href=\"https:\/\/www.xplenty.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Xplenty<\/a>. If you want to play around, you can also try some open-source platforms like&nbsp;<a href=\"https:\/\/pimcore.com\/en\" target=\"_blank\" rel=\"noreferrer noopener\">Pimcore<\/a>&nbsp;or&nbsp;<a href=\"http:\/\/www.dswarm.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">D:Swarm<\/a>.<\/p>\n\n\n\n<h2 id=\"3-step-three-cleaning-the-data\">03. Step Three: Cleaning the data<\/h2>\n\n\n\n<p>Once you\u2019ve collected your data, the next step is to get it ready for analysis. This means cleaning, or \u2018scrubbing\u2019 it. Key data cleaning tasks include:<\/p>\n\n\n\n<ul><li><strong>Removing major errors, duplicates, and outliers<\/strong>\u2014all of which are inevitable problems when aggregating data from numerous sources.<\/li><li><strong>Removing unwanted data points<\/strong>\u2014extracting irrelevant observations that have no bearing on your intended analysis.<\/li><li><strong>Bringing structure to your data<\/strong>\u2014general \u2018housekeeping\u2019, i.e. fixing typos or layout issues, which will help you map and manipulate your data more easily.<\/li><li><strong>Filling in major gaps<\/strong>\u2014as you\u2019re tidying up, you might notice that important data are missing. Once you\u2019ve identified gaps, you can go about filling them.<\/li><\/ul>\n\n\n\n<p>A good data analyst will spend around 70-90% of their time cleaning their data. This might sound excessive. But focusing on the wrong data points (or analyzing erroneous data) will severely impact your results. It might even send you back to square one\u2026so don\u2019t rush it!<\/p>\n\n\n\n<h3 id=\"carrying-out-an-exploratory-analysis\">Carrying out an exploratory analysis<\/h3>\n\n\n\n<p>Another thing many data analysts do (alongside cleaning data) is to carry out an exploratory analysis. This helps identify initial trends and characteristics, and can even refine your hypothesis. Let\u2019s use our fictional learning company as an example again. Carrying out an exploratory analysis, perhaps you notice a correlation between how much Topnotch Learning\u2019s clients pay and how quickly they move on to new suppliers. This might suggest that a low-quality customer experience (the assumption in your initial hypothesis) is actually less of an issue than cost. You might, therefore, take this into account.<\/p>\n\n\n\n<h3 id=\"tools-to-help-you-clean-your-data\">Tools to help you clean your data<\/h3>\n\n\n\n<p>Cleaning datasets manually\u2014especially large ones\u2014can be daunting. Luckily, there are many tools available to streamline the process. Open-source tools, such as&nbsp;<a href=\"https:\/\/openrefine.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenRefine<\/a>, are excellent for basic data cleaning, as well as high-level exploration. However, free tools offer limited functionality for very large datasets. Python libraries (e.g. Pandas) and some R packages are better suited for heavy data scrubbing. You will, of course, need to be familiar with the languages. Alternatively, enterprise tools are also available. For example,&nbsp;<a href=\"https:\/\/dataladder.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Ladder<\/a>, which is one of the highest-rated data-matching tools in the industry. There are many more. Why not see which free data cleaning tools you can find to play around with?<\/p>\n\n\n\n<h2 id=\"4-step-four-analyzing-the-data\">04. Step Four: Analyzing the data<\/h2>\n\n\n\n<p>Finally, you\u2019ve cleaned your data. Now comes the fun bit\u2014analyzing it! The type of data analysis you carry out largely depends on what your goal is. But there are many techniques available. Univariate or bivariate analysis, time-series analysis, and regression analysis are just a few you might have heard of. More important than the different types, though, is how you apply them. This depends on what insights you\u2019re hoping to gain. Broadly speaking, all types of data analysis fit into one of the following four categories.<\/p>\n\n\n\n<h3 id=\"descriptive-analysis\">Descriptive Analysis<\/h3>\n\n\n\n<p><strong>Descriptive analysis<\/strong>\u00a0<strong>identifies what has already happened<\/strong>. It is a common first step that companies carry out before proceeding with deeper explorations. As an example, let\u2019s refer back to our fictional learning provider once more. Topnotch Learning might use descriptive analytics to analyze course completion rates for their customers. Or they might identify how many users access their products during a particular period. Perhaps they\u2019ll use it to measure sales figures over the last five years. While the company might not draw firm conclusions from any of these insights, summarizing and describing the data will help them to determine how to proceed.<\/p>\n\n\n\n<h3 id=\"diagnostic-analysis\">Diagnostic Analysis<\/h3>\n\n\n\n<p><strong>Diagnostic analytics<\/strong>&nbsp;<strong>focuses on understanding why something has happened<\/strong>. It is literally the diagnosis of a problem, just as a doctor uses a patient\u2019s symptoms to diagnose a disease. Remember TopNotch Learning\u2019s business problem? \u2018Which factors are negatively impacting the customer experience?\u2019 A diagnostic analysis would help answer this. For instance, it could help the company draw correlations between the issue (struggling to gain repeat business) and factors that might be causing it (e.g. project costs, speed of delivery, customer sector, etc.) Let\u2019s imagine that, using diagnostic analytics, TopNotch realizes its clients in the retail sector are departing at a faster rate than other clients. This might suggest that they\u2019re losing customers because they lack expertise in this sector. And that\u2019s a useful insight!<\/p>\n\n\n\n<h3 id=\"predictive-analysis\">Predictive Analysis<\/h3>\n\n\n\n<p><strong>Predictive analysis allows you to<\/strong>&nbsp;<strong>identify future trends based on historical data<\/strong>. In business, predictive analysis is commonly used to forecast future growth, for example. But it doesn\u2019t stop there. Predictive analysis has grown increasingly sophisticated in recent years. The speedy evolution of machine learning allows organizations to make surprisingly accurate forecasts. Take the insurance industry. Insurance providers commonly use past data to predict which customer groups are more likely to get into accidents. As a result, they\u2019ll hike up customer insurance premiums for those groups. Likewise, the retail industry often uses transaction data to predict where future trends lie, or to determine seasonal buying habits to inform their strategies. These are just a few simple examples, but the untapped potential of predictive analysis is pretty compelling.<\/p>\n\n\n\n<h3 id=\"prescriptive-analysis\">Prescriptive Analysis<\/h3>\n\n\n\n<p><strong>Prescriptive analysis allows you to make recommendations for the future.<\/strong>&nbsp;This is the final step in the analytics part of the process. It\u2019s also the most complex. This is because it incorporates aspects of all the other analyses we\u2019ve described. A great example of prescriptive analytics is the algorithms that guide Google\u2019s self-driving cars. Every second, these algorithms make countless decisions based on past and present data, ensuring a smooth, safe ride. Prescriptive analytics also helps companies decide on new products or areas of business to invest in.<\/p>\n\n\n\n<h2 id=\"5-step-five-sharing-your-results\">05. Step Five: Sharing your results<\/h2>\n\n\n\n<p>You\u2019ve finished carrying out your analyses. You have your insights. The final step of the data analytics process is to share these insights with the wider world (or at least with your organization\u2019s stakeholders!) This is more complex than simply sharing the raw results of your work\u2014it involves interpreting the outcomes, and presenting them in a manner that\u2019s digestible for all types of audiences. Since you\u2019ll often present information to decision-makers, it\u2019s very important that the insights you present are 100% clear and unambiguous. For this reason, data analysts commonly use reports, dashboards, and interactive visualizations to support their findings.<\/p>\n\n\n\n<p>How you interpret and present results will often influence the direction of a business. Depending on what you share, your organization might decide to restructure, to launch a high-risk product, or even to close an entire division. That\u2019s why it\u2019s very important to provide all the evidence that you\u2019ve gathered, and not to cherry-pick data. Ensuring that you cover everything in a clear, concise way will prove that your conclusions are scientifically sound and based on the facts. On the flip side, it\u2019s important to highlight any gaps in the data or to flag any insights that might be open to interpretation. Honest communication is the most important part of the process. It will help the business, while also helping you to excel at your job!<\/p>\n\n\n\n<h3 id=\"tools-for-interpreting-and-sharing-your-findings\">Tools for interpreting and sharing your findings<\/h3>\n\n\n\n<p>There are tons of data visualization tools available, suited to different experience levels. Popular tools requiring little or no coding skills include&nbsp;<a href=\"https:\/\/developers.google.com\/chart\" target=\"_blank\" rel=\"noreferrer noopener\">Google Charts<\/a>,&nbsp;<a href=\"https:\/\/public.tableau.com\/en-us\/s\/\" target=\"_blank\" rel=\"noreferrer noopener\">Tableau<\/a>,&nbsp;<a href=\"https:\/\/www.datawrapper.de\/\" target=\"_blank\" rel=\"noreferrer noopener\">Datawrapper<\/a>, and&nbsp;<a href=\"https:\/\/infogram.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Infogram<\/a>. If you\u2019re familiar with Python and R, there are also many data visualization libraries and packages available. For instance, check out the Python libraries&nbsp;<a href=\"https:\/\/plotly.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Plotly<\/a>,&nbsp;<a href=\"https:\/\/seaborn.pydata.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Seaborn<\/a>, and&nbsp;<a href=\"https:\/\/matplotlib.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Matplotlib<\/a>. Whichever data visualization tools you use, make sure you polish up your presentation skills, too. Remember: Visualization is great, but communication is key!<\/p>\n\n\n\n<h2 id=\"6-step-six-embrace-your-failures\">06. Step Six: Embrace your failures<\/h2>\n\n\n\n<p>The last \u2018step\u2019 in the data analytics process is to embrace your failures. The path we\u2019ve described above is more of an iterative process than a one-way street. Data analytics is inherently messy, and the process you follow will be different for every project. For instance, while cleaning data, you might spot patterns that spark a whole new set of questions. This could send you back to step one (to redefine your objective). Equally, an exploratory analysis might highlight a set of data points you\u2019d never considered using before. Or maybe you find that the results of your core analyses are misleading or erroneous. This might be caused by mistakes in the data, or human error earlier in the process.<\/p>\n\n\n\n<p>While these pitfalls can feel like failures, don\u2019t be disheartened if they happen. Data analysis is inherently chaotic, and mistakes occur. What\u2019s important is to hone your ability to spot and rectify errors. If data analytics was straightforward, it might be easier, but it certainly wouldn\u2019t be as interesting. Use the steps we\u2019ve outlined as a framework, stay open-minded, and be creative. If you lose your way, you can refer back to the process to keep yourself on track.<\/p>\n\n\n\n<h2 id=\"7-summary\">07. Summary<\/h2>\n\n\n\n<p>In this post, we\u2019ve covered the main steps of the data analytics process. These core steps can be amended, re-ordered and re-used as you deem fit, but they underpin every data analyst\u2019s work:<\/p>\n\n\n\n<ul><li><strong>Define the question<\/strong>\u2014What business problem are you trying to solve? Frame it as a question to help you focus on finding a clear answer.<\/li><li><strong>Collect data<\/strong>\u2014Create a strategy for collecting data. Which data sources are most likely to help you solve your business problem?<\/li><li><strong>Clean the data<\/strong>\u2014Explore, scrub, tidy, de-dupe, and structure your data as needed. Do whatever you have to! But don\u2019t rush\u2026take your time!<\/li><li><strong>Analyze the data<\/strong>\u2014Carry out various analyses to obtain insights. Focus on the four types of data analysis: descriptive, diagnostic, predictive, and prescriptive.<\/li><li><strong>Share your results<\/strong>\u2014How best can you share your insights and recommendations? A combination of visualization tools and communication is key.<\/li><li><strong>Embrace your mistakes<\/strong>\u2014Mistakes happen. Learn from them. This is what transforms a good data analyst into a great one.<\/li><\/ul>\n\n\n\n<p><strong>What next?<\/strong> From here, we strongly encourage you to explore the topic on your own. Get creative with the steps in the<strong> data analysis process<\/strong>, and see what tools you can find. As long as you stick to the core principles we\u2019ve described, you can create a tailored technique that works for you.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Like any scientific discipline, data analysis follows a rigorous step-by-step process. Each stage requires different skills and know-how. To get meaningful insights, though, it\u2019s important to understand the process as a whole. An underlying framework is invaluable for producing results that stand up to scrutiny. In this post, we\u2019ll explore the main steps in the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":4908,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[2,175,131,14],"tags":[127,78,132,137],"jetpack_featured_media_url":"https:\/\/azoora.com\/blog\/wp-content\/uploads\/2020\/08\/data-analysis-process-guide.jpg","jetpack_publicize_connections":[],"jetpack_shortlink":"https:\/\/wp.me\/p7FQPL-1h9","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/posts\/4907"}],"collection":[{"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/comments?post=4907"}],"version-history":[{"count":1,"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/posts\/4907\/revisions"}],"predecessor-version":[{"id":4909,"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/posts\/4907\/revisions\/4909"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/media\/4908"}],"wp:attachment":[{"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/media?parent=4907"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/categories?post=4907"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/azoora.com\/blog\/wp-json\/wp\/v2\/tags?post=4907"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}