Thursday, December 10, 2020

Python Program to Append to a File

In the post Python Program to Write a File we saw options to write to a file in Python but that has the drawback of overwriting the existing file. If you want to keep adding content to an existing file you should use append mode to open a file. In this tutorial we’ll see options to append to a file in Python.

Append mode in Python I/O

To append data to a file i.e. adding content to the end of an existing file you should open file in append mode (‘a’). If the file doesn’t exist it will create a new file for writing content.

Appending to a file in Python

Following method opens the passed file in append mode and then adds content to the end of the file.

def append_file(fname):
  with open(fname, 'a') as f:
    f.write('This line is added to the already existing content')

Using ‘a+’ mode to write and read file

Following program opens a file in ‘a+’ mode for both appending and reading. Program also uses tell() method to get the current position of the file pointer and seek() method to move to the beginning of the file.

def append_file(fname):
  with open(fname, 'a+') as f:
    f.write('This line is added to the already existing content')
    f.flush()
    print("Current position of file pointer- ", f.tell())
    f.seek(0, 0)
    s = f.read()
    print('Content- ', s)

That's all for the topic Python Program to Append to a File. If something is missing or you have something to share about the topic please write a comment.


You may also like

Wednesday, December 2, 2020

Spring Boot With Docker Example

In this tutorial you’ll see how to build a Docker image for running a Spring Boot application. We’ll create a basic DockerFile to dockerize a Spring Boot MVC application where view is created using Thymeleaf.

Maven Dependencies

Since we are creating a web application so we need a spring-boot-starter-web, for Thymeleaf we need spring-boot-starter-thymeleaf, spring-boot-maven-plugin is also added to our pom.xml. This plugin provides many convenient features-

  • It helps to create an executable jar (über-jar), which makes it more convenient to execute and transport your service.
  • It also searches for the public static void main() method to flag the class having this method as a runnable class.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.knpcode</groupId>
  <artifactId>SprinBootProject</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <name>SpringBootProject</name>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.3.0.RELEASE</version>
  </parent>
  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-thymeleaf</artifactId>
     </dependency>
     <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-devtools</artifactId>
      <optional>true</optional>
    </dependency>
  </dependencies>
  <build>
    <plugins>
      <plugin>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-maven-plugin</artifactId>
      </plugin>
    </plugins>
  </build>
</project>

Classes for Spring Boot Web Application

We’ll add a simple controller for our web application.

import org.springframework.stereotype.Controller;
import org.springframework.ui.Model;
import org.springframework.web.bind.annotation.GetMapping;

@Controller
public class MessageController {
  @GetMapping("/")
  public String showMessage(Model model) { 
    model.addAttribute("msg", "Welome to Docker");
    return "message";
  }
}
View class (Thymeleaf template)

In src/main/resources added a new folder Templates and in that created a message.html file.

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Spring Boot With Docker</title>
</head>
<body>
 <div>
    <p th:text="${msg}"></p>
 </div>
</body>
</html>

Application Class

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class SpringBootProjectApp {
  public static void main(String[] args) {
    SpringApplication.run(SpringBootProjectApp.class, args);
  }
}
Running the application

You can run this Spring Boot web application as a stand alone Java application but we'll run it by creating an executable jar.

For creating a completely self-contained executable jar file run mvn package from the command line. Note that you should be in your Spring Boot project directory.

knpcode:SprinBootProject$ mvn package

To run application using the created jar, you can use the java -jar command, as follows-

java -jar target/SprinBootProject-0.0.1-SNAPSHOT.jar

But we’ll do the samething by creating a DockerFile.

DockerFile

For running in your application in Docker container you need to create an image which is a read-only template with instructions for creating a Docker container.

For creating Docker image you create a Dockerfile which is a text file with a simple syntax for defining the steps needed to create the image and run it. Each instruction in a Dockerfile creates a layer in the image.

Create a text file with in your project directory named DockerFile and copy the following text in it.

FROM openjdk:8-jdk-alpine

ARG JAR_FILE=target/SprinBootProject-0.0.1-SNAPSHOT.jar

COPY ${JAR_FILE} app.jar

ENTRYPOINT ["java","-jar","/app.jar"]
  1. Often, an image is based on another image, with some additional customization. This is true in our case too and the base image used here is openjdk:8-jdk-alpine This image is based on the popular Alpine Linux project which is much smaller than most distribution base images (~5MB), and thus leads to much slimmer images in general.
  2. Then assign a name to the jar path.
  3. Copy jar file.
  4. Execute jar using the ENTRYPOINT instruction by providing arguments in the following form- ENTRYPOINT ["executable", "param1", "param2"] Which makes it equivalent to java -jar target/SprinBootProject-0.0.1-SNAPSHOT.jar

Create a docker image

You can create a Docker image by running command in the following form-

sudo docker build -t name:tag .

For our project command to create a docker image-

sudo docker build -t sbexample:1.0 .

. means using the current directory as context

tag the image as sbexample:1.0

To create a container (run an image)

The docker run command must specify an image to derive the container from.

sudo docker run -d -p 8080:8080 sbexample:1.0

Here options are-

-d To start a container in detached mode (to run the container in the background)

-p Publish all exposed ports to the host interfaces

If every thing works fine then you will have a dockerized Spring Boot application at this point which you can access by typing URL http://localhost:8080/ in a browser

Spring Boot With Docker

If you want to see the running containers use following command

sudo docker ps

To stop a running container use following command

sudo docker stop container_id

That's all for the topic Spring Boot With Docker Example. If something is missing or you have something to share about the topic please write a comment.


You may also like

Saturday, August 1, 2020

Installing Java in Windows

This post shows how to install Java in Windows 10. Steps for installing Java in Windows are as follows.

  1. Downloading the latest version of Java.
  2. Installing Java by using the downloaded JDK installer.
  3. Setting the PATH Environment Variable

Downloading Java

You need to download the JDK installer from this location- http://www.oracle.com/technetwork/java/javase/downloads/index.html

Latest version of Java would be at the top which happens to be Java SE 10.0.2 at the time of writing, by scrolling down you can see the other versions of JDK too.

Installing Java Windows

Click the download button below JDK.

You can download JRE if you just want to run Java applications on your system but JDK contains many other tools helpful for development. As per Oracle site this is the description.

  • Software Developers: JDK (Java SE Development Kit). For Java Developers. Includes a complete JRE plus tools for developing, debugging, and monitoring Java applications.
  • End user running Java on a desktop: JRE:(Java Runtime Environment). Covers most end-users needs. Contains everything required to run Java applications on your system.

Once you click the download button you would be taken to the page where you need to accept license agreement and download the java exe file as per your requirement. Note that from Java 9 on wards only 64 bit JDK download is available, till Java 8 there was choice of both 32 bit and 64 bit.

Installing Java using JDK installer in Windows

Once you have the exe file downloaded you can run it by double clicking it that will start the JDK installer. Just follow the instructions provided by the Installation wizard.

Setting the PATH Environment Variable

It is useful to set the PATH variable permanently for Java so that it is persistent after the system is rebooted. Otherwise you will have to give the full path every time you run a program.

To set the PATH variable permanently you need to add the path to jdk-10\bin directory in your Java installation to the PATH variable. Typically the full path would be similar to as follows-

C:\Program Files\Java\jdk-10.0.1\bin 

To set the PATH variable in Windows 10, right click on “This PC” and select Properties. Alternatively you go to Control Panel – System to open the same screen.
Click on Advanced System Settings – Advanced – Environment Variables

There you need to add the location of the bin folder of the JDK installation to the Path variable in System Variables.

For that select path and click Edit and add the path to bin directory in your Java installation- C:\Program Files\Java\jdk-10.0.1\bin

Note that Java 9 onwards JAVA_HOME variable is not needed. If you are installing any older version you also need to add JAVA_HOME variable. Under system variables click New. In the Variable Name field enter “JAVA_HOME” and as value enter the JDK installation path – C:\Program Files\Java\jdk-10.0.1

Want to write your first Java program after installing Java, have a look at this post- Writing First Java Program – Hello World

That’s all for the topic Installing Java in Windows. If something is missing or you have something to share about the topic please write a comment.


You may also like

Thursday, May 14, 2020

Counters in Hadoop MapReduce

Counters in Hadoop MapReduce help in getting statistics about the MapReduce job. With counters in Hadoop you can get general information about the executed job like launched map and reduce tasks, map input records, use the information to diagnose if there is any problem with data, use information provided by counters to do some performance tuning, as example from counters you get information about spilled records and memory used, using that information you can try to fine tune your job.

Types of counters in Hadoop

With in Hadoop there are many built-in counters for the MapReduce job that are displayed on the console after running the job or you can use UI to analyze those counters.

You can also have user defined counters. So there are two types of counters in Hadoop.

  1. Built-in counters
  2. User defined counters

Built-in counters in Hadoop

Built-in counters in Hadoop can be divided into following groups, these counters are defined as Enum in the Hadoop framework.

  1. File System Counters- org.apache.hadoop.mapreduce.FileSystemCounter
  2. Map-Reduce Framework Counters- org.apache.hadoop.mapreduce.TaskCounter
  3. File Input Format Counters- org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
  4. File Output Format Counters- org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
  5. Job Counters- org.apache.hadoop.mapreduce.JobCounter

File System Counters in Hadoop

  • Number of bytes read (BYTES_READ)- Shows the number of bytes read by Map and Reduce tasks. There will be a separate entry for each file system. As example if bytes are read from both local file system and HDFS then there will be two entries prefixed with FILE: and HDFS:.
  • Number of bytes written (BYTES_WRITTEN)- Shows the number of bytes written by Map and Reduce tasks.
  • Number of read operations (READ_OPS)- Shows the number of read operations (like opening a file) by both Map and Reduce tasks.
  • Number of large read operations (LARGE_READ_OPS)- Shows the number of large operations (like going through a large directory structure) by both Map and Reduce tasks.
  • Number of write operations (WRITE_OPS)- Shows the number of write operations (like creating a file, appending to it) by both Map and Reduce tasks.

Map-Reduce Framework Counters

  • Map input records (MAP_INPUT_RECORDS)- The number of records processed by all the maps.
  • Map output records (MAP_OUTPUT_RECORDS)- The number of output records emitted by all the maps.
  • Map skipped records (MAP_SKIPPED_RECORDS)– The number of records skipped by all the maps.
  • Map output bytes (MAP_OUTPUT_BYTES)- Output of all the maps in bytes.
  • Map output materialized bytes (MAP_OUTPUT_MATERIALIZED_BYTES)- Output bytes written to the disk.
  • Input split bytes (SPLIT_RAW_BYTES)- Metadata about the input splits in bytes.
  • Combine input records (COMBINE_INPUT_RECORDS)- The number of input records processed by combiner.
  • Combine output records (COMBINE_OUTPUT_RECORDS)- The number of output records emitted by combiner.
  • Reduce input groups (REDUCE_INPUT_GROUPS)- The number of key groups processed by all the Reducers.
  • Reduce shuffle bytes (REDUCE_SHUFFLE_BYTES)- Map output copied to Reducers in bytes.
  • Reduce input records (REDUCE_INPUT_RECORDS)- The number of input records processed by all the Reducers.
  • Reduce output records (REDUCE_OUTPUT_RECORDS)- The number of output records emitted by all the Reducers.
  • Reduce skipped records (REDUCE_SKIPPED_RECORDS)- The number of records skipped by Reducer.
  • Spilled Records (SPILLED_RECORDS)- The number of records spilled to the disk.
  • Shuffled Maps (SHUFFLED_MAPS)- The number of map output files copied to nodes where reducers are running.
  • Failed Shuffles (FAILED_SHUFFLE)- The number of map output files failed during shuffle.
  • Merged Map outputs (MERGED_MAP_OUTPUTS)- The number of map outputs merged to create input for the Reducers.
  • GC time elapsed (GC_TIME_MILLIS)- Time spent in garbage collection.
  • CPU time spent (CPU_MILLISECONDS)- CPU time spent for task processing.
  • Physical memory snapshot (PHYSICAL_MEMORY_BYTES)- Total physical memory used.
  • Virtual memory snapshot (VIRTUAL_MEMORY_BYTES)- Total virtual memory used.
  • Total committed heap usage (COMMITTED_HEAP_BYTES)- Total amount of heap memory available.

File Input Format Counters in Hadoop

  • Bytes Read (BYTES_READ)– Bytes read by Map tasks using the Input format used for the task.

File Output Format Counters in Hadoop

  • Bytes Written (BYTES_WRITTEN)- Bytes written by Map and reduce tasks using the Output format used for the task.

Job Counters in Hadoop

  • Launched map tasks (TOTAL_LAUNCHED_MAPS)- Total number of launched map tasks.
  • Launched reduce tasks (TOTAL_LAUNCHED_REDUCES)- Total number of launched reduce tasks.
  • Failed map tasks (NUM_FAILED_MAPS)- The number of failed map tasks.
  • Failed reduce tasks (NUM_FAILED_REDUCES)- The number of failed reduce tasks.
  • Killed map tasks (NUM_KILLED_MAPS)- The number of killed map tasks.
  • Killed reduce tasks (NUM_KILLED_REDUCES)- The number of killed reduce tasks.
  • Data-local map tasks (DATA_LOCAL_MAPS)- The number of map taks running on the same node where the data they process also resides.
  • rack-local map tasks (RACK_LOCAL_MAPS)- The number of map taks running on the node in the rack where the data they process also resides.
  • Launched uber tasks (TOTAL_LAUNCHED_UBERTASKS)- Total number of launched uber tasks.
  • Map in uber tasks (NUM_UBER_SUBMAPS)- The number of maps run as uber task.
  • Reduce in uber tasks (NUM_UBER_SUBREDUCES)- The number of reduces run as uber task.
  • failed uber tasks (NUM_FAILED_UBERTASKS)- The number of failed uber tasks.
  • Total time spent by all map tasks (ms) (MILLIS_MAPS)- Time spent in running all the map tasks.
  • Total time spent by all reduce tasks (ms) (MILLIS_REDUCES)- Time spent in running all the reduce tasks.
  • Total vcore-milliseconds taken by all map tasks (VCORES_MILLIS_MAPS)- Total Vcore time taken by all map tasks.
  • Total vcore-milliseconds taken by all reduce tasks (VCORES_MILLIS_REDUCES)- Total Vcore time taken by all reduce tasks.

As you can see from the description of the counters; File System Counters, Map-Reduce Framework Counters, File Input Format Counters, File Output Format Counters are providing statistics about the tasks in the MapReduce job. On the other hand Job counter provides statistics about the overall job.

User defined counters in Hadoop

You can also create user defined counter in Hadoop MapReduce. Using counters also help with debugging as you can create a counter and increment it for some condition and then check the counter output that will also give you an idea if there is anything wrong with the data.

For creating a counter you can use Java enum. Each field in an enum is a counter name where as enum is a group these counters belong to.

User defined counter Hadoop MapReduce example

As example if you have data about stock symbol, price and number of transactions and you want to check the records where transactions are missing you can create a counter in MapReduce to do that.

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class StockData extends Configured implements Tool{
  enum Stock {
    TRANSACTION_MISSING
  }
  // Mapper 1
  public static class StockFieldMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
    private Text symbol = new Text();
    Integer trans = 0;
    public void map(LongWritable key, Text value, Context context) 
        throws IOException, InterruptedException {
      // Splitting the line on tab
      String[] stringArr = value.toString().split("\t");
      //Setting symbol and transaction values
      symbol.set(stringArr[0]);
      if(stringArr[2] != null && !stringArr[2].trim().equals("")) {
        trans = Integer.parseInt(stringArr[2]);
      }else {
        // incrementing counter
        context.getCounter(Stock.TRANSACTION_MISSING).increment(1);
        trans = 0;
      }      
        context.write(symbol, new IntWritable(trans));
     }
  }
	
  // Reduce function
  public static class TotalTransReducer extends Reducer<Text, IntWritable, Text, IntWritable>{    
    public void reduce(Text key, Iterable values, Context context) 
        throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }      
      context.write(key, new IntWritable(sum));
    }
  }

  public static void main(String[] args) throws Exception {
    int exitFlag = ToolRunner.run(new StockData(), args);
    System.exit(exitFlag);
  }

  @Override
  public int run(String[] args) throws Exception {
    Configuration conf = getConf();
    Job job = Job.getInstance(conf, "Stock data");
    job.setJarByClass(getClass());
    job.setMapperClass(StockFieldMapper.class);    
    job.setReducerClass(TotalTransReducer.class);	 
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    return job.waitForCompletion(true) ? 0 : 1;
  }
}

Then in the counters that are displayed you would see some thing similar to as follows-

org.knpcode.StockData$Stock
	TRANSACTION_MISSING=3

That's all for the topic Counters in Hadoop MapReduce. If something is missing or you have something to share about the topic please write a comment.


You may also like