CSV Python에 몇 줄이 있습니까?
CSV 파일을 읽기 위해 Python (Django Framework)을 사용하고 있습니다. 보시다시피이 CSV에서 2 줄만 가져옵니다. 내가하려는 것은 CSV의 총 행 수를 변수에 저장하는 것입니다.
총 행 수는 어떻게 얻을 수 있습니까?
file = object.myfilePath
fileObject = csv.reader(file)
for i in range(2):
data.append(fileObject.next())
나는 시도했다 :
len(fileObject)
fileObject.length
행 수를 계산해야합니다.
row_count = sum(1 for row in fileObject) # fileObject is your csv.reader
sum()
생성기 표현식과 함께 사용 하면 전체 파일을 메모리에 저장하지 않고도 효율적인 카운터를 만들 수 있습니다.
시작하기 위해 이미 2 개의 행을 읽었다면이 2 개 행을 합계에 추가해야합니다. 이미 읽은 행은 계산되지 않습니다.
2018-10-29 수정
의견을 보내 주셔서 감사합니다.
속도 측면에서 csv 파일의 줄 수를 얻기 위해 여러 종류의 코드를 테스트했습니다. 가장 좋은 방법은 다음과 같습니다.
with open(filename) as f:
sum(1 for line in f)
다음은 테스트 된 코드입니다.
import timeit
import csv
import pandas as pd
filename = './sample_submission.csv'
def talktime(filename, funcname, func):
print(f"# {funcname}")
t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
print('Elapsed time : ', t)
print('n = ', func(filename))
print('\n')
def sum1forline(filename):
with open(filename) as f:
return sum(1 for line in f)
talktime(filename, 'sum1forline', sum1forline)
def lenopenreadlines(filename):
with open(filename) as f:
return len(f.readlines())
talktime(filename, 'lenopenreadlines', lenopenreadlines)
def lenpd(filename):
return len(pd.read_csv(filename)) + 1
talktime(filename, 'lenpd', lenpd)
def csvreaderfor(filename):
cnt = 0
with open(filename) as f:
cr = csv.reader(f)
for row in cr:
cnt += 1
return cnt
talktime(filename, 'csvreaderfor', csvreaderfor)
def openenum(filename):
cnt = 0
with open(filename) as f:
for i, line in enumerate(f,1):
cnt += 1
return cnt
talktime(filename, 'openenum', openenum)
결과는 아래와 같습니다.
# sum1forline
Elapsed time : 0.6327946722068599
n = 2528244
# lenopenreadlines
Elapsed time : 0.655304473598555
n = 2528244
# lenpd
Elapsed time : 0.7561274056295324
n = 2528244
# csvreaderfor
Elapsed time : 1.5571560935772661
n = 2528244
# openenum
Elapsed time : 0.773000013928679
n = 2528244
In conclusion, sum(1 for line in f)
is fastest. But there might not be significant difference from len(f.readlines())
.
sample_submission.csv
is 30.2MB and has 31 million characters.
To do it you need to have a bit of code like my example here:
file = open("Task1.csv")
numline = len(file.readlines())
print (numline)
I hope this helps everyone.
Several of the above suggestions count the number of LINES in the csv file. But some CSV files will contain quoted strings which themselves contain newline characters. MS CSV files usually delimit records with \r\n, but use \n alone within quoted strings.
For a file like this, counting lines of text (as delimited by newline) in the file will give too large a result. So for an accurate count you need to use csv.reader to read the records.
row_count = sum(1 for line in open(filename))
worked for me.
Note : sum(1 for line in csv.reader(filename))
seems to calculate the length of first line
numline = len(file_read.readlines())
Use "list" to fit a more workably object.
You can then count, skip, mutate till your heart's desire:
list(fileObject) #list values
len(list(fileObject)) # get length of file lines
list(fileObject)[10:] # skip first 10 lines
First you have to open the file with open
input_file = open("nameOfFile.csv","r+")
Then use the csv.reader for open the csv
reader_file = csv.reader(input_file)
At the last, you can take the number of row with the instruction 'len'
value = len(list(reader_file))
The total code is this:
input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))
Remember that if you want to reuse the csv file, you have to make a input_file.fseek(0), because when you use a list for the reader_file, it reads all file, and the pointer in the file change its position
when you instantiate a csv.reader object and you iter the whole file then you can access an instance variable called line_num providing the row count:
import csv
with open('csv_path_file') as f:
csv_reader = csv.reader(f)
for row in csv_reader:
pass
print(csv_reader.line_num)
might want to try something as simple as below in the command line:
sed -n '$=' filename
or wc -l filename
import csv
count = 0
with open('filename.csv', 'rb') as count_file:
csv_reader = csv.reader(count_file)
for row in csv_reader:
count += 1
print count
This works for csv and all files containing strings in Unix-based OSes:
import os
numOfLines = int(os.popen('wc -l < file.csv').read()[:-1])
In case the csv file contains a fields row you can deduct one from numOfLines
above:
numOfLines = numOfLines - 1
try
data = pd.read_csv("data.csv")
data.shape
and in the output you can see something like (aa,bb) where aa is the # of rows
I think we can improve the best answer a little bit, I'm using:
len = sum(1 for _ in reader)
Moreover, we shouldnt forget pythonic code not always have the best performance in the project. In example: If we can do more operations at the same time in the same data set Its better to do all in the same bucle instead make two or more pythonic bucles.
import pandas as pd
data = pd.read_csv('data.csv')
totalInstances=len(data)
참고URL : https://stackoverflow.com/questions/16108526/count-how-many-lines-are-in-a-csv-python
'program tip' 카테고리의 다른 글
Mac OS X에서 파일을 사용하는 프로세스 확인 (0) | 2020.09.11 |
---|---|
샘플 데이터에서 신뢰 구간 계산 (0) | 2020.09.11 |
Android 지원 라이브러리를 23.2.0으로 업데이트하면 오류 발생 : XmlPullParserException 바이너리 XML 파일 줄 # 17 (0) | 2020.09.11 |
홈 디렉토리와 관련된 파일을 여는 방법 (0) | 2020.09.11 |
알파벳 순서로 문자열 비교 (0) | 2020.09.11 |